<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Fast deserialization in Python</title>
	<atom:link href="http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/feed/" rel="self" type="application/rss+xml" />
	<link>http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/</link>
	<description>building machine learning and natural language processing tools</description>
	<lastBuildDate>Mon, 30 Jan 2012 04:05:03 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=abc</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Ricardo Barroso</title>
		<link>http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/comment-page-1/#comment-1081</link>
		<dc:creator>Ricardo Barroso</dc:creator>
		<pubDate>Sun, 06 Feb 2011 01:55:38 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=5#comment-1081</guid>
		<description>&lt;span class=&quot;topsy_trackback_comment&quot;&gt;&lt;span class=&quot;topsy_twitter_username&quot;&gt;&lt;span class=&quot;topsy_trackback_content&quot;&gt;Fast Deserialization in Python (Performance Comparison): http://bit.ly/hpMVs1
#python #WebDev #JSON #XML (RT @turian)&lt;/span&gt;&lt;/span&gt;</description>
		<content:encoded><![CDATA[<p><span class="topsy_trackback_comment"><span class="topsy_twitter_username"><span class="topsy_trackback_content">Fast Deserialization in Python (Performance Comparison): <a href="http://bit.ly/hpMVs1" rel="nofollow">http://bit.ly/hpMVs1</a><br />
#python #WebDev #JSON #XML (RT @turian)</span></span></span></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph Turian</title>
		<link>http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/comment-page-1/#comment-701</link>
		<dc:creator>Joseph Turian</dc:creator>
		<pubDate>Tue, 15 Jun 2010 20:48:49 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=5#comment-701</guid>
		<description>Aigars, the question is not which is the most compact data set, but which is the fastest to read in (deserialize). Text processing with native Python tends to be much slower than using C, so I would be surprised if your proposal is faster than a JSON library with C implementation. However, I encourage you to post benchmarks that prove me wrong!</description>
		<content:encoded><![CDATA[<p>Aigars, the question is not which is the most compact data set, but which is the fastest to read in (deserialize). Text processing with native Python tends to be much slower than using C, so I would be surprised if your proposal is faster than a JSON library with C implementation. However, I encourage you to post benchmarks that prove me wrong!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aigars Mahinovs</title>
		<link>http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/comment-page-1/#comment-700</link>
		<dc:creator>Aigars Mahinovs</dc:creator>
		<pubDate>Tue, 15 Jun 2010 13:15:04 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=5#comment-700</guid>
		<description>Please try creating a custom reader/writer in Python (if you don&#039;t want to bother with C). If your data structure is so limited and is not recursive, then you should able to very easily express it in a simple comma separated value line (one line per vocabulary term). &lt;br&gt;&lt;br&gt;It could look like this:&lt;br&gt;&lt;br&gt;the propos delet,the proposed deletion,3590,7180.0,the proposed deletion,7153.333333333333,the proposed deletions,13.666666666666666,The proposed deletion,12.0,the proposed deletes,1.0&lt;br&gt;&lt;br&gt;And that is it - this will be the most compact storage data format, because all the repeated data, that describes the structure of the dicts and lists inside a term will be contained in the code that will parse this. I believe that this might be faster than json read function.</description>
		<content:encoded><![CDATA[<p>Please try creating a custom reader/writer in Python (if you don’t want to bother with C). If your data structure is so limited and is not recursive, then you should able to very easily express it in a simple comma separated value line (one line per vocabulary term). </p>
<p>It could look like this:</p>
<p>the propos delet,the proposed deletion,3590,7180.0,the proposed deletion,7153.333333333333,the proposed deletions,13.666666666666666,The proposed deletion,12.0,the proposed deletes,1.0</p>
<p>And that is it — this will be the most compact storage data format, because all the repeated data, that describes the structure of the dicts and lists inside a term will be contained in the code that will parse this. I believe that this might be faster than json read function.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nir</title>
		<link>http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/comment-page-1/#comment-36</link>
		<dc:creator>Nir</dc:creator>
		<pubDate>Tue, 10 Nov 2009 08:41:47 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=5#comment-36</guid>
		<description>Seems that Bob Ippolito fixed simplejson slowness. 
Retry with latest version.</description>
		<content:encoded><![CDATA[<p>Seems that Bob Ippolito fixed simplejson slowness.<br />
Retry with latest version.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Millikin</title>
		<link>http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/comment-page-1/#comment-673</link>
		<dc:creator>John Millikin</dc:creator>
		<pubDate>Tue, 24 Mar 2009 18:59:48 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=5#comment-673</guid>
		<description>(reposting a comment from Hacker News, at Joseph Turian&#039;s request)&lt;br&gt;&lt;br&gt;I&#039;m the author of jsonlib, and I registered specifically to post this message. Please, please, please do not use cjson!&lt;br&gt;&lt;br&gt;First, it is unmaintained. The latest version available was posted on August 24, 2007. When you encounter one of its myriad bugs, you&#039;ll either have to patch it yourself or pick another JSON library. Just skip the intermediate step and use another library to begin with.&lt;br&gt;&lt;br&gt;Second, it is buggy. In some cases, parsing text it just generated will return a different value from what you passed in! It&#039;s almost entirely ignorant of Unicode, and what little it tries to parse it gets wrong.&lt;br&gt;&lt;br&gt;Third, it&#039;s exceedingly non-compliant. The text it parses and generates bears only a passing resemblance to JSON. There are varying degrees of conformance to the spec between libraries, based on personal preference of the authors -- I prefer strict conformance, others less strict -- but cjson is so different as to be simply unusable.&lt;br&gt;&lt;br&gt;Yes, it&#039;s fast. I know. I wrote jsonlib partly because I was unsatisfied with simplejson&#039;s performance, and one goal (never truly achieved) was always to surpass cjson. However, speed isn&#039;t everything. As the saying goes, &quot;if I want my math performed fast and wrong I&#039;ll ask my cat&quot;.&lt;br&gt;&lt;br&gt;In my opinion, the only Python JSON libraries worth considering are:&lt;br&gt;&lt;br&gt;* simplejson -- it&#039;s in the standard library, and should therefore be considered first and most thoroughly.&lt;br&gt;&lt;br&gt;* jsonlib -- it&#039;s fast, well-tested, and standards-compliant.&lt;br&gt;&lt;br&gt;* demjson -- has several options for reliable parsing of invalid input.&lt;br&gt;&lt;br&gt;Last time I checked, jsonlib and simplejson&#039;s C extensions are neck-and-neck performance-wise. In some quick, unscientific tests, jsonlib reads faster and simplejson writes faster. However, simplejson&#039;s extensions are only used for certain subsets of input -- if you want to use an uncommon feature, performance will degrade. jsonlib has an implementation in pure C, which avoids this problem at the cost of complexity.&lt;br&gt;&lt;br&gt;Apologies for the brain-dump, but even if you skip right over it, please remember: don&#039;t use cjson.</description>
		<content:encoded><![CDATA[<p>(reposting a comment from Hacker News, at Joseph Turian’s request)</p>
<p>I’m the author of jsonlib, and I registered specifically to post this message. Please, please, please do not use cjson!</p>
<p>First, it is unmaintained. The latest version available was posted on August 24, 2007. When you encounter one of its myriad bugs, you’ll either have to patch it yourself or pick another JSON library. Just skip the intermediate step and use another library to begin with.</p>
<p>Second, it is buggy. In some cases, parsing text it just generated will return a different value from what you passed in! It’s almost entirely ignorant of Unicode, and what little it tries to parse it gets wrong.</p>
<p>Third, it’s exceedingly non-compliant. The text it parses and generates bears only a passing resemblance to JSON. There are varying degrees of conformance to the spec between libraries, based on personal preference of the authors — I prefer strict conformance, others less strict — but cjson is so different as to be simply unusable.</p>
<p>Yes, it’s fast. I know. I wrote jsonlib partly because I was unsatisfied with simplejson’s performance, and one goal (never truly achieved) was always to surpass cjson. However, speed isn’t everything. As the saying goes, “if I want my math performed fast and wrong I’ll ask my cat”.</p>
<p>In my opinion, the only Python JSON libraries worth considering are:</p>
<p>* simplejson — it’s in the standard library, and should therefore be considered first and most thoroughly.</p>
<p>* jsonlib — it’s fast, well-tested, and standards-compliant.</p>
<p>* demjson — has several options for reliable parsing of invalid input.</p>
<p>Last time I checked, jsonlib and simplejson’s C extensions are neck-and-neck performance-wise. In some quick, unscientific tests, jsonlib reads faster and simplejson writes faster. However, simplejson’s extensions are only used for certain subsets of input — if you want to use an uncommon feature, performance will degrade. jsonlib has an implementation in pure C, which avoids this problem at the cost of complexity.</p>
<p>Apologies for the brain-dump, but even if you skip right over it, please remember: don’t use cjson.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Millikin</title>
		<link>http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/comment-page-1/#comment-9</link>
		<dc:creator>John Millikin</dc:creator>
		<pubDate>Tue, 24 Mar 2009 13:59:48 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=5#comment-9</guid>
		<description>(reposting a comment from Hacker News, at Joseph Turian&#039;s request)&lt;br&gt;&lt;br&gt;I&#039;m the author of jsonlib, and I registered specifically to post this message. Please, please, please do not use cjson!&lt;br&gt;&lt;br&gt;First, it is unmaintained. The latest version available was posted on August 24, 2007. When you encounter one of its myriad bugs, you&#039;ll either have to patch it yourself or pick another JSON library. Just skip the intermediate step and use another library to begin with.&lt;br&gt;&lt;br&gt;Second, it is buggy. In some cases, parsing text it just generated will return a different value from what you passed in! It&#039;s almost entirely ignorant of Unicode, and what little it tries to parse it gets wrong.&lt;br&gt;&lt;br&gt;Third, it&#039;s exceedingly non-compliant. The text it parses and generates bears only a passing resemblance to JSON. There are varying degrees of conformance to the spec between libraries, based on personal preference of the authors -- I prefer strict conformance, others less strict -- but cjson is so different as to be simply unusable.&lt;br&gt;&lt;br&gt;Yes, it&#039;s fast. I know. I wrote jsonlib partly because I was unsatisfied with simplejson&#039;s performance, and one goal (never truly achieved) was always to surpass cjson. However, speed isn&#039;t everything. As the saying goes, &quot;if I want my math performed fast and wrong I&#039;ll ask my cat&quot;.&lt;br&gt;&lt;br&gt;In my opinion, the only Python JSON libraries worth considering are:&lt;br&gt;&lt;br&gt;* simplejson -- it&#039;s in the standard library, and should therefore be considered first and most thoroughly.&lt;br&gt;&lt;br&gt;* jsonlib -- it&#039;s fast, well-tested, and standards-compliant.&lt;br&gt;&lt;br&gt;* demjson -- has several options for reliable parsing of invalid input.&lt;br&gt;&lt;br&gt;Last time I checked, jsonlib and simplejson&#039;s C extensions are neck-and-neck performance-wise. In some quick, unscientific tests, jsonlib reads faster and simplejson writes faster. However, simplejson&#039;s extensions are only used for certain subsets of input -- if you want to use an uncommon feature, performance will degrade. jsonlib has an implementation in pure C, which avoids this problem at the cost of complexity.&lt;br&gt;&lt;br&gt;Apologies for the brain-dump, but even if you skip right over it, please remember: don&#039;t use cjson.</description>
		<content:encoded><![CDATA[<p>(reposting a comment from Hacker News, at Joseph Turian’s request)</p>
<p>I’m the author of jsonlib, and I registered specifically to post this message. Please, please, please do not use cjson!</p>
<p>First, it is unmaintained. The latest version available was posted on August 24, 2007. When you encounter one of its myriad bugs, you’ll either have to patch it yourself or pick another JSON library. Just skip the intermediate step and use another library to begin with.</p>
<p>Second, it is buggy. In some cases, parsing text it just generated will return a different value from what you passed in! It’s almost entirely ignorant of Unicode, and what little it tries to parse it gets wrong.</p>
<p>Third, it’s exceedingly non-compliant. The text it parses and generates bears only a passing resemblance to JSON. There are varying degrees of conformance to the spec between libraries, based on personal preference of the authors — I prefer strict conformance, others less strict — but cjson is so different as to be simply unusable.</p>
<p>Yes, it’s fast. I know. I wrote jsonlib partly because I was unsatisfied with simplejson’s performance, and one goal (never truly achieved) was always to surpass cjson. However, speed isn’t everything. As the saying goes, “if I want my math performed fast and wrong I’ll ask my cat”.</p>
<p>In my opinion, the only Python JSON libraries worth considering are:</p>
<p>* simplejson — it’s in the standard library, and should therefore be considered first and most thoroughly.</p>
<p>* jsonlib — it’s fast, well-tested, and standards-compliant.</p>
<p>* demjson — has several options for reliable parsing of invalid input.</p>
<p>Last time I checked, jsonlib and simplejson’s C extensions are neck-and-neck performance-wise. In some quick, unscientific tests, jsonlib reads faster and simplejson writes faster. However, simplejson’s extensions are only used for certain subsets of input — if you want to use an uncommon feature, performance will degrade. jsonlib has an implementation in pure C, which avoids this problem at the cost of complexity.</p>
<p>Apologies for the brain-dump, but even if you skip right over it, please remember: don’t use cjson.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph Turian</title>
		<link>http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/comment-page-1/#comment-8</link>
		<dc:creator>Joseph Turian</dc:creator>
		<pubDate>Mon, 23 Mar 2009 16:50:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=5#comment-8</guid>
		<description>I am excited for a faster protobuf. In particular, haberman&#039;s &lt;a href=&quot;http://github.com/haberman/pbstream/tree/master&quot; rel=&quot;nofollow&quot;&gt;C extensions&lt;/a&gt; look promising.&lt;br&gt;&lt;br&gt;Compactness is very important for transferring data over a network.&lt;br&gt;However, during the development cycle, human readability is important and often overlooked. If all you need to do to read your data is type &#039;zcat&#039;, you are much more likely to be looking at your data, and hence more likely to catch bugs.</description>
		<content:encoded><![CDATA[<p>I am excited for a faster protobuf. In particular, haberman’s <a href="http://github.com/haberman/pbstream/tree/master" rel="nofollow">C extensions</a> look promising.</p>
<p>Compactness is very important for transferring data over a network.<br />However, during the development cycle, human readability is important and often overlooked. If all you need to do to read your data is type ‘zcat’, you are much more likely to be looking at your data, and hence more likely to catch bugs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin</title>
		<link>http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/comment-page-1/#comment-7</link>
		<dc:creator>Justin</dc:creator>
		<pubDate>Mon, 23 Mar 2009 12:11:32 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=5#comment-7</guid>
		<description>Nice writeup :-)  Good to see that you get the same results on a more complicated data structure.&lt;br&gt;&lt;br&gt;I still have high hopes for protobuf: it can get faster, but json can&#039;t get any smaller.  At some point protobuf will be both the fastest and most compact method.</description>
		<content:encoded><![CDATA[<p>Nice writeup <img src='http://metaoptimize.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />   Good to see that you get the same results on a more complicated data structure.</p>
<p>I still have high hopes for protobuf: it can get faster, but json can’t get any smaller.  At some point protobuf will be both the fastest and most compact method.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jasper Spaans</title>
		<link>http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/comment-page-1/#comment-6</link>
		<dc:creator>Jasper Spaans</dc:creator>
		<pubDate>Mon, 23 Mar 2009 09:06:02 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=5#comment-6</guid>
		<description>Check if the slower simplejson install does something with locales? I&#039;ve seen grep go really slow when trying to do utf-8 stuff, which disappeared after setting LANG=C / LC_ALL=C...</description>
		<content:encoded><![CDATA[<p>Check if the slower simplejson install does something with locales? I’ve seen grep go really slow when trying to do utf-8 stuff, which disappeared after setting LANG=C / LC_ALL=C…</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph Turian</title>
		<link>http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/comment-page-1/#comment-5</link>
		<dc:creator>Joseph Turian</dc:creator>
		<pubDate>Mon, 23 Mar 2009 04:58:47 +0000</pubDate>
		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=5#comment-5</guid>
		<description>According to &lt;a href=&quot;http://kbyanc.blogspot.com/2007/07/python-serializer-benchmarks.html&quot; rel=&quot;nofollow&quot;&gt;Extra Cheese&lt;/a&gt;, cjson has an incompatibility with simplejson in processing slashes. A fix is available from &lt;a href=&quot;http://www.vazor.com/cjson.html&quot; rel=&quot;nofollow&quot;&gt;vazor&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>According to <a href="http://kbyanc.blogspot.com/2007/07/python-serializer-benchmarks.html" rel="nofollow">Extra Cheese</a>, cjson has an incompatibility with simplejson in processing slashes. A fix is available from <a href="http://www.vazor.com/cjson.html" rel="nofollow">vazor</a>.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

