<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>MetaOptimize &#187; data</title>
	<atom:link href="http://metaoptimize.com/blog/tag/data/feed/" rel="self" type="application/rss+xml" />
	<link>http://metaoptimize.com/blog</link>
	<description>building machine learning and natural language processing tools</description>
	<lastBuildDate>Mon, 23 May 2011 01:16:49 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=abc</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Use flag –xml when you run mysqldump</title>
		<link>http://metaoptimize.com/blog/2009/10/14/use-flag-xml-when-you-run-mysqldump/</link>
		<comments>http://metaoptimize.com/blog/2009/10/14/use-flag-xml-when-you-run-mysqldump/#comments</comments>
		<pubDate>Wed, 14 Oct 2009 22:40:17 +0000</pubDate>
		<dc:creator>Joseph Turian</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[mysqldump]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://blog.metaoptimize.com/?p=60</guid>
		<description><![CDATA[

Summary:

If you have text data (like a web scrape) stored in a MySQL database, and you want to share the data, mysqldump to XML using the –xml flag.

When fields are unlikely to contain tabs, an even simpler format is a tab-separated file, created using the –tab=path flag to mysqldump. path must be owned by the [...]]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_blue" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Fmetaoptimize.com%252Fblog%252F2009%252F10%252F14%252Fuse-flag-xml-when-you-run-mysqldump%252F%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Use%20flag%20--xml%20when%20you%20run%20mysqldump%22%20%7D);"></div>
<h1>Summary:</h1>
<p><a href="http://flickr.com/photos/24030756@N05/2649856228" title="psychogenic womb memory-gemini project"><img src="http://farm4.static.flickr.com/3223/2649856228_61b5405cfa_t.jpg" align="right"></a></p>
<p>If you have text data (like a <a class="zem_slink" href="http://en.wikipedia.org/wiki/Screen_scraping" title="Screen scraping" rel="wikipedia">web scrape</a>) stored in a <a class="zem_slink" href="http://www.mysql.com" title="MySQL" rel="homepage">MySQL</a> database, and you want to share the data, mysqldump to <a class="zem_slink" href="http://en.wikipedia.org/wiki/XML" title="XML" rel="wikipedia">XML</a> using the <tt>–xml</tt> flag.</p>
</p>
<p>When fields are unlikely to contain tabs, an even simpler format is a tab-separated file, created using the <tt>–tab=path</tt> flag to mysqldump. <tt>path</tt> must be owned by the MySQL database user.
</p>
<h1>The Problem with the standard MySQL dump format</h1>
<p>The standard MySQL dump looks as follows</p>
<pre><code>INSERT INTO `sources` VALUES (1,'2009-03-07 22:06:36','"You\'ve got to be kidding me"', ...
</code></pre>
<p>The problem is that the standard dump format is difficult to interact with programmatically.</p>
<p>It is difficult to parse using <a class="zem_slink" href="http://en.wikipedia.org/wiki/Regular_expression" title="Regular expression" rel="wikipedia">regular expressions</a> because you cannot merely search for single quotes. You have to search for single quotes that are not preceded by a <a href="http://en.wikipedia.org/wiki/Backslash">backslash</a> (unless, perhaps, that backslash is preceded by a backslash).</p>
<p>Also, there are no libraries for reading the standard dump format, nor scripts for converting it into a standard format like <a class="zem_slink" href="http://en.wikipedia.org/wiki/JSON" title="JSON" rel="wikipedia">JSON</a> or XML. I asked <a href="http://www.google.com/search?q=mysql+dump+library&amp;hl=en">the oracle</a> as well as <a href="http://stackoverflow.com/questions/1568838/library-to-read-a-mysql-dump">stackoverflow</a>.</p>
<p>So if you receive a MySQL dump in the standard format, you might have to install MySQL and import the dump to get at your data.</p>
<h1>The tabbed MySQL dump format</h1>
<p>You can create a directory with one file per table, and the table will be one-row-per-line, with <a class="zem_slink" href="http://en.wikipedia.org/wiki/Delimiter-separated_values" title="Delimiter-separated values" rel="wikipedia">tab-separated values</a>:</p>
<pre><code>mysqldump --tab=path database</code></pre>
<p>Here is some example output:</p>
<pre><code>1	2009-03-07 22:06:36	"You've got to be kidding me"</code></pre>
<p>If you get an error of the following form when you issue the mysqldump command:</p>
<pre><code>mysqldump: Got error: 1: Can't create/write to file 'path/database.txt' (Errcode: 13) when executing 'SELECT INTO OUTFILE'</code></pre>
<p>You can resolve this complaint by making sure that /tmp/path is owned by the mysql user (and also writeable by the current Unix user). Thanks <a href="http://forums.mysql.com/read.php?35,172714,172766#msg-172766">JinRong Ye</a>!</p>
<p>This format is convenient if none of your data contains tabs. In <a class="zem_slink" href="http://en.wikipedia.org/wiki/Natural_language_processing" title="Natural language processing" rel="wikipedia">NLP</a>, however, it is quite possible that your text will contain tabs.</p>
<h1>The XML MySQL dump format</h1>
<p>Enter the XML MySQL dump format:</p>
<pre><code>        &lt;table_data name="sources"&gt;
        &lt;row&gt;
                &lt;field name="id"&gt;1&lt;/field&gt;
                &lt;field name="created_at"&gt;2009-03-07 22:06:36&lt;/field&gt;
                &lt;field name="text"&gt;&amp;quot;You've got to be kidding me&amp;quot;&lt;/field&gt;
</code></pre>
<p>Ah… pure bliss. You can get the XML dump format as follows:</p>
<pre><code>mysqldump --xml database</code></pre>
<div class="zemanta-pixie" style="margin-top:10px;height:15px"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/41468938-de30-448c-ac95-b381457c48c8/" title="Reblog this post [with Zemanta]"><img class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_e.png?x-id=41468938-de30-448c-ac95-b381457c48c8" alt="Reblog this post [with Zemanta]" style="border:none;float:right"></a><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div>

<div style="float:left;margin:0px 0px 0px 0px;"></div>]]></content:encoded>
			<wfw:commentRss>http://metaoptimize.com/blog/2009/10/14/use-flag-xml-when-you-run-mysqldump/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

