<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>MetaOptimize</title>
	<atom:link href="http://metaoptimize.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://metaoptimize.com</link>
	<description>Consulting on big data, machine learning, and natural language processing</description>
	<lastBuildDate>Sat, 10 Mar 2012 05:35:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Lean Startup, and The Stooges</title>
		<link>http://metaoptimize.com/newblog2/lean-startup-and-the-stooges/</link>
		<comments>http://metaoptimize.com/newblog2/lean-startup-and-the-stooges/#comments</comments>
		<pubDate>Sat, 10 Mar 2012 05:35:28 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Blog]]></category>

		<guid isPermaLink="false">http://metaoptimize.com/?p=190</guid>
		<description><![CDATA[Okay, I&#8217;m ready. After reading a handful of articles making tenuous connections between entrepreneurship and music, including : The Notorious CEO: Ten Startup Commandments from Biggie Smalls Being like The Sex Pistols can help your startup? I&#8217;ve decided to come out and share my favorite startup music. Dirt, by The Stooges, is a proto-punk cut [...]]]></description>
			<content:encoded><![CDATA[<p>Okay, I&#8217;m ready.</p>
<p>After reading a handful of articles making tenuous connections between entrepreneurship and music, including :</p>
<ul>
<li><a href="http://themetricsystem.rjmetrics.com/2009/08/10/the-notorious-ceo-ten-startup-commandments-from-biggie-smalls/">The Notorious CEO: Ten Startup Commandments from Biggie Smalls</a></li>
<li><a href="http://blog.smartupz.com/2010/03/being-like-sex-pistols-can-help-your.html">Being like The Sex Pistols can help your startup?</a></li>
</ul>
<p>I&#8217;ve decided to come out and share my favorite startup music.</p>
<p>Dirt, by <a href="http://en.wikipedia.org/wiki/The_Stooges">The Stooges</a>, is a <a href="http://www.allmusic.com/cg/amg.dll?p=amg&#038;sql=77:2698">proto-punk</a> cut that sprawls for seven-minutes, brooding and smoldering. It never climaxes or burns out, it just persists and drives forward.</p>
<p>Anyway, I believe this song should be the mantra for boostrappers, in particular those that practice the <a href="http://www.startuplessonslearned.com/">lean</a> <a href="http://groups.google.com/group/lean-startup-circle?pli=1">startup</a> <a href="http://leanstartup.pbworks.com/">methodology</a>.</p>
<ul>
<i>Ooh, I been dirt / And I don&#8217;t care / Cause I&#8217;m burning inside / I&#8217;m just a yearning inside / And I&#8217;m the fire o&#8217; life.</i>
</ul>
<p>Without further ado, <b>DIRT</b>:</p>
<p><object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/zxYXV2RrwIs&#038;hl=en_US&#038;fs=1&#038;"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/zxYXV2RrwIs&#038;hl=en_US&#038;fs=1&#038;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://metaoptimize.com/newblog2/lean-startup-and-the-stooges/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Constitution for Governance of Open-Source Projects (v20100227)</title>
		<link>http://metaoptimize.com/newblog2/constitution-for-governance-of-open-source-projects-v20100227/</link>
		<comments>http://metaoptimize.com/newblog2/constitution-for-governance-of-open-source-projects-v20100227/#comments</comments>
		<pubDate>Sat, 10 Mar 2012 05:35:05 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Blog]]></category>

		<guid isPermaLink="false">http://metaoptimize.com/?p=189</guid>
		<description><![CDATA[Summary I propose a default &#8220;Constitution for Governance of Open-Source Projects&#8221;. Background I recently got involved in the OSQA project, which is a fork of CNPROG, which in turn is a clone of the StackExchange Q&#038;A forum software. Note that the OSQA project has no formal &#8220;homepage&#8221;, or instructions on how to get involved. I [...]]]></description>
			<content:encoded><![CDATA[<h1>Summary</h1>
<p>I propose a default &#8220;Constitution for Governance of Open-Source Projects&#8221;.</p>
<hr />
<h1>Background</h1>
<p>I recently got involved in the <a href="http://osqa.net/question/2/where-can-i-get-the-source-code-for-osqa">OSQA</a> project, which is a fork of <a href="http://github.com/cnprog/CNPROG">CNPROG</a>, which in turn is a clone of the <a href="http://stackexchange.com/">StackExchange</a> Q&#038;A forum software.</p>
<p>Note that the OSQA project has no formal &#8220;homepage&#8221;, or instructions on how to get involved. I only discovered by chance that there is a mailing-list (unarchived) and developer chat room. Nor was it immediately clear which OSQA github fork should one use.</p>
<p>This is because OSQA grew organically from one contributor to a handful, and developer involvement was an afterthought in this project. Not that there is anything wrong with that.<br />
However, now that a handful of people are involved in the project, and <a href="http://osqa.net/questions/unanswered/">more people are trying to get involved</a>, we have begun discussing governance and decision-making policies on the mailing list. In fact,<br />
<a href="http://nmrwiki.org/">Evgeny Fadeev</a> poses this very question on <a href="http://stackoverflow.com/questions/2328631/how-to-achieve-effective-democratic-governance-for-an-open-source-project">StackOverflow</a>, and proposes some potential answers.</p>
<p>I believe that, by default, there are some simple but clear principles that should be enunciated. I hereby propose my</p>
<h1>Constitution for Governance of Open-Source Projects (v20100227)</h1>
<p>Let it be affirmed that the primary goal in instituting governance of an open-source project be to ensure the long-term health of the project.</p>
<p>Accordingly, the default bias should be towards openness and inclusiveness.<br />
However, policy should be changed as issues present themselves, in order to maintain the long-term health of the project.</p>
<p>For the model of decision making,  we favor a &#8220;do-ocracy&#8221;.<br />
The people who contribute the most generally command the respect of the community.<br />
Alienating them is the best way to derail the project.</p>
<p>The repository should be open the committers, given that commits can easily be reverted and commit-access easily revoked. This is preferable to alienating potential committers.</p>
<p>To ensure transparency for developers new and old, and allow them to decide their involvement in a project based upon the history of the project, their should be transparency and openess in the inner working of the project. For example, the email archive should be public.</p>
<p>Lastly, let us remember that too much red-tape gets in the way of progress. So red-tape and other barriers to contribution should be avoided, and only added as issues present themselves.</p>
<p>This Constitution can and should be amended as issues present themselves.</p>
<p>Therefore be it resolved.</p>
]]></content:encoded>
			<wfw:commentRss>http://metaoptimize.com/newblog2/constitution-for-governance-of-open-source-projects-v20100227/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why can&#8217;t you pickle generators in Python? A pattern for saving training state</title>
		<link>http://metaoptimize.com/newblog2/why-cant-you-pickle-generators-in-python-a-pattern-for-saving-training-state/</link>
		<comments>http://metaoptimize.com/newblog2/why-cant-you-pickle-generators-in-python-a-pattern-for-saving-training-state/#comments</comments>
		<pubDate>Sat, 10 Mar 2012 05:24:45 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Blog]]></category>

		<guid isPermaLink="false">http://metaoptimize.com/?p=183</guid>
		<description><![CDATA[Summary A pattern for persisting generators is to turn them into pickle-able class objects. This is useful when you use generators for streaming training examples. I would also try generator_tools, which might be a more convenient alternative to the pattern I describe. I haven&#8217;t used it yet. Generators for streaming training examples For machine learning, [...]]]></description>
			<content:encoded><![CDATA[<h1>Summary</h1>
<p><a href="http://flickr.com/photos/28402283@N07/3186143355" title="Moon Rise behind the San Gorgonio Pass Wind Farm"><img align=right src="http://farm4.static.flickr.com/3118/3186143355_4840fb7620_t.jpg" /></a></p>
<p>A pattern for persisting generators is to turn them into pickle-able class objects. This is useful when you use generators for streaming training examples.</p>
<p>I would also try <a href="http://www.fiber-space.de/generator_tools/doc/generator_tools.html">generator_tools</a>, which might be a more convenient alternative to the pattern I describe. I haven&#8217;t used it yet.</p>
<hr />
<h2>Generators for streaming training examples</h2>
<p>For machine learning, python <a href="http://www.ibm.com/developerworks/library/l-pycon.html">generators</a> are a simple idiom that make it easy to generate a stream of training examples. Moreover, you can nest generators:</p>
<ul>
<li>The inner generator can be used to read one example at a time.</li>
<li>The outer generator can be used to read examples from the inner generator until you have a full minibatch, and then yield this minibatch.</li>
</ul>
<p>Here is some example code:</p>
<p>[Update: The example holds without the ALL CAPS magic variable names, "HYPERPARAMETERS". However, I include HYPERPARAMETERS because I am including the actual code I am using. Hyperparameters are global, read-only variables that specify the particular experimental condition being tested. I can't say that I have the best solution to this particular aspect of experimental control (hyperparameters). I might write a blog post about it in the future, to solicit feedback on improved methods. However, I have refined my current approach over several years, and I can assure you that it is far less painful than a handful of more "clean" approaches.]</p>
<pre>def get_train_example():
    HYPERPARAMETERS = common.hyperparameters.read("language-model")

    from vocabulary import wordmap
    for l in myopen(HYPERPARAMETERS["TRAIN_SENTENCES"]):
        prevwords = []
        for w in string.split(l):
            w = string.strip(w)
            id = None
            if wordmap.exists(w):
                prevwords.append(wordmap.id(w))
                if len(prevwords) >= HYPERPARAMETERS["WINDOW_SIZE"]:
                    yield prevwords[-HYPERPARAMETERS["WINDOW_SIZE"]:]
            else:
                prevwords = []

def get_train_minibatch():
    HYPERPARAMETERS = common.hyperparameters.read("language-model")
    minibatch = []
    for e in get_train_example():
        minibatch.append(e)
        if len(minibatch) >= HYPERPARAMETERS["MINIBATCH SIZE"]:
            assert len(minibatch) == HYPERPARAMETERS["MINIBATCH SIZE"]
            yield minibatch
            minibatch = []
</pre>
<h2>You can&#8217;t persist training state by pickling your generators</h2>
<p>However, generators become problematic when you want to persist your experiment&#8217;s state in order to later restart training at the same place. Unfortunately, <a href="http://bugs.python.org/issue1092962">you can&#8217;t pickle generators in Python</a>. And it can be a bit of a <a href="http://en.wiktionary.org/wiki/pain_in_the_ass">PITA</a> to workaround this, in order to save the training state.</p>
<h2>Pattern to workaround this annoyance</h2>
<p>Following useful discussion on <a href="http://groups.google.com/group/pylearn-dev/browse_thread/thread/c4e4dd3496bbbf08">pylearn-dev</a> and stackoverflow <a href="http://stackoverflow.com/questions/1942328/add-a-member-variable-method-to-a-python-generator">[1]</a> <a href="http://stackoverflow.com/questions/1939015/singleton-python-generator-or-pickle-a-python-generator">[2]</a>, I propose the following pattern for converting generators to pickle-able class objects:</p>
<ol>
<li>Convert the generator to a class in which the generator code is the <a href="http://stackoverflow.com/questions/1942328/add-a-member-variable-method-to-a-python-generator/1942387#1942387">__iter__</a> method</li>
<li>Add <a href="http://docs.python.org/library/pickle.html#object.__getstate__">__getstate__</a> and <a href="http://docs.python.org/library/pickle.html#object.__setstate__">__setstate__</a> methods to the class, to handling pickling. Remember that you can&#8217;t pickle file objects. So __setstate__ will have to re-open files, as necessary.</li>
</ol>
<p>Here is the updated code, after applying this pattern:</p>
<pre>
class TrainingExampleStream(object):
    def __init__(self):
        # Set the state variables, in case pickling happens before __iter__ is called.
        self.filename = None
        self.count = 0
        pass

    def __iter__(self):
        HYPERPARAMETERS = common.hyperparameters.read("language-model")
        from vocabulary import wordmap
        self.filename = HYPERPARAMETERS["TRAIN_SENTENCES"]
        self.count = 0
        for l in myopen(self.filename):
            prevwords = []
            for w in string.split(l):
                w = string.strip(w)
                id = None
                if wordmap.exists(w):
                    prevwords.append(wordmap.id(w))
                    if len(prevwords) >= HYPERPARAMETERS["WINDOW_SIZE"]:
                        self.count += 1
                        yield prevwords[-HYPERPARAMETERS["WINDOW_SIZE"]:]
                else:
                    prevwords = []

    def __getstate__(self):
        return self.filename, self.count

    def __setstate__(self, state):
        """
        @warning: We ignore the filename.  If we wanted
        to be really fastidious, we would assume that
        HYPERPARAMETERS["TRAIN_SENTENCES"] might change.  The only
        problem is that if we change filesystems, the filename
        might change just because the base file is in a different
        path. So we issue a warning if the filename is different from what is expected.
        """
        filename, count = state
        print >> sys.stderr, ("__setstate__(%s)..." % `state`)
        iter = self.__iter__()
        while count != self.count:
#            print count, self.count
            iter.next()
        if self.filename != filename:
            assert self.filename == HYPERPARAMETERS["TRAIN_SENTENCES"]
            print >> sys.stderr, ("self.filename %s != filename given to __setstate__ %s" % (self.filename, filename))
        print >> sys.stderr, ("...__setstate__(%s)" % `state`)

class TrainingMinibatchStream(object):
    def __init__(self):
        pass

    def __iter__(self):
        HYPERPARAMETERS = common.hyperparameters.read("language-model")
        minibatch = []
        self.get_train_example = TrainingExampleStream()
        for e in self.get_train_example:
            minibatch.append(e)
            if len(minibatch) >= HYPERPARAMETERS["MINIBATCH SIZE"]:
                assert len(minibatch) == HYPERPARAMETERS["MINIBATCH SIZE"]
                yield minibatch
                minibatch = []

    def __getstate__(self):
        return (self.get_train_example.__getstate__(),)

    def __setstate__(self, state):
        """
        @warning: We ignore the filename.
        """
        self.get_train_example = TrainingExampleStream()
        self.get_train_example.__setstate__(state[0])
</pre>
]]></content:encoded>
			<wfw:commentRss>http://metaoptimize.com/newblog2/why-cant-you-pickle-generators-in-python-a-pattern-for-saving-training-state/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Welcome to MetaOptimize, a consultancy on Big Data, NLP, and ML</title>
		<link>http://metaoptimize.com/hompeage/welcome-to-metaoptimize-a-consultancy-on-big-data-nlp-and-ml/</link>
		<comments>http://metaoptimize.com/hompeage/welcome-to-metaoptimize-a-consultancy-on-big-data-nlp-and-ml/#comments</comments>
		<pubDate>Fri, 14 Oct 2011 22:21:49 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Homepage]]></category>

		<guid isPermaLink="false">http://metaoptimize.com/new/?p=47</guid>
		<description><![CDATA[Welcome to MetaOptimize, a consultancy on Big Data, NLP, and ML We provide best-of-breed strategic consulting and technical implementations. We work in technology, healthcare, finance, and other data-driven verticals. Please read more about the solutions we offer, and contact us for more information about how we can help you.]]></description>
			<content:encoded><![CDATA[<h1>Welcome to MetaOptimize, a consultancy on Big Data, NLP, and ML</h1>
<p><span style="font-size: medium;">We provide best-of-breed strategic consulting and technical implementations.</span></p>
<p><span style="font-size: medium;">We work in technology, healthcare, finance, and other data-driven verticals.</span></p>
<p><span style="font-size: medium;">Please read more about the <a href="solutions">solutions</a> we offer, and <a href="contact-us">contact us</a> for more information about how we can help you.</span></p>
]]></content:encoded>
			<wfw:commentRss>http://metaoptimize.com/hompeage/welcome-to-metaoptimize-a-consultancy-on-big-data-nlp-and-ml/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

