Category Archives: Uncategorized

Lean Startup, and The Stooges

Okay, I’m ready.
After read­ing a hand­ful of arti­cles mak­ing ten­u­ous con­nec­tions between entre­pre­neur­ship and music, including :

The Noto­ri­ous CEO: Ten Startup Com­mand­ments from Big­gie Smalls
Being like The Sex Pis­tols can help your startup?

I’ve decided to come out and share my favorite startup music.
Dirt, by The Stooges, is a proto-punk cut that sprawls for seven-minutes, brood­ing and smol­der­ing. It

Constitution for Governance of Open-Source Projects (v20100227)

Sum­mary
I pro­pose a default “Con­sti­tu­tion for Gov­er­nance of Open-Source Projects”.

Back­ground
I recently got involved in the OSQA project, which is a fork of CNPROG, which in turn is a clone of the Stack­Ex­change Q&A forum soft­ware.
Note that the OSQA project has no for­mal “home­page”, or instruc­tions on how to get involved. I only dis­cov­ered by chance that there is a mailing-list

Why can’t you pickle generators in Python? A pattern for saving training state

Sum­mary

A pat­tern for per­sist­ing gen­er­a­tors is to turn them into pickle-able class objects. This is use­ful when you use gen­er­a­tors for stream­ing train­ing exam­ples.
I would also try generator_tools, which might be a more con­ve­nient alter­na­tive to the pat­tern I describe. I haven’t used it yet.

Gen­er­a­tors for stream­ing train­ing exam­ples
For machine learn­ing, python gen­er­a­tors are a sim­ple idiom that make it

Use flag –xml when you run mysqldump

Sum­mary:

If you have text data (like a web scrape) stored in a MySQL data­base, and you want to share the data, mysql­dump to XML using the –xml flag.

When fields are unlikely to con­tain tabs, an even sim­pler for­mat is a tab-separated file, cre­ated using the –tab=path flag to mysql­dump. path must be owned by the MySQL database user.

The Prob­lem

Automatically sorting graph curves

A script for auto­mat­i­cally sort­ing graph curves, e.g. for gnuplot.

Fast deserialization in Python

All stan­dard YMMV dis­claimers apply.
Update (20090324−2): Accord­ing to John Mil­likin, the author of json­lib, cjson is buggy and unmain­tained. I will eval­u­ate fur­ther and post a fol­lowup blog entry. My dis­cus­sion with Dan Pascu, the author of cjson, cor­rob­o­rates these claims. I urge read­ers to read John Millikin’s com­ment.
Sum­mary:
For quickly dese­ri­al­iz­ing data in Python, use cjson.
sim­ple­j­son is mys­te­ri­ously