Author Archives: Joseph Turian

Joseph Turian has been working on artificial intelligence research since 1996. His focus is on using sophisticated machine learning techniques to approach large-scale problems in natural language. He is currently a post-doctoral research fellow at the Université de Montréal, studying deep learning methods with Professor Yoshua Bengio, Canada Research Chair in Statistical Learning Algorithms. Dr. Turian defended his dissertation, “Constituent Parsing by Classification” at New York University. He received his AB from Harvard University in Computer Science (cum laude).

Lean Startup, and The Stooges

Okay, I’m ready.
After read­ing a hand­ful of arti­cles mak­ing ten­u­ous con­nec­tions between entre­pre­neur­ship and music, including :

The Noto­ri­ous CEO: Ten Startup Com­mand­ments from Big­gie Smalls
Being like The Sex Pis­tols can help your startup?

I’ve decided to come out and share my favorite startup music.
Dirt, by The Stooges, is a proto-punk cut that sprawls for seven-minutes, brood­ing and smol­der­ing. It

Constitution for Governance of Open-Source Projects (v20100227)

Sum­mary
I pro­pose a default “Con­sti­tu­tion for Gov­er­nance of Open-Source Projects”.

Back­ground
I recently got involved in the OSQA project, which is a fork of CNPROG, which in turn is a clone of the Stack­Ex­change Q&A forum soft­ware.
Note that the OSQA project has no for­mal “home­page”, or instruc­tions on how to get involved. I only dis­cov­ered by chance that there is a mailing-list

Why can’t you pickle generators in Python? A pattern for saving training state

Sum­mary

A pat­tern for per­sist­ing gen­er­a­tors is to turn them into pickle-able class objects. This is use­ful when you use gen­er­a­tors for stream­ing train­ing exam­ples.
I would also try generator_tools, which might be a more con­ve­nient alter­na­tive to the pat­tern I describe. I haven’t used it yet.

Gen­er­a­tors for stream­ing train­ing exam­ples
For machine learn­ing, python gen­er­a­tors are a sim­ple idiom that make it

Use flag –xml when you run mysqldump

Sum­mary:

If you have text data (like a web scrape) stored in a MySQL data­base, and you want to share the data, mysql­dump to XML using the –xml flag.

When fields are unlikely to con­tain tabs, an even sim­pler for­mat is a tab-separated file, cre­ated using the –tab=path flag to mysql­dump. path must be owned by the MySQL database user.

The Prob­lem

Automatically sorting graph curves

A script for auto­mat­i­cally sort­ing graph curves, e.g. for gnuplot.

Fast deserialization in Python

All stan­dard YMMV dis­claimers apply.
Update (20090324−2): Accord­ing to John Mil­likin, the author of json­lib, cjson is buggy and unmain­tained. I will eval­u­ate fur­ther and post a fol­lowup blog entry. My dis­cus­sion with Dan Pascu, the author of cjson, cor­rob­o­rates these claims. I urge read­ers to read John Millikin’s com­ment.
Sum­mary:
For quickly dese­ri­al­iz­ing data in Python, use cjson.
sim­ple­j­son is mys­te­ri­ously