Until there is better documentation for Lucene 3.0, I recommend you use Lucene 2.4 or 2.9. Nonetheless, I provide a basic indexing and retrieval code using the PyLucene 3.0 API, perhaps the first such example code on the web.
Summary
A pattern for persisting generators is to turn them into pickle-able class objects. This is useful when you use generators for streaming training examples.
I would also try generator_tools, which might be a more convenient alternative to the pattern I describe. I haven’t used it yet.
Generators for streaming training examples
For machine learning, python generators are a simple idiom that make it …
All standard YMMV disclaimers apply.
Update (20090324−2): According to John Millikin, the author of jsonlib, cjson is buggy and unmaintained. I will evaluate further and post a followup blog entry. My discussion with Dan Pascu, the author of cjson, corroborates these claims. I urge readers to read John Millikin’s comment.
Summary:
For quickly deserializing data in Python, use cjson.
simplejson is mysteriously …