Summary
Email me your pitch and how you need help monetizing data.
If I like your pitch, I’ll give you a free consultation on data strategy (NLP, ML, business intelligence, etc.)
Afterwards, if we both think that I can add value to your business, we can talk about a longer-term relationship.
You should forward this blog post to any friend who could use …
2010.08.20; Friday – 13:22
|
By Joseph Turian
|
Posted in Uncategorized
|
Tagged AI, artificial intelligence, BI, business intelligence, data mining, large datasets, machine learning, ML, natural language processing, NLP, statistical modeling, text analysis, web as corpus
|
Until there is better documentation for Lucene 3.0, I recommend you use Lucene 2.4 or 2.9. Nonetheless, I provide a basic indexing and retrieval code using the PyLucene 3.0 API, perhaps the first such example code on the web.
Summary
I speculate that job hopping, if it becomes a widespread phenomenon, might actually lead to improved business efficiency. In this way, the “Gen Y” job hopping phenomenon could ultimately prove beneficial.
Background
Mark Suster begins the debate by writing: “[Job Hoppers] Make Terrible Employees”.
Paul Dix responds that job hopping is not correlated with employee quality and there are …
Summary
According to common wisdom, the best code is developed in-house. I am beginning to believe this is only true when the code must be tightly coupled, or there are realistic security concerns. These scenarios are less common than managers like to believe.
For run-of-the-mill development projects, outsourcing might have advantages above-and-beyond cost savings. If your code effort …
Okay, I’m ready.
After reading a handful of articles making tenuous connections between entrepreneurship and music, including :
The Notorious CEO: Ten Startup Commandments from Biggie Smalls
Being like The Sex Pistols can help your startup?
I’ve decided to come out and share my favorite startup music.
Dirt, by The Stooges, is a proto-punk cut that sprawls for seven-minutes, brooding and smoldering. It …
Summary
I propose a default “Constitution for Governance of Open-Source Projects”.
Background
I recently got involved in the OSQA project, which is a fork of CNPROG, which in turn is a clone of the StackExchange Q&A forum software.
Note that the OSQA project has no formal “homepage”, or instructions on how to get involved. I only discovered by chance that there is a mailing-list …
Summary
A pattern for persisting generators is to turn them into pickle-able class objects. This is useful when you use generators for streaming training examples.
I would also try generator_tools, which might be a more convenient alternative to the pattern I describe. I haven’t used it yet.
Generators for streaming training examples
For machine learning, python generators are a simple idiom that make it …
Summary:
If you have text data (like a web scrape) stored in a MySQL database, and you want to share the data, mysqldump to XML using the –xml flag.
When fields are unlikely to contain tabs, an even simpler format is a tab-separated file, created using the –tab=path flag to mysqldump. path must be owned by the MySQL database user.
The Problem …
A script for automatically sorting graph curves, e.g. for gnuplot.