Free consultation on data strategy (NLP, ML, business intelligence, etc.)

Sum­mary

Email me your pitch and how you need help mon­e­tiz­ing data.
If I like your pitch, I’ll give you a free con­sul­ta­tion on data strat­egy (NLP, ML, busi­ness intel­li­gence, etc.)
After­wards, if we both think that I can add value to your busi­ness, we can talk about a longer-term relationship.

You should for­ward this blog post to any friend who could use this information.


What is data strategy?

Do you know how to mon­e­tize the data you have? How can you improve mon­e­ti­za­tion using other data avail­able to you? How do you trans­form your data into action­able busi­ness intelligence?

I can help you shape your data strat­egy, your long-term plan for how your busi­ness will cap­ture, process, and mon­e­tize data. For exam­ple, data strat­egy can help you in the fol­low­ing circumstances:

  • You don’t know who your indi­vid­ual users are or what they want, so you can’t effec­tively tar­get ads.
  • You don’t know what user behav­ior on your site to track.
  • You don’t know what infor­ma­tion you should start scrap­ing from the web, infor­ma­tion which you could use months or years down the line.

Besides work­ing back­wards from your busi­ness goals and busi­ness assets to a viable data strat­egy, I can also help you with more con­crete chal­lenges in NLP and machine learning:

  • How do I improve my search engine so that users don’t miss out on rel­e­vant results?
  • How do I add or improve rec­om­men­da­tion, to con­nect users with what they want?
  • How do I scale this ML algo­rithm to bil­lions of exam­ples with mil­lions of features?
  • How do I improve the accu­racy of this NLP or ML tool?

Who am I?

My name is Joseph Turian, and I head MetaOp­ti­mize LLC. We con­sult on NLP, ML, and data strat­egy. We also run the MetaOp­ti­mize Q&A site, where ML and NLP experts share their knowledge.

  • I am a data expert, hold­ing a Ph.D. in nat­ural lan­guage pro­cess­ing and machine learn­ing. I have a decade of expe­ri­ence in these top­ics. I spe­cial­ize in large data sets.
  • I’m business-minded, so I focus on busi­ness goals and the most direct path of exe­cu­tion to achieve these goals.
  • I am also a tech­nol­ogy gen­er­al­ist who has been hack­ing since age 10 and has pro­grammed com­pet­i­tively at a world-class level.

Ref­er­ences from clients past and present avail­able upon request.


What is the offer?

You send me infor­ma­tion about what you’re doing and why you think I can help you.
Bonus points if you send me your deck, so I can under­stand your entire busi­ness pic­ture. You are ask­ing me to invest valu­able exper­tise and poten­tially IP in your com­pany, so appeal to me as a poten­tial investor.
Demer­its if you send me an NDA pre­ma­turely. Uptight com­pa­nies who think what they are doing isn’t pro­tected by good exe­cu­tion are a turn-off. But if you must be all James Bond about it, I’ll still con­sider you.

If I like what you’re doing and I can bud­get time, we sched­ule a meet­ing (in per­son or over Skype) and I’ll give you a free con­sul­ta­tion on what you’re doing.

If the ini­tial meet­ing goes well, and we both see how I can add value to your busi­ness, we can decide to con­tinue work­ing together. I can con­tinue to help you either:

  • Advis­ing you peri­od­i­cally about your data strategy.
  • Build­ing you new tools to use in your product.
  • Licens­ing to you exist­ing tools I’ve already built.
  • Train­ing your smart tech geeks on NLP and ML tech­nol­ogy for you to build in-house.

Com­pen­sa­tion accepted in the form of cash or equity or a mix of both. Pro-bono if you’re an awe­some non-profit.


Why am I doing this?

  • More deals is always good.
  • I am a social hacker, and enjoy con­nect­ing and shar­ing with other entre­pre­neurs. I want to meet some more excel­lent people.
  • I would like to improve my under­stand­ing of wide­spread chal­lenges and pain points in data strat­egy. That way, I can build a prod­uct that is use­ful for many people.
  • This is an inter­est­ing social busi­ness experiment.

Who is this offer for?

  • Open-source projects look­ing to use NLP + ML to improve their users’ experience.
  • Unfunded star­tups with a promis­ing team, prod­uct, and market.
  • Funded star­tups.
  • Estab­lished companies.

What are you wait­ing for?

Email me your pitch and how you need help mon­e­tiz­ing data.
Or for­ward this blog post to a friend who could use this information.


KEA Keyphrase Extraction as an XML-RPC service (code release)

Sum­mary
We release code writ­ten by Ali Afshar, which turns the KEA keyphrase extrac­tor into an XML-RPC ser­vice. This allows you to use KEA as a ser­vice, call­ing it from a vari­ety of dif­fer­ent pro­gram­ming lan­guages. The code is released under the New BSD License.

Back­ground
Keyphrase extrac­tion (AKA ter­mi­nol­ogy min­ing, term extrac­tion, term recog­ni­tion, or glos­sary extrac­tion) is the


PyLucene 3.0 in 60 seconds — Tutorial sample code for the 3.0 API

Until there is bet­ter doc­u­men­ta­tion for Lucene 3.0, I rec­om­mend you use Lucene 2.4 or 2.9. Nonethe­less, I pro­vide a basic index­ing and retrieval code using the PyLucene 3.0 API, per­haps the first such exam­ple code on the web.


Perhaps job hopping is a good thing?

Sum­mary
I spec­u­late that job hop­ping, if it becomes a wide­spread phe­nom­e­non, might actu­ally lead to improved busi­ness effi­ciency. In this way, the “Gen Y” job hop­ping phe­nom­e­non could ulti­mately prove beneficial.

Back­ground

Mark Suster begins the debate by writ­ing: “[Job Hop­pers] Make Ter­ri­ble Employ­ees”.
Paul Dix responds that job hop­ping is not cor­re­lated with employee qual­ity and there are


Code maintainability, and the joy of outsourcing

Sum­mary
Accord­ing to com­mon wis­dom, the best code is devel­oped in-house. I am begin­ning to believe this is only true when the code must be tightly cou­pled, or there are real­is­tic secu­rity con­cerns. These sce­nar­ios are less com­mon than man­agers like to believe.
For run-of-the-mill devel­op­ment projects, out­sourc­ing might have advan­tages above-and-beyond cost sav­ings. If your code effort


Lean Startup, and The Stooges

Okay, I’m ready.
After read­ing a hand­ful of arti­cles mak­ing ten­u­ous con­nec­tions between entre­pre­neur­ship and music, including :

The Noto­ri­ous CEO: Ten Startup Com­mand­ments from Big­gie Smalls
Being like The Sex Pis­tols can help your startup?

I’ve decided to come out and share my favorite startup music.
Dirt, by The Stooges, is a proto-punk cut that sprawls for seven-minutes, brood­ing and smol­der­ing. It


Constitution for Governance of Open-Source Projects (v20100227)

Sum­mary
I pro­pose a default “Con­sti­tu­tion for Gov­er­nance of Open-Source Projects”.

Back­ground
I recently got involved in the OSQA project, which is a fork of CNPROG, which in turn is a clone of the Stack­Ex­change Q&A forum soft­ware.
Note that the OSQA project has no for­mal “home­page”, or instruc­tions on how to get involved. I only dis­cov­ered by chance that there is a mailing-list


Why can’t you pickle generators in Python? A pattern for saving training state

Sum­mary

A pat­tern for per­sist­ing gen­er­a­tors is to turn them into pickle-able class objects. This is use­ful when you use gen­er­a­tors for stream­ing train­ing exam­ples.
I would also try generator_tools, which might be a more con­ve­nient alter­na­tive to the pat­tern I describe. I haven’t used it yet.

Gen­er­a­tors for stream­ing train­ing exam­ples
For machine learn­ing, python gen­er­a­tors are a sim­ple idiom that make it


Use flag –xml when you run mysqldump

Sum­mary:

If you have text data (like a web scrape) stored in a MySQL data­base, and you want to share the data, mysql­dump to XML using the –xml flag.

When fields are unlikely to con­tain tabs, an even sim­pler for­mat is a tab-separated file, cre­ated using the –tab=path flag to mysql­dump. path must be owned by the MySQL database user.

The Prob­lem


Automatically sorting graph curves

A script for auto­mat­i­cally sort­ing graph curves, e.g. for gnuplot.