Discussion 2.0: Personalization

[The fol­low­ing post is my sub­mis­sion to the Knight-Mozilla “Beyond Com­ment Threads” chal­lenge.]

The fol­low­ing are the core prob­lems with cur­rent dis­cus­sion systems:

  1. Trolls, acri­mo­nious peo­ple, and low qual­ity com­men­tary can drown out thought­ful dis­cus­sion and destroy a good community.
  2. Bias towards senior­ity: Deep insight is penal­ized if it comes from a new, unknown, or anony­mous voice. For exam­ple, on Quora, answer­ing a ques­tion one month faster than some­one else can lead to a rich-get-richer phe­nom­e­non where the old answer gets more upvotes because it is always shown as the top answer, and hence has more vis­i­bil­ity merely because it is older. Nepo­tism cre­ates arti­fi­cial fric­tion and a barrier-entry, because it is an effec­tive tech­nique for enforc­ing com­mu­nity stan­dards. But it has the down­side that it discriminates—gently or extremely—-against insight­ful new com­men­ta­tors and anonymity.
  3. Vot­ing sys­tems can be gamed by vot­ing rings and bal­lot stuffing.
  4. Vot­ing sys­tems can lead to “mob rule”.

How do we address all these core prob­lems in tra­di­tional com­ment­ing sys­tems?
How can we cre­ate an engag­ing sys­tem that most effec­tively pro­motes dis­cus­sion? Can we avoid nepo­tism and bias against new, unknown, and anony­mous com­men­ta­tors? How can we defend against basic trolling and vot­ing rings?
A next gen­er­a­tion dis­cus­sion sys­tem must address these core problems.

The core value of a dis­cus­sion sys­tem is to encour­age stim­u­lat­ing and engag­ing dis­cus­sion. We want a sys­tem that is fric­tion­less to par­tic­i­pate in: You can lurk for years and then jump in when you have some­thing great and insight­ful to say, and your voice is heard loud and clear. This is true democ­ra­ti­za­tion of discussion.

The solu­tion is per­son­al­iza­tion. A next gen­er­a­tion dis­cus­sion sys­tem is per­son­al­ized. Per­son­al­iza­tion makes dis­cus­sion more stim­u­lat­ing and engag­ing. Each user that reads and par­tic­i­pates gets a com­ment thread that is sorted by per­sonal rel­e­vancy. Irrel­e­vant com­ments are hid­den by default, but can option­ally be viewed. Per­son­al­iza­tion is tuned to pro­mote dis­cus­sion that the user finds stim­u­lat­ing and engag­ing, and hid­ing dis­cus­sion that the user finds off-topic, spammy, exces­sively or insuf­fi­ciently detailed, etc.

The beauty of per­son­al­iza­tion is its flex­i­bil­ity. It does not force a par­tic­u­lar style of dis­cus­sion. If the user enjoys:

  • heated dis­cus­sion back-and-forth discussion,
  • calm dis­cus­sion with well-reasoned but con­cise arguments,
  • in-depth aca­d­e­mic discourse,
  • tabloid-like ad-hominem, or
  • trolling and hate speech

then the user gets what they want.

Addi­tion­ally, per­son­al­iza­tion is adapted on a per-topic basis. One par­tic­u­lar user might enjoy a heated dis­cus­sion about abor­tion, a calm well-reasoned dis­cus­sion about NoSQL data­bases, and low-brow dis­cus­sion about celebrity romance. Per-topic per­son­al­iza­tion can sat­ify all these user needs.

I can dis­cuss more details about this approach, includ­ing how to:

  • cap­ture per­son­al­iza­tion infor­ma­tion through user inter­ac­tion with the dis­cus­sion board.
  • incor­po­rate atomic commenting.
  • fed­er­ate dis­cus­sion across mul­ti­ple sites and lib­er­ate dis­cus­sion from a sin­gle site.

I can also dis­cuss pos­si­ble objec­tions to per­son­al­iza­tion, and my response to them.
Due to space lim­i­ta­tions (500 words), I omit these details for now, and focus on per­son­al­iza­tion, which I believe addresses the core prob­lems of tra­di­tional dis­cus­sion systems.


Fat Free CRM in five minutes on a fresh Amazon EC2 micro instance

Would you like to get Fat Free CRM up-and-running, but spend only five min­utes on deploy­ment?
I am not a Rails hacker, so get­ting Fat Free CRM installed and run­ning is non-trivial for me.
fatfreecrm-ec2 will auto­mat­i­cally deploy Fat Free CRM on a fresh Ama­zon EC2 micro instance. I have also tested it on a fresh Ubuntu Lin­ode slice.
Caveat: The five min­utes will


NLP Challenge: Find semantically related terms over a large vocabulary (>1M)?

Sum­mary
In the spirit of shared tasks and NLP “bake offs”, I hereby announce the first MetaOp­ti­mize Chal­lenge. It’s an open prob­lem, and I am inter­ested in involv­ing prac­ti­tion­ers who want to demo their style, as well as peo­ple who want to learn some large-scale IR/NLP. Hope­fully, we’ll all learn some­thing about var­i­ous real-world approaches.
Join the announce­ment list


Information Organization: A case study in music recommendations

I intro­duce “infor­ma­tion orga­ni­za­tion”, an approach which I have been explor­ing for sev­eral years. As a case study, music rec­om­men­da­tions should be orga­nized, but exist­ing appli­ca­tions cur­rently orga­nize music rec­om­men­da­tions poorly. I dis­cuss issues with cur­rent appli­ca­tions, and dis­cuss fea­tures that address these issues.


Free consultation on data strategy (NLP, ML, business intelligence, etc.)

Sum­mary
Email me your pitch and how you need help mon­e­tiz­ing data.
If I like your pitch, I’ll give you a free con­sul­ta­tion on data strat­egy (NLP, ML, busi­ness intel­li­gence, etc.)
After­wards, if we both think that I can add value to your busi­ness, we can talk about a longer-term rela­tion­ship.
You should for­ward this blog post to any friend who could use


KEA Keyphrase Extraction as an XML-RPC service (code release)

Sum­mary
We release code writ­ten by Ali Afshar, which turns the KEA keyphrase extrac­tor into an XML-RPC ser­vice. This allows you to use KEA as a ser­vice, call­ing it from a vari­ety of dif­fer­ent pro­gram­ming lan­guages. The code is released under the New BSD License.

Back­ground
Keyphrase extrac­tion (AKA ter­mi­nol­ogy min­ing, term extrac­tion, term recog­ni­tion, or glos­sary extrac­tion) is the


PyLucene 3.0 in 60 seconds — Tutorial sample code for the 3.0 API

Until there is bet­ter doc­u­men­ta­tion for Lucene 3.0, I rec­om­mend you use Lucene 2.4 or 2.9. Nonethe­less, I pro­vide a basic index­ing and retrieval code using the PyLucene 3.0 API, per­haps the first such exam­ple code on the web.


Perhaps job hopping is a good thing?

Sum­mary
I spec­u­late that job hop­ping, if it becomes a wide­spread phe­nom­e­non, might actu­ally lead to improved busi­ness effi­ciency. In this way, the “Gen Y” job hop­ping phe­nom­e­non could ulti­mately prove beneficial.

Back­ground

Mark Suster begins the debate by writ­ing: “[Job Hop­pers] Make Ter­ri­ble Employ­ees”.
Paul Dix responds that job hop­ping is not cor­re­lated with employee qual­ity and there are


Code maintainability, and the joy of outsourcing

Sum­mary
Accord­ing to com­mon wis­dom, the best code is devel­oped in-house. I am begin­ning to believe this is only true when the code must be tightly cou­pled, or there are real­is­tic secu­rity con­cerns. These sce­nar­ios are less com­mon than man­agers like to believe.
For run-of-the-mill devel­op­ment projects, out­sourc­ing might have advan­tages above-and-beyond cost sav­ings. If your code effort


Lean Startup, and The Stooges

Okay, I’m ready.
After read­ing a hand­ful of arti­cles mak­ing ten­u­ous con­nec­tions between entre­pre­neur­ship and music, including :

The Noto­ri­ous CEO: Ten Startup Com­mand­ments from Big­gie Smalls
Being like The Sex Pis­tols can help your startup?

I’ve decided to come out and share my favorite startup music.
Dirt, by The Stooges, is a proto-punk cut that sprawls for seven-minutes, brood­ing and smol­der­ing. It