Revision history[back]
click to hide/show revision 1
Revision n. 1

Nov 04 '10 at 03:30

Joseph%20Turian's gravatar image

Joseph Turian
577551125146

Find semantically related terms over a large vocabulary (>1M)?

Let's say I have several 100M documents, which are very short (only a few words). There are several 1M terms in the vocabulary. What is the fastest way to find the top-k semantically related terms for each term in the vocabulary?

When I say fastest, I mean that it should take under a week of computation time, and as little human time as possible. So use of existing implementations is encouraged.

click to hide/show revision 2
Revision n. 2

Nov 04 '10 at 04:33

Joseph%20Turian's gravatar image

Joseph Turian
577551125146

Find semantically related terms over a large vocabulary (>1M)?

Let's say I have several 100M hundred million documents, which are very short (only a few words). There are several 1M million terms in the vocabulary. What is the fastest way to find the top-k semantically related terms for each term in the vocabulary?

When I say fastest, I mean that it should take under a week of computation time, and as little human time as possible. So use of existing implementations is encouraged.

click to hide/show revision 3
Revision n. 3

Nov 05 '10 at 21:38

Joseph%20Turian's gravatar image

Joseph Turian
577551125146

Find semantically related terms over a large vocabulary (>1M)?

Let's say I have several hundred million documents, which are very short (only a few words). There are several million terms in the vocabulary. What is the fastest way to find the top-k semantically related terms for each term in the vocabulary?

When I say fastest, I mean that it should take under a week of computation time, and as little human time as possible. So use of existing implementations is encouraged.

Edit: At the behest of commentators, this problem is now an actual challenge, with a dataset, you can play with.

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.