I have a directed graph database including various types of nodes and relationships. My goal is to automatically classify nodes into categories based on their relationships to other nodes. To do this, I plan to identify all existing bigrams (relationship-node tuples) such that a specific node would represent a 'document' consisting of the bigrams it is directly related to. (N.B. I use bigrams as a bag-of-words model would eliminate the original relationship configuration)

For example, I have the following relationships: (Bob-Loves-Jane), (Bob-Likes-Jazz), (Bob-Plays-Football), (Charles-Eats-Sandwiches), ... From this, I'd extract: (Loves-Jane), (Likes-Jazz), (Plays-Football), (Eats-Sandwiches), ... and assign each of them to a different token. Now, the document pertaining to Bob would be: [(Loves-Jane), (Likes-Jazz), (Plays-Football)]

I could then use LDA (Latent Dirichlet Allocation) to classify what type of person Bob is, relative to everyone else in the network.

I'd now like to include relationships such as: (Bob-Age-25) and (Bob-Earns-$1Million)

Is there a way to incorporate the notion of numerical values to the LDA algorithm in order to extract features such as "Bob is fairly young" and "Bob is mega rich" (I understand LDA won't be this explicit!).

Thanks in advance for any advice!

asked Feb 21 '13 at 01:21

Alexander%20Bridi's gravatar image

Alexander Bridi
1112

edited Feb 22 '13 at 01:15

LDA = latent dirichlet allocation or LDA = linear discriminant analysis?

(Feb 21 '13 at 05:14) larsmans

Latent Dirichlet Allocation. I'm sorry for the ambiguity.

(Feb 21 '13 at 05:27) Alexander Bridi
Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.