|
I have a directed graph database including various types of nodes and relationships. My goal is to automatically classify nodes into categories based on their relationships to other nodes. To do this, I plan to identify all existing bigrams (relationship-node tuples) such that a specific node would represent a 'document' consisting of the bigrams it is directly related to. (N.B. I use bigrams as a bag-of-words model would eliminate the original relationship configuration) For example, I have the following relationships: (Bob-Loves-Jane), (Bob-Likes-Jazz), (Bob-Plays-Football), (Charles-Eats-Sandwiches), ... From this, I'd extract: (Loves-Jane), (Likes-Jazz), (Plays-Football), (Eats-Sandwiches), ... and assign each of them to a different token. Now, the document pertaining to Bob would be: [(Loves-Jane), (Likes-Jazz), (Plays-Football)] I could then use LDA (Latent Dirichlet Allocation) to classify what type of person Bob is, relative to everyone else in the network. I'd now like to include relationships such as: (Bob-Age-25) and (Bob-Earns-$1Million) Is there a way to incorporate the notion of numerical values to the LDA algorithm in order to extract features such as "Bob is fairly young" and "Bob is mega rich" (I understand LDA won't be this explicit!). Thanks in advance for any advice! |
LDA = latent dirichlet allocation or LDA = linear discriminant analysis?
Latent Dirichlet Allocation. I'm sorry for the ambiguity.