|
Given an input set of entities, are there established methods for inferring shared semantic properties between those entities? For example, given a list of three entities { Hillary Clinton, Angela Merkel, Margaret Thatcher } are there systems or methods that can infer that all 3 are 'People' or 'Politicians' or 'Female Politicians?' Or given a set of entities like {Babe Ruth, baseball bat, pitcher } are there methods to infer the common property of 'Baseball?' I feel as if my question is semi-related to topic modeling, but instead of finding latent groups of words that represent a large collection of text, I want to find latent semantic properties of a select few entities. It seems like the most obvious solution would be to use existing knowledgebases / ontologies. I've thought about using overlapping Wikipedia categories or 'Types' from Freebase / YAGO2 / DBpedia. Are there any other semantic knowledgebases / ontologies that are worth looking at for this problem? Or are there other ways in general to approach this kind of problem? Sorry if the question is too vague, but any guidance towards the correct subfields of research / keywords would be greatly appreciated! |
|
It all depends on what kind information you have about the entities and if the target categories are given or if you ahve to infer what the possible categories are as well. I think you might be after Formal Concept Analysis though:
However any unsupervised method may be potentially useful depending on the details of what you are trying to achieve. Many methods have a common mathematical view of the data as a undirected (possibly weighted) bipartite graph. This is true of bag of words text models, collaborative recommendations, frequent itemset analysis as well as formal concept analysis and methods from all these areas may be relevant to you, if your data fits this model. However FCA sounds like the best fit from what you have said. Thanks Daniel for the pointer to FCA. I've been reading a couple of academic papers (mostly the introduction section so I can get of better sense of what FCA is about), and it definitely seems like a relevant approach. From my original perspective when posting the question, the target categories would not be given (I wanted a method that could infer them). However, from my brief reading it seems like I could "transform" my problem into something that could be solved by an FCA-like approach. If I build a "concept lattice" or maybe just a "formal context" before hand (by scraping objects/attributes from existing ontologies), then given an input set of objects / entities I should be able to quickly determine shared properties. Hrm, although now that I think about it, that sounds a lot like what I was originally thinking of doing ("using overlapping [insert-ontology-here] categories").
(Aug 11 '11 at 21:23)
Anthony Wong
yes, basically the idea is that initially you may only have low level attributes like "yellow", "bent" & "edible" and FCA can help you figure out the {"yellow", "bent", "edible"} is a natural concept (ie "banana"). One problem is with FCA is that it can be sensitive to noise. If you have objects with missing attributes and/or spurious attributes, you can end up with a very large concept lattice (potentially a complete boolean lattice on the powerset of your attributes) so if you are using real world data, it may be worthwhile to clean/smooth the data first, but that can be a research project in itself :) I have often thought that there should be something like robust FCA that would cope better with dirty data, but I have not come across it, but I did not look hard recently either.
(Aug 12 '11 at 17:50)
Daniel Mahler
|