|
I am working on a natural language processing C# application. I have a text describing 30 domains. Each domain is defined with a short paragraph that explains it. My aim is to build a thesaurus from this text so I can determine from an input string which domains are concerned. The text is about 5000 words and each domains is described by 150 words. My questions are : Do I have a long enough text to create a thesaurus from ? Is my idea of building a thesaurus legit or should I just use NLP libraries to analyse my corpus and the input string ? At the moment, I have calculated the number total of occurrence of each words grouped by domains because I first thought of a indexed approach. But I am really not sure which method is the best. Does someone have experience in both NLP and thesaurus building ? |