|
Note that I am doing everything in R. The problem goes as follow: Basically, I have a list of resumes (CVs). Some candidates will have work experience before and some don't. The goal here is to: based on the text on their CVs, I want to classify them into different job sectors. I am particular in those cases, in which the candidates do not have any experience / is a student, and I want to make a prediction to classify which job sectors this candidate will most likely belongs to after graduation . Question 1: I know machine learning algorithms. However, I have never done NLP before. I came across Latent Dirichlet allocation on the internet. However, I am not sure if this is the best approach to tackle my problem. My original idea: make this a supervised learning problem. Since some candidates have work experience and are currently work, we can treat these as labelled data based on the most recent job sector they work in. We train the model up using ML algorithms (i.e. nearest neighbor... )and feed in those unlabelled data, which are candidates that have no work experience / are students, and try to predict which job sector they will belong to. Question 2: The tricky part is: how to identify and extract the keywords ? Using the Any ideas would be great. Thanks |