Note that I am doing everything in R.

The problem goes as follow:

Basically, I have a list of resumes (CVs). Some candidates will have work experience before and some don't. The goal here is to: based on the text on their CVs, I want to classify them into different job sectors. I am particular in those cases, in which the candidates do not have any experience / is a student, and I want to make a prediction to classify which job sectors this candidate will most likely belongs to after graduation .

Question 1: I know machine learning algorithms. However, I have never done NLP before. I came across Latent Dirichlet allocation on the internet. However, I am not sure if this is the best approach to tackle my problem.

My original idea: make this a supervised learning problem. Since some candidates have work experience and are currently work, we can treat these as labelled data based on the most recent job sector they work in. We train the model up using ML algorithms (i.e. nearest neighbor... )and feed in those unlabelled data, which are candidates that have no work experience / are students, and try to predict which job sector they will belong to.

Question 2: The tricky part is: how to identify and extract the keywords ? Using the tm package in R ? what algorithm is the tm package based on ? Should I use NLP algorithms ? If yes, what algorithms should I look at ? Please point me to some good resources to look at as well.

Any ideas would be great. Thanks

asked Jul 02 '14 at 08:34

timmonly's gravatar image

timmonly
15445

edited Jul 02 '14 at 08:39

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.