Analyzing CV/resume using natural language processing and machine learning
Abstract
This paper proposes a model of extracting important information from the semi-structured text format in a curriculum vitae or resume and ranking it according to the preference of the associated company and requirements. In order to achieve the desired goal, the entire process has been divided into 3 basic segments. The first segment consists of segmenting the entire CV / Resume based on the topic of each part, the second segment consists of extracting data in structured form from the unstructured data and the final segment consists of evaluating the structured data by decision tree algorithm and training the system. The structured data extraction process is done by segmenting the entire CV / Resume by converting it to HTML. After the conversion to structured data, decision tree algorithm techniques are used to classify the input into different categories based on qualifications and then the data with positive weight is used to train the system for future benefit. Finally, classifier algorithm apart from decision tree such as logistic regression is used to compare the classification result.