Analyzing CV/resume using natural language processing and machine learning

Reza, Md. Tanzim; Zaman, Md. Sakib

View/Open

14101061,14101171_CSE.pdf (1.298Mb)

Date

2017

Publisher

BRAC University

Abstract

This paper proposes a model of extracting important information from the semi-structured text format in a curriculum vitae or resume and ranking it according to the preference of the associated company and requirements. In order to achieve the desired goal, the entire process has been divided into 3 basic segments. The first segment consists of segmenting the entire CV / Resume based on the topic of each part, the second segment consists of extracting data in structured form from the unstructured data and the final segment consists of evaluating the structured data by decision tree algorithm and training the system. The structured data extraction process is done by segmenting the entire CV / Resume by converting it to HTML. After the conversion to structured data, decision tree algorithm techniques are used to classify the input into different categories based on qualifications and then the data with positive weight is used to train the system for future benefit. Finally, classifier algorithm apart from decision tree such as logistic regression is used to compare the classification result.

Keywords

Machine learning; CV; Resume; Natural language; NLP; JSON; ID3

Description

This thesis report is submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2017.

Cataloged from PDF version of thesis report.

Includes bibliographical references.

Department

Department of Computer Science and Engineering, BRAC University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1589]