This project focuses on classifying resumes into job categories using Natural Language Processing (NLP) and Machine Learning techniques.
The dataset contains resumes with the following columns:
Category
: The job role (e.g., Data Scientist, Web Developer, Java Developer, Sales, Mechanical, etc.)Resume
: Text summary of the candidate’s resume
- Checked the number of resumes and unique job categories
- Performed basic text analysis on the resume data
- Converted job role categories (text) into numeric labels using Label Encoding to make them suitable for machine learning models.
- Applied TF-IDF (Term Frequency–Inverse Document Frequency) using
TfidfVectorizer
fromscikit-learn
- Transformed textual resume data into numerical vectors representing the importance of words
- Used the
OneVsRestClassifier
wrapper to handle multi-class classification - Trained the model using K-Nearest Neighbors (KNN) algorithm
- Split the data into training and testing sets
- Achieved an impressive accuracy of around 98%
- The model successfully predicted job categories based on resume content
- Independent Feature: Resume text
- Dependent Feature: Job category
- Python
- Pandas
- Scikit-learn
- Natural Language Processing (TF-IDF)
- Clone the repository:
git clone <repository_url> cd resume-screening
- Install dependencies & Run the script:
pip install -r requirements.txt
python resume_classifier.py