Home

Awesome

Resume-Classification-Dataset

Dataset Description

The dataset comprises resumes from various sources, including Google Images, Bing Images, and LiveCareer. Each resume entry has two columns: "Category" and "Text". The "Category" column indicates the job title associated with the resume, while the "Text" column contains the textual content extracted from the resumes using optical character recognition (OCR).

Data Sources

Total Number of Records

The dataset contains 13389 records, encompassing job titles and corresponding resume texts from all three sources.

Data Collection Process

Visualization:

Average Word Count Average Character Count

Challenges in The Dataset

Unstructured Format

Personal Information

Overlapping Content (Repeated)

Spelling Errors

Irrelevant Text

Watermarks and Highlighted Text

Special Characters

Links

Irrelevant Experience

Data Preprocessing

Lowercasing

Removing Punctuation

Additional Cleaning

Tokenization

Removing Stop Words

Images Data set