Awesome

About this project

This project extracts the text from an article using Python Article Library and uses NLTK (Natural Language Processing Toolkit) to preprocess the text and extract the most common words in the article

Tools

Newspaper3k: tool to scrape article
NLTK: tool to process text

Steps

Scrape articles with newspaper3k

from newspaper import Article

url = 'https://mystudentvoices.com/it-took-me-2-years-to-get-1000-followers-life-lessons-ive-learned-throughout-the-journey-9bc44f2959f0'
article = Article(url)

article.download()

Find the publish date

article.publish_date

Extract image
Find the author
Find the keywords
Find the summary
Preprocessing with NLTK
- Tokenize text
- Lowercase and remove stopwords
Visualization the frequency of words with Matplotlib

Tutorial blog

Find the Medium article for this repository here