Awesome
About this project
This project extracts the text from an article using Python Article Library and uses NLTK (Natural Language Processing Toolkit) to preprocess the text and extract the most common words in the article
Tools
- Newspaper3k: tool to scrape article
- NLTK: tool to process text
Steps
- Scrape articles with newspaper3k
from newspaper import Article
url = 'https://mystudentvoices.com/it-took-me-2-years-to-get-1000-followers-life-lessons-ive-learned-throughout-the-journey-9bc44f2959f0'
article = Article(url)
article.download()
- Find the publish date
article.publish_date
- Extract image
- Find the author
- Find the keywords
- Find the summary
- Preprocessing with NLTK
- Tokenize text
- Lowercase and remove stopwords
- Visualization the frequency of words with Matplotlib
Tutorial blog
Find the Medium article for this repository here