Awesome
rsscrape
Python script to extract news from RSS feeds and save it as json.
Usage
$ python3 rsscrape.py
[INFO] Found 51 in 'feeds.txt'
[INFO] Requests 51 XMLs content
[INFO] Scrape 10 items
[INFO] Write 1250 json files to './items'
[INFO] 1648 json files in './items'
Generates a directory items
with the results:
./items
0a1c2b2da6e40ab4e54b8247bbbc1422.json
fc8ddcf4cc0725bfa35564fb19e4a407.json
fe15bf1383c382101984ea4fdc6a33ae.json
...
Each json file correspondends to a single RSS item:
// f8b40f2bb091e41c53eb35528c433d7f.json
{
"title": "USA: Corona, war da was?",
"link": "https://de.nachrichten.yahoo.com/usa-corona-war-135203870.html",
"pubDate": "2021-11-23T13:52:03Z",
"source": "ZEIT ONLINE",
"guid": "usa-corona-war-135203870.html",
"raw": "<item xmlns:media=\"http://search.yahoo.com/mrss/\"><title>USA: Corona, war da was?</title><link>https://de.nachrichten.yahoo.com/usa-corona-war-135203870.html</link><pubDate>2021-11-23T13:52:03Z</pubDate><source url=\"http://www.zeit.de/index\">ZEIT ONLINE</source><guid isPermaLink=\"false\">usa-corona-war-135203870.html</guid><media:content height=\"86\" url=\"https://s.yimg.com/uu/api/res/1.2/_rdWs7VS_33DY3PJWhkh6Q--~B/aD04MTA7dz0xNDQwO2FwcGlkPXl0YWNoeW9u/https://media.zenfs.com/de/zeit_921/2c35cfd59ae80f62a1ecb89623d2a47f\" width=\"130\"/><media:credit role=\"publishing company\"/></item>"
}