Awesome
Opensubtitles_dataset
downloads and parses subtitle dataset from opensubtitles.org
Usage
python3 parse_opensubtitle_xml.py
the above will download a zip containing the english opensubtitles corpus, and extract text from all the xml files (removes metadata)