Awesome
Politic Bots
Politic Bots is a side-project that started within the research project ParticipaPY of the Catholic University "Nuestra Señora de la Asunción", which aims at designing and implementing solutions to generate spaces of civic participation through technology.
Motivated by the series of journalist investigations (e.g., How a Russian 'troll soldier' stirred anger after the Westminster attack, Anti-Vaxxers Are Using Twitter to Manipulate a Vaccine Bill, A Russian Facebook page organized a protest in Texas. A different Russian page launched the counterprotest, La oscura utilización de Facebook y Twitter como armas de manipulación política, How Bots Ruined Clicktivism) regarding the use of social media to manipulate the public opinion, specially in times of elections, we decided to study the use of Twitter during the presidential elections that took place in Paraguay in December 2017 (primary) and April 2018 (general).
To understand how the public and the political candidates use Twitter, we collected tweets published through
the accounts
of the candidates or containing hashtags
used in the campaigns. This information was
recorded in a CSV file that was provided to the tweet collector. The source code of the collector is available at
here and the CSV
file used to pull data from Twitter during the primaries can be found here.
Data Augmentation
The accounts and hashtags employed to collect tweets during the primary elections were augmented with information about the parties and internal movements of the candidates. A similar approach was followed to collect tweets for the general election. However, in this case, the hashtags and accounts were supplemented with information not only of the candidate parties but also about the region of the candidates, the name of their coalitions (if any), and the political positions that they stand for. The CSV file used to collect tweets during the general elections can be found here.
The functions create_flags
and add_values_to_flags
that annotate the tweets with information of the
candidate's party and movement are implemented in the module add_flags.py
in src/tweet_collector
.
Data Cleaning
Some of the hashtags used by the candidates were generic Spanish words employed in other contexts and Spanish-speaking
countries (e.g., a marketing campaign in Argentina) so, before starting any analysis, we had to ensure that the collected
tweets were actually related to the elections in Paraguay. We labeled the collected tweets as relevant if they mention
candidate accounts or if they had at least more than one of the hashtags of interest. The class TweetEvaluator
in
src/utils/data_wrangler.py
contains the code that labels the collected tweets as relevant or not for this project.
Structure of the repository
├── LICENSE
├── README.md <- The top-level README for this project.
├── requirements.txt <- The requirements file for reproducing the project environment, e.g.
│ generated with `pip freeze > requirements.txt`
├── data
├── sna <- Social Network Analysis
│ ├── gefx <- Files that record the interaction network among users
│ ├── img <- Images that illustrate the interaction network among users
├── reports <- Reports about the usage of Twitter during elections in Paraguay
│ ├── notebooks <- Jupyter notebooks used to conduct the analyses
│ ├── html <- HTML files with the results of the analyses
├── src <- Source code of the project
│ ├── __init__.py <- Makes src a Python module
│ ├── run.py <- Main script to run analysis tasks
│ ├── config.json.example <- Example of a configuration file
│ ├── analyzer
│ │ └── data_analyzer.py <- Functions to conduct analyses on tweets
│ │ └── network_analysis.py <- Class used to conduct Social Network Analysis
│ ├── bot_detector
│ │ └── bot_detector.py <- Main class to conduct the detection of bots
│ │ └── run.py <- Main function to execute the detection of bots
│ │ └── heuristics
│ │ │ └── fake_handlers.py <- Functions to execute the heuristic fake handlers
│ │ │ └── fake_promoter.py <- Functions to execute the heuristic fake promoter
│ │ │ └── heuristic_config.json <- Configuration file with the parameters of the heuristics
│ │ │ └── simple.py <- Functions to execute a set of straighforward heuristics
│ ├── tweet_collector
│ │ └── add_flags.py <- Functions used to augment tweets with information about the candidates
│ │ └── generales.csv <- CSV file with the hashtags and accounts used to collect tweets related to
│ │ │ the general elections
│ │ └── internas_2017.csv <- CSV file with the hashtags and accounts used to collect tweets related to
│ │ │ the primary elections
│ │ └── tweet_collector.py <- Class implemented to collect tweets by hitting the API of Twitter
│ ├── utils
│ │ └── data_wrangler.py <- Functions and classes to clean and pre-process the data
│ │ └── db_manager.py <- Main class to operate the MongoDB used to store the tweets
│ │ └── utils.py <- General utilitarian functions
Analyses
The directory reports
contains the analyses conducted to study the use of Twitter during the primary and general
elections. Jupyter notebook was employed to document the analyses and report the results. HTML
files were generated to facilitate the access to the analyses and results.
Getting Started
Installation guide
Links to packages are provided below.
- Download and install Python >= 3.4.4;
- Download and install MongoDB community version;
- Create a Twitter APP by following the instructions here;
- Clone the repository
git clone https://github.com/ParticipaPY/politic-bots.git
; - Get into the directory of the repository
cd politic-bots
; - Create a virtual environment by running
virtualenv env
; - Activate the virtual environment by executing
source env/bin/activate
; - Inside the directory of the repository install the project dependencies by running
pip install -r requirements.txt
;
Optional for social network analysis: Download and install Gephi
Run
There are some tasks that can be run from src/run.py
. Bellow we explain each of them.
Pre-requirements
- Set in
src/config.json
the information of the MongoDB database that is used to store the tweets; - Activate the virtual environment by executing
source env/bin/activate
.
Collect Political Tweets
The data sets of tweets collected during the presidential primary and general elections that took place in Paraguay in December 2017 and April 2018, respectively, are available to be download.
In case a new set of tweets needs to be downloaded, below we list the steps that are required to follow.
- Create a CSV file to contain the name of the Twitter accounts and hashtags of interest. The CSV file should have a column called keyword with the list of accounts and hashtags. Additional columns can be added to the CSV to complement the information about accounts and hashtags. An example of CSV file can be found here;
- Rename
src/config.json.example
tosrc/config.json
; - Set to metadata in
src/config.json
the path to the CSV file; - Set to consumer_key and consumer_secret in
src/config.json
the information of the authentication tokens of the Twitter App created during the installation process; - Go to the
src
directory and executepython run.py --collect_tweets
.
Depending on the number of hashtags and accounts the collection can take several hours or even days.
Create database of tweet authors
Before conducting analyses on the tweets, a database of the authors of tweets should be created. To create the database
of users active the virtual environment source env/bin/activate
and execute from the src
directory
python run.py --db_users
. A new MongoDB database so called users
is created as a result of this process.
Analyze the sentiment of tweets
It is possible to analyze the tone of tweets by executing, from the src
directory, python run.py --sentiment_analysis
.
The sentiment of tweets are stored as part of the dictionary that contains the information of the tweet under the key
sentimiento
. We use the library CCA-Core to analyze the sentiment embed in Tweets.
See here for more information about the CCA-Core library.
Identify relevant tweets
Tweets should be evaluated to analyze their relevance for this project. See Data Cleaning section to understand
the problems with the hashtags used to collect tweets. From the src
directory of the repository and after activating
your virtual environment source env/bin/activate
, run python run.py --flag_tweets
to perform both tasks. The
flag relevante
, added to the dictionary that stores the information of the tweets, indicates whether the tweet is
relevant or not for the purpose of this project.
Generate network of interactions
Once the database of users was generated a network that shows the interactions among them can be created for a
follow-up social network analysis. From the src
directory and after activating your virtual environment
(source env/bin/activate
), run python run.py --interaction_net
to generate the network of interactions
among the tweet authors. Examples of interaction networks can be found in the directory sna
of the repo.
Troubleshooting
If you get the error ImportError: No module named
when trying to execute the scripts, make sure to be at the
src
directory. If after being at the src
directory you still get the same error, it is possible that you need to add
the src
directory to the PYTHONPATH
by adding PYTHONPATH=../
at the beginning of the execution commands, e.g.,
PYTHONPATH=../ python analyzer/pre_analysis.py
Technologies
Issues
Please use Github's issue tracker to report issues and suggestions.
Contributors
Jammily Ortigoza, Jorge Saldivar, Josué Ibarra, Laura Achón, Cristhian Parra