Awesome

Using Semantics to Understand Fake News

Code for the EMNLP 2019 workshop (TextGraphs) paper "Do Sentence Interactions Matter ? Leveraging Sentence Level Representations for Fake News Classification"

Make sure the following files are present as per the directory structure before running the code,

fake_news_semantics
│   README.md
│   *.py
│   
└───data
    │   balancedtest.csv
    │   fulltrain.csv
    |   test.xsls

balancedtest.csv and fulltrain.csv can be obtained from https://drive.google.com/file/d/1njY42YQD5Mzsx2MKkI_DdtCk5OUKgaqq/view?usp=sharing

test.xsls is basically the SLN dataset according to the paper. You can obtain this dataset from http://victoriarubin.fims.uwo.ca/news-verification/data-to-go/ Contact me if you have trouble finding these datasets.

Dependencies,

pytorch 1.0.0
pandas
tqdm
xlrd (pip install xlrd)
bert-pytorch (pip install pytorch-pretrained-bert)

To train a LSTM model, run the following command,

python main.py --batch_size 1024 --config lstm --encoder 0 --mode 0

To train a CNN model, run the following command,

python main.py --batch_size 1024 --config cnn --encoder 1 --mode 0

To train a BERT model, run the following command,

python bert_classifier.py --batch_size 4 --max_epochs 10 --max_seq_length 500 --max_sent_length 70 --mode 0

To train a GCN based model, run the following command,

python main.py --batch_size 32 --max_epochs 10 --config gcn --max_sent_len 50 --encoder 2 --mode 0

To train a GCN based model with attention, run the following command,

python main.py --batch_size 32 --max_epochs 10 --config gcn_attn --max_sent_len 50 --encoder 3 --mode 0

To train a GATconv based model, run the following command,

python main.py --batch_size 32 --max_epochs 10 --config gat --max_sent_len 50 --encoder 4 --mode 0

To test the accuracy of the model on the out of domain test set, run the following command,

For the LSTM model,

python main.py --batch_size 1024 --encoder 0 --model_file model_lstm.t7 --mode 1

For the CNN model,

python main.py --batch_size 1024 --encoder 1 --model_file model_cnn.t7 --mode 1

For the Bert model,

python bert_classifier.py --batch_size 4 --model_file model_bert.t7 --max_seq_length 500 --max_sent_length 70 --mode 1

For the GCN model,

python main.py --batch_size 32 --max_sent_len 50 --encoder 2 --model_file model_gcn.t7 --mode 1

For the GCN model with attention,

python main.py --batch_size 32 --max_sent_len 50 --encoder 3 --model_file model_gcn_attn.t7 --mode 1

For the GATconv model,

python main.py --batch_size 32 --max_sent_len 50 --encoder 4 --model_file model_gat_attn.t7 --mode 1

Baseline Results

Out of domain test set accuracy

Model	Acc	Prec	Recall	F1
CNN	67.5	67.5	67.5	67.4
LSTM	81.4	82.2	81.4	81.3
BERT	78.1	78.1	78.1	78.0
LSTM + GCN + Max Pool	85.0	85.9	85.0	85.1
LSTM + GCN + Max Pool + Semantic Adj	86.4	86.4	86.3	86.4
LSTM + GCN + Self Attn	86.6	87.1	86.9	86.9
LSTM + GCN + Self Attn + Semantic Adj	87.8	87.8	87.8	87.8
LSTM + GAT	86.1	86.2	86.1	86.1
LSTM + GAT + Semantic Adj	87.5	87.5	87.5	87.4
LSTM + GAT + 2 Attn Heads	88.6	89.1	88.9	88.9
LSTM + GAT + 2 Attn Heads + Semantic Adj	84.7	85.2	84.7	84.6
SoTA	-	88.0	82.0

Results with a dev/test split based on news sources: This might be a more realistic split

For two classes Satire / Trusted

In domain dev set accuracy

Model	Acc	Prec	Recall	F1
CNN	96.82	96.84	96.62	96.73
LSTM	95.65	95.64	95.41	95.52
BERT	91.72	92.74	90.56	91.31
LSTM + GCN + Max Pool	98.08	98.12	97.89	98.02
LSTM + GCN + Max Pool + Semantic Adj	96.77	97.57	97.85	97.7
LSTM + GCN + Attn	98.27	98.05	98.42	98.22
LSTM + GCN + Attn + Semantic Adj	98.17	98.15	98.06	98.11
LSTM + GAT	98.36	98.44	98.12	98.29
LSTM + GAT + Semantic Adj	98.25	98.29	98.09	98.19
LSTM + GAT + 2 Attn Heads	98.44	98.44	98.34	98.39
LSTM + GAT + 2 Attn Heads + Semantic Adj	98.02	98.01	97.9	97.95

Out of domain test set 1 accuracy

Model	Acc	Prec	Recall	F1
CNN	67.5	67.79	67.5	67.37
LSTM	81.11	82.12	81.11	80.96
BERT	75.83	76.62	75.83	75.65
LSTM + GCN + Max Pool	85.83	86.16	85.83	85.8
LSTM + GCN + Max Pool + Semantic Adj	83.89	84.73	83.89	83.79
LSTM + GCN + Attn	85.27	85.59	85.27	85.24
LSTM + GCN + Attn + Semantic Adj	85.56	85.57	85.56	85.55
LSTM + GAT	86.39	86.44	86.38	86.38
LSTM + GAT + Semantic Adj	85.27	85.31	85.27	85.27
LSTM + GAT + 2 Attn Heads	84.72	85.65	84.72	84.62
LSTM + GAT + 2 Attn Heads + Semantic Adj	86.94	87.04	86.94	86.94
SoTA	-	88.0	82.0

Out of domain test set 2 accuracy

Model	Acc	Prec	Recall	F1
CNN	91.13	91.28	91.13	91.12
LSTM	91.53	91.54	91.53	91.53
BERT	83.46	83.56	83.46	83.45
LSTM + GCN + Max Pool	92.6	92.61	92.59	92.59
LSTM + GCN + Max Pool + Semantic Adj	89.73	90.57	89.73	89.68
LSTM + GCN + Self Attn	91.26	91.99	91.26	91.22
LSTM + GCN + Self Attn + Semantic Adj	92.4	92.53	92.39	92.39
LSTM + GAT	94.2	94.21	94.2	94.19
LSTM + GAT + Semantic Adj	92.6	92.69	92.59	92.59
LSTM + GAT + 2 Attn Heads	89.66	90.37	89.67	89.62
LSTM + GAT + 2 Attn Heads + Semantic Adj	92.86	93.06	92.87	92.86

For four classes Satire, Hoax, Propaganda and Trusted

In domain dev set accuracy

Model	Acc	Prec	Recall	F1
CNN	96.48	96.41	96.18	96.28 / 96.48
LSTM	88.75	88.67	88.11	88.35 / 88.75
BERT	95.07	94.81	94.57	94.68 / 95.07
LSTM + GCN + Max Pool	96.76	96.61	96.58	96.59 / 96.76
LSTM + GCN + Max Pool + Semantic Adj
LSTM + GCN + Attn	97.57	97.25	97.63	97.43 / 97.57
LSTM + GCN + Attn + Semantic Adj
LSTM + GAT	97.73	97.9	97.36	97.62 / 97.28
LSTM + GAT + Semantic Adj
LSTM + GAT + 2 Attn Heads	97.8	97.69	97.74	97.71 / 97.82
LSTM + GAT + 2 Attn Heads + Semantic Adj
SoTA	-	-	-	91.0

Out of domain test set 2 accuracy

Model	Acc	Prec	Recall	F1
CNN	54.03	54.5	54.03	52.6 / 54.03
LSTM	55.06	58.88	55.06	52.5 / 55.05
BERT	55.56	57.45	54.86	54.0 / 54.87
LSTM + GCN + Max Pool	65.0	66.75	64.84	63.79 / 65.0
LSTM + GCN + Max Pool + Semantic Adj
LSTM + GCN + Attn	67.08	68.6	67.0	66.42 / 67.08
LSTM + GCN + Attn + Semantic Adj
LSTM + GAT	65.5	69.45	65.33	63.83 / 65.51
LSTM + GAT + Semantic Adj
LSTM + GAT + 2 Attn Heads	66.94	68.05	66.86	66.37 / 66.95
LSTM + GAT + 2 Attn Heads + Semantic Adj
SoTA	-	-	-	65.0

For more structured results, refer to the tables in the paper. The following results are for document classification when applied to non-fake news domain.

Document classification

AG News (4 news categories)

Model	Acc	Test Error Rate
GAT	89.61	10.39
GAT + 2 Attn Heads	89.72	10.28
SOTA		5.01

IMDB (2 sentiment categories)

Model	Acc	Test Error Rate
GAT
GAT + 2 Attn Heads
SOTA		4.6

DBPedia (14 ontology categories)

Model	Acc	Test Error Rate
GAT	99.13
GAT + 2 Attn Heads
SOTA		0.80

If you find this work useful in your research, please consider citing the paper using following bibtex:

Bibtex

If you found this work or code useful for your research, please cite us!

@inproceedings{vaibhav-etal-2019-sentence,
    title = "Do Sentence Interactions Matter? Leveraging Sentence Level Representations for Fake News Classification",
    author = "Vaibhav, Vaibhav  and
      Mandyam, Raghuram  and
      Hovy, Eduard",
    booktitle = "Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)",
    month = nov,
    year = "2019",
    address = "Hong Kong",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-5316",
    doi = "10.18653/v1/D19-5316",
    pages = "134--139",
    abstract = "The rising growth of fake news and misleading information through online media outlets demands an automatic method for detecting such news articles. Of the few limited works which differentiate between trusted vs other types of news article (satire, propaganda, hoax), none of them model sentence interactions within a document. We observe an interesting pattern in the way sentences interact with each other across different kind of news articles. To capture this kind of information for long news articles, we propose a graph neural network-based model which does away with the need of feature engineering for fine grained fake news classification. Through experiments, we show that our proposed method beats strong neural baselines and achieves state-of-the-art accuracy on existing datasets. Moreover, we establish the generalizability of our model by evaluating its performance in out-of-domain scenarios. Code is available at https://github.com/MysteryVaibhav/fake{\textbackslash}{\_}news{\textbackslash}{\_}semantics.",
}