Models

Rumor Detection: An Industry Survey

Current attempts at real-time rumor detection have leveraged both spatial and temporal features. The ones that are simple to implement have turned out to have low predictive power and the ones that work are typically too complicated to implement for practical use.

**Too Sophisticated to be Practical – Propagation Path Classification (PPC) + RNN + CNN (0.92 Accuracy)**

Our Differentiation: Meaningful predictive power with ease of implementation

Sophisticated enough to leverage propagation patterns of a tweet, semantic associations, linguistic features and user attributes yet easy to implement.
Sufficiently accurate even with a relatively low number of tweets
We build on important work already being done in this space especially Dr Soroush (MIT, now Dartmouth College) for his paper on Automatic Detection and Verification of Rumors on Twitter

Model	Accuracy	Precision	Recall	F1 Score
HMM	0.46	–	–	–
LSTM	0.50	–	–	–
Random Forest	0.75	0.71	0.86	0.77
Linear Regression	0.80	0.79	0.80	0.79
Gradient Boosted Classifier	0.82	0.80	0.86	0.83

Shown above are the results of our model performance on an independent hold-out set which consists of tweets derived from the PHEME rumor dataset.

Visualizing the predictive power of propagation patterns

The below charts show the features leveraged by the model and examines model selection of features across time-steps.

Predictive power of spatial and temporal features

Built to Scale

The framework that we have implemented:

A flexible API that de-couples the front-end user interface logic from the back-end machine learning logic to allow for dynamic changes to models.
API is self-contained to allow for parallel execution as a micro-service and to scale horizontally if required.

Dynamic Model Selection and Support for Horizontal Scaling

Additional Notes

Originally based on MIT Research Paper titled Rumor Gauge: Predicting the Veracity of Rumors on Twitter
Paper used the HMM model in order to model the tweets
Dataset used by paper
- Full access to the Twitter firehose APIs
- Total tweets including retweets approximately 640,000
- Similar claims grouped to analyze propagation dynamics across similar tweets
Limitations of our dataset
- No access to Twitter firehose
- 297 Original tweets and total tweets including retweets approximately 60,000
- No grouping of similar claims
- Inadequate access to all retweet and follower info for inference