Rumor Detection: An Industry Survey
Current attempts at real-time rumor detection have leveraged both spatial and temporal features. The ones that are simple to implement have turned out to have low predictive power and the ones that work are typically too complicated to implement for practical use.

Too Sophisticated to be Practical – Propagation Path Classification (PPC) + RNN + CNN (0.92 Accuracy)
Our Differentiation: Meaningful predictive power with ease of implementation
- Sophisticated enough to leverage propagation patterns of a tweet, semantic associations, linguistic features and user attributes yet easy to implement.
- Sufficiently accurate even with a relatively low number of tweets
- We build on important work already being done in this space especially Dr Soroush (MIT, now Dartmouth College) for his paper on Automatic Detection and Verification of Rumors on Twitter
| Model | Accuracy | Precision | Recall | F1 Score |
| HMM | 0.46 | – | – | – |
| LSTM | 0.50 | – | – | – |
| Random Forest | 0.75 | 0.71 | 0.86 | 0.77 |
| Linear Regression | 0.80 | 0.79 | 0.80 | 0.79 |
| Gradient Boosted Classifier | 0.82 | 0.80 | 0.86 | 0.83 |
Shown above are the results of our model performance on an independent hold-out set which consists of tweets derived from the PHEME rumor dataset.
Visualizing the predictive power of propagation patterns
The below charts show the features leveraged by the model and examines model selection of features across time-steps.

Built to Scale
The framework that we have implemented:
- A flexible API that de-couples the front-end user interface logic from the back-end machine learning logic to allow for dynamic changes to models.
- API is self-contained to allow for parallel execution as a micro-service and to scale horizontally if required.

Additional Notes
- Originally based on MIT Research Paper titled Rumor Gauge: Predicting the Veracity of Rumors on Twitter
- Paper used the HMM model in order to model the tweets
- Dataset used by paper
- Full access to the Twitter firehose APIs
- Total tweets including retweets approximately 640,000
- Similar claims grouped to analyze propagation dynamics across similar tweets
- Limitations of our dataset
- No access to Twitter firehose
- 297 Original tweets and total tweets including retweets approximately 60,000
- No grouping of similar claims
- Inadequate access to all retweet and follower info for inference
