Models

Rumor Detection: An Industry Survey

Current attempts at real-time rumor detection have leveraged both spatial and temporal features. The ones that are simple to implement have turned out to have low predictive power and the ones that work are typically too complicated to implement for practical use.


Too Sophisticated to be Practical – Propagation Path Classification (PPC) + RNN + CNN (0.92 Accuracy)

Our Differentiation: Meaningful predictive power with ease of implementation

  • Sophisticated enough to leverage propagation patterns of a tweet, semantic associations, linguistic features and user attributes yet easy to implement.
  • Sufficiently accurate even with a relatively low number of tweets
  • We build on important work already being done in this space especially Dr Soroush (MIT, now Dartmouth College) for his paper on Automatic Detection and Verification of Rumors on Twitter
ModelAccuracyPrecisionRecallF1 Score
HMM0.46
LSTM0.50
Random Forest0.750.710.860.77
Linear Regression0.800.790.800.79
Gradient Boosted Classifier0.820.800.860.83

Shown above are the results of our model performance on an independent hold-out set which consists of tweets derived from the PHEME rumor dataset.

Visualizing the predictive power of propagation patterns

The below charts show the features leveraged by the model and examines model selection of features across time-steps.

Predictive power of spatial and temporal features

Built to Scale

The framework that we have implemented:

  • A flexible API that de-couples the front-end user interface logic from the back-end machine learning logic to allow for dynamic changes to models.
  • API is self-contained to allow for parallel execution as a micro-service and to scale horizontally if required.
Dynamic Model Selection and Support for Horizontal Scaling

Additional Notes

  • Originally based on MIT Research Paper titled Rumor Gauge: Predicting the Veracity of Rumors on Twitter
  • Paper used the HMM model in order to model the tweets
  • Dataset used by paper
    • Full access to the Twitter firehose APIs
    • Total tweets including retweets approximately 640,000
    • Similar claims grouped to analyze propagation dynamics across similar tweets
  • Limitations of our dataset
    • No access to Twitter firehose
    • 297 Original tweets and total tweets including retweets approximately 60,000
    • No grouping of similar claims
    • Inadequate access to all retweet and follower info for inference

Design a site like this with WordPress.com
Get started
search previous next tag category expand menu location phone mail time cart zoom edit close