Feature Engineering

Propagation Attribution

Our method relies on network propagation dynamics determined by retweet attribution.

All the retweets come from the original claim (left), but we attribute each retweet to the earliest tweet from a user that retweet’s user follows (right).
For example, here we attribute the third tweet to the second because the third tweet’s user doesn’t follow the first tweet’s, but does follow the second tweet’s.

Feature Generation

Our tweet propagation time series data set is created in two steps.

  1. Build individual tweet data set describing the original tweet with the claim being evaluated and each of its retweets.
  2. Build a time series by aggregating all individual tweets up to the hour being sampled and adding propagation features.

Individual Tweet Data Set

For the original tweet and each of its retweets, we build out a number of diffusion, user, tweet, and linguistic features

Propagation Time Series

At each sampling point (t) we build our propagation time series with

  • aggregates of tweet level features across the tweets thus far,
  • values of these features for the most recent tweet,
  • differences between current and past values (e.g. number of new tweets),
  • diffusion tree shape summaries (e.g. total number of parents)
Design a site like this with WordPress.com
Get started
search previous next tag category expand menu location phone mail time cart zoom edit close