Charlie Hebdo Online Discussion

In the wake of the Charlie Hebdo killings, there emerged many threads of discussion on Twitter and along with them, the spread of rumours and other information related to the event. This is part of a larger rumour detection and veracity dataset called PHEME which contains online discussions in response to 9 events collected from Twitter.

Our work is intended for data exploration and in this case, our visualisations below allow for the exploration of the various tweet trees along with other meta-information pertaining to the event.

Original Dataset

The original event data consists of nested folders with each top level folder containing the annotations and the structure of the discussion, both stored in JSON files. Source tweets are stored in another folder as a JSON file. Reactions are stored in another folder, with individual folders for each reaction tweet in the discussion, and inside each folder is a JSON file containing the data for that tweet. The figure below shows the file structure.

For some tweet trees, the structure JSON file was either missing or incomplete, therefore we decided to process the data ourselves to reconstruct the tweet graph structure. To do so, we first add all the tweet IDs in the tree folder to a list and sort them in increasing order. We then iterate through each tweet ID and extract the parent tweet ID from the respective tweet data JSON and check that the parent tweet ID is present in the tweet ID list, if it is not, we remove those tweets from the list. This prevents the graph from having disjoint sub-graphs. We can do this since Twitter assigns tweet IDs in increasing order with respect to time; tweets with smaller tweet IDs are created earlier than tweets with larger tweet IDs.

User Influence

In this event, we would like to look at who has the most influence in the discussion, so we used a packing chart to visualise the influence of individual users. To do so we first define user influence: the number of reactions of other users in direct response to the root tweet or to other users reacting to the tweets discussing the content generated by the root.

In the interest of brevity, we have decided to show only the influence of the top 10 most influential users. The first level of the packing chart shows the top 10 most influential users and by clicking on each circle, we can zoom in to the second level of nested circles which represent the 10 other users most influenced by that user.

Users Activity Distribution

Besides looking at the user influence, we are also interested in user activity in aggregate. To quantify user activity we look at 3 variables:

  1. Tweets counts: The number of tweets made by that user;
  2. Link counts: The number of direct user-to-user interactions;
  3. Thread counts: The number of discussion threads which the user participates in.

We visualise this data in a stacked histogram to determine the distribution of the overall user activity. Doing a stacked histogram also allows us to see if the 3 variables are in agreement with each other. In general, we observe that most users are not very active, having only 1 or 2 tweets. In addition, users are generally only active in 1 or 2 threads and have few links from their tweets to other tweets. However, there are still some users who are more active and garner more attention, however these are exceptional cases so are unable to be seen from the histogram due to their small numbers. Number of bins can be adjusted via the input box to get different bin sizes.

Tweet Trees

Finally, we want to look at the structures of the various tweet trees to see if there are differences in the types of conversations happening. For example, the default tree displayed shows a tree where the root tweet has a relatively high out degree while the children have relatively low degree. There are also trees such as 553174338380517376 which also has a child with a high out degree, in essence making that child another major contributor to the spread of the root content as that user is also highly connected. You also have trees such as 552812984343330816 which have root tweets with few responses but has a child which generated more reactions.

To explore the different trees, you can use the next or back buttons below to browse the trees sequentially as well as typing the thread ID in the input field to select that specific tree. Mouseover individual nodes to see tweet details. Trees are ordered in decreasing size order.

User Threads and Tweets

The table below shows the various thread and tweet IDs which are associated with each User ID. Use this to search for specific users and then copy and paste the thread ID into the search bar above to view that specific tree.

Technology

The first two visualisations are done with d3.js while the tree visualisations were done with the python library Networkx. All data preprocessing was done in python.