Blog

Discovering the Twitter Botnet

In my last  blog post, I discussed our data preparation and collection. In this blog post I will start talking about 1- a brief of some of our preliminary findings 2- The discovery of the botnet in our dataset.

To recap my last two blog posts, I want to remind you that we first, collected tweets from twitter to analyze tweets from the Syrian civil war. We did that by selecting 3 violent and 3 nonviolent events, after that we conducted 2 different kinds of analyses: log analysis (from the most re-tweeted tweets based on content) and network analysis (from the high account influence on a network diagram) on the re-tweeted tweets. In the last step, we compared the top retweeted accounts (twitter handles) from the log analysis and the network analysis then we conducted a comparative analysis between the top re-tweeted accounts across the different event types (3 violent and 3 nonviolent events).

The results from these 2 different analyses were:

1- In the nonviolent events data set, people were not tweeting about the salient events we selected (3 violent and 3 nonviolent events). For example, Angelina Jolie’s visit to the Syrian refugees’ camp in Jordan on September 11, 2012, wasn’t discussed in the tweets, however, people were tweeting about war-related issues (e.g., chemical bombs), comparing 9/11 and Syria Civil War.

2- From the salient violent events, we picked Houla Massacre that occurred on 5/25/2012 and compared the authors of top most retweeted tweets from the Log Analysis and the top retweeting accounts (we identified these by looking at the node size a.k.a node centrality) in the Network Analysis. The results of our analysis showed that they were totally different (Top retweeting authors’ ≠ Top retweeting nodes)

3- We compared our findings with the Influence Matrix (Source: Klout.com) Just to better understand our results. We found that we were interested in 3 different types of Twitter users: Curators, Celebrity, and Activist.

Picture23

We were curious to know if we could find any celebrity type in the data set, someone who has both high content influence and high account influence. So we compared top retweeted nodes to the entire log analysis (450 posts), searching for any overlapping cases. We found one such user account: @g1. 

We wanted to learn more about this user’s attributes however, the account was suspended. Therefore, we started browsing the name associated to the bot, both in English and Arabic, on the Internet. We found some interesting information, however, none was related to the war. We suspected that this person might be the human user behind @g1. However, she did not have much of an online presence, which made us suspect that she is the one running her account (at that time we started suspecting that we might be dealing with a fake account of a celebrity)

In the network graph, @g1 was clustered with 19 other users, 17 of whom were suspended. Wondering what might be the reason behind this large number of account suspensions, we started following @g1 across different events in the data set.

Content Analysis

To better understand what might be the reason for suspending @g1 account we conducted a high-level content analysis on her tweets archived during the period of April to December 2012. We found that the account had stopped posting (therefore, presumably had been suspended) on November 20, 2012. Also, from our high level content analysis we discovered that most of tweets are highly political, so this wasn’t the reason for suspension by twitter.

From there, we started conducting the same analyses on the accounts clustered with @g1 across all of the six events. As a result, we identified 42 Twitter handles that had stopped posting on November 20, 2012. Interestingly, we found that the majority of these accounts got suspended on the same date, November 20, 2012. Moreover, we found that all of their last tweets were around 6:30 AM UTC indicating a systemic ban. Lastly, we discovered that they all shared the same last tweet.

Additional analyses on the data set and we discovered

  1.  21 additional accounts that had stopped posting at that time, (thus 63 accounts in total).
  2. All of the accounts were retweeting, specifically with RT, the one unique account: @h1
  3. All shared the same last retweet content.
  4. All stopped tweeting almost at the same time around 6:30 AM UTC, November 20, 2012.
  5. Each user was tweeting  continuously round the clock.

Why is this network a botnet?

What made us suspect that this might be a botnet were the following indicators:

  1. The links attached to tweets
  2. The links attached to RT
  3. The frequency of tweeting
  4. Tweet text (The 3 letter random hashtag)

An example is this tweet: “RT @h1: #سوريا #Syria لوهان ستمثّل في أغنية مصوّرة لليدي غاغا http://t.co/uv2e3OGV #xmy” (English translation: RT @h1: Lindsay Lohn to appear on Lady Gaga’s next music video #Syria ##سوريا http://t.co/uv2e3OGV #xmy).

When we searched for the sentence “Lindsay Lohan to appear on Lady Gaga’s next music video” in Arabic, we found a news headline on the website http://www.elnashrafan.com with the exact text. However, when clicking on the link, we got redirected to http://alwatan.sy.

Another example is: “#سوريا #Syria بدء امتحانات الفصل الثاني للمرحلة الجامعية الأولى في جامعة #دمشق http://t.co/OTUpaarW #dmq” (English translation: The second midterms starts for University of # Damascus #Syria #سوريا http://t.co/OTUpaarW #dmq).

The botnet was using a random 3 letter hashtag in all it’s tweets #xmy #dmq . Why were they adding this hashtag is something we still don’t know. We are assuming that this is their tracking method or reach testing technique.

Lastly, clicking on the link embedded in this tweet redirects to an article on the a new website,  which is an Arabic independent news forum.

These are the two examples of many similar incidents. Most of the tweets that were randomly tested lead to one of three websites.

Currently, we are still conducting content and network analyses to understand this botnet behavior and the motives behind its creation. One of the things we are pretty confident about is the botnet tweets were all in support of Alasad’s government and that it was followed by real people, who also supports the current Syrian regime. We asked ourselves: Was this twitter botnet created at the time when the majority of tweets on the Syrian civil war were against the regime to influence the public opinion and to amplify the voices of the people who are pro-regime, maybe?

In the meantime, stay tuned for further results of this project.

*The following and follower data was collected on March 18, 2013, not on the date of the event. For the top RT nodes, we only used data for 3 accounts because 17 accounts were suspended.)

** This project is in collaboration with Daisy Yoo and David McDonald from the iSchool at the University of Washington. Please don’t make copies of the content until you contact the blog admin.

***The twitter handles used in the post are not real they are pseudonyms created by the team.

[1]http://www.elcinema.com/person/pr1104200/

How did Angelina Jolie help us discover the botnets (Cont.)

In my previous post, I started discussing our latest project on the role of Twitter in the Syrian civil, where I talked a little bit about the research objective, hypotheses and new directions. The research new direction was a result of the botnets that we started finding in our network analysis.

In this blog post, I want to talk about our 2 important phases of the study, first, the data preparation and collection phase, second, the data analysis phase. In my next blog post I will start talking about the data analysis.

Data preparation and collection

Since April 2012, we have been archiving Twitter posts including the following hashtags: #Syria, #Damascus, #Aleppo, #Hama, #Idlib, #Homs, سوريا, #سورية, #دمشق, #حلب, #حماه , #ادلب#, and حمص#.

As I mentioned in the previous blog post, we wanted to know how retweeting on a day of violent events might be different from a day of non-violent events (if at all). Therefore, for the scope of analyses, we purposefully selected three dates for each event type. First, we browsed the Timeline of the Syrian Civil War from many news portals to choose three violent events:

  1.  Houla Massacre on May 25, 2012
    The UN human rights office reported that at least 190 civilians were killed, including 34 women and 49 children, in Houla, Homs province.
  2. Hama Massacre on June 6, 2012
    78 civilians were executed in a massacre by the Syrian army and Shabiha in the small village of Qubair, part of Maarzaf, Hama province. Over 140 people were killed across Syria, including in the Qubair and Maarzaf massacres.
  3. Damascus Massacre on December 12, 2012.
    Three bombs exploded outside the Interior Ministry building in Damascus, killing five and injuring at least 23 people. The LCC reported 113 civilians killed by the Syrian army, including 41 in Aleppo and 31 in the Damascus suburbs.
Violent events
Violent events (graph credits to Daisy Yoo)

Next, we browsed headlines from the Middle East section of the BBC News website to choose three non-violent events:

1. Syria’s Olympics chief denied visa for London Games on June 22, 2012
The head of the Syrian Olympic Committee, General Mowaffak Joumaa, has been refused a visa to travel to London for the Games [1].

2. Angelina Jolie visits Syrian refugees in Jordan on September 11, 2012
Actress Angelina Jolie has called for an end to the violence in Syria after meeting refugees in Jordan’s Zaatari camp [2].

3. Christian elected as head of Syrian National Council on November 10, 2012.
The Syrian National Council, the main political opposition to Bashar Al-Assad’s regime, defied accusations of being Islamist-led by electing a Christian opposition activist George Sabra to the presidency on Friday night [3].

non-Violent events
non-Violent events (Collage credits to Daisy Yoo)

After we selected the salient events, the next step was to start the data analysis phase. Considering the lag time in the spread of news, for each event, a data set was collected over a 3 day period: from 00:00 AM (UTC) one day prior to the date of event to 23:59 PM (UTC) one day after. Please note that we use UTC instead of local time. Later in the study, we realized that this might be problematic in terms of sampling validity. However, the difference between Syria local time and UTC is only 2 hours (UTC/GMT + 2 hours) and because we gave enough lag time (± 24 hours) we suspect the effect would be insignificant. Another thing we realized after we discovered the botnets was that the ± 24 hours would provide us with better understanding to the bot behavior.

Data Analyses

Our data analysis phase included the following:

a) Network analysis: For each of the events, we generated a graph of the RT network. Due to legibility issues, we applied edge-cut at minimum of 3 times retweeted (if retweeted less than 3 times, the edges were cut off from the network analysis). Through this analysis, we measured account influence. With the graphs, we examined clustering patterns and identified high-profile users based on the size of the node and the density of the edges.

b) Log analysis: We generated a log data set of 150 most retweeted posts per day, consequently, a total of 450 most retweeted posts for each event. Through this analysis, we measured content influence. We identified high-profile users based on the number of retweets during the period of the event.

c) High-profile user analysis: We compared sets of high-profile users between the network analysis and the log analysis. This helped us to identify distinct types of influencers, which we will share in the findings.

d) Cluster analysis: From the network analyses, we identified a unique cluster and monitored its trend across the timeline of events. This helped us to understand how a network might evolve over time to increase its influence on a microblogging space.

The 4 different data analysis methods we used on our dataset resulted in 3 major findings that I will be talking about in my next blog post. However, before I end this post I would like to give you a sneak peek of next weeks post.

The Evolution of the botnet
The Evolution of the botnet (Photo Credit to Daisy Yoo)

The This is the graph shows the evolution of the activist-botnet across events timeline. Who are they, what were they saying? What was their influence? I will answer all these questions in my next posts. Stay tuned!

Political Bots: Who is Re-tweeting the Syrian Civil War?

syria2

This is a project that I am currently working on with David McDonald and Daisy Yoo, they are both from the iSchool. The project started last year (2012) and the main objective of the project was to understand the role of Twitter in the ongoing conflict in Syria. Moreover, we were aiming on understanding how people use the retweet function to amplify their voices during protracted political conflicts such as war. In this study, we use two metrics to measure influence: (1) content influence — the number of retweets that a piece of content is receiving; and (2) account influence — the number of retweets that an account is receiving.

That said, we had an initial hypothesis about the types of influential voices on the RT network: (a) activists, people with an idea or a cause, who have high content influence but low account influence; and (b) celebrities, who have both high content influence and high account influence. Furthermore, we assumed that activists’ influence would be based on proactive networking ability (two-way communication) while celebrities’ influence would be based on individual fame and authority (one-way communication).

Syria1

However, the findings from our data reveled other interesting findings that neither of our hypotheses explained. For example, the image below is a network analysis of all retweets (From our dataset 1) November 10, 2012 – Syrian National Council Election, and in the image you will notice that there are nodes with high centrality and low centrality on the left of the digram. You will also notice that the edges have varied thickness, compared to the right small cluster (Purple color) . The right cluster looks like they were RTing each other equally and not only on this date, but in many others.

bot net

To know more about the right cluster and what were they tweeting, please stay tuned for the next blog post where I am going to explain what we found and our next steps.

1: Preparing the Dataset will be discussed in the coming post