In my previous post, I started discussing our latest project on the role of Twitter in the Syrian civil, where I talked a little bit about the research objective, hypotheses and new directions. The research new direction was a result of the botnets that we started finding in our network analysis.
In this blog post, I want to talk about our 2 important phases of the study, first, the data preparation and collection phase, second, the data analysis phase. In my next blog post I will start talking about the data analysis.
Data preparation and collection
Since April 2012, we have been archiving Twitter posts including the following hashtags: #Syria, #Damascus, #Aleppo, #Hama, #Idlib, #Homs, سوريا, #سورية, #دمشق, #حلب, #حماه , #ادلب#, and حمص#.
As I mentioned in the previous blog post, we wanted to know how retweeting on a day of violent events might be different from a day of non-violent events (if at all). Therefore, for the scope of analyses, we purposefully selected three dates for each event type. First, we browsed the Timeline of the Syrian Civil War from many news portals to choose three violent events:
- Houla Massacre on May 25, 2012
The UN human rights office reported that at least 190 civilians were killed, including 34 women and 49 children, in Houla, Homs province.
- Hama Massacre on June 6, 2012
78 civilians were executed in a massacre by the Syrian army and Shabiha in the small village of Qubair, part of Maarzaf, Hama province. Over 140 people were killed across Syria, including in the Qubair and Maarzaf massacres.
- Damascus Massacre on December 12, 2012.
Three bombs exploded outside the Interior Ministry building in Damascus, killing five and injuring at least 23 people. The LCC reported 113 civilians killed by the Syrian army, including 41 in Aleppo and 31 in the Damascus suburbs.
Next, we browsed headlines from the Middle East section of the BBC News website to choose three non-violent events:
1. Syria’s Olympics chief denied visa for London Games on June 22, 2012
The head of the Syrian Olympic Committee, General Mowaffak Joumaa, has been refused a visa to travel to London for the Games .
2. Angelina Jolie visits Syrian refugees in Jordan on September 11, 2012
Actress Angelina Jolie has called for an end to the violence in Syria after meeting refugees in Jordan’s Zaatari camp .
3. Christian elected as head of Syrian National Council on November 10, 2012.
The Syrian National Council, the main political opposition to Bashar Al-Assad’s regime, defied accusations of being Islamist-led by electing a Christian opposition activist George Sabra to the presidency on Friday night .
After we selected the salient events, the next step was to start the data analysis phase. Considering the lag time in the spread of news, for each event, a data set was collected over a 3 day period: from 00:00 AM (UTC) one day prior to the date of event to 23:59 PM (UTC) one day after. Please note that we use UTC instead of local time. Later in the study, we realized that this might be problematic in terms of sampling validity. However, the difference between Syria local time and UTC is only 2 hours (UTC/GMT + 2 hours) and because we gave enough lag time (± 24 hours) we suspect the effect would be insignificant. Another thing we realized after we discovered the botnets was that the ± 24 hours would provide us with better understanding to the bot behavior.
Our data analysis phase included the following:
a) Network analysis: For each of the events, we generated a graph of the RT network. Due to legibility issues, we applied edge-cut at minimum of 3 times retweeted (if retweeted less than 3 times, the edges were cut off from the network analysis). Through this analysis, we measured account influence. With the graphs, we examined clustering patterns and identified high-profile users based on the size of the node and the density of the edges.
b) Log analysis: We generated a log data set of 150 most retweeted posts per day, consequently, a total of 450 most retweeted posts for each event. Through this analysis, we measured content influence. We identified high-profile users based on the number of retweets during the period of the event.
c) High-profile user analysis: We compared sets of high-profile users between the network analysis and the log analysis. This helped us to identify distinct types of influencers, which we will share in the findings.
d) Cluster analysis: From the network analyses, we identified a unique cluster and monitored its trend across the timeline of events. This helped us to understand how a network might evolve over time to increase its influence on a microblogging space.
The 4 different data analysis methods we used on our dataset resulted in 3 major findings that I will be talking about in my next blog post. However, before I end this post I would like to give you a sneak peek of next weeks post.
The This is the graph shows the evolution of the activist-botnet across events timeline. Who are they, what were they saying? What was their influence? I will answer all these questions in my next posts. Stay tuned!