This dataset contains agrregation infromation over tweets about London Olympics, Koninginnedag (a Dutch national holiday) in 2012, Pinkpop 2012 and the UEFA Euro 2012 fotball tournament.
koninginnedag.txt
, olympics.txt
, pinkpop.txt
and uefa_euro.txt
contain information of amount of tweets being written per hour for the
correspondng events. You can plot it using gnuplot
and the timeline.dat
script and see the resutl in timeline.pdf
:
gnuplot timeline.dat
The timeline (*.txt
) files contain two columns separated by space:
head olympics.txt
2012-08-11-00 13350
2012-08-11-01 12777
2012-08-11-02 14249
2012-08-11-03 15746
2012-08-11-04 9104
2012-08-11-05 9262
2012-08-11-06 9056
2012-08-11-07 10495
2012-08-11-08 15119
2012-08-11-09 18497
where the first column is the hour and the second is the number of thweets collected.
The *__reply.csv.gz
files are comma separated file with 3 columns separated by
comma and space:
- the time a tweet was created
- the id of the tweet
- the id of a tweet this tweet is reply to,
None
otherwise.
Here is a short sample:
2012-08-11 00:01:58, 234077159721943040, None 2012-08-11 00:01:58, 234077159453499393, 234071638377889792 2012-08-11 00:01:58, 234077161231888384, None 2012-08-11 00:01:58, 234077160770514944, None 2012-08-11 00:01:59, 234077162729242624, 233155831913390080
Write a script that generates an timeline plot for a collection of tweets.
The script should take the name of an input file (e.g.
olympics.txt
) as a positional argument.The generated plot should be stored in file with a name passed using an optional argument
--output
(e.g.--output olympics.png
)It should be possible run the script using the following command:
timeline.py olympics.txt --output olympics.png
Modify the
timeline()
function intimeline.py
to generate the plots.Write a script that given a
*_reply.csv.gz
file plots the distribution of differences of the ID of a tweet and the tweet's ID it's a reply to.
You might want to use Pandas for data processing and plotting, and opster
for handling command line arguments. Opster is available in this repo as
opster.py
. Refer to the Overview section of the opster docs to see how
options are defined.
The datasets are based on the collection of tweets gathered for the experiments described in [1] and processed by poultry. Contact [email protected] in case you want to get the tweets.
This work is licensed under a Creative Commons Attribution 4.0 International License.
[1] | Milajevs, Dmitrijs, and Gosse Bouma. Real time discussion retrieval from Twitter. In Proceedings of the 22nd international conference on World Wide Web companion, pp. 795-800. International World Wide Web Conferences Steering Committee, 2013. |