Week 2: Set Up and Introduction to Collecting Data on Twitter
In our second session, we focus on setting up our data collection and getting acquainted with Twitter’s APIs. We start by discussing the research process with digital trace data. Following this, we will prepare your machine for working with Python. We then will get you access to Twitter’s API. We then use a couple example scripts provided in our tutorial to get some first practice in collecting data on Twitter.
To prepare for this session or to build on the issues we discussed you could have a look at Twitter’s API documentation. Make sure to have an extended look at the data fields provided by Twitter’s API in its API objects. This will show you which information Twitter provides you with through their API and which information is available to you for your analyses.
For more information on APIs of various services have a look at Matthew A. Russell’s book Mining the Social Web (2018, 3rd ed.).
In the course we will work quite heavily with the command line of your system. So make sure you (re-)acquaint yourself with some basic commands (i.e. navigating to a specific directory through the command line et al.). For Windows users have a quick look at this tutorial. Mac or Linux users please have a look at this tutorial. If you want to dive a little bit more deeply into how to use the command line directly for neat tasks have a look at this tutorial on DataCamp. You might have to register with DataCamp to access the course, but the course itself should be free. Alternatively, you could check out this CodeAcademy course on using the command line.
Also, keep in mind that the command line might make trouble accessing a file on your machine that has spaces in its name or in its path. In these cases, either rename your file or put quotation marks around the complete file path, e.g.:
cd "/Users/(…)/twitterresearch"
If you want to dive a little bit deeper into the use of Github, have a look at this free CodeAcademy course.
Also please prepare your machine for the following sessions. First, make sure you have a Python distribution up and running. For the purposes of this course, I recommend Continuum Analytic’s Anaconda.
Now, make sure your system is prepared to work with Python. If you are using a Mac make sure you have Apple’s Xcode installed. If you are using a PC please install a current version of Microsoft’s Visual Studio. Please make sure your version includes both “Visual C++” and “Common Tools for Visual C++ 2015”. This is important as both programs are needed for you to run specific Python modules. In case you run in any troubles maybe have a look at this comment thread.
As a final step, please follow the procedure described in Jürgens & Jungherr (2016), p. 18.
Now your machine should be ready for the purposes of this course. You can test this by running the following code examples.
Code Examples:
Setup
Pascal Jürgens and Andreas Jungherr. 2016b. twitterresearch [Computer software].
First make sure, you have installed python and relevant libraries as described in (Jürgens & Jungherr 2016, pp.15-20)
Start your command line or terminal and type
ipython
Now navigate to the working directory where you saved the scripts provided in twitterresearch.
ipython
cd "twitterresearch"
import python script “examples.py” from script set
import examples
Examples how to collect data through Twitter’s streaming APIs
Keyword-based searches
Load command defined in “examples.py”
examples.track_keywords()
Now you should see tweets containing the predifined keywords “politics” or “election” through your command line To cancel this process enter
crtl c
Here you see the command es defined in “examples.py”
def track_keywords():
"""
Track two keywords with a tracking stream and print machting tweets and notices.
To stop the stream, press ctrl-c or kill the python process.
"""
keywords = ["politics", "election"]
stream = streaming.stream(
on_tweet=print_tweet, on_notification=print_notice, track=keywords)
You can easily adjust the keywords used in the predifined command as shown here. Just enter the following command. You can enter the keywords of interest to you in the field “keywords” and replace the placeholder “trump”.
def track_keywords():
"""
Track two keywords with a tracking stream and print machting tweets and notices.
To stop the stream, press ctrl-c or kill the python process.
"""
keywords = ["trump"]
stream = examples.streaming.stream(
on_tweet=examples.print_tweet, on_notification=examples.print_notice, track=keywords)
Now, let’s test the adjusted command
track_keywords()
The command describted above allowed you to see tweets corresponding with your search criteria. The following command allows you to save them in a .json file in your working directory.
examples.save_track_keywords()
Here is the command as defined in “examples.py”. You can adjust it to your interests as shown above.
def save_track_keywords():
"""
Track two keywords with a tracking stream and save machting tweets.
To stop the stream, press ctrl-c or kill the python process.
"""
# Set up file to write to
outfile = open("keywords_example.json", "w")
def save_tweet(tweet):
json.dump(tweet, outfile)
# Insert a newline after one tweet
outfile.write("\n")
keywords = ["politics", "election"]
stream = streaming.stream(
on_tweet=save_tweet, on_notification=print_notice, track=keywords)
User-based searches
The following command allows you to track tweets coming in from a selection of users.
examples.follow_users()
Here is the command as defined in “examples.py”. You can adjust it to your interests as shown above. In this case you exchange the IDs listed in “users” with IDs of your choosing.
def follow_users():
"""
Follow several users, printing their tweets (and retweets) as they arrive.
To stop the stream, press ctrl-c or kill the python process.
"""
# user IDs are: nytimes: 807095, washingtonpost: 2467791
# they can be obtained through:
# users = ["nytimes", "washingtonpost"]
# users_json = rest.fetch_user_list_by_screen_name(screen_names=users)
# for u in users_json:
# print("{0}: {1}".format(u["screen_name"], u["id"]))
users = ["807095", "2467791"]
stream = streaming.stream(
on_tweet=print_tweet, on_notification=print_notice, follow=users)
Example how to collect data through Twitter’s REST APIs
User-based searches
The following command shows all available tweets from a user’s archive.
examples.print_user_archive()
The following command downloads all available tweets from a user’s archive and saves it to a .json file in your active workding directors.
examples.save_user_archive_to_file()
For more examples and advise how to prepare and analyze Twitter data please see the tutorial.
If you run into trouble with or find any bugs in the code provided in “twitterresearch” please report an issue in our GitHub repository.
Required Readings:
- Jürgens, P. & Jungherr, A. (2016). A tutorial for using twitter-data in the social sciences: Data collection, preparation, and analysis. Social Science Research Network (SSRN). doi:10.2139/ssrn.2710146. (pp. 15-20).
Background Readings:
- Bell, P. & Beer, B. (2018). Introducing github: A non-technical guide (2nd ed.). Sebastopol, CA: O’Reilly Media.
- Eubank, N. (2015). Data analysis in python.
- Russell, M. A. (2018). Mining the social web (3rd ed.). Sebastopol, CA: O’Reilly Media.
Additional Courses:
Command Line:
- CodeAcademy. Learn the Command Line.
- Wilson, G. Introduction to Shell for Data Science. DataCamp.
Git:
- CodeAcademy. Learn Git.