Using Digital Trace Data in the Social Sciences, University of Konstanz (Summer 2018)

Instructor: Andreas Jungherr

Week 5: Collecting Data Through Twitter’s API

In this session, we will focus on different approaches of getting data from Twitter’s various APIs. We will be using the example scripts discussed in detail in our tutorial.

We will collect data through Twitter’s Streaming APIs using as selectors keywords, hashtags, and user names. We will also use Twitter’s REST APIs to collect messages using specific keywords or hashtags posted during the last seven days and to collect messages from users’ tweet archives.

In preparation for the session have a look at the mandatory readings and Twitter’s documentation of its Streaming and REST APIs as well as the API objects found in metadata to specific tweets.

Code Examples:

First make sure, you have installed python and relevant libraries as discussed in session 2.

Start your command line or terminal and enter

ipython

Now navigate to the working directory where you saved the scripts provided in twitterresearch.

cd "/Users/().../twitterresearch"

import python script examples.py from the script set

import examples

Examples on how to collect data through Twitter’s streaming APIs

Keyword-based searches

Load command defined in examples.py

examples.track_keywords()

Now you should see tweets containing the predefined keywords “politics” or “election” through your command line.

To cancel this process enter.

crtl c

Here you see the command es defined in examples.py:

def track_keywords():
    """
    Track two keywords with a tracking stream and print machting tweets and notices.
    To stop the stream, press ctrl-c or kill the python process.
    """
    keywords = ["politics", "election"]
    stream = streaming.stream(
        on_tweet=print_tweet, on_notification=print_notice, track=keywords)

You can easily adjust the keywords used in the predefined command as shown here. Just enter the keywords of interest to you in the field keywords and replace the placeholder “trump”.

Important: The original function used helper functions defined in the file examples.py (i.e. “straming.stream”,”print_tweet”,”print_notice”). In the original function these helper functions could be called directly as they were defined in the same file. If you adjust this function based on your interests in your workspace or in a new Python file you have to point your function to the environment where these helper functions are defined (i.e. examples). Keeping this in mind, an adjusted trayk_keywords function would look like this:

def track_keywords_morgen():
    """
    Track one or more keywords with a tracking stream and print machting tweets and notices.
    To stop the stream, press ctrl-c or kill the python process.
    """
    keywords = ["morgen"]
    stream = examples.streaming.stream(
        on_tweet=examples.print_tweet, on_notification=examples.print_notice, track=keywords)

Adapting these steps to any other of our functions should allow you to build on our scripts to develop scripts according to your interests.

Now, let’s test the adjusted command

track_keywords_morgen()

The command described above allowed you to see tweets corresponding with your search criteria. The following command allows you to save them in a .json file in your working directory.

examples.save_track_keywords()

Here is the command as defined in examples.py. You can adjust it to your interests as shown above.

def save_track_keywords():
    """
    Track two keywords with a tracking stream and save machting tweets.
    To stop the stream, press ctrl-c or kill the python process.
    """
    # Set up file to write to
    outfile = open("keywords_example.json", "w")

    def save_tweet(tweet):
        json.dump(tweet, outfile)
        # Insert a newline after one tweet
        outfile.write("\n")
    keywords = ["politics", "election"]
    stream = streaming.stream(
        on_tweet=save_tweet, on_notification=print_notice, track=keywords)

User-based searches

The following command allows you to track tweets coming in from a selection of users.

examples.follow_users()

Here is the command as defined in examples.py. You can adjust it to your interests as shown above. In this case you exchange the IDs listed in “users” with IDs of your choosing.

def follow_users():
    """
    Follow several users, printing their tweets (and retweets) as they arrive.
    To stop the stream, press ctrl-c or kill the python process.
    """
    # user IDs are: nytimes: 807095, washingtonpost: 2467791
    # they can be obtained through:
    # users = ["nytimes", "washingtonpost"]
    # users_json = rest.fetch_user_list_by_screen_name(screen_names=users)
    # for u in users_json:
    #   print("{0}: {1}".format(u["screen_name"], u["id"]))
    users = ["807095", "2467791"]
    stream = streaming.stream(
        on_tweet=print_tweet, on_notification=print_notice, follow=users)

Example how to collect data through Twitter’s REST APIs

User-based searches

The following command shows all available tweets from a user’s archive.

examples.print_user_archive()

The following command downloads all available tweets from a user’s archive and saves it to a .json file in your active workding directors.

examples.save_user_archive_to_file()

Let’s have a closer look at what this command does:

Let’s open examples.py and have a look.

Now, let’s examine rest.py.

OK, now let’s try collecting messages from multiple users

import database
import rest
import logging

usernames=["GregorGysi","peersteinbrueck"]

def save_multiple_user_archive_to_database(usernames):
  """
  Fetch all available tweets for multiple users and save them to the database.
  """
  for name in usernames:
    archive_generator = rest.fetch_user_archive(name)
    for page in archive_generator:
      for tweet in page:
        database.create_tweet_from_dict(tweet)
    logging.warning(u"Wrote tweets from {0} to database".format(name))

save_multiple_user_archive_to_database(usernames)

Preparation for the next sessions

Install SQLite

PC Users

Please install SQLite on your machine. To download the program go here.

You probably want to go with the precompiled binaries for Windows.

For installation follow the instructions given here.

MAC Users

You’re good. SQLite is already available on your machines.

Create a database

Please execute and download the respective tweets to a database (.db)

examples.save_user_archive_to_database()

Required Readings:

Background Readings:


Week 3Week 6

back