Assignment 1

Please answer the question by entering runable python code into the cells. Add comments at the beginning of each cell which list the packages that need to be installed (e.g., pip install collections). Run the code so that the output is visible in the notbook before you submit. The 4 data files which you download should lie in the same directory as the python notebooks (use relative paths!). Only for the first 2 questions you have to provide unix commands. Please copy your commands together with the shell output into a text cell.

Submit the notebook (as .ipynb and .pdf) via email to until June 11th.

Subject of email: "CSS2014 ass 1"

Filename: clwagner_mastrohmaier_ass1.ipynb and clwagner_mastrohmaier_ass1.pdf if Markus Strohmaier and Claudia Wagner worked together on assignment 1.

Analyzing Affiliation Networks and Social Networks

  1. Download the following dataset from these two location-based social networking services: and

The Gowalla dataset contains all checkins between Feb 2009 and October 2010 (6.4 Mio checkins). The Brightkite dataset contains all checkins between April 2008 and October 2010 (4.5 Mio checkins).

Beside the checkin data that constitute an affiliation network (two-mode network of users and locations), we will also analyze the friendship network (one mode network of users). Note that in Gowalla the friendships are directed, in brightkite they are undirected. In this assignment we will treat both networks as undirected networks.

  1. What are the 10 most popular locations -- i.e., the locations with most checkins? Print location-ids and number of checkins using unix command line tools (e.g. awk). Enter you command and the output of the command below. (1 Point)
  1. Who are the 10 most active users -- i.e., user with most checkins? Print the user ids and the number of checkin using unix command line tools (e.g., awk). Enter your command and the output below. (1 Point)
In []:
  1. Count the number of distinct brightkite users who checked-in at each location (using python!). What are the top 10 locations -- i.e. the locations where most users checked in? Plot the rank ordered frequency distributions of locations (x-axis: locations ranked by the number of distinct users, y-axis: number of distinct users). (2 Points)
In []:
  1. Construct a unweighted two-mode network of brightkite users and locations. A user and a location are connected if the user checked-in at the location at least once. Compute the degree of each location in the two-mode network and list the top 10 location-ids with their corresponding degree.

Fold the two-mode network and construct a location network. What are the most central locations in the one-mode network and what does that mean? Compute the degree centrality and print the ids and degrees of the 10 most central nodes.
(4 points)

In []:
  1. How evenly distributed is the attention of brightkite users towards locations? The attention a location receives is measured by the number of distinct users who checked in at this loction. Compute the normalized entropy of the degree distribution (using python!). The degree of a location corresponds to the number of distinct users who checked in there.

Assume we ignore all locations where 0-1 different people checked in. That means we make the long tail of the degree distribution shorter. How would the entropy change?

(3 Points)

In []:
  1. In how many distinct locations did users check in on average in brightkite? Create a boxplot that shows the distribution of the number of distinct checkin-locations per user. What is the mean and the variance of this distribution? More than half of the users checked in in more than X locations? Compute X. (2 Points)
In []:
  1. Select the top location (i.e., the location where most distinct brightkite user checked in) and filter the brightkite social network so that it only contains users (nodes) which checked in at the top location. Load this subpart of the social networks into python (using the NetworkX library) Plot the sub-network. Use the spring layout with a suitable number of iterations such that the network is plotted in an appealing and clear way. (3 Points)
In []:
  1. What are the basic properties of this sub-network? Compute and print the number of nodes, number of egdes, average degree per node, clustering coefficient and density of the network. Compute all maximal cliques (cliques that cannot be enlarged). A clique in a graph is a maximal complete subgraph of three or more nodes. What's the total number of maximal cliques in the graph? What is the size of the largest maximal clique in the subnetwork? (2 Point)
In []:
  1. Load full Brightkite social network. What's the number of nodes, edges and average degree of a node? What is the size of the largest maximal clique in the network? (2 Point)
In []:

Analyze your Facebook ego-network

If you are not on facebook ask a friend or colleauge and analyze his/her facebook ego network. Login to your Facebook account (or ask you friend to login) and go to to obtain and set permissions for an access token and retrieved the "User Token" value from the Access Token Tool. Install facebook-sdk (pip install facebook-sdk)

Be sure to explore the permissions that are available by clicking on the "Get Access Token" button that's on the page and exploring all of the tabs available. For example, you will need to set the "friends_likes" option under the "Friends Data Permissions" since this permission is used by the script below but is not a basic permission and is not enabled by default.

  1. What do your friends have in common? Compute the 10 most popular likes among your friends. List the top 10 likes and the corresponding number of friends who liked the thing. (2 Points)
In []:
  1. Finding common likes between an ego and its friendships in a social network. How many likes do you have in common with your friends? (2 Points)
In []:
  1. Find the friends that like the same things than you - list the 10 friends which are most similar to you according to your fb likes. (2 Points)
In []:
  1. Crawl your friends and your mutual friends (via and construct your ego network which contains all your friends and the links between them (hint: you can crawl your friendlist and you can find out which friends you have in common with each of your friends. Plot the network. In how many triangles are you involved? In how many triangles could you be involved? What's the diameter of the network? (2 points)
In []: