Kaggle Social Networks Competition

front_pageThis week I was surprised to learn that I won the Kaggle Social Networks competition!

This was a bit different from other Kaggle competitions.  Typically, a Kaggle competition will provide a large set of data and want to optimize some particular number (say, turning anonymized personal data into a prediction of yearly medical costs).  The dataset here intrigued me because it’s about learning from and reconstructing graphs, which is a very different kind of problem.  In this post, I’ll discuss my approach and insights on the problem.

Continue reading

Advertisements

Finding Community

I attended a really nice talk by Arash Amini yesterday about detecting ‘communities’ in sparse graphs.  The basic problem is: In a big graph (like the Facebook graph, or the graph of scientific papers citations) you have clusters of friends/associates, and you want to pick out those clusters. Dr. Amini and his collaborators have been working on methods to solve this problem in particularly noisy cases.  The methods presented were developed for cases where you know the specific number of clusters you are seeking to find.  It was a fascinating talk, and I came away wondering if there are good methods to use when you aren’t sure of how many clusters there are.

communities
At the left is a randomly generated block-diagonal adjacency matrix (using a stochiastic block model). The center is the same matrix with the rows randomly shuffled; you can’t “see” that it has block diagonal structure. The right is the matrix with blocks recovered using the techniques outlined in this post. Note the noise in the lower-right corner; these correspond to individual, isolated vertices that ended up in their own clusters.

Continue reading