machine learning – Inventing Situations.

Seeing Like a Statistical Learning Algorithm

July 10, 2016July 12, 2016sdenton4

I recently had the pleasure of reading James Scott’s “Seeing Like a State,” which examines a certain strain of failure in large centrally-organized projects. These failures come down to the kinds of knowledge available to administrators and governments: aggregates and statistics, as opposed to the kinds of direct experience available to the people living ‘on the ground,’ in situations where the centralized knowledge either fails to or has no chance to describe a complex reality. The book classifies these two different kinds of knowledge as techne (general knowledge) and metis (local knowledge). In my reading, the techne – in both strengths and shortcomings – bears similarity to the knowledge we obtain from traditional algorithms, while metis knowledge is just starting to become available via statistical learning algorithms.

In this (kinda long) post, I will outline some of the major points of Scott’s arguments, and look at how they relate to modern machine learning. In particular, the divides Scott observes between the knowledge of administrators and the knowledge of communities suggest an array of topics for research. Beyond simply looking at the difference between the ways that humans and machines process data, we observe areas where traditional, centralized data analysis has systematically failed. And from these failures, we glean suggestions of where we need to improve machine learning systems to be able to solve the underlying problems.

Continue reading →

4D Scatter Plotting

November 12, 2015November 20, 2015sdenton4

I recently read Edward Tufte’s ‘Visualizing Quantitative Information,’ a classic book on visualizing statistical data. It reads a little bit like the ‘Elements of Style’ for data visualization: Instead of ‘omit needless words,’ we have ‘maximize data-ink.’ Indeed, the primary goal of the book is to establish some basic design principles, and then show that those principles, creatively applied, can lead to genuinely new modes of representing data.

One of my favorite graphics in the book was a scatter plot adapted from a physics paper, mapping four dimensions in a single graphic. It’s pretty typical to deal with data with much more than three dimensions; I was struck by the relative simplicity with which this scatter plot was able to illustrate four dimensional data.

I hacked out a bit of python code to generate similar images; here’s a 4D scatter plot of the Iris dataset:

scatter4d — 4D scatter plot of the Iris dataset

Continue reading →

Machine Learning Resources for Mathematicians

August 29, 2015August 29, 2015sdenton4

What it feels like to wade into a new field.

I met up with some mathematician friends in Toronto yesterday, who were interested in how one goes about getting started on machine learning and data science and such. There’s piles of great resources out there, of course, but it’s probably worthwhile to write a bit about how I got started, and place some resources that might be of more interest to people coming from a similar background. So here goes.
First off, it’s important to understand that machine learning is a gigantic field, with contributions coming from computer science, statistics, and occasionally even mathematics… But on the bright side, most of the algorithms really aren’t that complicated, and indeed they can’t be if they’re going to run at scale. Overall though, you’ll need to learn some coding, algorithms, and theory.

Oh, and you need to do side-projects. Get your hands dirty with a problem quickly, because it’s the fastest way to actually learn.

Continue reading →

Principal Component Analysis via Similarity

February 21, 2015February 21, 2015sdenton4

Recently I’ve seen a couple nice ‘visual’ explanations of principal component analysis (PCA). The basic idea of PCA is to choose a set of coordinates for describing your data where the coordinate axes point in the directions of maximum variance, dropping coordinates where there isn’t as much variance. So if your data is arranged in a roughly oval shape, the first principal component will lie along the oval’s long axis.

My goal with this post is to look a bit at the derivation of PCA, with an eye towards building intuition for what the mathematics is doing.

Continue reading →

Kaggle Social Networks Competition

November 9, 2014November 10, 2014sdenton4

This week I was surprised to learn that I won the Kaggle Social Networks competition!

This was a bit different from other Kaggle competitions. Typically, a Kaggle competition will provide a large set of data and want to optimize some particular number (say, turning anonymized personal data into a prediction of yearly medical costs). The dataset here intrigued me because it’s about learning from and reconstructing graphs, which is a very different kind of problem. In this post, I’ll discuss my approach and insights on the problem.

Continue reading →

One Weird Fourier Trick for Combinatorial Data

April 29, 2014April 29, 2014sdenton4

Cell phone photo from the scene of the crime

Here are the slides from the talk I gave in Montreal last Friday.

The talk was about using Fourier transforms to get polynomial-time encodings of permutation statistics, as well as a look at the Kondor-Borgwardt approach to graph invariants via the Fourier transform over $S_n$ . The talk was given at a representation theory conference, and I was making the point that we can get new research ideas by taking trips into the world of applications – in my case, by looking at machine learning problems. The opening joke was that I asked my computer for the best possible title for the talk, and received the click-bait title as a response. It was admittedly a pretty funny moment watching the chair of the session trying to decide whether to read the title originally submitted for the talk (‘Compressed Combinatorial Statistics’) or the ridiculousness on the screen (he went with the original).

Finding Community

February 12, 2014February 12, 2014sdenton4

I attended a really nice talk by Arash Amini yesterday about detecting ‘communities’ in sparse graphs. The basic problem is: In a big graph (like the Facebook graph, or the graph of scientific papers citations) you have clusters of friends/associates, and you want to pick out those clusters. Dr. Amini and his collaborators have been working on methods to solve this problem in particularly noisy cases. The methods presented were developed for cases where you know the specific number of clusters you are seeking to find. It was a fascinating talk, and I came away wondering if there are good methods to use when you aren’t sure of how many clusters there are.

Continue reading →

RoShamBo Part IV: Implementation

February 11, 2014February 11, 2014sdenton4

I’ve finally had a bit of time to finish up the code for the Rock-Paper-Scissors bot discussed in the previous posts; I’ve put the code in a GitHub repository here. Feel free to file issues if you would like it to be a bit more user friendly.

Win percentage over time, Diaconis vs Switchbot. As time increases, the win percentage settles in on 2/3's which is a natural limit for play against Switchbot. — Win percentage over time, Diaconis vs Switchbot. As time increases, the win percentage settles in on 2/3’s which is a natural limit for play against Switchbot.

The bot which uses the Fourier transform on move probabilities to search for profitable patterns is named `diaconis`, after Persi Diaconis. It’s currently working just fine, but is a bit slow computationally. On startup, it generates all possible move patterns that it will investigate during the course of each game, as well as some character tables. After each play, it tests some of these patterns, and tries to determine if their predictive power is better than any of the patterns seen thus far. If so, it begins using that pattern to choose move probabilities.

This works fine as a proof-of-concept of the basic ideas. Additional improvements could be had by doing some code optimization to speed things up a bit, and keeping a list of good patterns and allowing a bit more dexterity in switching between the patterns used for prediction.

Roshambo Part III – Representation Theory

January 29, 2014February 11, 2014sdenton4

In the last two posts, we’ve looked at using machine learning for playing iterated Roshambo. Specifically, we saw how to use Bayes’ theorem to try to detect and exploit patterns, and then saw how Fourier transforms can give us a concrete measurement of the randomness (and non-randomness) in our opponent’s play. Today’s post is about how we can use representation theory to improve our chances of finding interesting patterns.

Niels Henrik Abel, for whom 'Abelian groups' are named. — Niels Henrik Abel, for whom ‘Abelian groups’ are named. These are groups where $xy=yx$ for any *x, y.*

Niels Henrik Abel, for whom ‘Abelian groups’ are named. These are groups where $xy=yx$ for any *x, y.*

Continue reading →

Roshambo Part II – Fourier Analysis

January 22, 2014March 25, 2014sdenton41 Comment

In the last post, we looked at using an algorithm suggested by Bayes’ Theorem to learn patterns in an opponent’s play and exploit them. The game we’re playing is iterated rock-paper-scissors, with 1000 rounds of play per game. The opponent’s moves are a string of choices, ‘r’, ‘p’, or ‘s’, and if we can predict what they will play, we’ll be able to beat them. In trying to discover patterns automatically we’ll gain some general knowledge about detecting patterns in streams of characters, which has interesting applications ranging from biology (imagine ‘GATC’ instead of ‘rps’) to cryptography.

Fourier analysis is helpful in a wide variety of domains, ranging from music to image encoding. A great example suggested by ‘Building Machine Learning Algorithms with Python‘ is classifying pieces of music by genre. If we’re given a wave-form of a piece of music, automatically detecting its genre is difficult. But applying the Fourier transform breaks the music up into its component frequencies, which turn out to be quite useful in determining whether a song is (say) classical or metal.

This goat recognizes and apparently enjoys metal. Possibly using a furrier transform... — This goat recognizes and apparently enjoys metal. Possibly using a furrier transform… (sorry.)

Continue reading →