Machine Learning Resources for Mathematicians

What it feels like to wade into a new field.

I met up with some mathematician friends in Toronto yesterday, who were interested in how one goes about getting started on machine learning and data science and such. There’s piles of great resources out there, of course, but it’s probably worthwhile to write a bit about how I got started, and place some resources that might be of more interest to people coming from a similar background. So here goes.
First off, it’s important to understand that machine learning is a gigantic field, with contributions coming from computer science, statistics, and occasionally even mathematics… But on the bright side, most of the algorithms really aren’t that complicated, and indeed they can’t be if they’re going to run at scale. Overall though, you’ll need to learn some coding, algorithms, and theory.

Oh, and you need to do side-projects. Get your hands dirty with a problem quickly, because it’s the fastest way to actually learn.

Theory

On the theory front, I learned quite a lot by reading Christopher Bishop’s book Pattern Recognition and Machine Learning. It’s not an easy book if you don’t know anything, though: Bishop’s description of how K-means works (as a specialization of EM) isn’t terribly useful unless you’ve already got an idea of how K-means works, for example. So I found it very useful to watch talks from Andrew Ng’s Machine Learning course on Coursera, which is pitched at undergraduates, and then go read a relevant chapter from Bishop to get a more in-depth understanding and really think about the mathematics of a method. I find this to be a generally good approach: Find the ‘easy’ resources pitched at undergrads, burn through them quickly, and then dive directly into work aimed at grad students and researchers. The authors of the easy resources put a lot of effort into distilling the fundamental ideas, and it’s immensely helpful to have that starting point before wading into the deep end.

Another book I found quite useful is Flach’s Machine Learning: The Art and Science of Algorithms that Make Sense of Data. It’s a quick read, and deals with tree-based methods in depth, which aren’t quite as popular lately, but are fantastically useful. Decision trees train quickly, require no additional normalization step, and are very interpretable. Random forest, a technique which uses many decision trees together, often provides an as-good-as-anything machine learning solution, and gets one thinking how to make proper use of ensembles.

If you’re coming from mathematics, you probably already know linear algebra in some detail. You will learn it in greater detail. Knowing when it’s useful to compute an eigenvector (answer: always), and having some ability to figure out how to explain the eigenvectors that you find will make you immensely useful.

Coding

The way to get good at coding is to get a lot of practice.

For me, this is easy, because I view code as a way to solve problems and as a way to understand structures. When you start viewing code as a general problem-solving technique, you automatically get a lot more practice: Oh, I need to send grades to all my students? Let’s write a little python script to do it. Bam, practice. Oh, I need to count an obscure combinatorial object? I can write some code to generate the objects, and probably learn a lot about natural ways to arrange the objects in the process. Bam, lots of practice…

If you haven’t written any code at all, Codecademy is a great place to start learning Python. It is stupidly fantastic.

To get a lot of mathy practice, you can work problems at Project Euler, which collects interesting math problems that require a computer program to solve. The first few just ensure that you know your programming language, but they get much more interesting quickly.

Algorithms

At some point, I knew that I was probably going to be interviewing with Google, and started going through preparation for the interview process. The interviews are probably easier for a mathematician than they are for most people: thinking on your feet while writing precise statements on a whiteboard is pretty much exactly what math grad school prepares you for. It’s kind of like a qualifying exam, but with easier questions.

You do need to study, though. I took Steve Yegge’s advice and worked through Skiena’s Algorithm Design Manual, which is a fantastic book that falls solidly in the ‘quick and enlightening’ end of the spectrum. Work through all of the problems and you will probably kick the ass of any interview question that comes your way. (Unless it’s deep-in-the-weeds language knowledge stuff, which, ugh. And tends not to be in Google interview questions, anyways.) You can also read Cormen if you feel like you have lots of time…

You’ll also get a better sense of what algorithmic efficiency really means: in the real world of giant data sets, everything needs to run fast. My general experience is that any interesting algorithm starts its life running in cubic time. After a couple years, it gets pared down to something that runs in full completeness in quadratic time. And then somebody comes up with a burnt-out husk of the algorithm that runs in almost-linear time with a shady statistical guarantee that the answers aren’t complete garbage. Polynomial is cool and all, but at big enough scale, we need algorithms that run in almost-linear time.

Further Resources Everyone Should Know About

Kaggle is a website for data science competitions. It’s also a great source for side-projects if you don’t know what else to do. Go download a dataset and start hacking.
Scikit-Learn is the primary machine learning library in Python. It’s ‘fit’ and ‘predict’ framework makes it easy to try lots of different algorithms without much friction. You can build something that basically works with almost no effort.
Matplotlib is the Python library for basic plotting. It’s a bit more painful to use than it really should be, but it’s still a great library. Seaborn and some similar projects try to take some of the rough edges off, though.
Hacker News is where bored programmers hang out while their code compiles. There are also lots of stories about what’s going on in machine learning and algorithms research. You can also see other people’s random side projects, and aspire to get yours to the top!
Partially Derivative is a podcast about data science, and is sort of like having the relevant parts of Hacker News in audio form. They also have their own list of resources here.

Inventing Situations.

tom denton on math and things.