The ML Box

Sat Jul 23 2016

5 min read

Cumilla, Bangladesh

Machine Learning is working like magic all around us. Suddenly, facebook is recognizing our faces, youtube is finding for us what feel's like we always wanted to watch but could not find, gmail is filtering all the spams for us. ML is at play in all these cases. Machine Learning has been the buzz word for some time now. So, let's take a look at what's inside this ML magic box.

Arthur Samuel said in 1959,

Machine Learing is the field of study that gives the computers the ability to learn without being explicitly programmed.

1959? Well, that does not seem like something new then, huh? But this field of AI picked up steam only recently due to the deluge of data and exponential imporvement in computer storage and hardware.

Tom Michel defined it in a more geeky way,

A computer program is said to learn from experience E with respect to some class of tasks T and a measure of performance P, if its performance at tasks in T, as measured by P improves with experience E.

Whoa! That went way over the head. Well, at least for the first few times for me.

Mowed down to ground, it is the ability to do things better with experience. We're all familiar with it. Happens to all of us. Just think of it in the case of computers, voila! You just understood what Machine Learning is.

I'll describe what voodoo goes in this ML box with this diagram (courtesy: Prof.Carlos and Emily from UW) and touch the small boxes with in. This should give you a feel about what goes on in ML world.

ML diagram

So, now you know what goes inside the box. Let's go for a ride.

Training Data

The entry point to the box is Training Data. You need data to teach the computer just as you need books to teach us humans. The data that is used to teach computers about a specific task is called Training Data.

Feature extraction

Now the data will not always be ready to go into the next box, that is the ML model box. So, we need to groom the data and make it a delicious dish for the next box. Data cleaning, transforming, analysis etc. goes in this Feature Extraction box. So, all the shaggy data goes into this box and comes out all neat and tidy(well, almost) ready to attend the prom, the ML model.

ML model

This is where you start flexing your ML muscles. The data is ready to give you all it's secrets. Now you just need the right model to take it all in. Think of yourself as the data(or features) and the ML Model box as a new hat you're trying. So, you wear the hat and come out as that y with the hat on top that is to the left of the ML Model box. Now, you need to know how you look. That's what takes us to the next box.

Quality metric

Think of this box like a mirror for you. U wear the hat and see how you look. On the ML universe, the ML Model box gave some output about the data in the form of that y with the hat. In the Quality metric box, we try to compare the real outputs from the training data (the y coming out beneath the Training data pot). So, now Brad Pitt looked real cool with the hat but it just is not suiting you. So, you try moving it a bit, change of colors, etc. That is what the next box does.

ML algorithm

In this box, you try to fix the looks to make you look more like Brad with that hat. In the ML universe, in this ML algorithm box we try to make the ML model better with the feedback from the Quality metric box so that it could fit better to the data. And this Model to Metric to Algorithm to Model loop keeps on going for sometime while you are wearing the hat a bit differently (Model), looking into the mirror (Quality metric) then changing some things again (ML algorithm). At some point, you are happy with your look, it's close enough to Brad Pitt now. So, you go out happy from this store and tell your friends to try exactly the same way you tried if they wanted to look like Brad as well. On the other universe, we stop looping over that Model to Metric to Algorithm to Model merry-go-round when the Quality metric is good enough for us and then we try to get output with this final model for alien data that was not in the Training pot.

To touch on a fast and short example, think about spam filtering. We feed the Model a lot of spam and non-spam Training data. So, eventually it grows an eye to see what spam and non-spam emails look like. So, then the model is deployed in front of your mailbox and it identifies spam mails as soon as it sees them and catches all the spam (well, almost all) before they can sneak in your inbox.

So, this sums up our ride through ML valley. Hopefully you have started to see some light inside that ML voodoo box. I'll refer to this box while explaining different ML models in future hopefully. Now go buy a hat!