Demystifying Machine Learning through a recommendation system

Too much has been written about Artificial Intelligence being a black box and computer scientists don’t understand how these programs go about making decisions. This can be true once a system has been scaled up, but the fundamentals are still based on binary and computers selecting the right choice.

A really good place to start is with a machine learning recommendation system, because they go right back to the roots of binary. Either somebody clicked on the recommendation or they did not. (It is useful if students understand the idea that everything in a computer is binary first). This subset of machine learning is used on sites like Netflix, Youtube and Amazon. Everytime you see a recommended video on these sites, there is a recommendation system trying to guess the best video choice for you.

Students need to understand that everything in a computer is binary, so to start with we ask for a simple thumbs up or thumbs down to each of the following snacks:

Bakewell tart
Crisps (Chips in the US)
Jaffa Cakes
Pork pie
Sausage rolls

The system is then ordered by the number of thumbs up. You could choose to teach a sorting lesson or use the Python built-in sorting function. items.sort(reverse=True)

We now have our initial weightings and based on these we will order our list of snacks. If there is a tie in our system, this will be sorted by alphabetical order. A real recommendation system might randomly choose or even given a certain amount of time with each at the top to see which works better. (Code sample)

Good recommendations

The reason many recommendation systems are seen as black boxes is because of the number of factors that go into the system. So far, we have just used one factor. The next factor will be to see who chooses what and then dynamically change the ranking.

This is a bit like in Football (Soccer) where we choose a player based on the number of goals that they score, this might be an effective measure for a Striker and even for a striker this does not take into account the passes that enabled a goal to be scored or the amount of times that they lost possession. This becomes even more complex for midfielders and from this apparently simple game, you see the large number of complexities that would need to be programmed to make effective recommendations and predictions. There is now a platform for scouting new players called https://www.ai.io/ which uses millions of data points to select potential players for professional clubs including data such as: athletic, technical, and cognitive abilities, as well as other relevant information such as player’s age, height, weight, and playing history.

Many people have noticed that the longer they are using Youtube as a signed in member, then the better their recommendations will be. This is because whenever you are signed in it continues to collect data on what you clicked, how long you watched for and what others who watched similar videos to you also watched. Netflix recently shared information on its key metrics and these include: “starters” (i.e. households that watch two minutes of a film or one episode) and “completers” (i.e. households that watch 90% of a film or season of a series) for the first seven and 28 days on Netflix and “watchers” (i.e. households that watch 70% of a film or single episode of a series)

If we look at our snacks, there are many potential factors that could affect the popularity of a snack such as:

Weather – Ice cream is particularly popular on hot days.
Popularity – Overall popularity of the snack
Frequency Ordered – Sometimes people get bored of the same snacks
Cost – The cheapest snack might be the one people choose more frequently.

Once you have decided on the factors you then need to add a weighting and from there you will get a ranking.

It is useful to use a spreadsheet for this, because 2d arrays and lists can be challenging for younger students, but the chance to modify a spreadsheet is much more appealing especially when they have the power to change a chart to something they feel is more realistic.

Potential Dangers

Currently the ranking multipliers are fixed, but in many machine learning systems these can be adjusted by the program as the data comes in and it is seen how much of an effect it has on the system and this is a big problem. Even in our simple system of snacks, there are potential dangers and these are important points to discuss with students. The snacks seen so far are not particularly healthy and if machine learning continues to run purely on popularity it might not suggest any healthy snack choices. In this case the machine learning does not have to be fully autonomous and key rules can be programmed in to ensure it highlights healthy snacks.

However if the people creating AI systems do not think carefully then there is potential for dire consequences. Back in 2015, a Mercedes executive said that they would protect their driver at the expense of pedestrians. (Source: https://fortune.com/2016/10/15/mercedes-self-driving-car-ethics/) These consequences can be even more dire if these technologies are given lethal powers and in the news, a Colonel suggested that in order to win points, a drone might kill its operator. (Source: https://www.scmp.com/news/world/united-states-canada/article/3222714/ai-powered-drone-tried-kill-its-human-operator-us-military-simulation)

It’s essential that our students understand that AI is not a mystery, but computer programs that just like any other continue to follow rules that humans program. Only then will the next generation be confident enough to know that they have influence and do not have to accept it’s beyond their control.

(Originally Published in Hello World (22)

Author

James Abela

View all posts