🔍

A.I. Learns to Drive From Scratch in Trackmania - YouTube

Channel: unknown

[0]

Each of these cars is controlled by an Artificial Intelligence (AI) in the racing game Trackmania.

[5]

This AI is not very intelligent yet.. But that's normal : it has just started to learn. In fact,

[12]

I want to use a method called Reinforcement Learning to make this AI learn by itself

[17]

how to drive as fast as possible. I also want it to become intelligent enough to

[21]

master various combinations of turns without ever falling off the road. And to ensure this,

[27]

the AI will have to pass a final challenge : to complete this giant track. But first of all,

[34]

how is a simple computer program supposed to learn things ? It's not the first time I'm experimenting

[42]

with AI in Trackmania. And to achieve this, i'm using a method called Machine Learning.

[49]

First, I'm running a program that controls the car in-game to make it turn and accelerate. So the AI

[56]

can choose between 6 different actions. But how can it decide which action to take ? The AI needs

[62]

to get information about the game. It receives that in the form of numbers called inputs.

[68]

Some inputs describe the state of the car, such as its current speed and acceleration.

[74]

Others indicate how the car is positioned on the road section it's currently crossing.

[80]

And the last inputs indicate what's further ahead. This is now what the AI sees when playing. But how

[87]

can it interpret that ? It needs to sort of use this data in an intelligent way. To link inputs

[94]

to the desired action, the AI is going to use a neural network, which basically acts like a brain.

[101]

Now, all that remains is to parameterize the neural network so that it results in fast

[106]

driving. And that's where Machine Learning comes into play. As I said earlier, the objective here

[112]

is that the AI learns to drive by itself. So it will have to experiment with different strategies,

[119]

through trial and error, to progressively select the neural network that leads to the best driving.

[125]

One way to do this would be to use a genetic algorithm.

[129]

I've already tried that in Trackmania and it works fairly well. Basically, the idea is to start with

[135]

a population of several AIs, each with its own neural network. All AIs compete on the same map,

[142]

and the best ones are selected and reassembled through a process similar to Natural Selection.

[148]

This can be repeated for many generations, to get a better and better neural network.

[154]

One problem with this method is that you only compare the different AIs based on their end

[159]

result. To make an AI progress, it might be better to give it feedback on what it did

[165]

well or not so well during the race. So it's time to try something else : Reinforcement Learning.

[172]

And this goes with a crucial idea : the concept of reward.

[179]

This time, the AI has only one goal in mind : to get as many rewards as possible.

[185]

The idea of reinforcement learning is to learn to pick the action that brings the most reward,

[190]

in any situation. In fact, this is quite like a pet being trained, which will interpret pleasure

[196]

or food intake as a positive reinforcement. But in Trackmania, there is no food. So how can we define

[202]

rewards ? the AI can take 10 actions per second. Each action will be associated with a reward equal

[210]

to the distance traveled up to the next action. So the faster the AI goes, the more rewards it gets.

[217]

If the AI ever tries to go the wrong way, it will receive a punishment,

[221]

which is actually just a negative reward. And if the AI falls off the road, it will be directly

[227]

punished by a zero reward, but also indirectly by stopping the race. Which means no more rewards.

[235]

Now, it's time to start training. To learn which inputs and actions lead to which reward,

[240]

the AI must first gather information about the game. This is the exploration phase. the

[246]

AI simply takes random actions and doesn't use its neural network for the moment. The runs are driven

[253]

one by one. And after a thousand of them, here is what the AI has explored of the map so far.

[260]

Each line corresponds to one race trajectory. the AI has already collected plenty of data about the

[267]

rewards it can expect to get for various sets of inputs and actions. Now, it's time to use this

[273]

data to train its neural network. This is the role of the reinforcement learning algorithm. There

[279]

are many different variants of this method and here I chose to use one called Deep Q Learning.

[287]

Basically, for a given set of inputs, the role of the neural network is to predict the expected

[292]

reward for each possible action. But which reward are we talking about ? is it an immediate one ? In

[299]

Trackmania, although some actions may result in an immediate positive reward, they may have

[305]

negative consequences in the long run. Sometimes, it may be useful to sacrifice short-term incomes,

[312]

for example by slowing down when approaching a turn, in order to gain more long-term reward.

[318]

the AI therefore needs to consider the long-term consequences of each action. To achieve this,

[324]

the AI tries to imagine the cumulative reward that it's most likely to obtain in the future.

[331]

Although the long term is important, an action still has more impact in the short term. Thus,

[336]

the events in the immediate future are weighted more. So each time the AI gets inputs, its neural

[342]

network tries to predict the expected cumulative reward for each possible action. And the AI just

[348]

selects the one with the highest value. Let's resume training where we left off. In parallel to

[355]

driving, the AI is continuously trying to improve its neural network with the data it collects.

[361]

But by only doing random exploration, the AI ends up not having much new to learn. Instead

[367]

of just exploring, it's time for the AI to also start exploiting the knowledge it has acquired,

[373]

meaning using its neural network instead of just acting randomly. the AI is still a bit

[379]

immature though, to only rely on its neural network. If it does too much exploitation, it

[384]

will just experience the same things over and over again, which will not teach it much. For now, I'm

[391]

setting the proportion of exploration at 90%, and I'll decrease it progressively during training.

[426]

After more than 20 000 attempts on this map, here is the best run the AI has done so far. The AI

[432]

drives quite carefully, and it's not too bad for a start ! It has definitely learned something.

[440]

Going further into the map, it seems a bit more complicated,

[443]

and the AI ends up falling. Time to get back to training !

[450]

At this point, you might think that the AI hasn't learned much,

[454]

after training on the same map for so many hours. But I think it's quite

[458]

normal. Reinforcement learning is known to require a large number of iterations to work.

[464]

The time displayed here is in-game time. Fortunately, training is faster in practice,

[470]

since I can increase the game speed using a tool called TMInterface. This project would probably

[476]

not have been possible without this tool, so a big thanks to Donadigo, its developer.

[483]

The AI has made some nice progress. The driving style it learned in the first turns

[488]

seems to apply well to the following ones, which shows a good capacity

[492]

of generalization. The AI has now reached a 5% exploration, which I will not decrease further.

[515]

It seems that the AI is stuck and can no longer progress. Here is its current personal best.

[524]

In the first part of the map, the AI shows very little hesitation.

[528]

This first portion has a lot of turns and short straights. But then the AI arrives

[533]

in a new section with mainly long straight lines. Its driving becomes a little sketchy.

[541]

At one point, it even stops, as if it's afraid to continue. After a long minute, it finally decides

[550]

to continue, and dies. The AI seems to have difficulty adapting to this new type of road.

[557]

Or maybe it just needs more time. To be sure, I decided to push the training a little longer.

[574]

After 10 000 more attempts, the AI hasn't made much progress. It still has a lot of trouble with

[580]

long straight lines. There may be several reasons for this, but I think the main one is overfitting,

[586]

which is common in machine learning. In the exploration phase, the AI practiced the same first

[591]

few turns over and over again. Its neural network became a specialist of this kind of trajectories,

[598]

learning them almost by heart, as if nothing else existed. But when the AI faces a new situation,

[604]

the driving style it learned in the past is no longer appropriate : it needs to adapt. In a way,

[610]

adapting means questioning everything it has learned in the past. If the AI tries to

[615]

drastically change its strategy to adapt to this new roads, it risks to break everything that was

[621]

working for the first few turns. When there is overfitting, there is no generalization.

[627]

So what's the solution ? Maybe the AI could drive each run on a different map, to constantly learn

[634]

new things. But at this point, I really don't want to spend hours building dozens of different maps.

[640]

So, I'm gonna do things differently. I'm going to restart training from the beginning. But now,

[647]

each time the AI will start a new run, it will spawn at a random location on

[652]

the map, with a random speed and a random orientation. This should limit overfitting,

[657]

since the AI will be forced to consider many different situations from the beginning.

[678]

This time, the AI is learning way faster. However, perhaps the AI managed to cover long distances

[685]

just because it spawned in easy sections of the map. The real challenge is still to complete

[690]

the track from start to finish. From now on, I will regularly test the AI outside of training,

[697]

on a normal race. Outside of training, I remove any exploration to optimize

[702]

the AI's performance. I also increase the action frequency from 10 to 30 per second.

[720]

The AI is able to drive in all sections of the map, so there is clearly less

[724]

overfitting this time ! Now, the AI only has to combine everything in one run.

[735]

In this attempt, the AI manages to surpass its previous record, going further than ever. But

[741]

it fails within 500 meters of the finish. It has never been so close to finish this map.

[748]

And finally, a few attempts later, and after 53 hours of training, AI gets this run.

[772]

The AI was able to complete 230 turns without ever falling. Sounds good, but

[779]

is the AI fast ? Now, it's my turn to drive, to compare.

[788]

After a few attempts, I made a run of 4 minutes and 44 seconds.

[792]

Without using the brake of course, for a fair comparison. So yeah, the AI is not very fast. But

[800]

training is not over ! Now, the AI has one goal : to finish this map as fast as possible.

[823]

6 minutes and 28 seconds. After this run, I continued training, and the AI kept getting

[830]

slightly faster on average, more consistent too, but it never managed to beat its personal best.

[837]

With this version of its neural network, the AI drives quite aggressively, and takes most

[841]

turns very sharply. It's quite surprising to see it survived the whole race with such a driving

[847]

style. But it's the best the AI has found. Perhaps there is still a way to improve the AI's record

[854]

one last time, still with the same neural network. If I randomly force some actions of the AI at the

[861]

beginning, here, the AI will have to adapt to this small perturbation. And this is the start

[867]

of a completely different run. Now, I can repeat this a few hundred times to see what happens.

[898]

And Here is the final improvement of AI's record. Not a big improvement,

[903]

but it was visually worth it ! There is still a big gap with human performance, but I'm still

[909]

very happy with the result. Trackmania is a game that requires a lot of practice, even for humans,

[915]

and from my experience I'm pretty sure this AI could beat a good amount of beginners.

[921]

If there's anything AI is doing well, it's generalization. It can adapt to any new map with a

[927]

similar road structure. I even tried to change the road surface to see if it could drive on grass,

[934]

and AI is doing quite well ! Same thing on dirt, even though the AI has never experienced these

[941]

surfaces during training. But can it still survive on a new map, with a mix of road

[947]

dirt and grass surfaces, and a few slopes and obstacles ?

[971]

So yeah of course there is room to improve this AI. But with reinforcement learning,

[976]

it seems that the main limitation is always the same : training time. Even with a tool to increase

[983]

game speed. That's why I never venture into more complex maps, and that's why I try to

[989]

limit any complexity in general : few inputs, no breaks, not too many actions per second,

[995]

and so on. Anyway for now, the AI has deserved to rest after those long hours of training.

[1002]

And maybe it will be back one day, with new surprises !

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage