🔍

what is stochastic gradient descent? - YouTube

Channel: Code Byte

[16]

Hello guys, welcome to another video

[18]

protesting gradient descent.

[21]

So in the last video, we completed successfully

[22]

gradient descent

[25]

and we end up with some problems

[27]

where gradient starts in the local Minima.

[30]

So how to come from stalkers law some local memo to the

[33]

global minimum.

[35]

So we will use

[37]

we will see the answer technique for that problem.

[40]

So this is the technique called stochastic gradient descent

[44]

statistic means, right?

[46]

I will explain it shortly or

[49]

this algorithm very clearly

[52]

first.

[52]

We see some

[54]

things before going into the store - the gradient descent

[57]

can think this is as a revised of gradient descent or

[60]

something you can think

[62]

so,

[63]

so what is our goal?

[65]

We need to

[66]

find the perfect gradients.

[69]

That means that gradients

[71]

at local not problems are Global minimum.

[74]

So where we start from top of the bowl, and we are

[78]

we are taking some steps to the downside.

[81]

Finally.

[81]

We are

[82]

end up with him Global video Global Moon.

[86]

So these called learning rate based on the learning rate.

[89]

We will take steps as a big steps or small tip small steps.

[95]

When the learning rate is too slow.

[97]

So our model will that gradients will take very small

[101]

step

[102]

from top to bottom.

[104]

So this affects the time so it will take lot of time to

[108]

converge those gradient gradients to the local not

[111]

loving muscle Global minimum.

[114]

And in the thing is when the when our learning rate is too

[117]

large.

[119]

So our gradients will jump from this side to this side on this

[122]

side to this side like that.

[124]

And another thing is you can in the last video.

[127]

I didn't change the

[130]

learning rate.

[131]

So I kept the 0.01

[134]

here we can go on change that learning rate and can play

[136]

with that Leningrad also, so you will find these

[140]

things

[140]

if learning rate is too high.

[142]

You will get some Nan who put that is I mean that means not

[145]

an integer.

[146]

Sometimes when or learning rate is

[149]

very less than our model takes a lot of time to converge in

[152]

visualize that thing also.

[156]

So the problem with gradient descent is sometimes it starts

[159]

at the local minimum so

[161]

can get from the locker room.

[163]

This is global.

[165]

So what is the solution?

[170]

Statistic gradients

[172]

so here the

[174]

air.

[174]

I want to refer want them.

[177]

That's gradient descent is same

[180]

nothing, but gradient descent rule batch gradient descent

[182]

means

[183]

taking the old data and passing it to the model and we are

[186]

finally we are getting the weights and logarithm values

[188]

gradients.

[190]

That means batch gradient descent missed all data.

[194]

But in the case of stress degraded means

[197]

we are going to take only one instance and we will pass that

[202]

instance nor model and we will find the gradient and will

[204]

like that we will

[206]

take

[208]

so here is the definition statistic radiant radiant just

[211]

picks a random incident in a training set at every each

[213]

iteration

[215]

then compute the gradients.

[217]

Well done only single instance.

[220]

So due to this we can work on that problem.

[224]

So when we structured the local minimum

[227]

by using this technique the so our gradients will Bones from

[230]

local music Global minimum.

[233]

And there are some problems with this technique also,

[235]

so I want to say this techniques.

[240]

So what is the main problem is

[242]

that gradients will bounce in this is technique that means

[245]

when they are coming to the global minimum, they won't

[248]

stop they will bounce again to left side or right side or

[252]

right side to left it somewhere.

[254]

[255]

you will get very good gradient values and our all over them

[259]

stops, but those grades are not optimal

[262]

it is the that you have to remember.

[265]

So how to solve that problem

[268]

one is this one solution is there

[270]

we have to reduce the lending rate of regeneration

[273]

that is called

[274]

simulated annealing

[276]

or you can call it as Leningrad Decay or anything you want.

[280]

That is the what is the solution for this problem

[283]

for this?

[284]

It's our downfall.

[286]

You can reduce the

[288]

bones red so it so finally you will get very good optimal

[292]

values by

[294]

using this.

[295]

learning rate schedule

[297]

So this is the war all area of the store has decreed in desert.

[302]

And another thing is I want to say that

[304]

these pics are taken from her elbow.

[307]

We can refer his book.

[308]

Also Mission Landing

[310]

so actually follows that only when I was a

[313]

beginner,

[315]

so it book is really very helpful for beginners.

[318]

So you can read that book and some machine learning with SQL

[321]

and tensorflow.

[325]

Okay.

[326]

Okay guys, we seen about statistic need indecent.

[330]

So in the next video, we will see how to implement this

[333]

touristy gradient descent from scratch and

[336]

how to use SQL Library

[340]

the actual SQL Library provides this stochastic gradient

[344]

algorithm by building so you can use that Library also show

[348]

how to do that.

[348]

I will teach in the next video.

[350]

I will say

[351]

okay guys tanker will see you in the next video.

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage