what is stochastic gradient descent? - YouTube

Channel: Code Byte

[16]
Hello guys, welcome to another video
[18]
protesting gradient descent.
[21]
So in the last video, we completed successfully
[22]
gradient descent
[25]
and we end up with some problems
[27]
where gradient starts in the local Minima.
[30]
So how to come from stalkers law some local memo to the
[33]
global minimum.
[35]
So we will use
[37]
we will see the answer technique for that problem.
[40]
So this is the technique called stochastic gradient descent
[44]
statistic means, right?
[46]
I will explain it shortly or
[49]
this algorithm very clearly
[52]
first.
[52]
We see some
[54]
things before going into the store - the gradient descent
[57]
can think this is as a revised of gradient descent or
[60]
something you can think
[62]
so,
[63]
so what is our goal?
[65]
We need to
[66]
find the perfect gradients.
[69]
That means that gradients
[71]
at local not problems are Global minimum.
[74]
So where we start from top of the bowl, and we are
[78]
we are taking some steps to the downside.
[81]
Finally.
[81]
We are
[82]
end up with him Global video Global Moon.
[86]
So these called learning rate based on the learning rate.
[89]
We will take steps as a big steps or small tip small steps.
[95]
When the learning rate is too slow.
[97]
So our model will that gradients will take very small
[101]
step
[102]
from top to bottom.
[104]
So this affects the time so it will take lot of time to
[108]
converge those gradient gradients to the local not
[111]
loving muscle Global minimum.
[114]
And in the thing is when the when our learning rate is too
[117]
large.
[119]
So our gradients will jump from this side to this side on this
[122]
side to this side like that.
[124]
And another thing is you can in the last video.
[127]
I didn't change the
[130]
learning rate.
[131]
So I kept the 0.01
[134]
here we can go on change that learning rate and can play
[136]
with that Leningrad also, so you will find these
[140]
things
[140]
if learning rate is too high.
[142]
You will get some Nan who put that is I mean that means not
[145]
an integer.
[146]
Sometimes when or learning rate is
[149]
very less than our model takes a lot of time to converge in
[152]
visualize that thing also.
[156]
So the problem with gradient descent is sometimes it starts
[159]
at the local minimum so
[161]
can get from the locker room.
[163]
This is global.
[165]
So what is the solution?
[170]
Statistic gradients
[172]
so here the
[174]
air.
[174]
I want to refer want them.
[177]
That's gradient descent is same
[180]
nothing, but gradient descent rule batch gradient descent
[182]
means
[183]
taking the old data and passing it to the model and we are
[186]
finally we are getting the weights and logarithm values
[188]
gradients.
[190]
That means batch gradient descent missed all data.
[194]
But in the case of stress degraded means
[197]
we are going to take only one instance and we will pass that
[202]
instance nor model and we will find the gradient and will
[204]
like that we will
[206]
take
[208]
so here is the definition statistic radiant radiant just
[211]
picks a random incident in a training set at every each
[213]
iteration
[215]
then compute the gradients.
[217]
Well done only single instance.
[220]
So due to this we can work on that problem.
[224]
So when we structured the local minimum
[227]
by using this technique the so our gradients will Bones from
[230]
local music Global minimum.
[233]
And there are some problems with this technique also,
[235]
so I want to say this techniques.
[240]
So what is the main problem is
[242]
that gradients will bounce in this is technique that means
[245]
when they are coming to the global minimum, they won't
[248]
stop they will bounce again to left side or right side or
[252]
right side to left it somewhere.
[254]
So
[255]
you will get very good gradient values and our all over them
[259]
stops, but those grades are not optimal
[262]
it is the that you have to remember.
[265]
So how to solve that problem
[268]
one is this one solution is there
[270]
we have to reduce the lending rate of regeneration
[273]
that is called
[274]
simulated annealing
[276]
or you can call it as Leningrad Decay or anything you want.
[280]
That is the what is the solution for this problem
[283]
for this?
[284]
It's our downfall.
[286]
You can reduce the
[288]
bones red so it so finally you will get very good optimal
[292]
values by
[294]
using this.
[295]
learning rate schedule
[297]
So this is the war all area of the store has decreed in desert.
[302]
And another thing is I want to say that
[304]
these pics are taken from her elbow.
[307]
We can refer his book.
[308]
Also Mission Landing
[310]
so actually follows that only when I was a
[313]
beginner,
[315]
so it book is really very helpful for beginners.
[318]
So you can read that book and some machine learning with SQL
[321]
and tensorflow.
[325]
Okay.
[326]
Okay guys, we seen about statistic need indecent.
[330]
So in the next video, we will see how to implement this
[333]
touristy gradient descent from scratch and
[336]
how to use SQL Library
[340]
the actual SQL Library provides this stochastic gradient
[344]
algorithm by building so you can use that Library also show
[348]
how to do that.
[348]
I will teach in the next video.
[350]
I will say
[351]
okay guys tanker will see you in the next video.