馃攳
what is stochastic gradient descent? - YouTube
Channel: Code Byte
[16]
Hello guys, welcome to another
video
[18]
protesting gradient descent.
[21]
So in the last video, we
completed successfully
[22]
gradient descent
[25]
and we end up with some problems
[27]
where gradient starts in the
local Minima.
[30]
So how to come from stalkers
law some local memo to the
[33]
global minimum.
[35]
So we will use
[37]
we will see the answer
technique for that problem.
[40]
So this is the technique called
stochastic gradient descent
[44]
statistic means, right?
[46]
I will explain it shortly or
[49]
this algorithm very clearly
[52]
first.
[52]
We see some
[54]
things before going into the
store - the gradient descent
[57]
can think this is as a revised
of gradient descent or
[60]
something you can think
[62]
so,
[63]
so what is our goal?
[65]
We need to
[66]
find the perfect gradients.
[69]
That means that gradients
[71]
at local not problems are
Global minimum.
[74]
So where we start from top of
the bowl, and we are
[78]
we are taking some steps to the
downside.
[81]
Finally.
[81]
We are
[82]
end up with him Global video
Global Moon.
[86]
So these called learning rate
based on the learning rate.
[89]
We will take steps as a big
steps or small tip small steps.
[95]
When the learning rate is too
slow.
[97]
So our model will that
gradients will take very small
[101]
step
[102]
from top to bottom.
[104]
So this affects the time so it
will take lot of time to
[108]
converge those gradient
gradients to the local not
[111]
loving muscle Global minimum.
[114]
And in the thing is when the
when our learning rate is too
[117]
large.
[119]
So our gradients will jump from
this side to this side on this
[122]
side to this side like that.
[124]
And another thing is you can in
the last video.
[127]
I didn't change the
[130]
learning rate.
[131]
So I kept the 0.01
[134]
here we can go on change that
learning rate and can play
[136]
with that Leningrad also, so
you will find these
[140]
things
[140]
if learning rate is too high.
[142]
You will get some Nan who put
that is I mean that means not
[145]
an integer.
[146]
Sometimes when or learning rate
is
[149]
very less than our model takes
a lot of time to converge in
[152]
visualize that thing also.
[156]
So the problem with gradient
descent is sometimes it starts
[159]
at the local minimum so
[161]
can get from the locker room.
[163]
This is global.
[165]
So what is the solution?
[170]
Statistic gradients
[172]
so here the
[174]
air.
[174]
I want to refer want them.
[177]
That's gradient descent is same
[180]
nothing, but gradient descent
rule batch gradient descent
[182]
means
[183]
taking the old data and passing
it to the model and we are
[186]
finally we are getting the
weights and logarithm values
[188]
gradients.
[190]
That means batch gradient
descent missed all data.
[194]
But in the case of stress
degraded means
[197]
we are going to take only one
instance and we will pass that
[202]
instance nor model and we will
find the gradient and will
[204]
like that we will
[206]
take
[208]
so here is the definition
statistic radiant radiant just
[211]
picks a random incident in a
training set at every each
[213]
iteration
[215]
then compute the gradients.
[217]
Well done only single instance.
[220]
So due to this we can work on
that problem.
[224]
So when we structured the local
minimum
[227]
by using this technique the so
our gradients will Bones from
[230]
local music Global minimum.
[233]
And there are some problems
with this technique also,
[235]
so I want to say this
techniques.
[240]
So what is the main problem is
[242]
that gradients will bounce in
this is technique that means
[245]
when they are coming to the
global minimum, they won't
[248]
stop they will bounce again to
left side or right side or
[252]
right side to left it somewhere.
[254]
So
[255]
you will get very good gradient
values and our all over them
[259]
stops, but those grades are
not optimal
[262]
it is the that you have to
remember.
[265]
So how to solve that problem
[268]
one is this one solution is
there
[270]
we have to reduce the lending
rate of regeneration
[273]
that is called
[274]
simulated annealing
[276]
or you can call it as Leningrad
Decay or anything you want.
[280]
That is the what is the
solution for this problem
[283]
for this?
[284]
It's our downfall.
[286]
You can reduce the
[288]
bones red so it so finally you
will get very good optimal
[292]
values by
[294]
using this.
[295]
learning rate schedule
[297]
So this is the war all area of
the store has decreed in desert.
[302]
And another thing is I want to
say that
[304]
these pics are taken from her
elbow.
[307]
We can refer his book.
[308]
Also Mission Landing
[310]
so actually follows that only
when I was a
[313]
beginner,
[315]
so it book is really very
helpful for beginners.
[318]
So you can read that book and
some machine learning with SQL
[321]
and tensorflow.
[325]
Okay.
[326]
Okay guys, we seen about
statistic need indecent.
[330]
So in the next video, we will
see how to implement this
[333]
touristy gradient descent from
scratch and
[336]
how to use SQL Library
[340]
the actual SQL Library provides
this stochastic gradient
[344]
algorithm by building so you
can use that Library also show
[348]
how to do that.
[348]
I will teach in the next video.
[350]
I will say
[351]
okay guys tanker will see you
in the next video.
Most Recent Videos:
You can go back to the homepage right here: Homepage





