馃攳
Optimization in Crowdsourcing - YouTube
Channel: unknown
[4]
Thank you, and it's a nice
follow up on the first part of
[8]
Collin's presentation,
which was on crowdsourcing.
[12]
So I was asked to speak
on optimization for
[15]
crowdsourcing and education.
[17]
I'm not gonna speak
on education very much,
[18]
because I haven't
done that much work.
[20]
But I've done some work
on crowdsourcing and
[23]
in particular in the context of
sequential optimization which is
[25]
what I'm going to
be talking about.
[26]
It is really nice to hear
the word POMDP in the morning
[29]
session come in twice. I feel
like I am in an AI setting, so
[32]
I am going to be talking
more about POMDPs.
[34]
Just to start this is
joint work with Dan Weld.
[36]
Most of this work has happened
at the University of Washington
[38]
even after my coming
back to India.
[40]
Some of the work that has been
happening in this particular
[43]
field is with Dan Weld and
two of our students who have
[47]
are about to get degrees at
University of Washington.
[51]
So this is the 30,000 feet view.
[54]
Crowdsourcing has become huge.
[56]
And this is probably the only
slide I'm gonna connect with
[59]
social good.
[61]
So crowdsourcing is social
good from one vantage point.
[64]
It has, starting 2006, when the
word crowdsourcing was formed,
[69]
and then Amazon Mechanical Turk
and many,
[71]
many different platforms have
really made crowdsourcing
[75]
grow very huge and
has grown really rapidly.
[78]
And I would make the point that
it democratizes labor, and
[81]
this is I think also the point
that Collin was trying to make
[83]
that now we can give labor to
a lot of people in the world.
[86]
At the same time there
are some challenges in making
[89]
crowdsourcing as successful
as we want it to be.
[92]
And it is our thesis that AI,
machine learning, optimization
[95]
can really make crowdsourcing
achieve its potential,
[98]
it can reduce work errors,
and in our experiments we
[100]
have sometimes been able to
reduce errors by up to 85%.
[103]
So we have really made and
sometimes we have found
[106]
crowdsourcing to be super
super successful, okay,
[109]
using the AI techniques.
[111]
So now, I don't have to go too
much into how crowdsourcing
[114]
came up, Wikipedia is one
of the first examples of
[117]
a crowdsourced encyclopedia,
and look at the viral growth of
[120]
Wikipedia articles that happen
when people got interested.
[123]
Citizen Science was being
referred to by Bart earlier and
[126]
Galaxy Zoo was one such example
which was getting 50 million
[129]
classifications in a year, which
machine learning algorithms were
[132]
not able to do at the time.
[134]
And it was not just that,
the human workers,
[137]
human citizen scientists
actually ended up discovering
[141]
special form of galaxies
known as the Pea Galaxies,
[144]
which the astronomers
had no idea about.
[146]
Most of my talk is gonna be
about labor marketplaces which
[149]
are projected to grow
by $5 billion by 2018.
[152]
These are older
projections and
[153]
some of the older statistics,
[154]
I couldn't find
the new statistics.
[156]
But basically for example at
Amazon Mechanical Turk we had
[160]
more than half a million workers
three or four years ago.
[163]
And oDesk was seeing 35
million hours clocked
[168]
in their platform in 2012. Now
they have become UpWork.
[172]
So there's so much going on.
[173]
There's so many very interesting
strengths which got me really
[176]
excited about crowdsourcing.
[177]
For example, world becoming
a unified labor force.
[181]
Like a global
meritocracy of kinds.
[183]
Whether you are in India whether
you are in Africa whether you
[186]
are in some Pacific island,
as long as you have internet
[190]
connectivity and skill to monetize
digitally, then you can probably
[192]
get work. This is such a cool
thing for mankind in general.
[196]
And of course it had many other
strengths, not from the social
[200]
side, social good aspect like it
was perfect for
[205]
startups, it was cloud
computing for human work.
[207]
Really interesting new
applications could come about,
[211]
for example,
[211]
one of my colleagues thought
about an app for blind people.
[215]
Where blind people stuck in
a new circumstance can actually
[218]
ask a question and some crowd
worker can answer their question
[220]
based on the image
that they are sending.
[222]
The things that you would
not have expected if such
[224]
a thing was not available and
so on.
[226]
But you can find pretty much
every kind of expertise on that
[228]
crowdsourcing platform.
[230]
At the same time there
are lots of challenges,
[232]
which is what got us interested.
[234]
The most important challenge was
high variance in worker quality.
[237]
There are challenges
about how to track
[239]
the quality of output and in
general broad challenges about
[241]
how do you get high
quality output?
[243]
Because these are crowdworkers,
some workers are awesome,
[246]
some workers are not that great,
some workers need training,
[248]
how do you manage this
large enterprise?
[251]
And there are many other things
where AI can play a role like,
[254]
usually you divide a complex
task into small micro tasks and
[258]
so how do you divide it,
how do you test such work flows?
[260]
How do you optimize them?
[261]
How do you figure out who's the
right worker to be working on
[264]
my kind of tasks and
so on and so forth?
[266]
So we have really worked very
hard on demonstrating the value
[268]
of the AI and
machine learning for
[270]
crowdsourcing. At a high
level it works like this.
[272]
So imagine this is an AI agent,
this is your requester who is
[274]
telling you what tasks
need to be solved, and
[277]
every AI system is operating
in some environment and so
[279]
here my environment is
a crowdsourcing platform.
[281]
And so when a new task comes
along, the AI system figures out
[284]
which jobs to give out and
when work comes out,
[287]
you can do your learning and
you can your planning.
[289]
So you can ask questions like,
what are my workflow parameters?
[292]
What are the optimal parameters?
[295]
What are individual
worker's abilities?
[296]
What are they good at?
[297]
What are they may
not be as good at?
[299]
How difficult is my task,
and so on?
[301]
So then by learning all these
parameters then you can
[304]
control the task
better to figure out,
[305]
if I have multiple workflows,
which workflow do I select?
[308]
Within a workflow
which job do I post?
[311]
When do I know that my work has
been done at a high quality?
[314]
If I have a worker who is good,
but has some hole in their
[316]
understanding, when
do I teach them?
[319]
When do I test their quality
and so on and so forth?
[321]
So we have done a reasonably large
body of work in this space, and
[327]
let me just give you a couple
of quick examples of the things.
[330]
For example,
[331]
everybody starts out with
these simple yes, no questions.
[334]
Is this bird an Indigo Bunting?
[335]
Now I can ask several of you and
some of you will say yes,
[338]
some of you will say no, and
hopefully people who know this
[340]
thing will give us answers,
but they may make mistakes.
[343]
So can we do some kind of
consensus on top of them?
[347]
But not just consensus.
[349]
When do we know we have
a good quality answer and
[352]
when do we know that we
need to ask more people?
[354]
That becomes a decision theory
question and so when using POMDPs
[357]
in a cost controlled setting
if you did dynamic optimization,
[361]
dynamic sequential optimization,
we got much higher accuracy
[364]
than the static controller
for the same cost.
[366]
So this got us interested, but
[368]
then we went to
more complex tasks.
[369]
Suppose this is your doctor, he
wrote this nice thing for you.
[375]
Now, how do you know
what is going on?
[376]
So, you can give it
to a crowdworker and
[378]
in $0.27 in this particular
case, they were able to get
[381]
the full transcription using
a very interesting workflow.
[384]
But they didn't know how to
optimize this workflow, we did.
[387]
So when we optimize this we
found that we were able to use
[390]
the same amount of money, but
[392]
get much higher quality of
image transcription and so on.
[395]
We have done taxonomization
of all the items.
[398]
And again, one of the graduate
students who was an HCI graduate
[401]
student came up with a very
creative workflow to do this
[404]
taxonomization, but she was
not an optimization person.
[408]
And so, when you model this for
[409]
optimization you find that for
13% of work
[412]
you can get the same quality if
you use optimization carefully.
[416]
We have many,
many examples of this.
[418]
Like, if you have lots of workers
with varying abilities and
[420]
if you have lots of questions
with varying difficulties,
[423]
we know which particular worker
to give each question to and
[426]
again red is our thing,
and higher is better.
[429]
We have thought about
when to test a worker,
[432]
and when not to test a worker,
[434]
and again higher is better and
we are the red one.
[437]
We have thought about when
to train a worker, and
[440]
what to train the worker on.
[441]
And again green is what we
could eventually achieve, but
[444]
we were able to achieve
red in our experiments.
[447]
And in the latest work,
[448]
we have done not just
quality-cost optimization,
[451]
but, we have also done
quality-cost-completion time
[453]
optimization.
[455]
If I start paying
each worker more,
[458]
then more workers will take my
job, my time will reduce, but
[461]
my budget will exhaust quickly.
[463]
So I will not be able to
solve the whole task, so
[465]
my quality will go down.
[466]
There's a very interesting
interplay that happens when I
[468]
start doing this
three-way optimization,
[470]
how to manage
the whole optimization.
[472]
So I'm be able to use
my budget the best,
[474]
with the parameters
that I have in my task.
[477]
We have a latest
paper coming out.
[479]
So at a high level we have
looked at intelligent
[482]
control and optimization in the
context of smart crowdsourcing.
[486]
Crowdsourcing has a lot
of advantages but
[489]
there are some challenges
to be resolved.
[491]
For example,
we can figure out what to ask,
[493]
how many times to ask,
who to ask, how/when to teach,
[496]
when to test, when to stop.
[498]
All these actions can be
taken by our AI agents
[502]
using POMDPs as its base, and
we can do it for data quality.
[507]
In other works,
we've also done it for
[509]
classifier quality. If I'm in
an active learning setting,
[511]
then I can use workers
very effectively.
[513]
And from our point of view,
[515]
from an AI-minded person's point
of view, it's a small step
[517]
towards the vision of human
intelligence and machine
[519]
intelligence coming together
to achieve something bigger.
[523]
In our more recent work we
are not only thinking about
[525]
democratization of workers, but
[527]
also thinking about
democratization of requesters.
[530]
Suppose you as an individual
want to give out a crowdsourcing
[533]
task, can I make it easy for
you to crowdsource?
[535]
There are a lot of
good practices, but
[537]
it's not very streamlined, still.
[538]
You don't know how
to make the right task,
[541]
you may be confused
about your own task.
[543]
You may not know what
[544]
the workers are getting
confused about.
[546]
How do you improve
the task design?
[548]
So we've done an HCI interface,
[550]
where we'll give you the
interface that will allow you to
[553]
learn your task better using
interactions with workers.
[556]
So this is the work that we
have done in the context of
[559]
crowdsourcing.
[560]
I also wanted to add that in my
other lives, I do research in
[563]
natural language processing, and
recently we have started two
[566]
very interesting projects. One
is on analyzing legal data sets.
[570]
So lots of court data in Indian
judicial system has been
[575]
languishing.
[576]
And we have started analysis to
understand which courts are
[579]
outperforming other courts, which
districts are doing well, which
[582]
districts are not doing well,
where are the cases getting
[585]
stuck, so that we can inform
the legal department someday.
[588]
And in second project, we have
started a healthcare initiative,
[593]
where we're trying to
read MRI images and
[595]
CAT scans automatically
to see if we can help
[598]
the radiologists do their
job more effectively.
[602]
And since we're talking
about social good, for
[605]
young people I also advised
a dating company to help
[608]
you find your right partner.
[609]
So there is a lot of very, very
interesting stuff that we can do
[613]
as an AI person in the mix, but
I always feel that until we
[617]
get the right partner to partner
with, who is the domain expert,
[621]
it's a non-starter for us.
[622]
So in the remaining time as we
are doing these discussions,
[625]
I would love to hear more about
interesting problems from domain
[628]
experts, where AI people can
actually help contribute.
[632]
So I'll stop here.
[633]
>> [APPLAUSE]
Most Recent Videos:
You can go back to the homepage right here: Homepage





