🔍

Optimization in Crowdsourcing - YouTube

Channel: unknown

[4]

Thank you, and it's a nice follow up on the first part of

[8]

Collin's presentation, which was on crowdsourcing.

[12]

So I was asked to speak on optimization for

[15]

crowdsourcing and education.

[17]

I'm not gonna speak on education very much,

[18]

because I haven't done that much work.

[20]

But I've done some work on crowdsourcing and

[23]

in particular in the context of sequential optimization which is

[25]

what I'm going to be talking about.

[26]

It is really nice to hear the word POMDP in the morning

[29]

session come in twice. I feel like I am in an AI setting, so

[32]

I am going to be talking more about POMDPs.

[34]

Just to start this is joint work with Dan Weld.

[36]

Most of this work has happened at the University of Washington

[38]

even after my coming back to India.

[40]

Some of the work that has been happening in this particular

[43]

field is with Dan Weld and two of our students who have

[47]

are about to get degrees at University of Washington.

[51]

So this is the 30,000 feet view.

[54]

Crowdsourcing has become huge.

[56]

And this is probably the only slide I'm gonna connect with

[59]

social good.

[61]

So crowdsourcing is social good from one vantage point.

[64]

It has, starting 2006, when the word crowdsourcing was formed,

[69]

and then Amazon Mechanical Turk and many,

[71]

many different platforms have really made crowdsourcing

[75]

grow very huge and has grown really rapidly.

[78]

And I would make the point that it democratizes labor, and

[81]

this is I think also the point that Collin was trying to make

[83]

that now we can give labor to a lot of people in the world.

[86]

At the same time there are some challenges in making

[89]

crowdsourcing as successful as we want it to be.

[92]

And it is our thesis that AI, machine learning, optimization

[95]

can really make crowdsourcing achieve its potential,

[98]

it can reduce work errors, and in our experiments we

[100]

have sometimes been able to reduce errors by up to 85%.

[103]

So we have really made and sometimes we have found

[106]

crowdsourcing to be super super successful, okay,

[109]

using the AI techniques.

[111]

So now, I don't have to go too much into how crowdsourcing

[114]

came up, Wikipedia is one of the first examples of

[117]

a crowdsourced encyclopedia, and look at the viral growth of

[120]

Wikipedia articles that happen when people got interested.

[123]

Citizen Science was being referred to by Bart earlier and

[126]

Galaxy Zoo was one such example which was getting 50 million

[129]

classifications in a year, which machine learning algorithms were

[132]

not able to do at the time.

[134]

And it was not just that, the human workers,

[137]

human citizen scientists actually ended up discovering

[141]

special form of galaxies known as the Pea Galaxies,

[144]

which the astronomers had no idea about.

[146]

Most of my talk is gonna be about labor marketplaces which

[149]

are projected to grow by $5 billion by 2018.

[152]

These are older projections and

[153]

some of the older statistics,

[154]

I couldn't find the new statistics.

[156]

But basically for example at Amazon Mechanical Turk we had

[160]

more than half a million workers three or four years ago.

[163]

And oDesk was seeing 35 million hours clocked

[168]

in their platform in 2012. Now they have become UpWork.

[172]

So there's so much going on.

[173]

There's so many very interesting strengths which got me really

[176]

excited about crowdsourcing.

[177]

For example, world becoming a unified labor force.

[181]

Like a global meritocracy of kinds.

[183]

Whether you are in India whether you are in Africa whether you

[186]

are in some Pacific island, as long as you have internet

[190]

connectivity and skill to monetize digitally, then you can probably

[192]

get work. This is such a cool thing for mankind in general.

[196]

And of course it had many other strengths, not from the social

[200]

side, social good aspect like it was perfect for

[205]

startups, it was cloud computing for human work.

[207]

Really interesting new applications could come about,

[211]

for example,

[211]

one of my colleagues thought about an app for blind people.

[215]

Where blind people stuck in a new circumstance can actually

[218]

ask a question and some crowd worker can answer their question

[220]

based on the image that they are sending.

[222]

The things that you would not have expected if such

[224]

a thing was not available and so on.

[226]

But you can find pretty much every kind of expertise on that

[228]

crowdsourcing platform.

[230]

At the same time there are lots of challenges,

[232]

which is what got us interested.

[234]

The most important challenge was high variance in worker quality.

[237]

There are challenges about how to track

[239]

the quality of output and in general broad challenges about

[241]

how do you get high quality output?

[243]

Because these are crowdworkers, some workers are awesome,

[246]

some workers are not that great, some workers need training,

[248]

how do you manage this large enterprise?

[251]

And there are many other things where AI can play a role like,

[254]

usually you divide a complex task into small micro tasks and

[258]

so how do you divide it, how do you test such work flows?

[260]

How do you optimize them?

[261]

How do you figure out who's the right worker to be working on

[264]

my kind of tasks and so on and so forth?

[266]

So we have really worked very hard on demonstrating the value

[268]

of the AI and machine learning for

[270]

crowdsourcing. At a high level it works like this.

[272]

So imagine this is an AI agent, this is your requester who is

[274]

telling you what tasks need to be solved, and

[277]

every AI system is operating in some environment and so

[279]

here my environment is a crowdsourcing platform.

[281]

And so when a new task comes along, the AI system figures out

[284]

which jobs to give out and when work comes out,

[287]

you can do your learning and you can your planning.

[289]

So you can ask questions like, what are my workflow parameters?

[292]

What are the optimal parameters?

[295]

What are individual worker's abilities?

[296]

What are they good at?

[297]

What are they may not be as good at?

[299]

How difficult is my task, and so on?

[301]

So then by learning all these parameters then you can

[304]

control the task better to figure out,

[305]

if I have multiple workflows, which workflow do I select?

[308]

Within a workflow which job do I post?

[311]

When do I know that my work has been done at a high quality?

[314]

If I have a worker who is good, but has some hole in their

[316]

understanding, when do I teach them?

[319]

When do I test their quality and so on and so forth?

[321]

So we have done a reasonably large body of work in this space, and

[327]

let me just give you a couple of quick examples of the things.

[330]

For example,

[331]

everybody starts out with these simple yes, no questions.

[334]

Is this bird an Indigo Bunting?

[335]

Now I can ask several of you and some of you will say yes,

[338]

some of you will say no, and hopefully people who know this

[340]

thing will give us answers, but they may make mistakes.

[343]

So can we do some kind of consensus on top of them?

[347]

But not just consensus.

[349]

When do we know we have a good quality answer and

[352]

when do we know that we need to ask more people?

[354]

That becomes a decision theory question and so when using POMDPs

[357]

in a cost controlled setting if you did dynamic optimization,

[361]

dynamic sequential optimization, we got much higher accuracy

[364]

than the static controller for the same cost.

[366]

So this got us interested, but

[368]

then we went to more complex tasks.

[369]

Suppose this is your doctor, he wrote this nice thing for you.

[375]

Now, how do you know what is going on?

[376]

So, you can give it to a crowdworker and

[378]

in $0.27 in this particular case, they were able to get

[381]

the full transcription using a very interesting workflow.

[384]

But they didn't know how to optimize this workflow, we did.

[387]

So when we optimize this we found that we were able to use

[390]

the same amount of money, but

[392]

get much higher quality of image transcription and so on.

[395]

We have done taxonomization of all the items.

[398]

And again, one of the graduate students who was an HCI graduate

[401]

student came up with a very creative workflow to do this

[404]

taxonomization, but she was not an optimization person.

[408]

And so, when you model this for

[409]

optimization you find that for 13% of work

[412]

you can get the same quality if you use optimization carefully.

[416]

We have many, many examples of this.

[418]

Like, if you have lots of workers with varying abilities and

[420]

if you have lots of questions with varying difficulties,

[423]

we know which particular worker to give each question to and

[426]

again red is our thing, and higher is better.

[429]

We have thought about when to test a worker,

[432]

and when not to test a worker,

[434]

and again higher is better and we are the red one.

[437]

We have thought about when to train a worker, and

[440]

what to train the worker on.

[441]

And again green is what we could eventually achieve, but

[444]

we were able to achieve red in our experiments.

[447]

And in the latest work,

[448]

we have done not just quality-cost optimization,

[451]

but, we have also done quality-cost-completion time

[453]

optimization.

[455]

If I start paying each worker more,

[458]

then more workers will take my job, my time will reduce, but

[461]

my budget will exhaust quickly.

[463]

So I will not be able to solve the whole task, so

[465]

my quality will go down.

[466]

There's a very interesting interplay that happens when I

[468]

start doing this three-way optimization,

[470]

how to manage the whole optimization.

[472]

So I'm be able to use my budget the best,

[474]

with the parameters that I have in my task.

[477]

We have a latest paper coming out.

[479]

So at a high level we have looked at intelligent

[482]

control and optimization in the context of smart crowdsourcing.

[486]

Crowdsourcing has a lot of advantages but

[489]

there are some challenges to be resolved.

[491]

For example, we can figure out what to ask,

[493]

how many times to ask, who to ask, how/when to teach,

[496]

when to test, when to stop.

[498]

All these actions can be taken by our AI agents

[502]

using POMDPs as its base, and we can do it for data quality.

[507]

In other works, we've also done it for

[509]

classifier quality. If I'm in an active learning setting,

[511]

then I can use workers very effectively.

[513]

And from our point of view,

[515]

from an AI-minded person's point of view, it's a small step

[517]

towards the vision of human intelligence and machine

[519]

intelligence coming together to achieve something bigger.

[523]

In our more recent work we are not only thinking about

[525]

democratization of workers, but

[527]

also thinking about democratization of requesters.

[530]

Suppose you as an individual want to give out a crowdsourcing

[533]

task, can I make it easy for you to crowdsource?

[535]

There are a lot of good practices, but

[537]

it's not very streamlined, still.

[538]

You don't know how to make the right task,

[541]

you may be confused about your own task.

[543]

You may not know what

[544]

the workers are getting confused about.

[546]

How do you improve the task design?

[548]

So we've done an HCI interface,

[550]

where we'll give you the interface that will allow you to

[553]

learn your task better using interactions with workers.

[556]

So this is the work that we have done in the context of

[559]

crowdsourcing.

[560]

I also wanted to add that in my other lives, I do research in

[563]

natural language processing, and recently we have started two

[566]

very interesting projects. One is on analyzing legal data sets.

[570]

So lots of court data in Indian judicial system has been

[575]

languishing.

[576]

And we have started analysis to understand which courts are

[579]

outperforming other courts, which districts are doing well, which

[582]

districts are not doing well, where are the cases getting

[585]

stuck, so that we can inform the legal department someday.

[588]

And in second project, we have started a healthcare initiative,

[593]

where we're trying to read MRI images and

[595]

CAT scans automatically to see if we can help

[598]

the radiologists do their job more effectively.

[602]

And since we're talking about social good, for

[605]

young people I also advised a dating company to help

[608]

you find your right partner.

[609]

So there is a lot of very, very interesting stuff that we can do

[613]

as an AI person in the mix, but I always feel that until we

[617]

get the right partner to partner with, who is the domain expert,

[621]

it's a non-starter for us.

[622]

So in the remaining time as we are doing these discussions,

[625]

I would love to hear more about interesting problems from domain

[628]

experts, where AI people can actually help contribute.

[632]

So I'll stop here.

[633]

>> [APPLAUSE]

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage