🔍

NVIDIA’s AI Puts Video Calls On Steroids! 💪 - YouTube

Channel: Two Minute Papers

[0]

Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér.

[3]

This paper is really something else.

[7]

Scientists at NVIDIA just came up with an absolutely insane idea for video conferencing.

[14]

Their idea is not to do what everyone else is doing, which is, transmitting our video

[19]

to the person on the other end.

[21]

No, of course not, that would be too easy!

[25]

What they do in this work, is take only the first image from the video, and they throw

[31]

away the entire video afterwards!

[34]

But before that, it stores a tiny bit of information from it, which is, how our head is moving

[41]

over time, and how our expressions change.

[45]

That is an absolutely outrageous idea… and of course, we like those around here, so,

[51]

does this work?

[52]

Well, let’s have a look.

[54]

This is the input video, note that this is not transmitted, only the first image and

[60]

some additional information, and the rest of this video is discarded.

[65]

And hold on to your papers, because this is the output of the algorithm compared to the

[70]

input video.

[72]

No, this is not some kind of misunderstanding, nobody has copy-pasted the results there.

[79]

This is a near-perfect reconstruction of the input, except that the amount of information

[84]

we need to transmit through the network is significantly less than with previous compression

[89]

techniques.

[91]

How much less?

[92]

Well, you know what’s coming, so let’s try it out!

[96]

Here is the output of the new technique, and here is the comparison against H.264, a powerful

[102]

and commonly used video compression standard.

[105]

Well, to our disappointment, the two seem close, the new technique appears better, especially

[112]

around the glasses, but the rest is similar.

[116]

And if you have been holding on to your papers so far, now, squeeze that paper, because this

[122]

is not a reasonable comparison.

[125]

And that is because the previous method was allowed to transmit 6 to 12 times more information.

[131]

Look, as we further decrease the data allowance of the previous method, it still can transmit

[137]

more than twice as much information, and at this point, there is no contest.

[143]

This bitrate would be unusable for any kind of videoconferencing, while the new method

[148]

uses less than half as much information, and still transmits a sharp and perfectly fine

[155]

video.

[156]

Overall, the authors report that their new method is ten times more efficient.

[162]

That is unreal.

[163]

This is an excellent video reconstruction technique, that much is clear.

[168]

And if it only did that, it would be a great paper.

[172]

But this is not a great paper, this is an absolutely amazing paper, so it does even

[179]

more.

[180]

Much, much more!

[181]

For instance, it can also rotate our head and make a frontal video, can also fix potential

[187]

framing issues by translating our head, and transferring all of our gestures to a new

[194]

model.

[195]

And, it is also evaluated well, so all of these new features are tested in isolation.

[202]

Look at these two previous methods trying to frontalize the input video.

[207]

One would think that it’s not even possible to perform properly given how much these techniques

[212]

are struggling with the task…until we look at the new method.

[218]

My goodness.

[219]

There is some jumpiness in the neck movement in the output video here, and some warping

[226]

issues here, but otherwise, very impressive results.

[230]

Now if you have been holding on to your papers so far, now, squeeze that paper, because these

[235]

previous methods are not some ancient papers that were published a long time ago.

[241]

Not at all!

[242]

Both of them were published within the same year as the new paper.

[247]

How amazing is that.

[249]

Wow.

[250]

I really liked this page from the paper, which showcases both the images and the mathematical

[256]

measurements against previous methods side by side.

[260]

There are many ways to measure how close two videos are to each other, the up and down

[265]

arrows tell us whether the given quality metric is subject to minimization or maximization,

[272]

for instance, pixelwise errors are typically minimized, so lesser is better, but we are

[278]

to maximize the the peak signal to noise ratio.

[281]

And the cool thing is that none of this matters too much as soon as we insert the new technique,

[287]

which really outpaces all of these.

[290]

And we are still not done yet!

[293]

So we said that the technique takes the first image, reads the evolution of expressions

[298]

and the head pose from the input video, and then, it discards the entirety of the video,

[304]

save for the first image.

[307]

The cool thing about this was that we could pretend to rotate the head pose information,

[312]

and the result is that the head appears rotated in the output image.

[317]

That was great.

[319]

But what if we take the source image from someone, and take this data, the driving keypoint

[325]

sequence from someone else?

[328]

Well, what we get is, motion transfer.

[332]

Look!

[333]

We only need one image of the target person, and we can transfer all of our gestures to

[339]

them, in a way that is significantly better than most previous methods.

[347]

Now, of course, not even this technique is perfect, it still struggles a great deal in

[355]

the presence of occluder objects, but still, just the fact that this is possible feels

[361]

like something straight out of a science fiction movie.

[365]

What

[375]

a time

[398]

to be alive!

[421]

Thanks for watching and for your generous support,

[429]

and I'll see you next time!

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage