Skip to content

04




文本

████    重点词汇
████    难点词汇
████    生僻词
████    词组 & 惯用语

[学习本文需要基础词汇量:7,000 ]
[本次分析采用基础词汇量:6,000 ]

Okay. Let's get started, guys.

So welcome to lecture number 4.

Um, today we will go over two topics that are not discussed,

uh, in the Coursera videos.

Uh, you've been learning C2M1 and C2M2,

if I'm not mistaking.

So you've learned about, uh,

what, uh, an initialization is,

how to tune neural networks,

what tests validation and train sets are.

Today, we're going to go a little further, uh,

you should have the background to understand 80 percent of this, uh, lecture.

There is gonna be 20 percent that I want you to look back

after you've seen the BatchNorm videos for those of you who haven't seen them.

So we split the lecture in two parts,

and I put back the attendance code at the,

at the very end of the lecture so don't worry.

Ah, one topic is attacking, ah,

neural networks, ah, with adversarial examples.

Ah, the second one is generative adversarial networks.

[NOISE] And although these two topics have a common word which is adversarial,

they are two separate topics.

You will understand why it's called adversarial in both cases.

So let's get started with adversarial examples.

And in 2013, ah,

Christian Szegedy and his team have, uh,

published a paper called Intriguing Properties of Neural Networks.

What they noticed is that neural networks,

neural networks have kind of a blind spot, the spots, uh,

for which several machine learning including

the state of the art ones that you will learn about, ah,

VGG 16-19 inception, uh,

networks and resi - re-residual networks

are vulnerable to something called adversarial examples.

These adversarial examples you're going to learn what it is, in three parts.

First, by explaining how these examples in

the context of images can attack a network in their blind spots,

and, and make the network classify these images as something totally wrong.

How to defend against these type of examples,

and why are networks vulnerable to these type of examples.

This is a little bit more theoretical,

and we're going to go over it on the board.

The, the papers that are listed on the bottom are the two big papers that,

that started this field of research.

So I would advise you to go and,

and read them because we have only one hour-and-a-half to go over two big topics,

um, in, in, in deep learning and,

ah, we will not have the time to go into details of everything.

Okay. So let's set up the goal.

The goal is like, is that given a pre-trained network.

So a network trained on ImageNet on 1,000 classes, millions of images.

Ah, find an input image that is not an iguana,

so it doesn't look like the animal iguana.

A batch will be classified by the network as an iguana.

We will call this an adversarial example if we manage to find it.

Okay. Yeah, one question.

Ah, what was the magic code for those that came in this late?

Uh, let me - so 284889,

let me write it down on the board so that you can -

Thank you.

Can you guys see? [NOISE] Okay. Let's move on.

So we have a network pre-trained on image and it's a very good network.

Ah, what I want is to fool it

by giving it an image that doesn't

look like an iguana but is classified as an iguana.

So if I give it a cat image to start with.

The network is obviously going to give me a vector of

probabilities that has the maximum probability for a cat,

because it's a good network.

And you can guess what's the output layer of this network,

it's probably a softmax, so classification network.

Now what I want is to find

an image x that is going to be classified as an iguana by the network.

Okay. Does the, the,

the setting makes sense to everyone?

Okay. Now as usual, uh,

this ma - this, this might remind you of what

we've seen together about neural style transfer.

You remember the, the art generation thing,

where we wanted to generate an image based on

the content of the first image and the style of another image.

And in that problem,

the main difference with classic

supervised learning was that we fixed the parameters of the network,

which was also pre-trained,

and we back propagate the error of the loss all the

way back to the input image to update the pixels,

so that it looks like the content of the content image

and the style of the style image. The first thing we did is that we rephrased the problem.

We, we try to, to,

to phrase what exactly we want.

So wha - what would you say is a sentence that defines our last function let's say.

Any ideas?

Okay. Complicated. Yep.

An image that provides minimum cost.

An image that provides minimum cost.

Okay. What's the cost you're talking about?

Cost of the, the difference between the expected iguana and non-expected iguana.

Expected iguana and non-expected iguana.

Wha - what do you mean exactly by that?

So if we're sort of going back in the training session,

we're trying to train it on

an image and we wanted to think that [NOISE] this is a cat and iguana.

Yeah. Okay. So you want,

ah, this image to minimize a certain loss function,

and the loss function would be the distance metric

between the output you're looking for and the output you want.

Okay. Yeah. So I would say,

we want to find x, the image,

such that y hat of x,

which is the result of the forward propagation of x in the network is equal to y-iguana,

which is a one-hot vector with the one at the position of iguana.

Does that make sense? So now based on that we define our loss function,

which is can be an L2 loss,

can be an L1 loss,

can be a cross-entropy in practice.

Ah, this one, ah, works better.

So you see that minimizing this loss function,

would lead our image x to be outputted as an iguana by the network.

Does that makes sense?

And then the process is very similar to neural style transfer,

where we will optimize the image iteratively.

So we will start with x,

we will forward propagate it,

compute the loss function that we just defined.

And remember, we're not training the network, right?

We'll just take the, the derivative of the loss function all the way back to the inputs,

and update the input using a gradient descent algorithm until we

get something that is classified as iguana.

Yeah, any question on that?

But this doesn't necessarily mean that the x that you get in -

Okay. So you mentioned that it

doesn't guarantee that x is loo - going to look like something.

The only thing is guaranteeing is that

this x will be classified as an iguana if we train properly.

[NOISE] We will, we will talk about that now.

Er, another question in the back I thought. Yeah.

For the last question we miss the one that for logistic regression.

Oh yeah, it could be binary cross en - it could be cross entropy.

Yeah. So in this case not binary cross entropy because we have a, uh, uh, uh,

a vector of, of n classes,

where it could have been cross-entropy.

Okay. So yeah that's true.

We - are we guaranteed that the forged image x,

this one, i - is going to look like an iguana?

Who thinks it's going to look like an iguana?

If you - who thinks it's not going to look like an iguana?

Okay. Majority of people.

So can someone tell me why i - it's not going to look like an iguana?

[NOISE].

[inaudible] making a vector through a vector.

Okay. So you're saying, uh,

the loss function is unconstrained,

is very unconstrained, so we didn't

put any constraints on what the image should look like.

That's true. Actually, the answer to this question is,

it depends. We don't know.

Maybe it looks like an iguana or maybe it doesn't.

But in terms of probabilities,

it's high chance that it doesn't look like an iguana.

So the reason is here. Let's say this is our space of input images.

And the interesting thing is that even if as a human on

a daily basis we deal with images of the real world.

So like, I mean,

if you look at a TV,

uh, that is totally buggy,

you see pixels, random pixels,

but in other contexts,

we usually see real-world distribution images.

A network is deterministic,

it means it takes an image.

Any input image that fits the,

the first layer would,

would be - would produce an output, right.

So this is the whole space of input images that the network can see.

Um, this is the space of real images,

it's a lot smaller.

Can someone tell me what's the size of the, the,

the space of possible input images for a network?

[NOISE].

Infinite.

Huh? Sorry.

Infinite.

Infinite?

Yeah.

Uh, It's not infinite.

It's, it's a lot but not - [NOISE]

It's the number of the pixels to the power of the number of things it could be.

Okay. Uh, yeah, there is an idea here. Someone there?

I also said the same thing with just number of possible pixel permutations.

Yeah, that's true.

So more precisely - you would start with how many pixel values are there?

There are 255, 256 pixel values,

and then what's the size of an image?

Let's say 64 by 64 by 3,

and your results would give you 256,

so you fix the first pixel,

256 possible value, then the second one can be anything else,

then the third one can be anything else,

and you end up with a very big number.

So this is a huge number.

And the space of real images is here.

Now if we had to plot the space of im - of images classified as an iguana,

it would be something like that.

Right. And you see that there is a small overlap between the space

of real images and the space of images classified by - as an iguana by the network.

And this is where we probably are not.

We're probably in the green part that is not overlapping with the red part,

because we didn't constrain our optimization problem.

Does that make sense? Okay. Now we're

going to constrain it a little bit more, because in practice,

these type of attacks are not too dangerous because as a,

as a human we would see that the pictures look like garbage.

The dangerous attack is if the picture looks like a cat,

but the network sees it as an iguana and a human see it as a cat.

Can someone think of, uh,

of like malicious applications of that?

[NOISE] Face recognition, it could show a face of - you,

you could show your, your,

picture of your face, it pushed the network [NOISE] to think it's a face of someone else.

What else? Yeah.

Breaking CAPTCHAs and breaking like against bot detection.

Yeah. Breaking CAPTCHAs.

If you know what the output,

what output you want you can force the network to think that these CAPTCHA,

uh, thi - this input CAPTCHA is the output it's looking for.

Or in general, I would say like social medias, uh,

if someone is malicious and wants to put, uh,

violent content online,

there is - all these companies have algorithms to check for this violent content.

If people can use adversarial examples that look still violent,

but are not detected as violent by the algorithms using this methodology,

they could still publish their violent pictures.

Uh, think about self-driving cars.

A stop sign that looks like a stop sign for everyone,

but when the self-driving car sees it, it's not a stop sign.

So these are malicious applications of adversarial examples, and there are a lot more.

Okay. And in fact, the picture we generated

previously would look like that. It's nothing special.

So now let's constrain our problem a little bit more.

We're going to say we want the picture to look like a cat but be classified as an iguana.

Okay. So now say we have our neural network.

If we give it a cat it's going to predict that it's a cat.

What we want is still give it a cat but predict that it's an iguana.

Okay. I, I go quickly over that because it's very similar to what we did before,

so I just plucked, I just put back what we had on the previous slide.

Okay, exactly the same thing.

Now, the way we rephrase our problem will be a little different.

Instead of saying we want only y hat of x equals y - iguana,

we have another constraint.

What's the other constraint?

The picture x should be closer to a picture of a cat.

So we want x equal or very close to x-cat.

And in terms of loss function,

what it does is that it adds

another term which is going to decide how x should be close to x-cat.

If we minimize this loss now,

we should have an image that looks like a cat because of the second term,

and that is predicted as an iguana because of the first term.

Does that makes sense? So we're just building up our loss functions,

and I guess you guys are very familiar with this type of thought process now.

Okay, and same process,

we optimize until we hopefully get the cat.

Now our question is,

what should be the initial image we start with?

We didn't talk about that in the previous example [NOISE].Yeah.

White noise?

White noise.

Yeah, possibly white noise.

Any other, uh, proposals?

Maybe a cat.

A cat? Yeah, which cat?

The [inaudible] [NOISE].

I don't know. Probably the cat that we put in the loss function, right?

Because it's - is the closest one to what we want to get.

So if we want to have a fast process,

we'd better start with exactly this cat,

which is the one we put in our loss function here, right?

If we put another cat,

it's going to be a little longer because we have to

change the pixel of the other cat to look like this cat.

That's what we told our loss function.

If we start with white noise,

it will take even longer because we have to change the pixels all the way

so that it looks real and then it looks like a cat that we defined here.

So yeah, the best thing would be probably to start with the picture of the cat.

Does that makes sense? And then move the pixels so that

this term is also minimized. Yeah.

So when you write that loss function,

it seems like you are implicitly saying that what a human sees as a cat will

just be like minimizing the RMSE error to the actual cat picture, right?

Yeah?

Is that - I mean, I thought that RMSE error was

actually a really bad way to gauge whether or not a human,

like saw two images as similar.

Yeah. This is, this is empirical,

the fact that we use that type of, of loss function.

But in practice, it could have been any distance between X and X cat,

and any distance between Y hat and Y cat, yeah,

and Y iguana, sorry. Yes.

So when you say X cat is [inaudible] just one specific cat.

Yeah.

[inaudible].

Exactly.

I can't think of a way of making a constrained,

like a complex loss function that takes a bunch of cats.

And then it puts like something like a minimum of it, right?

The minimum distance between [inaudible]

Can we just look at this wide [inaudible]

probability of like 0.55 probability of iguana and cat and then try to [inaudible]

I'm not sure about the second method.

But just to repeat the point you mentioned,

is that here we had to choose a cat.

It means the X cat is actually an image of a cat.

So what if we don't know what the cat should look like,

we just want a random cat to come out and be classified as an iguana.

We're going to see uh, generating networks

after which can be used to do that type of stuff.

But, uh, but for the second part of the question,

I'm not sure what the optimization process would look like.

Okay, let's move on?

So yeah, it's probably a good idea to start

with the cat image that we specified in the loss function.

Okay. And so then we have an image of a cat that originally

was classified as 92 percent cat and we modified a few pixels.

So you can see that this image looks a little blurry.

So by doing this modification,

the network will think it's an iguana.

Okay? And sometimes this modification can be very slight and we

can even not be able to notice it. Sounds good.

Now, let's add something else to this,

uh, to this, uh,

to this, uh, draft.

We add a third set which is the space of images that look real to a human.

So that's interesting because the,

the space of images that look real to a human is

actually bigger than space - than the space of real images.

An example is this one.

This is probably an image that looks real to human,

but it's not an image that we could see in,

in the daily life because of these slight pixel changes.

Okay? So these are the space of dangerous adversarial examples.

They looked real to human but they're not actually real.

They might be used to fool a model.

Okay. Now let's see a video, uh,

by Kurakin et al, uh,

on real-world example of adversarial examples.

So for those who cannot read,

they're taking, uh, a camera which,

which classify - which has a classifier.

And the classifier classifies the first part as

a library and the second image that is that - the same as a prison.

So the second image has slight different pixels but

it's hard to see for a human. Same here.

So the, the the classifier on the phone classifies

the first image as a washer with 52 percent accuracy,

confidence, and the second one as a door mat.

Yeah. So yeah, this is,

uh, a small example of - of what can, what can be done.

Okay. Now let's go,

we've seen how to generate these adversarial examples.

It's an optimization process.

We will see, uh,

what are the type of attacks that we can

lead and what are defenses against these adversarial examples.

So we would usually,

uh, split the attacks into two parts.

non-targeted attacks and targeted attacks.

So non-targeted attacks mean that we just want outputs,

we just want to find an adversarial example that is going to fool the model.

While targeted attack is we want to force

this adversarial example to be output - to output a specific class that we chose.

These are two different type of attacks that,

that are widely discussed in, in the research.

Knowledge of the attacker is something very important.

For those of you who did some crypto,

you know that we talk about white-box attacks, black-box attacks.

So one interesting thing is that,

uh, a black-box attack - a white-box attack is when you have access to a network.

So we have our image in pre-train - in pre-trained network.

We have fully access to,

to all the parameters and, and the gradients.

So it's probably an easier attack.

Right? We can, we can back-propagate all the way

back to the image and update the image, like we did.

A black-box attack is when the model is probably encrypted or something like that,

so that we don't have access to its parameters,

activations, and, uh, architecture.

So the question is how do we attack in

black-box attack if we cannot back-propagate because we don't have access to the layers?

Any ideas? Yeah.

Numerical gradient.

Numerical gradient. Yeah, good idea.

So you know you will trick the image a little

bit and you will see how it changes the loss.

Looking at these you can,

you can do have an estimate of the numerical gradient.

Even if the model is a black-box model.

This assumes that you can query the model,

right? You can query it.

What if you cannot even query the model or you can query it one time only,

it's to send the adversarial example.

How would you do that? So this becomes more complicated.

So, there is ve - a very complex property of

these adversarial examples is that they're highly transferable.

It means I have a model here that is,

uh, an animal classifier, okay?

I don't have access to it.

I cannot even query it.

I still wanna fool it.

What I'm going to do is that I'm going to build my own animal classifier,

forge an adversarial example on it.

It's highly likely that it's going to

be an adversarial example for the other one as well.

So, this is called transferability,

and it's still a, uh, research topic, okay?

We're trying to understand why this happens and,

uh, also, uh, how to defend against that.

You know, maybe a defense against that is to,

is to - we're going to see it after, I'm not gonna say it now, sorry.

Uh, does that make sense or no, this transferability?

Probably is because two animal classifiers look at the same features in images, right?

And maybe these pixels that are play - we're playing

with are changing also the output of the other network.

Let's go over some kind of defenses.

So, one solution to defend against

these adversarial networks is to create a safet - Safety Net. A Safety Net is what?

Is, uh, a net that - like a firewall,

you would put it before your network.

Every image that comes in will be classified as fake like forged or real

by the network and you only take those which are real and no - not adversarial.

Does that make sense? So, you could - you could - you could say that,

okay, but we can also build an adversarial network that,

that fools this network, right?

Just we need black-box or white-box,

we can just create an adversarial net - example for this network.

It's true. But the issue is that now we have two constraints.

We have to fool the first one and the second one at the same time.

You know, maybe if you fool the first one,

there is a chance that the second one is going to be fooled.

We don't know, okay?

It just makes it more complex.

There is no good defense at this point to - to - to all type of adversarial examples.

This is an option that people are researching for.

So, the paper is here if you want to check it out.

Can you guys think of another solution?

[NOISE].

I've got one.

Yeah.

Just like multiple in terms of loss functions [inaudible]

adversarial examples loss functions and train them.

Train on multiple loss functions of different networks?

Yes.

So, you're talking about ensembling.

Maybe we can - maybe we can create five networks to do our tasks,

and it's highly unlikely that the adversarial example is going

to fool the five networks the same way, right?

Any other idea? Yes.

Uh, generate adversarial examples and train on them.

Exactly. Generate adversarial examples and train on those, okay?

So, you will generate a cat image that is adversarial.

So, some pixels have been changed to fool a network.

You will label it as the human sees it.

So as a cat because you want the network to still

see that as a cat and you will train on those.

The downside of that is that it's very costly.

We've seen that generating adversarial examples is super

costly and also we don't know if it can generalize to other adversarial examples.

Maybe we are going to overfit to the ones we have.

So, it is another optimization problem.

Now, another solution is to

train on adversarial examples at the same time as we train on - on normal examples.

So, look at this loss function.

This loss function, the loss mu is a sum of two loss functions.

One is the classic loss function we would use.

So, let's say, cross entropy in the case of a - of

a classification and the second one is

the same loss function but we give it the adversarial version of x.

So, what's the complexity of that at every gradient descent step?

For every iteration of our gradient descent,

we're going to have to iterate enough to forge

an adversarial example at every step, right?

Because we have x, what we wanna do is

forward propagate x through the network to compute the first term,

generate x adversarial with the optimization process and forward propagate

it to calculate the second term and then back propagate over the weights of the network.

This is super costly as well and is very similar to what you said,

it's just online just all the time, okay?

So, what is interesting is we're going to delve a little more.

There's another technique called logit pairing, I just put it here.

We're not going to talk about it.

There is the paper here if you want to check it.

It's another way to do adversarial training.

Uh, but what I would like to talk about is more,

from a theoretical perspective,

why are neural network vulnerable to adversarial examples?

So, let's, let's do some,

some work on the board.

Yeah, one question.

Let's say, uh, so, when you want to expose the [inaudible] probably look like a cat, all right?

So, you expect to be able to [inaudible] can't you just [inaudible] denoise it [inaudible]?

Denoising is also a method that's interesting, but you - so the thing is that it's just like in crypto,

every time you come up with a defense,

someone will come up with an attack and it's a race between humans, you know.

So, this is the same type of problem.

And security problems are open-ended.

Okay. So, let's go over, uh,

something interesting that is more on the ins - on the intuition side of adversarial examples.

So, let me - let me write down something.

Uh, so, one question we ask ourselves

is why do adversarial example exist? What's the reason?

And Ian Goodfellow and - and his team have came up

with explaining - with the - one of the seminal papers of adversarial examples,

where they argue that although many people in the past have - have

attributed this existence of adversarial examples to

high non-lineari - non-linearities of neural networks and overfitting.

So, because we over-fit to a specific dataset,

we actually don't understand what cats are.

We just understanding what,

what we've been trained on.

Uh, they argue that it's actually the linear parts of networks that

is the cause of the existence of adversarial examples. So, let's see why.

And the example I'm gonna - I'm gonna look at is linear regression.

So, together we've seen logistic regression.

Linear regression is basically the same thing without the sigmoid.

So, before the sigmoid,

we have y-hat equals wx plus b.

So, the forward propagation of our network is going

to be y-hat equals wx plus b, okay?

And our first example is going to be a six-dimensional input.

Okay. We have a neuron here,

but the neuron doesn't have any activation because we're in linear regression.

So here what happens is simply w x plus b.

Okay? And then we get y-hats.

And we probably use an L1 or L2 loss because it's a regression problem to,

uh, to train this network.

Now let's look at our first example.

Our first example where, uh,

where it's - where we trained our network.

So network has been trained - sorry.

Network has been trained and

converged to

w equals one,

three, minus one, two, two, three.

This is w. And you know, like,

because we defined x to be a vector of size 6,

a column vector, w has to be a row vector of size 6.

So the network converge to this value of w and b equals 0.

So now, we're going to look at these inputs.

We're giving a new input to the network.

And the net - th - the input is going to be one,

minus one, two, zero, three, minus two.

Okay. So I'm going to forward propagate this to get y-hat equals wx plus b.

[NOISE].

And this value is going to be 1 times 1 minus 3

minus 2 plus 0 plus 6 minus 6.

If I didn't make a mistake, up, up,

2 minus 3 plus, okay.

[NOISE] Okay.

And so we - we - we basically get minus 4.

And so this is the - the - the first - the first example that was propagated.

Now, the question is [NOISE] how to change x

into x-star

such that y-hat changes

radically but x-star is close to x?

So this is basically a problem of adversarial examples.

Can we find an example that is very close to

x but radically - radically changes the output of our network?

And we're trying to build intuition on - on adversarial neural networks.

So the interesting part is to - is to identify how we should modify x.

And the intuition comes from the derivative.

If you take the derivative of y-hat with respect to x,

you know that the definition of this term is - is like correlated to

the impact on y-hat of

small changes of x, right?

How - what's the impact of small changes of x to - on - on the output?

And if you compute it, what do you get?

W.

W? Everybody agrees?

What's - what's the shape of this thing?

Shape of that is the same as shape of x.

So it should be w-transpose.

Remember, derivative of a scalar with respect to a vector is the shape of the vector.

Okay. Now it's interesting to - to see this because if we compute x-star to be,

let's say, x plus a small perturbation like,

I will call it, perturbation value.

Can you write bigger?

Yeah. Sorry. And can you see the top one?

Yeah.

You said yes or no?

Yes.

Okay. [NOISE]. So what if x-star equals x plus epsilon times w-transpose?

You know, and this epsilon,

I will call it value of the perturbation.

Now, if we forward propagate x-star,

it means we do y-hat-star equals w x-star plus b,

would be zero at this point.

We're going to get w x plus epsilon w times w-transpose.

And w times w-transpose is a dot product, right?

So this is the same as w-squared.

So what is interesting?

It's interesting because the -

the smart part was that this term is always going to be positive.

It means we - we moved a little bit x because we

can make this change little by changing epsilon to a small value.

But it's going to push y-hat to a larger value for sure. You know?

And if I had a minus here instead of a plus,

it will push y-hat to a smaller value.

And the - the interesting thing is, now,

if we compute x-star to be x plus epsilon times w-transpose,

and we take epsilon to be a small value like, let's say, 0.2.

You can make the calculation.

What we get is - is this.

So 1, minus 1, 2, 0,

3, minus 2, plus 0.2 times 1,

0.2 times 3, minus 0.2,

plus 0.4, plus 0.4, and plus 0.6.

So if you look at that,

all the positive values have been pushed on the right. You agree?

And all the negative values - uh, sorry, sorry.

No, that's my bad. No, no, that's not it.

So let - let's finish the calculation and I'll give the insight after.

1.2, minus 0.4,

1.8, 0.4, 3.4, and minus 1.4.

So this is our x-star that we hope to be adversarial.

Okay. Let's compute y-hat-star to see what happens.

It's w x-star plus b, which is zero.

So what we get when we multiply w by x-star is 1.2 -

[NOISE]

1.2 minus 1.2,

minus 1.8 plus 0.8

plus 6.8 and minus 4.2.

[NOISE], which I believe is going to give us 0.5.

All right.

So we see that a very slight change in x-star has pushed y-hat from minus four to 0.5.

And so a few things we want to notice here.

[NOISE].

So insights on this - on this small example.

The first one is that, uh,

if W is large,

then X star is not similar to X, right?

The larger the W, the less X star is - is likely to be like X.

And specifically, if one entry of W is very large,

XI, the pixel corresponding to this entry is going to be very different from XI star.

Um, if W is large,

X star is going to be different than X.

So what we're going to do is that we are going to take

sign - sign of W instead of taking W. What's the reason why we do that?

Because the interesting part is the sign of - of the W. It means,

if we play correctly with the sign of W,

we will always push the X,

this term WX star in the positive side.

Because every entry here,

this multiplication is going to give us a positive number, right?

And the second insight is that as X grows in dimension,

the impact of plus epsilon sign of W increases.

Does that make sense? So the impact

of sign of W on Y hat increases.

And so what's interesting to notice is that we can keep epsilon as small as possible.

It means X and X star will be very similar but as we grow in dimension,

we're going to get more term in this, a lot more term.

And the change in Y hat is going to grow and grow and grow and grow and grow.

And so the one reason why adversarial examples

exist for images is because the dimension is very high,

64 by 64 by three.

So we can make epsilon very small and take the sign of W,

we will still get Y hat to be far from the original value that it had.

Does that make sense? Yeah. Do you guys have any questions on that?

So epsilon doesn't grow with the dimension,

but its impact of this term increases with the dimension.

[NOISE] Okay.

[NOISE].

The one hot encoder changes what into what? So you have the input image cat, right?

Yeah.

It puts it right between these two that gives [inaudible].

Okay. So you like - you try to unadversarially [inaudible] the cat?

Yeah.

Yeah. I - I don't know if that had been done.

I don't think that has been done.

So you're talking about taking an encoder that takes the adversarial example,

convert it into a normal image of the cat and then give the cat.

Yeah.

Maybe yeah. I don't know.

So it's a topic of research.

Uh, okay, let's move on because we don't have too much time.

So just to conclude,

what we're going to count as

a general way to generate adversarial examples is this formula.

[NOISE]

This is going

to be a fast way to generate adversarial example.

So this method is called the fas - Fast Gradient Sign Method.

So basically what we're doing is that we can - we - we are linearizing

the cost function in - in the proximity of, uh, the parameters.

And we're saying that what's applied to linear networks here

is going to also apply for this general formula for deeper networks.

So we're pushing the pixel images in one direction

that is going to impact highly the output, okay?

So that's the intuition behind it.

Now you might say that, okay,

we did this example on the linear network,

but neural networks are not linear,

they are highly non-linear.

In fact, if you look where the research has been going for the past few years,

we are trying to linearize all the behaviors of these neural networks.

With ReLU for example, or with Xavier initialization.

All that type of methods,

even the sigmoid, when we train on sigmoid,

we do all we can to put sigmoid in the linear regime,

because we want fast training.

Okay? And one last thing that I'll mention for

adversarial examples is if I have a network like this.

[NOISE]

So fully connected

with three-dimensional inputs, up, yeah.

And then one here and then the output.

What's interesting is computing the chain rule on - on - on this neuron,

will give you that derivative of the loss function with respect to, let's say,

X is equal to the derivative of the loss function with respect to Z one, one.

Here times derivative of Z one,

one with respect to X.

Let's say we're - we're going - we're going,

there is actually a summation here.

But anyway. Uh, just let me illustrate the point.

Uh, what we're - what we're saying is that - what we're - what we

try to do with neural networks is to have this gradient be high.

Because if this gradient is not high,

we're not able to train the parameters of

this neuron and we need this gradient to be high.

Because if you want to do the same thing with the - with W one,

one, which is the parameter related to this neuron,

you would need to go to this chain rule.

Correct? So we need this gradient to be high.

And if this gradient is high,

the gradient with respect to the input is also going to be high.

Because you use the same gradient in the chain rule.

So networks that are - that have

high gradients and that are operating in the linear regime are even more,

uh, vulnerable to adversarial examples because of this observation.

So any question on, on adversarial examples?

Before we move on, I think we don't have time and I would like to,

to go over the, the GANs with you guys.

So let's move on to GANs.

I'll stick around to answer questions on that part.

So the general question we're asking now is,

uh, do neural networks understand the data?

Because we've seen that some,

some data points look like they would be real,

uh, but the neural networks don't understand it.

So more generally, uh, can we build generated networks that

can mimic the real-world distribution of images?

Let's say, and this is what we will call generative adversarial networks.

We'll start by motivating it,

and then we look at something called the minimax game between two networks,

a generator and a discriminator,

that are going to help each other improve,

and finally we'll see that GANs are hard to train, uh,

we'll see some tips to train them, and finally,

go over some nice results and methods to evaluate GANs, okay?

So, uh, the motivation behind generative adversarial networks is to handle

computers with an understanding of our world, okay?

So by, by that we mean that we want to collect a lot of data,

use it to train a model that can generate

images that look like they're real even if they're not,

so a dog that has never existed can be generated by this network.

Um, and finally, uh,

the number of parameters of the model, uh,

is smaller than the amount of data,

we already talked about that,

and this is the intuition behind why a generated network can exist.

Is because there is too much data in the world,

any images count as data for generating the network,

and there are not enough parameters to mimic this data.

You know, you have - the network needs to understand the salient features of the dataset,

because it doesn't have enough parameters to overfit everything.

So let's talk about probability distributions.

So these are samples from real images that have been taken,

and if you plot this real data distribution in a 2-D map,

uh, it would look like something like that.

I made it up, but this is

the image space similar to what we talked about in adversarial networks,

and this green shape is the space of real-world images.

Now, uh, if you train a generator and generate some images that look like this,

and these images come from StackGAN, uh, from Zhang et al.

Uh, this distribution, if the generator is not good,

is not going to match the real world distribution.

So our goal here is to do something so

that the red distribution matches the real-world distribution,

then to train the network so that it realizes what we want.

So this is our generator and it's what counts,

is what, what we want to train ultimately.

We want to give it, let's say,

a random number or a random latent code of 100 dimension scalar numbers,

and we want it to output an image.

But of course because it's not trained initially,

it's going to output a random image,

looks like something like that random pixels.

Now, this image doesn't look very good.

What we want is these images to look like

generated images that are very similar to the real world.

So how are we going to help this generator train?

It's not like what we did in classic supervised learning,

because we don't have,

uh, we don't really have inputs and labels,

you know, there is no label.

We could maybe give it an image of a cat and ask it to output another cat,

but we want the network to be able to output things that don't exist,

things that we've never seen.

Right. So we want the network to understand what a cat

is but not overfit to the cat we give it.

So the way we're going to do it is through

a small game between these network called the generator G,

and another network called the discriminator D. Let's,

let's look at how it works.

We have a database of real images,

and we're going to start with this distribution on the bottom,

which is the real-world data distribution,

is the distribution of the images in this database.

Now our generator has this distribution initially,

it means the pixels that you see here

probably follow a distribution that doesn't match the real world.

We'll define the discriminator D,

and the goal of the discriminator will be to detect if an image is real or not.

So we're going to give several images to this discriminator,

sometimes we will give it generated images,

and sometimes we will give it real-world images.

What we want is that this discriminator is a binary classifier that outputs

one if the image is real and zero if the image was generated, okay?

So let's say we give it x coming from the generated image is going to give us zero,

because we want the discriminator to detect that x was actually G of z.

If the image came from our database of real images,

we want the discriminator to say one.

So it seems like the discriminator would be easy to train, right?

It's just a binary classification.

We can define a loss function.

That is the binary cross entropy.

And the good thing is we can have as many label as we want,

like it's, it's unsupervised but a little bit supervised, you know,

we have this database and we label it all as one,

it's just this image exists,

let's label them as one for discriminator,

and everything that comes out of the generator let's label it as zero for discriminator.

So basically, data is not costly at all in this point.

The way we will train is that we will backpropagate

the gradient to the discriminator to train the discriminator,

using a binary cross entropy.

But what we ultimately want is to train the generator, that's what we want.

At the end, we were not going to use the discriminator,

we just want to generate images.

So we are going to direct the gradient to go back to the generator.

And why does this gradient can go back to the generator?

The reason is that x is G of z,

it means we can backpropagate

the gradient all the way back to the input of the discriminator.

But this input depends on the input of the generator if the image was generated.

So we can also backpropagate and direct

the gradient to the generator. Does it make sense?

There is a direct relation between z and the loss function,

in the case where the image was generated.

If the image was real,

then the generator couldn't get the gradient,

because x doesn't depend on z or on the features and parameters of the generator.

Okay? So we would run an algorithm such as Adam,

um, simultaneously on two minibatches,

one for the true data and from, from generated data.

Does this scheme makes sense to everyone?

Yeah, one question?

So you said there was two minibatches, you're not mixing two and generating it together.

So there's many methods of, your question is about mixing the minibatches.

Usually we would use, uh, we would,

we would use one minibatch for the real data and one minibatch for the fake data.

But in, in practice,

you can try other things.

Yeah. So there are many methods that are being tried to train GANs property.

We're going to delve a little more into

the details of that when we will see the loss functions.

So we hope that the probability distributions will match at the end,

and if it matches,

we're going to just take the generator and generate images,

normally it should be able to generate images that look real,

[NOISE] that looked like they came from this distribution.

Okay? Sounds good?

So now let's talk more about the training procedure and

try to figure out what the loss functions should be in this case.

What should be the cost of the discriminator?

Assuming, assuming we give two minibatches,

one for real data, so real images,

and one for generated data that come from G [NOISE].

Yes.

The same basic loss function we use for every binary classes, right?

The same basic loss function we use from binary class - for binary class case.

It's true we're going to tweak it a tiny bit,

but it's the same idea.

So this is what it can look like.

We're going to call it JD,

cost function of the discriminator.

It has two terms. What does the first term say?

What does the second term say?

And you can recognize the binary cross-entropy here.

[NOISE].

The only difference is that we have

a label that is Y_real and a label that is Y_generated.

In practice, Y_real and Y_generated are always going to be set to values.

We know that Y_generated is zero and we know that Y_real is one.

So we can just remove these two terms because they're both equal to one.

The first term is telling us this should correctly

label real data as one, the cross-entropy term.

The first term of a binary cross-entropy.

The second term is going to tell us,

D should correctly labeled generated data as zero.

So the difference with classic cross-entropy we've seen is that,

this summation is the summation over the real mini-batch.

And the summation on the second cross-entropy is a summation on generated mini-batch.

Does that makes sense?

So we both want the D to correctly identify real data,

and also correctly identify fake data.

That's why we have two terms.

Now, what about the generator?

What do you think should be the cost function of the generator? Yes.

So just about that cost function.

If I've been putting data that's from the generator,

I won't run the first pass because I don't have a,

uh, a Y_real if I have the - an input that's coming in from the generator.

Yeah. Exactly.

It's about half of this.

Yeah. But in your batch, we have had, like,

a certain number of real example,

a certain number of generated examples.

The generated examples have no impact on the first cross-entropy,

and same for the real examples on the second cross-entropy. Any other questions?

Okay. So coming back to the cross - to the - to the cost of the generator.

What should it be? This is a tiny bit complicated.

Let's move - let's move on because we don't have too much time.

The cost of the generator basically should say that G should try to

prove D. [NOISE] The goal is to for G to generate real samples.

And in order to generate real samples,

we want to fool D. If G managed to fool D and D is very good,

it means G is very good, right?

The problem is that it's a game.

Because if D is bad and G fools D,

it doesn't mean that G is good.

Because G - because D is bad,

it doesn't detect very well the real versus fake examples.

We want D to go up to - to be very good and G to go up at the same time.

Until the equilibrium is reached at a certain point where D will always output one-half,

like, random probabilities because it cannot

distinguish the samples coming from G versus the real samples.

So this cost function is basically saying, uh,

for generated images, we want D to classify them as one.

That's what it's saying. We want to fool D,

okay? Yeah. One question.

Uh, just a little bit of a side question, um, I

can kind of see - so if you're implementing this,

I can kind of see how you would, uh, you know,

implement for D, but how would you implement for D as if you're actually implementing this?

Um, is there - has there been a module to dot train this

because it's not immediately obvious how you do this setup?

So, you know, like, if you - if,

if you're using - so how to implement that?

If you're using a deep learning framework,

you've been building a graph, right?

And at the end of your graph,

you've been building your cost function D that is very close to a binary cross-entropy.

Uh, what you're going to just do is to define a node that is going to be minus

the cost function of D. It's going - every time you are going to call the function J of G,

it's going to run the graph that you define for J of D and run,

uh, an in - an opposition operation - an oppositive operation. Yeah.

So now you have two different cost functions.

How can they propagate gradients back the same way?

These are two different cost functions.

Propagate gradients back the same way?

Yeah.

We're not going to propagate the same way.

We are going to - to returning [OVERLAPPING]

to a minus sign for the grad - for the generator.

So, you know, you - you - you backpropagate on the - on

the - on - on D. And when you backpropagate on G,

you would flip - you would flip the sign. That's all we do.

The same thing with the sign flipped.

In terms of implementation it's just, uh, another operation.

Okay. Now, let's look at someth - something interesting is that this, uh, log - logarithm.

Let's look at [NOISE] at the graph of the logarithm.

So I'm going to plot against the axes, axis G,

oh sorry, D of G of z.

So what does this mean?

This axis is the output of D when given a generated example, G of z.

It's going to be between zero and one because it's a probability.

D is a binary classifier with a sigmoid, uh, output probably.

Um, if we plot logarithm of X.

So, like, this type of thing.

This would be log of D of G of z.

Does it makes sense? That's the logarithm function.

Um, if I plot minus that, minus that.

So let me - let me plot minus logarithm of G of D of z or,

or let me - let me do something else.

Let me plot logarithm of minus D of G of z.

This is it. Do, do you guys agree?

Now, what I'm going to do is that I'm going to plot another function that is this one.

That is logarithm of one minus D of

G of z, okay?

So the question is,

right now, what we're doing is that we're saying the,

the cost function of the generator is logarithm of 1 minus D of G of z.

So it looks like this,

right? It looks like this one.

[NOISE] What's the issue with this one?

What do you think is the issue with this cost function looking at it like that?

It goes to negative infinity?

Sorry.

It goes to negative infinity?

Can you say it louder?

I mean, it go - goes to negative in - infinity.

It goes to negative infinity in,

in one, that's what you mean?

Yeah.

Yeah. And so the, the consequence of that is that

the gradient here is going to be very large,

the closer we go to one.

But the closer we are to zero,

the lower is the gradient.

And it's the reverse phenomenon for this lo - logarithm.

The gradient is very high,

and very high I mean in absolute value.

Very high when we're close to zero,

but it's very low when we go close to one, okay?

So which loss function do you think would be better?

A loss function that looks like this one or a loss function that looks like

this one to train our generator?

The broader question is where are we early in the training?

Are we close to here or are we close to there?

What does it mean to be close there?

Close to one? [NOISE].

You're fooling the network.

Hmm?

You're fooling the network.

You're fooling the network. It means that D thinks that generated,

uh, samples are real.

They're here. This place is the contrary.

D thinks that generated samples are fake.

It means, correctly finds out that they're fake.

Early on, we're generally here.

Because the discriminator is better than the generator.

Generator outputs garbage at the beginning,

and it's very easy for the discriminator to figure out that it's fake

because this garbage looks very different from real world data.

So early on, we're here.

So which function is the best one to - to - to - to - to be our cost?

[inaudible].

Yeah. So probably, this one is better.

So we have to use a mathematical trick to change this into that.

Right. And the mathematical trick is pretty standard.

Right now, we're minimizing something that is in log of one minus X.

We can say that doing so is the same as maximizing something that is in log of X.

Do you agree? Simple flip.

I mean, max flip.

And we can also say that it is the same as minimizing something in minus log of X.

Does it make sense? So we are going to use this mathematical trick

to convert our function that is a saturating cost,

we would say, into a non-saturating cost that is going to look more like this.

Let's see what it looks like.

So to sum up,

our cost function currently looks like that.

It's a saturating cost.

Because early on, the gradients are small.

We cannot train G. We're going to do a flip that I just talked about on the board,

and converts this into another function that is a non-saturating cost.

Okay. Yeah. Well, actually, yeah.

So the reason it's the blue one is like that is because I added a minus sign here.

So I'm flipping this.

Okay? And it's the same thing,

it's just the - the sign of the gradient that is going to be different.

Like that, the gradient is high at the beginning and low at the end. That makes sense?

[NOISE] So we're going to do the - use this flip.

And so we have a new training procedure now where J of D

didn't change but J of G changed.

We have a minus sign here and instead of the log of one minus D of G of Z,

we have the log of G,

uh, D of G of Z.

Does that make sense to everyone?

Good. And actually, so this is a fun thing

if you - if you check this paper which is really cool, our GANs

created a large,

study of many, many different GANs.

It shows what people have tried.

And you can see that people have tried all types of loss to make GANs trainable.

So it looks - it looks complicated here.

But actually, the MM GAN is the first one we saw together.

It's the mini-max loss function.

The second one is the non-saturating one that we just see.

So you see between the first two.

The only difference is that on the generator,

we gets the log of one minus D of X hat becoming log - minus log of D of X hat.

Okay. Now, another trick to train GANs is to use the fact that,

uh, a non-saturating, uh,

to use the fact that D is usually easier to train

than G. But as D improves, G can improve.

If D doesn't improve, G cannot improve.

So you can see the - the - the - the performance

of D as an upper bound to what G can achieve.

Because of that, we will usually train D more time than we will train G.

So we will basically train for num_iteration,

K times D, one times G. K times D,

one times G, and so on.

So that the discriminator becomes better than the - the generator can catch up.

Better than can catch up,

and so on. That make sense?

There's also methods to use like

different learning rates for D and G to take this into account,

to train faster the discriminator.

Okay. Uh, because we don't have too much time,

I'm going to skip the BatchNorm with GANs.

We are going to sit probably next week, uh,

together after you guys have seen the BatchNorm videos.

Okay. It's cool. So just to sum up.

Some - some tips to train GANs is to modify the cost function.

We've seen one modification, there are many more.

Uh, keeping D up-to-date with respect to G. So updating D

more than you update G using Virtual BatchNorm which is a derivate of BatchNorm,

so it's a different type of BatchNorm that is used here.

And something called one-sided la - label

smoothing that I'm not going to talk about it today because we don't have time.

So let's see some nice result now,

and that's the funniest part.

Um, so some of you have worked with word embeddings,

and you - you might know that word embeddings

are vectors that can encode the meaning of a word.

And you can compute operations sometimes on these - on these words.

So if you take, um,

if you take king minus queen,

it should be equal to man minus woman.

Operations like that.

That's happened in the space of encoding. So here's the thing.

You can use a generator to generate faces,

and the paper is listed on the bottom here.

So you give a code that is a random code and it will give you an image of a - a face.

You can give it a second code,

it's going to give you a second image that is

different from the first one because the code was different.

You can give it a third one,

it's going to give you a third fa - third face.

The fun part is,

if you take code one minus code two plus code three.

So basically, image of a man with glasses minus image of

a man plus image of a woman will give you an image of a woman with glasses.

So [OVERLAPPING].

So this is interesting because it means that linear operation in

the latent space of codes have impact directly on the image space.

Okay. Let's look at something even better.

So you can use GANs for image generation.

Of course, these are very nice samples.

You see that sometimes,

GANs have problem with - with the - [LAUGHTER] I don't know. I don't think that's a dog.

But - but - but the - but these

are StackGAN++ is a - is a very impressive GAN

that has generated - that has been state of the art for a long time.

Okay. So let's see something fun.

Something called image-to-image translation.

So, uh, actually, the - the -

the project winners last quarter in Spring was a project dealing with exactly that.

Generating satellite images based on the map image.

So given the map image, generate the satellite image using a GAN.

So you see that instead of giving a latent code that was 100 dimensional,

you could give a very detailed code.

The code can be this image.

Right? And you have to find a way to constrain your network in a certain - with - in

a certain way to push it to output

exactly the satellite image that corresponded to this map image.

There are many other results that are fun.

Converting zebras to - horses to zebras and zebras to horses.

Um, and apples to oranges and oranges to apple.

So let's do a - a case study together.

Let's say our goal is to convert horses to zebras on images and vice versa.

Can you tell me what data we need?

Let's go quickly so that we have some time.

Horses and zebras?

Yeah. Horses and zebras.

Do you need per images?

You know, like, do you need to have the same image of a horse as a zebra?

No.

Yeah. So the problem is, uh, okay,

we could have labeled images, you know,

like uh, a horse and its,

uh, zebra doppelganger in the same position.

Uh, and we could train a network to take one and output the other.

Unfortunately, we don't - not - every horse has

a doppelganger that is a zebra, so we cannot do that.

Uh, so instead, we're going to do unpaired,

uh, unpaired generative adversarial networks.

It means we have a database of horses and a database of zebras.

But these are different horses and different zebras.

They're not one-to-one - there's no one-to-one mapping between them.

There's no mapping at all. What architecture do you wanna use?

GAN?

Nice.

[LAUGHTER] GAN, not a [inaudible].

Okay. So let's see about the architecture and the cost.

So I'm going over it very quickly because it's a -

it's a very fun GAN with - it's called CycleGAN.

So the way we are going to work it out is we have a horse called

capital H. We want to generate the zebra version of this horse, right.

So we give it to a generator that we call G1.

You can call it H2Z,

like horse to zebra.

It should give us this horse H as a zebra, right.

And in fact, if we're training a GAN,

we need a discriminator.

So, we will add a discriminator that is going to be a binary classifier to tell us

if this image outputted by Generator 1 is real or not.

So this discriminator is going to take in some images of zebras, probably,

or-yeah, zebras or horses [NOISE],

and it's going to also take the generated images

and going to see if which one is fake which one is real.

On the other hand, we're going to do - and the vice versa is very important.

We need to enforce the fact that this horse G1 of H

should be the same horse as H. In order to do that,

we're going to create another gen - generator which is going to take the generated image,

and generate back the input image.

And this is where we will be able to enforce the constraints that G2 of G1 of

H should be equal to H. Do you see why this loop is super important?

Because if we don't have this loop,

we don't have a constraint on the fact that the horse

should be the - the zebra should be the horse as a zebra,

the same horse as H. So we'll do that and

we have a second discriminator to decide if this image is real.

This is one step, H2Z.

Another state might be Z2H where we start with zebra,

give it to Generator 2,

ge nerate the horse version of the zebra.

Discriminate, generate back the zebra version

of the zebra and discriminate. Does that makes sense?

So this is the general pattern using CycleGANs.

And what I'd like to go over is what loss should we minimize in

order to enforce the fact that we want

the horse to be converted to a zebra that is the same as the horse.

Can someone give me the terms that we need?

Someone wants to give it a try?

Go for it. Two minutes. Yes.

So you want to make sure that the picture in

the end that is of the zebra that you started off with,

matches the zebra that you started it with or

the horse that you started off matches the horse that you had originally.

Okay.

But at the same time, you also need to have Discriminator 2

identifying that the image is a real zebra or a real horse -

Yeah.

- because you don't want it to just sort of input

in the sample image and it output back to you the sample image.

Yeah, correct.

So I think you'd want to add the output of the cost function for Discriminator

2 to the cost that you get at for comparing the starting images.

Okay, that's correct. So you're saying we need

the classic cost functions that we've seen previously,

plus another one that is the matching between H and G2 of G1 of H,

and Z and G1 of G2 of Z.

Yes.

Correct. So we have all these terms.

One term to train G1,

which is the classic term we've seen,

differentiate real images from generated images.

G1 is what? Same. We are using the non-saturating costs on generating images.

Same for D2. Same for G2. These are classics.

The one we need to add to all of this is

the cycle costs which is the distance between this term,

G2 of G1 of H and H,

and the same thing for zebras.

Does that make sense? So you have the intuition to build that type of loss.

We just sum everything and it gives us the cost function we're looking for. Yeah.

Can we use the same,

uh, D1 as D2?

It's the same [inaudible] recognized [inaudible]

Oh, the same cost function for D1 and D2?

Yeah. Could you use the same -

So, the, the - you could but it's not going to work that well.

I think - So I think there's a - there's a tiny mistake here,

is that, uh, the Zi here,

the small Zi should be small Hi,

and the small Hi on top should be a small Zi.

Because the Discriminator 1 is going to receive

generated samples that look like zebras because it came out of G1.

So you want the real database to - that you give it to to be zebras as well.

To force - to force the generator one to output things that look like zebras,

and vice versa for the second one.

Okay? And this my favorite.

So you can convert the ramen to a face and back to a ramen.

[LAUGHTER] It's the most fun application I found.

It's from Naritomi et al, and Takuya Tako.

So it's Japanese research lab are working hard to,

to, to do face2ramen [LAUGHTER].

And actually, in two - in two to three weeks,

you will learn, um,

object detection, you know, to detect faces.

And if you learn that, maybe you can start a project to like

detect the face and then replace it by a ramen.

[LAUGHTER] Because I don't know, this is also a funny,

funny work by Naritomi.

Okay. Oh, this is a super cool application as well.

So let's look at that.

Okay. So we have - so this model is a conditional GAN that was conditioned on learning,

um, learning edges and generating cats based on the edges.

So I'm gonna - I'm gonna to try to draw a cat.

[LAUGHTER] Okay, sorry.

I cannot see [LAUGHTER].

Again, I'm not a good drawer - [LAUGHTER]. It's a cat.

Okay. It's going to download the model.

I hope it's gonna work. [LAUGHTER] Okay.

Yeah,

I,

I don't think it worked,

but it's supposed to work.

So you can generate cats based on,

on edges and you can do it for different things.

You can do it for a shoe.

So all these models have been trained for that. Okay.

Yeah, I have a question.

Yes, go for it.

[NOISE] So, so for this model,

would you have the specific things for the things that you want it to generate?

Like two things, so cats and shoes in this case?

Uh, sorry. Can you repeat?

Is it generalizable or do you have to train it specifically for the domains?

You have to train it specifically for the domain.

So like these models are different models that have [NOISE] been trained.

Okay.

Okay. I'm looking for my presentation,

[NOISE] I missed it. The presentation disappeared.

Okay. Another application is super resolution.

You can give a lower resolution image and generate

the super resolution version of it using GANs.

And this is pretty cool because you can get,

uh, a high resolution image,

down-sample it, and use this as the minimax game, you know.

[NOISE] Like you have

the high resolution version of the lower ver - ver - lower-resolution image.

Um, other applications can be privacy-preserving.

So some people have been working on - you know in medical - uh,

in the medical space privacy is a huge issue.

You cannot share a dataset among hospitals,

among medical teams it's common,

so people have been looking at generating a dataset that looks like a medical dataset.

If you train a model on this dataset,

it's going to give you the same type of parameters than the other one,

but this dataset is anonymized.

So they can share the anonymized data with each

other and train their model on that, without

being able to access, uh,

the information of the patient and who it is.

Um, manufacturing is important as well,

so GANs can generate, um,

very specific, uh, objects that can replace bones for humans,

personalized to, to the human body.

So same for dental.

If you lose the teeth, uh, the,

the technician can take a picture and decide what's the,

the crown should look like.

The GAN can generate it.

Um, another topic is how to evaluate GANs, you know.

Um, you might say we can just look at the images and see if they

look real and it will give us an idea if the GAN is working well.

In fact, this is hard because maybe the images you're looking at are over-fitting images

from the real samples you gave to the - to the - to the discriminator.

Uh, so how do you check that?

It's very complicated.

So human annotation is a big one,

where you would, uh,

[NOISE] you would build a software,

push it on the cloud and people all around the world are

gonna select which images look generated,

which images look not generated to see if a human can, can,

can compare your GAN to real-world data,

and how your GAN performs.

So it would look like that.

A web app indicates which image is fake, which image is real.

You can - you can do different experiments like you can show very quickly

an image for a fraction of a second and ask them was it real or not,

or you can give them unlimited time.

Different experiments can be led.

Uh, there is another one that is more scalable because human annotation is very painful.

You know, every time you train a GAN,

you want to do that to verify if the GAN is working well. It takes a lot of time.

So instead of using humans,

why don't we use a very good network that is good at classification.

In fact, in fact, the Inception network is a tremendous network that does classification.

We're going to give our image samples to

this Inception network and see what the network thinks of this image.

Does it think that it's a dog or not?

Does it look like a dog for the network or not?

And we can scale it and make it very quick.

And there is a Inception score that,

that we can talk next week about when we'll have time.

Uh, it measures the quality of

the samples and also it measures the diversity of the sample.

I'll go over it next week, hopefully.

Uh, there is another distance that is very popular, uh,

that has been growingly popular recently called the Frechet Inception Distance.

And I, I - I'll advise you to check some of

these paper if you're more interested in it for, for your projects.

So just to end, um, for next Wednesday,

we'll have, uh, C2 and three and also the whole C3 modules.

[NOISE] Uh, you'll have three quizzes.

Be careful, these two quiz,

C3M1 and C3M2 are longer than ca - than normal quizzes.

They're like wide case studies, so take your time,

and go over it, um,

and you have one programming assignment.

Uh, make sure you understand the BatchNorm videos,

so that we can go over the virtual BatchNorm hopefully next week together.

Um, and hands-on section this Friday, uh,

you will receive your project proposal as soon as possible, uh,

and meet with your project TAs to go over the proposal and

to make decisions regarding the next steps for your projects.

Uh, I'll stick around in case you have any questions. Okay. Thanks, guys.


知识点

重点词汇
infinite [ˈɪnfɪnət] n. 无限;[数] 无穷大;无限的东西(如空间,时间) adj. 无限的,无穷的;无数的;极大的 {cet4 cet6 ky ielts :6045}

gauge [ɡeɪdʒ] n. 计量器;标准尺寸;容量规格 vt. 测量;估计;给…定规格 {cet4 cet6 ky toefl ielts gre :6046}

detection [dɪˈtekʃn] n. 侦查,探测;发觉,发现;察觉 {cet4 cet6 gre :6133}

attacker [əˈtækə(r)] n. 攻击者;进攻者 { :6197}

download [ˌdaʊnˈləʊd] vt. [计] 下载 {gk :6382}

radically ['rædɪklɪ] adv. 根本上;彻底地;以激进的方式 {ky toefl :6472}

discriminate [dɪˈskrɪmɪneɪt] vi. 区别;辨别 vt. 歧视;区别;辨别 {ky toefl ielts gre :6572}

proximity [prɒkˈsɪməti] n. 接近,[数]邻近;接近;接近度,距离;亲近 {cet6 toefl ielts :6588}

overlapping [əʊvə'læpɪŋ] adj. 重叠;覆盖 v. 与…重叠;盖过(overlap的ing形式) {toefl :6707}

overlap [ˌəʊvəˈlæp] n. 重叠;重复 vi. 部分重叠;部分的同时发生 vt. 与…重叠;与…同时发生 {cet6 ky toefl ielts gre :6707}

unlimited [ʌnˈlɪmɪtɪd] adj. 无限制的;无限量的;无条件的 {cet6 :6742}

MA [mɑ:] abbr. 文学硕士(Master of Arts);磁放大器(magnetic amplifier);主报警信号(main alarm) { :6756}

inaudible [ɪnˈɔ:dəbl] adj. 听不见的;不可闻的 { :6808}

algorithm [ˈælgərɪðəm] n. [计][数] 算法,运算法则 { :6819}

algorithms [ˈælɡəriðəmz] n. [计][数] 算法;算法式(algorithm的复数) { :6819}

mimic [ˈmɪmɪk] vt. 模仿,摹拟 n. 效颦者,模仿者;仿制品;小丑 adj. 模仿的,模拟的;假装的 {toefl ielts gre :6833}

plucked [plʌkt] [纺] 粗细不匀 { :6870}

dental [ˈdentl] n. 齿音 adj. 牙科的;牙齿的,牙的 {ky toefl :7161}

vice [vaɪs] prep. 代替 n. 恶习;缺点;[机] 老虎钳;卖淫 adj. 副的;代替的 vt. 钳住 n. (Vice)人名;(塞)维采 {gk cet4 cet6 ky ielts :7210}

latent [ˈleɪtnt] adj. 潜在的;潜伏的;隐藏的 {cet6 ky toefl ielts gre :7284}

numerical [nju:ˈmerɪkl] adj. 数值的;数字的;用数字表示的(等于numeric) {cet6 ky toefl ielts :7312}

gradient [ˈgreɪdiənt] n. [数][物] 梯度;坡度;倾斜度 adj. 倾斜的;步行的 {cet6 toefl :7370}

gradients [ˈgreɪdi:ənts] n. 渐变,[数][物] 梯度(gradient复数形式) { :7370}

binary [ˈbaɪnəri] adj. [数] 二进制的;二元的,二态的 { :7467}

Et ['i:ti:] conj. (拉丁语)和(等于and) { :7820}

compute [kəmˈpju:t] n. 计算;估计;推断 vt. 计算;估算;用计算机计算 vi. 计算;估算;推断 {cet4 cet6 ky toefl ielts :7824}

intuition [ˌɪntjuˈɪʃn] n. 直觉;直觉力;直觉的知识 {cet6 ky toefl ielts gre :7905}

conditional [kənˈdɪʃənl] n. 条件句;条件语 adj. 有条件的;假定的 { :8076}

converged [kən'vɜ:dʒd] v. 聚集,使会聚(converge的过去式) adj. 收敛的;聚合的 { :8179}

converge [kənˈvɜ:dʒ] vt. 使汇聚 vi. 聚集;靠拢;收敛 {cet6 toefl ielts gre :8179}

encode [ɪnˈkəʊd] vt. (将文字材料)译成密码;编码,编制成计算机语言 { :8299}

encoding [ɪn'kəʊdɪŋ] n. [计] 编码 v. [计] 编码(encode的ing形式) { :8299}

validation [ˌvælɪ'deɪʃn] n. 确认;批准;生效 { :8314}

downside [ˈdaʊnsaɪd] n. 负面,缺点;下降趋势;底侧 adj. 底侧的 { :8709}

implicitly [ɪm'plɪsɪtlɪ] adv. 含蓄地;暗中地 { :8775}

quiz [kwɪz] n. 考查;恶作剧;课堂测验 vt. 挖苦;张望;对…进行测验 {gk cet4 cet6 ky :8784}

loo [lu:] n. 厕所,洗手间;赌金;卢牌戏(一种纸牌赌博) vt. 使罚赌金 n. (Loo)人名;(德、法)洛 { :8889}

derivative [dɪˈrɪvətɪv] n. [化学] 衍生物,派生物;导数 adj. 派生的;引出的 {toefl gre :9140}

neural [ˈnjʊərəl] adj. 神经的;神经系统的;背的;神经中枢的 n. (Neural)人名;(捷)诺伊拉尔 { :9310}

activations [,æktɪ'veɪʃən] n. [电子][物] 激活;活化作用 { :9314}

activation [ˌæktɪ'veɪʃn] n. [电子][物] 激活;活化作用 { :9314}

neuron [ˈnjʊərɒn] n. [解剖] 神经元,神经单位 {cet6 toefl :9397}

salient [ˈseɪliənt] n. 凸角;突出部分 adj. 显著的;突出的;跳跃的 n. (Salient)人名;(西)萨连特 {toefl gre :9408}

en [en] n. 半方;字母N prep. 在…中 n. (En)人名;(芬、柬)恩 { :9798}

metric [ˈmetrɪk] adj. 公制的;米制的;公尺的 n. 度量标准 {cet4 cet6 ky ielts :10163}

propagate [ˈprɒpəgeɪt] vt. 传播;传送;繁殖;宣传 vi. 繁殖;增殖 {cet6 toefl ielts gre :10193}

propagated [ˈprɔpəɡeitid] 传播 { :10193}

inception [ɪnˈsepʃn] n. 起初;获得学位 n. 《盗梦空间》(电影名) {gre :10325}

grad [græd] n. 毕业生;校友 n. (Grad)人名;(英、法、德、罗、瑞典)格拉德 { :10355}

pixels ['pɪksəl] n. [电子] 像素;像素点(pixel的复数) { :10356}

pixel [ˈpɪksl] n. (显示器或电视机图象的)像素(等于picture element) { :10356}

generalize [ˈdʒenrəlaɪz] vi. 形成概念 vt. 概括;推广;使...一般化 {cet6 ky toefl ielts gre :10707}

tweak [twi:k] n. 扭;拧;焦急 vt. 扭;用力拉;开足马力 { :10855}

wha [ ] [医][=warmed,humidified air]温暖、潮湿的空气 { :11046}

saturating [ˈsætʃəreitɪŋ] v. 浸湿,浸透( saturate的现在分词 ); 使…大量吸收或充满某物 { :11157}

infinity [ɪnˈfɪnəti] n. 无穷;无限大;无限距 {cet6 gre :11224}

delve [delv] n. 穴;洞 vi. 钻研;探究;挖 vt. 钻研;探究;挖 n. (Delve)人名;(英)德尔夫 {gre :11237}

axes [ˈæksi:z] n. 轴线;轴心;坐标轴;斧头(axe的复数) { :11322}

malicious [məˈlɪʃəs] adj. 恶意的;恶毒的;蓄意的;怀恨的 {cet6 toefl gre :11330}

washer [ˈwɒʃə(r)] n. [机] 垫圈;洗涤器;洗衣人 { :11379}

seminal [ˈsemɪnl] adj. 种子的;精液的;生殖的 adj. 有创造力的,对未来有影响的;重大的 {gre :11387}

optimize [ˈɒptɪmaɪz] vt. 使最优化,使完善 vi. 优化;持乐观态度 {ky :11612}

propagation [ˌprɒpə'ɡeɪʃn] n. 传播;繁殖;增殖 {cet6 gre :12741}

multiplication [ˌmʌltɪplɪˈkeɪʃn] n. [数] 乘法;增加 {cet6 :12748}

personalized [ˈpəːs(ə)n(ə)lʌɪzd] adj. 个性化的;个人化的 v. 个性化(personalize的过去式);个人化 { :13175}

buggy [ˈbʌgi] n. 童车;双轮单座轻马车 adj. 多虫的 {toefl gre :13418}

entropy [ˈentrəpi] n. [热] 熵(热力学函数) { :13494}

blurry [ˈblɜ:ri] adj. 模糊的;污脏的;不清楚的 { :13819}

zebra [ˈzebrə] n. [脊椎] 斑马 adj. 有斑纹的 {zk gk cet4 cet6 ky :13912}

zebras ['zɪbrəz] n. 斑马( zebra的名词复数 ) { :13912}

logistic [lə'dʒɪstɪkl] adj. 后勤学的;[数] 符号逻辑的 { :14538}

dimensional [dɪ'menʃənəl] adj. 空间的;尺寸的 {toefl :15066}

adversarial [ˌædvəˈseəriəl] adj. 对抗的;对手的,敌手的 { :15137}

transferable [trænsˈfɜ:rəbl] adj. 可转让的;[数] 可转移的 { :16039}

APP [æp] abbr. 应用(Application);穿甲试验(Armor Piercing Proof) n. (App)人名;(英)阿普 { :16510}

mu [mju:] n. 希腊语的第12个字母;微米 n. (Mu)人名;(中)茉(广东话·威妥玛) { :16619}

optimization [ˌɒptɪmaɪ'zeɪʃən] n. 最佳化,最优化 {gre :16923}

firewall [ˈfaɪəwɔ:l] n. 防火墙 vt. 用作防火墙 { :17087}

iteration [ˌɪtəˈreɪʃn] n. [数] 迭代;反复;重复 { :17595}

summation [sʌˈmeɪʃn] n. 和;[生理] 总和;合计 {gre :17935}

dataset ['deɪtəset] n. 资料组 { :18096}

permutations [pɜ:mju:'teɪʃnz] n. [数] 排列(permutation的复数) { :18648}

iguana [ɪˈgwɑ:nə] n. 鬣蜥蜴 { :18852}

BOT [bɒt] n. 马蝇幼虫,马蝇 n. (Bot)人名;(俄、荷、罗、匈)博特;(法)博 { :18864}


难点词汇
unsupervised [ˌʌn'sju:pəvaɪzd] adj. 无人监督的;无人管理的 { :19787}

perturbation [ˌpɜ:təˈbeɪʃn] n. [数][天] 摄动;不安;扰乱 { :19948}

rephrased [ri:ˈfreizd] v. 改述,改撰( rephrase的过去式和过去分词 ) { :20756}

rephrase [ˌri:ˈfreɪz] vt. 改述;重新措辞 { :20756}

encrypted [inˈkriptid] v. 把…编码;把…加密(encrypt的过去分词) { :21117}

generative [ˈdʒenərətɪv] adj. 生殖的;生产的;有生殖力的;有生产力的 { :21588}

epsilon [ˈepsɪlɒn] n. 希腊语字母之第五字 { :22651}

annotation [ˌænə'teɪʃn] n. 注释;注解;释文 { :22939}

deterministic [dɪˌtɜ:mɪ'nɪstɪk] adj. 确定性的;命运注定论的 { :23481}

unconstrained [ˌʌnkən'streɪnd] adj. 不勉强的;非强迫的;不受约束的 { :23653}

IM [ ] abbr. 感应电动机(Induction Motor) abbr. 即时通信(Instant Messaging) { :24105}

encoder [ɪn'kəʊdə] n. 编码器;译码器 { :24604}

GE [ʒei] abbr. 美国通用电气公司(General Electric Co.);总能量(gross energy) n. (Ge)人名;(朝)揭;(俄)格 { :25836}

xavier ['zʌvɪə] n. 泽维尔(男子名) { :26299}

embeddings [ɪm'bɛd] v. [医] 植入;埋藏(embed的ing形式) { :27523}

logarithm [ˈlɒgərɪðəm] n. [数] 对数 { :27896}

ve [ ] 委内瑞拉 abbr. 虚拟环境(Virtual Environment) { :28191}

transferability [ˌtrænsˌfɜ:rə'bɪlətɪ] n. 可转移性;可转让性 { :28407}

scalar [ˈskeɪlə(r)] n. [数] 标量;[数] 数量 adj. 标量的;数量的;梯状的,分等级的 { :28925}

ramen [rɑmən] n. (方便)拉面,拉面 { :29936}

unpaired ['ʌn'peəd] adj. 不成双的;无对手的;无配偶的 { :30070}

scalable ['skeɪləbl] adj. 可攀登的;可去鳞的;可称量的 { :30540}

thi [ ] abbr. 温度-湿度指数(Temperature Humidity Index) { :30640}

sigmoid ['sɪgmɔɪd] n. 乙状结肠(等于sigmoidal);S状弯曲 adj. 乙状结肠的;C形的;S形的 { :31478}

doppelganger ['dɒpelɡɑ:nɡər] n. 面貌极相似的人;幽灵 { :31488}

discriminator [dɪ'skrɪmɪneɪtə] n. [电子] 鉴别器;辨别者 { :34448}

classifiers [ ] (classifier 的复数) n. 分类者, 分粒器, 分级机, 汉语中的量词 [计] 分类符, 分类器 { :37807}

classifier [ˈklæsɪfaɪə(r)] n. [测][遥感] 分类器; { :37807}

iterate [ˈɪtəreɪt] vt. 迭代;重复;反复说;重做 {toefl gre :38640}

Goodfellow [ ] [人名] [英格兰人姓氏] 古德费洛绰号,意气相投的伙伴,来源于中世纪英语,含义是“好+伙伴”(good+fellow) { :39174}

initialization [ɪˌnɪʃəlaɪ'zeɪʃn] n. [计] 初始化;赋初值 { :40016}

VER [vɜ:] n. DOS命令:显示DOS版本号 { :45633}

iteratively [ ] [计] 迭代的 { :48568}


生僻词
anonymized [ ] 隐去姓名资料 使匿名

backpropagate [ ] [网络] 反向传播

Coursera [ ] [网络] 免费在线大学课程;免费在线大;斯坦福

crypto ['krɪptəʊ] n. 秘密赞同者;秘密党员

denoise [di:'nɔiz] 降噪, 消除干扰

denoising [ ] [网络] 去噪;去噪声;小波去噪

derivate ['derɪveɪt] n. 导数;派生词;派生的事物 adj. 引出的;系出的

ensembling [ ] [网络] 综合

Frechet [ ] [计] F-拓扑

generalizable ['dʒenərəlaɪzəbl] adj. 可概括的,可归纳的

growingly [ ] adv. growing的变形

kurakin [ ] [网络] 库拉金;寄给阿金

linearize ['lɪnɪəraɪz] vt. 使线性化

linearizing ['liniәraiziŋ] 线性化的

logit ['lɒdʒɪt] 分对数

medias [ ] n. (西)丝袜

minibatch [ ] [网络] 迷你

minibatches [ ] [网络] 小批量

minimax ['mɪnəˌmæks] n. 极小化极大,极小极大(极大中的极小),鞍点

oppositive [ә'pɔzitiv] a. 反对的;相反的

outputted ['aʊt.pʊt] n. 产量;产品;【电】发电力;供给量 [网络] 输出;产出;输出量

overfit [ ] [网络] 过拟合;过度拟合;过适应

overfitting [ ] n. 过适;[数] 过度拟合 v. 过适(overfit现在分词)

quizzes [kwiziz] n. 小测验(quiz复数形式);智力比赛 v. 测验;盘问(quiz的第三人称单数形式)

relu [ ] [网络] 关节轴承

resi [ ] [网络] 万鼎工程;菜;树脂

safet [ ] abbr. self-aligned MESFET process 自对准金属-半导体场效应晶体管工艺

softmax [ ] [网络] 柔性最大传递函数;前回收的日志文件的百分比;西风狂诗曲系列篇章

someth [ ] [网络] 官方完整

tako [ ] [地名] [日本] 多古

takuya [ ] [网络] 拓也;寺田拓哉;卓也

trainable [t'reɪnəbl] adj. 可训练的,可教育的

versa ['vɜ:sə] adj. 反的

wx [ ] abbr. weather 天气; weather report 气象报告; watts second 瓦特秒; waxy 蜡(状)的

zhang [ ] n. 张,章(中国姓氏)

zi [,zi 'aɪ] abbr. 美国本土,后方地带(等于zone of interior)


词组
a batch [ei bætʃ] un. 一批 [网络] 同批产品;罩式炉;详细

a dot [ ] [网络] 阿顿;阿突

a flip [ ] [网络] 翻筋斗

A minus [ ] [网络] A减

an algorithm [ ] [网络] 规则系统;运算程式

and vice versa [ ] [网络] 反之亦然;反过来也一样;科技中的政府

AX (axis) [ ] 线

binary classification [ ] 二元分类

classify as [ ] [网络] 归类为;分类为;出库类型

column vector [ ] un. 列向量;列向和;纵矢量 [网络] 列矢量;行向量;向量了

correlate to [ ] [网络] 使相互关联;相关;与…相互关联

cross entropy [ ] 交叉熵

descent algorithm [ ] 下降算法

door mat [ ] un. 门前擦鞋棕垫;蹭鞋胶垫;门口踏脚垫 [网络] 门垫;门前棕垫;乡村小花脚踏垫

dot product [dɔt ˈprɔdʌkt] un. 点积;标量积 [网络] 点乘;数量积;内积

Epsilon sign [ ] 《英汉医学词典》Epsilon sign 艾泼斯龙征

et al [ ] abbr. 以及其他人,等人

et al. [ˌet ˈæl] adv. 以及其他人;表示还有别的名字省略不提 abbr. 等等(尤置于名称后,源自拉丁文 et alii/alia) [网络] 等人;某某等人;出处

et. al [ ] adv. 以及其他人;用在一个名字后面 [网络] 等;等人;等等

flip that [ ] None

forward propagation [ ] 正向传播

generator output [ ] 电机输出[功率]

gradient descent [ ] n. 梯度下降法 [网络] 梯度递减;梯度下降算法;梯度递减的学习法

gradient descent algorithm [ ] [网络] 梯度下降算法;梯度陡降法;梯度衰减原理

gradient sign [ ] 坡度标

high gradient [ ] 高梯度

in the proximity of [ ] na. 在…附近 [网络] 在...附近

latent code [ ] 隐性分类码

linear network [ ] [网络] 线性网络;线性网路;线性神经网络

linear operation [ ] [网络] 线性运算;它并不是个线性操作;线性演算

linear regression [ ] un. 线性回归;直线回归 [网络] 线性回归分析;线性回归法;线性衰退

logarithm function [ ] n.对数函数

logistic regression [loˈdʒɪstɪk rɪˈɡrɛʃən] n. 逻辑回归 [网络] 吉斯回归;逻辑斯回归;罗吉斯回归

maximum probability [ ] un. 最高概率 [网络] 最大概率;最大机率

minus one [ ] [网络] 桃花源;幸福意外;谢谢你捧场

minus sign [ˈmainəs sain] n. 负号 [网络] 减号;减号的故事;负符号

negative infinity [ ] 负无穷大,负无限大

neural network [ˈnjuərəl ˈnetwə:k] n. 神经网络 [网络] 类神经网路;类神经网络;神经元网络

neural networks [ ] na. 【计】模拟脑神经元网络 [网络] 神经网络;类神经网路;神经网络系统

object detection [ ] [科技] 物体检测

optimization problem [ ] un. 最佳化问题 [网络] 最优化问题;次要最佳化问题

optimization problems [ ] [网络] 最佳化问题;最优化问题;最适化问题

optimization process [ ] un. 最优化过程 [网络] 优化历程;最佳化处理

overlap with [ ] vt.与...相一致

pixel image [ˈpiksəl ˈimidʒ] [医]像素显像

pixel value [ ] [网络] 像素值;像素数值;像素单元值

probability distribution [ ] un. 概率分布 [网络] 机率分布;机率分配;确率分布

probability distributions [ ] [网络] 概率分布;学过几率分布;机会率分布

probability of [ ] na. (飞弹不被击落的)概率 [网络] 变异概率

row vector [ ] un. 行向量;单行矩阵;行矢量 [网络] 列向量;列矢量;列向量使用

salient feature [ ] un. 特征 [网络] 特点;鲜明特征

salient features [ ] na. 特点 [网络] 特征;特色;突出特点

small perturbation [ ] 小微扰

test validation [ ] [网络] 测验效度

the algorithm [ ] [网络] 算法

the ax [ ] 斧子

the downside [ ] [网络] 不利方面;缺点

the equilibrium [ ] [网络] 平衡;那种平静

the FA [ ] [网络] 英格兰足总;英国足球协会;英国足总

the vice [ ] [网络] 罪恶谷

time derivative [ ] 时间导数,时间微商

to compute [ ] [网络] 计算;用计算机计算

to download [ ] 下载

to forge [ ] [网络] 煅炼;稳步前进;假造

to propagate [ ] [网络] 传播;传种;推展

to skip [ ] [网络] 略过;跳越;跳过

to update [ ] [网络] 更新;重要更新公告;每月更新

vice versa [ˌvaɪs ˈvɜ:sə] adv. 反之亦然;反过来也一样 [网络] 小爸爸大儿子;反过来亦然;反过来的

web app [ ] [网络] 网页应用;网络应用;应用程序


惯用语
does that make sense
does that makes sense
i don't know
i mean
if you
in fact
let's say
one question
plus 0
so yeah
you know
you're fooling the network



单词释义末尾数字为词频顺序
zk/中考 gk/中考 ky/考研 cet4/四级 cet6/六级 ielts/雅思 toefl/托福 gre/GRE
* 词汇量测试建议用 testyourvocab.com