04
文本
████ 重点词汇
████ 难点词汇
████ 生僻词
████ 词组 & 惯用语
[学习本文需要基础词汇量:
[本次分析采用基础词汇量:
Okay. Let's get started, guys.
So welcome to lecture number 4.
Um, today we will go over two topics that are not discussed,
uh, in the videos.
Uh, you've been learning C2M1 and C2M2,
if I'm not mistaking.
So you've learned about, uh,
what, uh, an is,
how to tune ,
what
Today, we're going to go a little further, uh,
you should have the background to understand 80 percent of this, uh, lecture.
There is gonna be 20 percent that I want you to look back
after you've seen the BatchNorm videos for those of you who haven't seen them.
So we split the lecture in two parts,
and I put back the attendance code at the,
at the very end of the lecture so don't worry.
Ah, one topic is attacking, ah,
Ah, the second one is
[NOISE] And although these two topics have a common word which is
they are two separate topics.
You will understand why it's called
So let's get started with
And in 2013, ah,
Christian Szegedy and his team have, uh,
published a paper called Intriguing Properties of
What they noticed is that
for which several machine learning including
the state of the art ones that you will learn about, ah,
VGG 16-19
networks and
are vulnerable to something called adversarial examples.
These adversarial examples you're going to learn what it is, in three parts.
First, by explaining how these examples in
the context of images can attack a network in their blind spots,
and, and make the network classify these images as something totally wrong.
How to defend against these type of examples,
and why are networks vulnerable to these type of examples.
This is a little bit more theoretical,
and we're going to go over it on the board.
The, the papers that are listed on the bottom are the two big papers that,
that started this field of research.
So I would advise you to go and,
and read them because we have only one hour-and-a-half to go over two big topics,
um, in, in, in deep learning and,
ah, we will not have the time to go into details of everything.
Okay. So let's set up the goal.
The goal is like, is that given a pre-trained network.
So a network trained on ImageNet on 1,000 classes, millions of images.
Ah, find an input image that is not an
so it doesn't look like the animal
We will call this an adversarial example if we manage to find it.
Okay. Yeah,
Ah, what was the magic code for those that came in this late?
Uh, let me - so 284889,
let me write it down on the board so that you can -
Thank you.
Can you guys see? [NOISE] Okay. Let's move on.
So we have a network pre-trained on image and it's a very good network.
Ah, what I want is to fool it
by giving it an image that doesn't
look like an
So if I give it a cat image to start with.
The network is obviously going to give me a vector of
probabilities that has the
because it's a good network.
And you can guess what's the output layer of this network,
it's probably a
Now what I want is to find
an image x that is going to be
Okay. Does the, the,
the setting makes sense to everyone?
Okay. Now as usual, uh,
this
we've seen together about neural style transfer.
You remember the, the art generation thing,
where we wanted to generate an image based on
the content of the first image and the style of another image.
And in that problem,
the main difference with classic
supervised learning was that we fixed the parameters of the network,
which was also pre-trained,
and we back
way back to the input image
so that it looks like the content of the content image
and the style of the style image. The first thing we did is that we
We, we try to, to,
to phrase what exactly we want.
So
Any ideas?
Okay. Complicated. Yep.
An image that provides minimum cost.
An image that provides minimum cost.
Okay. What's the cost you're talking about?
Cost of the, the difference between the expected iguana and non-expected iguana.
Expected iguana and non-expected iguana.
So if we're sort of going back in the training session,
we're trying to train it on
an image and we wanted to think that [NOISE] this is a cat and iguana.
Yeah. Okay. So you want,
ah, this image to minimize a certain loss function,
and the loss function would be the distance
between the output you're looking for and the output you want.
Okay. Yeah. So I would say,
we want to find x, the image,
such that y hat of x,
which is the result of the
which is a one-hot vector with the one at the position of iguana.
which is can be an L2 loss,
can be an L1 loss,
can be a cross-entropy in practice.
Ah, this one, ah, works better.
So you see that minimizing this loss function,
would lead our image x to be
And then the process is very similar to neural style transfer,
where we will
So we will start with x,
we will forward
And remember, we're not training the network, right?
We'll just take the, the
and update the input using a
get something that is
Yeah, any question on that?
But this doesn't necessarily mean that the x that you get in -
Okay. So you mentioned that it
doesn't guarantee that x is
The only thing is guaranteeing is that
this x will be
[NOISE] We will, we will talk about that now.
Er, another question in the back I thought. Yeah.
For the last question we miss the one that for
Oh yeah, it could be
Yeah. So in this case not
a vector of, of n classes,
where it could have been cross-entropy.
Okay. So yeah that's true.
We - are we guaranteed that the forged image x,
this one, i - is going to look like an iguana?
Who thinks it's going to look like an iguana?
Okay. Majority of people.
So can someone tell me why i - it's not going to look like an iguana?
[NOISE].
[
Okay. So you're saying, uh,
the loss function is
is very
put any constraints on what the image should look like.
That's true. Actually, the answer to this question is,
it depends. We don't know.
Maybe it looks like an iguana or maybe it doesn't.
But in terms of probabilities,
it's high chance that it doesn't look like an iguana.
So the reason is here. Let's say this is our space of input images.
And the interesting thing is that even if as a human on
a daily basis we deal with images of the real world.
So like,
if you look at a TV,
uh, that is totally
you see
but in other contexts,
we usually see real-world distribution images.
A network is
it means it takes an image.
Any input image that fits the,
the first layer would,
would be - would produce an output, right.
So this is the whole space of input images that the network can see.
Um, this is the space of real images,
it's a lot smaller.
Can someone tell me what's the size of the, the,
the space of possible input images for a network?
[NOISE].
Huh? Sorry.
Yeah.
Uh, It's not
It's, it's a lot but not - [NOISE]
It's the number of the
Okay. Uh, yeah, there is an idea here. Someone there?
I also said the same thing with just number of possible
Yeah, that's true.
So more precisely - you would start with how many
There are 255, 256
and then what's the size of an image?
Let's say 64 by 64 by 3,
and your results would give you 256,
so you fix the first
256 possible value, then the second one can be anything else,
then the third one can be anything else,
and you end up with a very big number.
So this is a huge number.
And the space of real images is here.
Now if we had to plot the space of
it would be something like that.
Right. And you see that there is a small
of real images and the space of images classified by - as an iguana by the network.
And this is where we probably are not.
We're probably in the green part that is not
because we didn't constrain our
going to constrain it a little bit more, because in practice,
these type of attacks are not too dangerous because as a,
as a human we would see that the pictures look like garbage.
The dangerous attack is if the picture looks like a cat,
but the network sees it as an iguana and a human see it as a cat.
Can someone think of, uh,
of like
[NOISE] Face recognition, it could show a face of - you,
you could show your, your,
picture of your face, it pushed the network [NOISE] to think it's a face of someone else.
What else? Yeah.
Breaking CAPTCHAs and breaking like against
Yeah. Breaking CAPTCHAs.
If you know what the output,
what output you want you can force the network to think that these CAPTCHA,
uh,
Or in general, I would say like social
if someone is
violent content online,
there is - all these companies have
If people can use adversarial examples that look still violent,
but are not detected as violent by
they could still publish their violent pictures.
Uh, think about self-driving cars.
A stop sign that looks like a stop sign for everyone,
but when the self-driving car sees it, it's not a stop sign.
So these are
Okay. And in fact, the picture we generated
previously would look like that. It's nothing special.
So now let's constrain our problem a little bit more.
We're going to say we want the picture to look like a cat but be classified as an iguana.
Okay. So now say we have our neural network.
If we give it a cat it's going to predict that it's a cat.
What we want is still give it a cat but predict that it's an iguana.
Okay. I, I go quickly over that because it's very similar to what we did before,
so I just
Okay, exactly the same thing.
Now, the way we
Instead of saying we want only y hat of x equals y - iguana,
we have another constraint.
What's the other constraint?
The picture x should be closer to a picture of a cat.
So we want x equal or very close to x-cat.
And in terms of loss function,
what it does is that it adds
another term which is going to decide how x should be close to x-cat.
If we minimize this loss now,
we should have an image that looks like a cat because of the second term,
and that is predicted as an iguana because of the first term.
and I guess you guys are very familiar with this type of thought process now.
Okay, and same process,
we
Now our question is,
what should be the initial image we start with?
We didn't talk about that in the previous example [NOISE].Yeah.
White noise?
White noise.
Yeah, possibly white noise.
Any other, uh, proposals?
Maybe a cat.
A cat? Yeah, which cat?
The [
Because it's - is the closest one to what we want to get.
So if we want to have a fast process,
we'd better start with exactly this cat,
which is the one we put in our loss function here, right?
If we put another cat,
it's going to be a little longer because we have to
change the
That's what we told our loss function.
If we start with white noise,
it will take even longer because we have to change the
so that it looks real and then it looks like a cat that we defined here.
this term is also minimized. Yeah.
So when you write that loss function,
it seems like you are
just be like minimizing the RMSE error to the actual cat picture, right?
Yeah?
Is that -
actually a really bad way to
like saw two images as similar.
Yeah. This is, this is empirical,
the fact that we use that type of, of loss function.
But in practice, it could have been any distance between X and X cat,
and any distance between Y hat and Y cat, yeah,
and Y iguana, sorry. Yes.
So when you say X cat is [
Yeah.
[
Exactly.
I can't think of a way of making a constrained,
like a complex loss function that takes a bunch of cats.
And then it puts like something like a minimum of it, right?
The minimum distance between [
Can we just look at this wide [inaudible]
I'm not sure about the second method.
But just to repeat the point you mentioned,
is that here we had to choose a cat.
It means the X cat is actually an image of a cat.
So what if we don't know what the cat should look like,
we just want a random cat to come out and be classified as an iguana.
We're going to see uh, generating networks
after which can be used to do that type of stuff.
But, uh, but for the second part of the question,
I'm not sure what the
Okay, let's move on?
with the cat image that we specified in the loss function.
Okay. And so then we have an image of a cat that originally
was classified as 92 percent cat and we modified a few pixels.
So you can see that this image looks a little
So by doing this modification,
the network will think it's an iguana.
Okay? And sometimes this modification can be very slight and we
can even not be able to notice it. Sounds good.
Now, let's add something else to this,
uh, to this, uh,
to this, uh, draft.
We add a third set which is the space of images that look real to a human.
So that's interesting because the,
the space of images that look real to a human is
actually bigger than space - than the space of real images.
An example is this one.
This is probably an image that looks real to human,
but it's not an image that we could see in,
in the daily life because of these slight pixel changes.
Okay? So these are the space of dangerous adversarial examples.
They looked real to human but they're not actually real.
They might be used to fool a model.
Okay. Now let's see a video, uh,
by
on real-world example of adversarial examples.
So for those who cannot read,
they're taking, uh, a camera which,
which classify - which has a
And the
a library and the second image that is that - the same as a prison.
So the second image has slight different pixels but
it's hard to see for a human. Same here.
So the, the the
the first image as a
confidence, and the second one as a
Yeah.
uh, a small example of - of what can, what can be done.
Okay. Now let's go,
we've seen how to generate these adversarial examples.
It's an
We will see, uh,
what are the type of attacks that we can
lead and what are defenses against these adversarial examples.
So we would usually,
uh, split the attacks into two parts.
non-targeted attacks and targeted attacks.
So non-targeted attacks mean that we just want outputs,
we just want to find an adversarial example that is going to fool the model.
While targeted attack is we want to force
this adversarial example to be output - to output a specific class that we chose.
These are two different type of attacks that,
that are widely discussed in, in the research.
Knowledge of the
For those of you who did some
you know that we talk about white-box attacks, black-box attacks.
So one interesting thing is that,
uh, a black-box attack - a white-box attack is when you have access to a network.
So we have our image in pre-train - in pre-trained network.
We have fully access to,
to all the parameters and, and the
So it's probably an easier attack.
Right? We can, we can back-propagate all the way
back to the image and update the image, like we did.
A black-box attack is when the model is probably
so that we don't have access to its parameters,
So the question is how do we attack in
black-box attack if we cannot back-propagate because we don't have access to the layers?
Any ideas? Yeah.
So you know you will trick the image a little
bit and you will see how it changes the loss.
Looking at these you can,
you can do have an estimate of the
Even if the model is a black-box model.
This assumes that you can query the model,
right? You can query it.
What if you cannot even query the model or you can query it one time only,
it's to send the adversarial example.
How would you do that? So this becomes more complicated.
So, there is
these adversarial examples is that they're highly
It means I have a model here that is,
uh, an animal
I don't have access to it.
I cannot even query it.
I still wanna fool it.
What I'm going to do is that I'm going to build my own animal
forge an adversarial example on it.
It's highly likely that it's going to
be an adversarial example for the other one as well.
So, this is called
and it's still a, uh, research topic, okay?
We're trying to understand why this happens and,
uh, also, uh, how to defend against that.
is to - we're going to see it after, I'm not gonna say it now, sorry.
Uh, does that make sense or no, this
Probably is because two animal
And maybe these pixels that are play - we're playing
with are changing also the output of the other network.
Let's go over some kind of defenses.
So, one solution to defend against
these adversarial networks is to create a
Is, uh, a net that - like a
you would put it before your network.
Every image that comes in will be classified as fake like forged or real
by the network and you only take those which are real and no - not adversarial.
okay, but we can also build an adversarial network that,
that fools this network, right?
Just we need black-box or white-box,
we can just create an adversarial net - example for this network.
It's true. But the issue is that now we have two constraints.
We have to fool the first one and the second one at the same time.
there is a chance that the second one is going to be fooled.
We don't know, okay?
It just makes it more complex.
There is no good defense at this point to - to - to all type of adversarial examples.
This is an option that people are researching for.
So, the paper is here if you want to check it out.
Can you guys think of another solution?
[NOISE].
I've got one.
Yeah.
Just like multiple in terms of loss functions [inaudible]
adversarial examples loss functions and train them.
Train on multiple loss functions of different networks?
Yes.
So, you're talking about
Maybe we can - maybe we can create five networks to do our tasks,
and it's highly unlikely that the adversarial example is going
to fool the five networks the same way, right?
Any other idea? Yes.
Uh, generate adversarial examples and train on them.
Exactly. Generate adversarial examples and train on those, okay?
So, you will generate a cat image that is adversarial.
So, some pixels have been changed to fool a network.
You will label it as the human sees it.
So as a cat because you want the network to still
see that as a cat and you will train on those.
We've seen that generating adversarial examples is super
costly and also we don't know if it can
Maybe we are going to
So, it is another
Now, another solution is to
train on adversarial examples at the same time as we train on - on normal examples.
So, look at this loss function.
This loss function, the loss
One is the classic loss function we would use.
So,
a classification and the second one is
the same loss function but we give it the adversarial version of x.
So, what's the complexity of that at every
For every
we're going to have to
an adversarial example at every step, right?
Because we have x, what we wanna do is
forward
generate x adversarial with the
it to calculate the second term and then back
This is super costly as well and is very similar to what you said,
it's just online just all the time, okay?
So, what is interesting is we're going to
There's another technique called
We're not going to talk about it.
There is the paper here if you want to check it.
It's another way to do adversarial training.
Uh, but what I would like to talk about is more,
from a theoretical perspective,
why are neural network vulnerable to adversarial examples?
So, let's, let's do some,
some work on the board.
Yeah,
So, you expect to be able to [inaudible] can't you just [inaudible]
every time you come up with a defense,
someone will come up with an attack and it's a race between humans,
So, this is the same type of problem.
And security problems are open-ended.
Okay. So, let's go over, uh,
something interesting that is more on the ins - on the
So, let me - let me write down something.
Uh, so, one question we ask ourselves
is why do adversarial example exist? What's the reason?
And Ian
with explaining - with the - one of the
where they argue that although many people in the past have - have
attributed this existence of adversarial examples to
high non-lineari - non-linearities of neural networks and
So, because we over-fit to a specific
we actually don't understand what cats are.
We just understanding what,
what we've been trained on.
Uh, they argue that it's actually the linear parts of networks that
is the cause of the existence of adversarial examples. So, let's see why.
And the example I'm gonna - I'm gonna look at is
So, together we've seen
So, before the
we have y-hat equals
So, the
to be y-hat equals
And our first example is going to be a six-dimensional input.
Okay. We have a
but the
So here what happens is simply w x plus b.
Okay? And then we get y-hats.
And we probably use an L1 or L2 loss because it's a regression problem to,
uh, to train this network.
Now let's look at our first example.
Our first example where, uh,
where it's - where we trained our network.
So network has been trained - sorry.
Network has been trained and
w equals one,
three,
This is w. And you know, like,
because we defined x to be a vector of size 6,
a
So the network
So now, we're going to look at these inputs.
We're giving a new input to the network.
And the net - th - the input is going to be one,
Okay. So I'm going to forward propagate this to get y-hat equals
[NOISE].
And this value is going to be 1 times 1 minus 3
minus 2 plus 0 plus 6 minus 6.
If I didn't make a mistake, up, up,
2 minus 3 plus, okay.
[NOISE] Okay.
And so we - we - we basically get minus 4.
And so this is the - the - the first - the first example that was
Now, the question is [NOISE] how to change x
into x-star
such that y-hat changes
So this is basically a problem of adversarial examples.
Can we find an example that is very close to
x but
And we're trying to build
So the interesting part is to - is to identify how we should modify x.
And the
If you take the
you know that the definition of this term is - is like
the impact on y-hat of
small changes of x, right?
How - what's the impact of small changes of x to - on - on the output?
And if you
W.
W? Everybody agrees?
What's - what's the shape of this thing?
Shape of that is the same as shape of x.
So it should be w-transpose.
Remember,
Okay. Now it's interesting to - to see this because if we
I will call it,
Can you write bigger?
Yeah. Sorry. And can you see the top one?
Yeah.
You said yes or no?
Yes.
Okay. [NOISE]. So what if x-star equals x plus
I will call it value of the
Now, if we forward propagate x-star,
it means we do y-hat-star equals w x-star plus b,
would be zero at this point.
We're going to get w x plus
And w times w-transpose is a
So this is the same as w-squared.
So what is interesting?
It's interesting because the -
the smart part was that this term is always going to be positive.
It means we - we moved a little bit x because we
can make this change little by changing
But it's going to push y-hat to a larger value for sure.
And if I had
it will push y-hat to a smaller value.
And the - the interesting thing is, now,
if we
and we take epsilon to be a small value like,
You can make the calculation.
What we get is - is this.
So 1, minus 1, 2, 0,
3, minus 2,
0.2 times 3, minus 0.2,
So if you look at that,
all the positive values have been pushed on the right. You agree?
And all the negative values - uh, sorry, sorry.
No, that's my bad. No, no, that's not it.
So let - let's finish the calculation and I'll give the insight after.
1.2, minus 0.4,
1.8, 0.4, 3.4, and minus 1.4.
So this is our x-star that we hope to be adversarial.
Okay. Let's compute y-hat-star to see what happens.
It's w x-star plus b, which is zero.
So what we get when we multiply w by x-star is 1.2 -
[NOISE]
1.2 minus 1.2,
minus 1.8 plus 0.8
plus 6.8 and minus 4.2.
[NOISE], which I believe is going to give us 0.5.
All right.
So we see that a very slight change in x-star has pushed y-hat from minus four to 0.5.
And so a few things we want to notice here.
[NOISE].
So insights on this - on this small example.
The first one is that, uh,
if W is large,
then X star is not similar to X, right?
The larger the W, the less X star is - is likely to be like X.
And specifically, if one entry of W is very large,
XI, the pixel corresponding to this entry is going to be very different from XI star.
Um, if W is large,
X star is going to be different than X.
So what we're going to do is that we are going to take
sign - sign of W instead of taking W. What's the reason why we do that?
Because the interesting part is the sign of - of the W. It means,
if we play correctly with the sign of W,
we will always push the X,
this term
Because every entry here,
this
And the second insight is that as X grows in dimension,
the impact of plus
of sign of W on Y hat increases.
And so what's interesting to notice is that we can keep epsilon as small as possible.
It means X and X star will be very similar but as we grow in dimension,
we're going to get more term in this, a lot more term.
And the change in Y hat is going to grow and grow and grow and grow and grow.
And so the one reason why adversarial examples
exist for images is because the dimension is very high,
64 by 64 by three.
So we can make epsilon very small and take the sign of W,
we will still get Y hat to be far from the original value that it had.
So epsilon doesn't grow with the dimension,
but its impact of this term increases with the dimension.
[NOISE] Okay.
[NOISE].
The one hot
Yeah.
It puts it right between these two that gives [inaudible].
Okay. So you like - you try to unadversarially [inaudible] the cat?
Yeah.
Yeah. I - I don't know if that had been done.
I don't think that has been done.
So you're talking about taking an
convert it into a normal image of the cat and then give the cat.
Yeah.
Maybe yeah.
So it's a topic of research.
Uh, okay, let's move on because we don't have too much time.
So just to conclude,
what we're going to count as
a general way to generate adversarial examples is this formula.
[NOISE]
This is going
to be a fast way to generate adversarial example.
So this method is called
So basically what we're doing is that we can - we - we are
the cost function in -
And we're saying that what's applied to
is going to also apply for this general formula for deeper networks.
So we're pushing the
that is going to impact highly the output, okay?
So that's the
Now you might say that, okay,
we did this example on the
but neural networks are not linear,
they are highly non-linear.
we are trying to
With
All that type of methods,
even the
we do all we can to put
because we want fast training.
Okay? And one last thing that I'll mention for
adversarial examples is if I have a network like this.
[NOISE]
So fully connected
with three-dimensional inputs, up, yeah.
And then one here and then the output.
What's interesting is computing the chain rule on - on - on this
will give you that
X is equal to the derivative of the loss function with respect to Z one, one.
Here
one with respect to X.
Let's say we're - we're going - we're going,
there is actually a
But anyway. Uh, just let me illustrate the point.
Uh, what we're - what we're saying is that - what we're - what we
try to do with neural networks is to have this gradient be high.
Because if this gradient is not high,
we're not able to train the parameters of
this
Because if you want to do the same thing with the - with W one,
one, which is the parameter related to this
you would need to go to this chain rule.
Correct? So we need this gradient to be high.
And if this gradient is high,
the gradient with respect to the input is also going to be high.
Because you use the same gradient in the chain rule.
So networks that are - that have
uh, vulnerable to adversarial examples because of this observation.
So any question on, on adversarial examples?
Before we move on, I think we don't have time and I would like to,
to go over the, the GANs with you guys.
So let's move on to GANs.
I'll stick around to answer questions on that part.
So the general question we're asking now is,
uh, do neural networks understand the data?
Because we've seen that some,
some data points look like they would be real,
uh, but the neural networks don't understand it.
So more generally, uh, can we build generated networks that
can
Let's say, and this is what we will call
We'll start by motivating it,
and then we look at something called the
a generator and a
that are going to help each other improve,
and finally we'll see that GANs are hard to train, uh,
we'll see some tips to train them, and finally,
go over some nice results and methods to evaluate GANs, okay?
So, uh, the motivation behind
computers with an understanding of our world, okay?
So by, by that we mean that we want to collect a lot of data,
use it to train a model that can generate
images that look like they're real even if they're not,
so a dog that has never existed can be generated by this network.
Um, and finally, uh,
the number of parameters of the model, uh,
is smaller than the amount of data,
we already talked about that,
and this is the
Is because there is too much data in the world,
any images count as data for generating the network,
and there are not enough parameters to
You know, you have - the network needs to understand the
because it doesn't have enough parameters to
So let's talk about
So these are samples from real images that have been taken,
and if you plot this real data distribution in a 2-D map,
uh, it would look like something like that.
I made it up, but this is
the image space similar to what we talked about in adversarial networks,
and this green shape is the space of real-world images.
Now, uh, if you train a generator and generate some images that look like this,
and these images come from StackGAN, uh, from
Uh, this distribution, if the generator is not good,
is not going to match the real world distribution.
So our goal here is to do something so
that the red distribution matches the real-world distribution,
then to train the network so that it realizes what we want.
So this is our generator and it's what counts,
is what, what we want to train ultimately.
We want to give it, let's say,
a random number or a random
and we want it to output an image.
But of course because it's not trained initially,
it's going to output a random image,
looks like something like that random pixels.
Now, this image doesn't look very good.
What we want is these images to look like
generated images that are very similar to the real world.
So how are we going to help this generator train?
It's not like what we did in classic supervised learning,
because we don't have,
uh, we don't really have inputs and labels,
you know, there is no label.
We could maybe give it an image of a cat and ask it to output another cat,
but we want the network to be able to output things that don't exist,
things that we've never seen.
Right. So we want the network to understand what a cat
is but not
So the way we're going to do it is through
a small game between these network called the generator G,
and another network called the
let's look at how it works.
We have a database of real images,
and we're going to start with this distribution on the bottom,
which is the real-world data distribution,
is the distribution of the images in this database.
Now our generator has this distribution initially,
it means the pixels that you see here
probably follow a distribution that doesn't match the real world.
We'll define the
and the goal of the
So we're going to give several images to this
sometimes we will give it generated images,
and sometimes we will give it real-world images.
What we want is that this discriminator is a
one if the image is real and zero if the image was generated, okay?
So let's say we give it x coming from the generated image is going to give us zero,
because we want the discriminator to detect that x was actually G of z.
If the image came from our database of real images,
we want the discriminator to say one.
So it seems like the discriminator would be easy to train, right?
It's just a
We can define a loss function.
That is the
And the good thing is we can have as many label as we want,
like it's, it's
we have this database and we label it all as one,
it's just this image exists,
let's label them as one for discriminator,
and everything that comes out of the generator let's label it as zero for discriminator.
So basically, data is not costly at all in this point.
The way we will train is that we will
the gradient to the discriminator to train the discriminator,
using a binary
But what we ultimately want is to train the generator, that's what we want.
At the end, we were not going to use the discriminator,
we just want to generate images.
So we are going to direct the gradient to go back to the generator.
And why does this gradient can go back to the generator?
The reason is that x is G of z,
it means we can
the gradient all the way back to the input of the discriminator.
But this input depends on the input of the generator if the image was generated.
So we can also
the gradient to the generator. Does it make sense?
There is a direct relation between z and the loss function,
in the case where the image was generated.
If the image was real,
then the generator couldn't get the gradient,
because x doesn't depend on z or on the features and parameters of the generator.
Okay? So we would run
um, simultaneously on two
one for the true data and from, from generated data.
Does this scheme makes sense to everyone?
Yeah,
So you said there was two
So there's many methods of, your question is about mixing the
Usually we would use, uh, we would,
we would use one
But in, in practice,
you can try other things.
Yeah. So there are many methods that are being tried to train GANs property.
We're going to
the details of that when we will see the loss functions.
So we hope that the
and if it matches,
we're going to just take the generator and generate images,
normally it should be able to generate images that look real,
[NOISE] that looked like they came from this distribution.
Okay? Sounds good?
So now let's talk more about the training procedure and
try to figure out what the loss functions should be in this case.
What should be the cost of the discriminator?
Assuming, assuming we give two
one for real data, so real images,
and one for generated data that come from G [NOISE].
Yes.
The same basic loss function we use for every binary classes, right?
The same basic loss function we use from binary class - for binary class case.
It's true we're going to
but it's the same idea.
So this is what it can look like.
We're going to call it JD,
cost function of the discriminator.
It has two terms. What does the first term say?
What does the second term say?
And you can recognize the binary cross-entropy here.
[NOISE].
The only difference is that we have
a label that is Y_real and a label that is Y_generated.
In practice, Y_real and Y_generated are always going to be set to values.
We know that Y_generated is zero and we know that Y_real is one.
So we can just remove these two terms because they're both equal to one.
The first term is telling us this should correctly
label real data as one, the cross-entropy term.
The first term of a binary cross-entropy.
The second term is going to tell us,
D should correctly labeled generated data as zero.
So the difference with classic cross-entropy we've seen is that,
this
And the
So we both want the D to correctly identify real data,
and also correctly identify fake data.
That's why we have two terms.
Now, what about the generator?
What do you think should be the cost function of the generator? Yes.
So just about that cost function.
If I've been putting data that's from the generator,
I won't run the first pass because I don't have a,
uh, a Y_real if I have the - an input that's coming in from the generator.
Yeah. Exactly.
It's about half of this.
Yeah. But in your batch, we have had, like,
a certain number of real example,
a certain number of generated examples.
The generated examples have no impact on the first cross-entropy,
and same for the real examples on the second cross-entropy. Any other questions?
Okay. So coming back to the cross - to the - to the cost of the generator.
What should it be? This is a tiny bit complicated.
Let's move - let's move on because we don't have too much time.
The cost of the generator basically should say that G should try to
prove D. [NOISE] The goal is to for G to generate real samples.
And in order to generate real samples,
we want to fool D. If G managed to fool D and D is very good,
it means G is very good, right?
The problem is that it's a game.
Because if D is bad and G fools D,
it doesn't mean that G is good.
Because G - because D is bad,
it doesn't detect very well the real versus fake examples.
We want D to go up to - to be very good and G to go up at the same time.
Until
like, random probabilities because it cannot
distinguish the samples coming from G versus the real samples.
So this cost function is basically saying, uh,
for generated images, we want D to classify them as one.
That's what it's saying. We want to fool D,
okay? Yeah.
Uh, just a little bit of a side question, um, I
can kind of see - so if you're implementing this,
I can kind of see how you would, uh, you know,
implement for D, but how would you implement for D as if you're actually implementing this?
Um, is there - has there been a module to dot train this
because it's not immediately obvious how you do this setup?
So, you know, like,
if you're using - so how to implement that?
If you're using a deep learning framework,
you've been building a graph, right?
And at the end of your graph,
you've been building your cost function D that is very close to a binary cross-entropy.
Uh, what you're going to just do is to define a node that is going to be minus
the cost function of D. It's going - every time you are going to call the function J of G,
it's going to run the graph that you define for J of D and run,
uh, an in - an opposition operation - an
So now you have two different cost functions.
How can they propagate
These are two different cost functions.
Propagate
Yeah.
We're not going
We are going to - to returning [
to a
So, you know, you - you - you
the - on - on D. And when you
you would flip - you would flip the sign. That's all we do.
The same thing with the sign flipped.
In terms of implementation it's just, uh, another operation.
Okay. Now, let's look at
Let's look at [NOISE] at the graph of the
So I'm going to plot against
oh sorry, D of G of z.
So what does this mean?
This axis is the output of D when given a generated example, G of z.
It's going to be between zero and one because it's a probability.
D is a binary classifier with a sigmoid, uh, output probably.
Um, if we plot
So, like, this type of thing.
This would be log of D of G of z.
Does it makes sense? That's the
Um, if I plot minus that, minus that.
So let me - let me plot minus
or let me - let me do something else.
Let me plot logarithm of minus D of G of z.
This is it. Do, do you guys agree?
Now, what I'm going to do is that I'm going to plot another function that is this one.
That is logarithm of one minus D of
G of z, okay?
So the question is,
right now, what we're doing is that we're saying the,
the cost function of the generator is logarithm of 1 minus D of G of z.
So it looks like this,
right? It looks like this one.
[NOISE] What's the issue with this one?
What do you think is the issue with this cost function looking at it like that?
It goes to
Sorry.
It goes to
Can you say it louder?
It goes to
in one, that's what you mean?
Yeah.
Yeah. And so the, the consequence of that is that
the gradient here is going to be very large,
the closer we go to one.
But the closer we are to zero,
the lower is the gradient.
And it's the reverse phenomenon for this lo - logarithm.
The gradient is very high,
and very high I mean in absolute value.
Very high when we're close to zero,
but it's very low when we go close to one, okay?
So which loss function do you think would be better?
A loss function that looks like this one or a loss function that looks like
this one to train our generator?
The broader question is where are we early in the training?
Are we close to here or are we close to there?
What does it mean to be close there?
Close to one? [NOISE].
Hmm?
uh, samples are real.
They're here. This place is the contrary.
D thinks that generated samples are fake.
It means, correctly finds out that they're fake.
Early on, we're generally here.
Because the discriminator is better than the generator.
and it's very easy for the discriminator to figure out that it's fake
because this garbage looks very different from real world data.
So early on, we're here.
So which function is the best one to - to - to - to - to be our cost?
[inaudible].
Yeah. So probably, this one is better.
So we have to use a mathematical trick to change this into that.
Right. And the mathematical trick is pretty standard.
Right now, we're minimizing something that is in log of one minus X.
We can say that doing so is the same as maximizing something that is in log of X.
Do you agree? Simple flip.
And we can also say that it is the same as minimizing something in minus log of X.
Does it make sense? So we are going to use this mathematical trick
to convert our function that is a
we would say, into a non-saturating cost that is going to look more like this.
Let's see what it looks like.
So to sum up,
our cost function currently looks like that.
It's a
Because early on, the
We cannot train G. We're going to do a
and converts this into another function that is a non-saturating cost.
Okay. Yeah. Well, actually, yeah.
So the reason it's the blue one is like that is because I added a
So I'm flipping this.
Okay? And it's the same thing,
it's just the - the sign of the gradient that is going to be different.
Like that, the gradient is high at the beginning and low at the end. That makes sense?
[NOISE] So we're going to do the - use this flip.
And so we have a new training procedure now where J of D
didn't change but J of G changed.
We have a
we have the log of G,
uh, D of G of Z.
Does that make sense to everyone?
Good. And actually, so this is a fun thing
created a large,
study of many, many different GANs.
It shows what people have tried.
And you can see that people have tried all types of loss to make GANs
So it looks - it looks complicated here.
But actually, the MM GAN is the first one we saw together.
It's the mini-max loss function.
The second one is the non-saturating one that we just see.
So you see between the first two.
The only difference is that on the generator,
we gets the log of one minus D of X hat becoming log - minus log of D of X hat.
Okay. Now, another trick to train GANs is to use the fact that,
uh, a non-saturating, uh,
to use the fact that D is usually easier to train
than G. But as D improves, G can improve.
If D doesn't improve, G cannot improve.
So you can see the - the - the - the performance
of D as an upper bound to what G can achieve.
Because of that, we will usually train D more time than we will train G.
So we will basically train for num_iteration,
K times D, one times G. K times D,
one times G, and so on.
So that the discriminator becomes better than the - the generator can catch up.
Better than can catch up,
and so on. That make sense?
There's also methods to use like
different learning rates for D and G to take this into account,
to train faster the discriminator.
Okay. Uh, because we don't have too much time,
I'm going
We are going to sit probably next week, uh,
together after you guys have seen the BatchNorm videos.
Okay. It's cool. So just to sum up.
Some - some tips to train GANs is to modify the cost function.
We've seen one modification, there are many more.
Uh, keeping D up-to-date with respect to G. So updating D
more than you update G using Virtual BatchNorm which is a
so it's a different type of BatchNorm that is used here.
And something called one-sided la - label
smoothing that I'm not going to talk about it today because we don't have time.
So let's see some nice result now,
and that's the funniest part.
Um, so some of you have worked with word
and you - you might know that word
are vectors that can
And you can compute operations sometimes on these - on these words.
So if you take, um,
if you take king minus queen,
it should be equal to man minus woman.
Operations like that.
That's happened in the space of
You can use a generator to generate faces,
and the paper is listed on the bottom here.
So you give a code that is a random code and it will give you an image of a - a face.
You can give it a second code,
it's going to give you a second image that is
different from the first one because the code was different.
You can give it a third one,
it's going to give you a third fa - third face.
The fun part is,
if you take code one minus code two plus code three.
So basically, image of a man with glasses minus image of
a man plus image of a woman will give you an image of a woman with glasses.
So [
So this is interesting because it means that
the
Okay. Let's look at something even better.
So you can use GANs for image generation.
Of course, these are very nice samples.
You see that sometimes,
GANs have problem with - with the - [LAUGHTER]
But - but - but the - but these
are StackGAN++ is a - is a very impressive GAN
that has generated - that has been state of the art for a long time.
Okay. So let's see something fun.
Something called image-to-image translation.
So, uh, actually, the - the -
the project winners last quarter in Spring was a project dealing with exactly that.
Generating satellite images based on the map image.
So given the map image, generate the satellite image using a GAN.
So you see that instead of giving a
you could give a very detailed code.
The code can be this image.
Right? And you have to find a way to constrain your network in a certain - with - in
a certain way to push it to output
exactly the satellite image that corresponded to this map image.
There are many other results that are fun.
Converting
Um, and apples to oranges and oranges to apple.
So let's do a - a case study together.
Let's say our goal is to convert horses to
Can you tell me what data we need?
Let's go quickly so that we have some time.
Horses and
Yeah. Horses and zebras.
Do you need per images?
You know, like, do you need to have the same image of a horse as a
No.
Yeah. So the problem is, uh, okay,
we could have labeled images, you know,
like uh, a horse and its,
uh,
Uh, and we could train a network to take one and output the other.
Unfortunately, we don't - not - every horse has
a
Uh, so instead, we're going to do
uh,
It means we have a database of horses and a database of zebras.
But these are different horses and different zebras.
They're not one-to-one - there's no one-to-one mapping between them.
There's no mapping at all. What architecture do you wanna use?
GAN?
Nice.
[LAUGHTER] GAN, not a [inaudible].
Okay. So let's see about the architecture and the cost.
So I'm going over it very quickly because it's a -
it's a very fun GAN with - it's called CycleGAN.
So the way we are going to work it out is we have a horse called
capital H. We want to generate the
So we give it to a generator that we call G1.
You can call it H2Z,
like horse to
It should give us this horse H as a zebra, right.
And in fact, if we're training a GAN,
we need a discriminator.
So, we will add a discriminator that is going to be a binary classifier to tell us
if this image
So this discriminator is going to take in some images of zebras, probably,
or-yeah, zebras or horses [NOISE],
and it's going to also take the generated images
and going to see if which one is fake which one is real.
On the other hand, we're going to do - and the
We need to enforce the fact that this horse G1 of H
should be the same horse as H. In order to do that,
we're going to create another gen - generator which is going to take the generated image,
and generate back the input image.
And this is where we will be able to enforce the constraints that G2 of G1 of
H should be equal to H. Do you see why this loop is super important?
Because if we don't have this loop,
we don't have a constraint on the fact that the horse
should be the - the zebra should be the horse as a zebra,
the same horse as H. So we'll do that and
we have a second discriminator to decide if this image is real.
This is one step, H2Z.
Another state might be Z2H where we start with zebra,
give it to Generator 2,
of the zebra and
So this is the general pattern using CycleGANs.
And what I'd like to go over is what loss should we minimize in
order to enforce the fact that we want
the horse to be converted to a zebra that is the same as the horse.
Can someone give me the terms that we need?
Someone wants to give it a try?
Go for it. Two minutes. Yes.
So you want to make sure that the picture in
the end that is of the zebra that you started off with,
matches the zebra that you started it with or
the horse that you started off matches the horse that you had originally.
Okay.
But at the same time, you also need to have Discriminator 2
identifying that the image is a real zebra or a real horse -
Yeah.
- because you don't want it to just sort of input
in the sample image and it output back to you the sample image.
Yeah, correct.
So I think you'd want to add the output of the cost function for Discriminator
2 to the cost that you get at for comparing the starting images.
Okay, that's correct. So you're saying we need
the classic cost functions that we've seen previously,
plus another one that is the matching between H and G2 of G1 of H,
and Z and G1 of G2 of Z.
Yes.
Correct. So we have all these terms.
One term to train G1,
which is the classic term we've seen,
differentiate real images from generated images.
G1 is what? Same. We are using the non-saturating costs on generating images.
Same for D2. Same for G2. These are classics.
The one we need to add to all of this is
the cycle costs which is the distance between this term,
G2 of G1 of H and H,
and the same thing for zebras.
Does that make sense? So you have the intuition to build that type of loss.
We just sum everything and it gives us the cost function we're looking for. Yeah.
Can we use the same,
uh, D1 as D2?
It's the same [inaudible] recognized [inaudible]
Oh, the same cost function for D1 and D2?
Yeah. Could you use the same -
So, the, the - you could but it's not going to work that well.
I think - So I think there's a - there's a tiny mistake here,
is that, uh, the
the small
and the small Hi on top should be a small
Because the Discriminator 1 is going to receive
generated samples that look like zebras because it came out of G1.
So you want the real database to - that you give it to to be zebras as well.
To force - to force the generator one to output things that look like zebras,
Okay? And this my favorite.
So you can convert the
[LAUGHTER] It's the most fun application I found.
It's from Naritomi
So it's Japanese research lab are working hard to,
to, to do face2ramen [LAUGHTER].
And actually, in two - in two to three weeks,
you will learn, um,
And if you learn that, maybe you can start a project to like
detect the face and then replace it by a
[LAUGHTER] Because I don't know, this is also a funny,
funny work by Naritomi.
Okay. Oh, this is a super cool application as well.
So let's look at that.
Okay. So we have - so this model is a
um, learning edges and generating cats based on the edges.
So I'm gonna - I'm gonna to try to draw a cat.
[LAUGHTER] Okay, sorry.
I cannot see [LAUGHTER].
Again, I'm not a good drawer - [LAUGHTER]. It's a cat.
Okay. It's going
I hope it's gonna work. [LAUGHTER] Okay.
Yeah,
I,
I don't think it worked,
but it's supposed to work.
So you can generate cats based on,
on edges and you can do it for different things.
You can do it for a shoe.
So all these models have been trained for that. Okay.
Yeah, I have a question.
Yes, go for it.
[NOISE] So, so for this model,
would you have the specific things for the things that you want it to generate?
Like two things, so cats and shoes in this case?
Uh, sorry. Can you repeat?
Is it
You have to train it specifically for the domain.
So like these models are different models that have [NOISE] been trained.
Okay.
Okay. I'm looking for my presentation,
[NOISE] I missed it. The presentation disappeared.
Okay. Another application is super resolution.
You can give a lower resolution image and generate
the super resolution version of it using GANs.
And this is pretty cool because you can get,
uh, a high resolution image,
down-sample it, and use this as the
[NOISE] Like you have
the high resolution version of the lower
Um, other applications can be privacy-preserving.
So some people have been working on - you know in medical - uh,
in the medical space privacy is a huge issue.
You cannot share a
among medical teams it's common,
so people have been looking at generating a
If you train a model on this dataset,
it's going to give you the same type of parameters than the other one,
but this dataset is
So they can share the
other and train their model on that, without
being able to access, uh,
the information of the patient and who it is.
Um, manufacturing is important as well,
so GANs can generate, um,
very specific, uh, objects that can replace bones for humans,
So same for
If you lose the teeth, uh, the,
the technician can take a picture and decide what's the,
the crown should look like.
The GAN can generate it.
Um, another topic is how to evaluate GANs, you know.
Um, you might say we can just look at the images and see if they
look real and it will give us an idea if the GAN is working well.
from the real samples you gave to the - to the - to the discriminator.
Uh, so how do you check that?
It's very complicated.
So human
where you would, uh,
[NOISE] you would build a software,
push it on the cloud and people all around the world are
gonna select which images look generated,
which images look not generated to see if a human can, can,
can compare your GAN to real-world data,
and how your GAN performs.
So it would look like that.
A
You can - you can do different experiments like you can show very quickly
an image for a fraction of a second and ask them was it real or not,
or you can give them
Different experiments can be led.
Uh, there is another one that is more
You know, every time you train a GAN,
you want to do that to verify if the GAN is working well. It takes a lot of time.
So instead of using humans,
why don't we use a very good network that is good at classification.
We're going to give our image samples to
this
Does it think that it's a dog or not?
Does it look like a dog for the network or not?
And we can scale it and make it very quick.
And there is a
that we can talk next week about when we'll have time.
Uh, it measures the quality of
the samples and also it measures the diversity of the sample.
I'll go over it next week, hopefully.
Uh, there is another distance that is very popular, uh,
that has been
And I, I - I'll advise you to check some of
these paper if you're more interested in it for, for your projects.
So just to end, um, for next Wednesday,
we'll have, uh, C2 and three and also the whole C3 modules.
[NOISE] Uh, you'll have three
Be careful, these two
C3M1 and C3M2 are longer than ca - than normal
They're like wide case studies, so take your time,
and go over it, um,
and you have one programming assignment.
Uh, make sure you understand the BatchNorm videos,
so that we can go over the virtual BatchNorm hopefully next week together.
Um, and hands-on section this Friday, uh,
you will receive your project proposal as soon as possible, uh,
and meet with your project TAs to go over the proposal and
to make decisions regarding the next steps for your projects.
Uh, I'll stick around in case you have any questions. Okay. Thanks, guys.
知识点
重点词汇
infinite [ˈɪnfɪnət] n. 无限;[数] 无穷大;无限的东西(如空间,时间) adj. 无限的,无穷的;无数的;极大的 {cet4 cet6 ky ielts :6045}
gauge [ɡeɪdʒ] n. 计量器;标准尺寸;容量规格 vt. 测量;估计;给…定规格 {cet4 cet6 ky toefl ielts gre :6046}
detection [dɪˈtekʃn] n. 侦查,探测;发觉,发现;察觉 {cet4 cet6 gre :6133}
attacker [əˈtækə(r)] n. 攻击者;进攻者 { :6197}
download [ˌdaʊnˈləʊd] vt. [计] 下载 {gk :6382}
radically ['rædɪklɪ] adv. 根本上;彻底地;以激进的方式 {ky toefl :6472}
discriminate [dɪˈskrɪmɪneɪt] vi. 区别;辨别 vt. 歧视;区别;辨别 {ky toefl ielts gre :6572}
proximity [prɒkˈsɪməti] n. 接近,[数]邻近;接近;接近度,距离;亲近 {cet6 toefl ielts :6588}
overlapping [əʊvə'læpɪŋ] adj. 重叠;覆盖 v. 与…重叠;盖过(overlap的ing形式) {toefl :6707}
overlap [ˌəʊvəˈlæp] n. 重叠;重复 vi. 部分重叠;部分的同时发生 vt. 与…重叠;与…同时发生 {cet6 ky toefl ielts gre :6707}
unlimited [ʌnˈlɪmɪtɪd] adj. 无限制的;无限量的;无条件的 {cet6 :6742}
MA [mɑ:] abbr. 文学硕士(Master of Arts);磁放大器(magnetic amplifier);主报警信号(main alarm) { :6756}
inaudible [ɪnˈɔ:dəbl] adj. 听不见的;不可闻的 { :6808}
algorithm [ˈælgərɪðəm] n. [计][数] 算法,运算法则 { :6819}
algorithms [ˈælɡəriðəmz] n. [计][数] 算法;算法式(algorithm的复数) { :6819}
mimic [ˈmɪmɪk] vt. 模仿,摹拟 n. 效颦者,模仿者;仿制品;小丑 adj. 模仿的,模拟的;假装的 {toefl ielts gre :6833}
plucked [plʌkt] [纺] 粗细不匀 { :6870}
dental [ˈdentl] n. 齿音 adj. 牙科的;牙齿的,牙的 {ky toefl :7161}
vice [vaɪs] prep. 代替 n. 恶习;缺点;[机] 老虎钳;卖淫 adj. 副的;代替的 vt. 钳住 n. (Vice)人名;(塞)维采 {gk cet4 cet6 ky ielts :7210}
latent [ˈleɪtnt] adj. 潜在的;潜伏的;隐藏的 {cet6 ky toefl ielts gre :7284}
numerical [nju:ˈmerɪkl] adj. 数值的;数字的;用数字表示的(等于numeric) {cet6 ky toefl ielts :7312}
gradient [ˈgreɪdiənt] n. [数][物] 梯度;坡度;倾斜度 adj. 倾斜的;步行的 {cet6 toefl :7370}
gradients [ˈgreɪdi:ənts] n. 渐变,[数][物] 梯度(gradient复数形式) { :7370}
binary [ˈbaɪnəri] adj. [数] 二进制的;二元的,二态的 { :7467}
Et ['i:ti:] conj. (拉丁语)和(等于and) { :7820}
compute [kəmˈpju:t] n. 计算;估计;推断 vt. 计算;估算;用计算机计算 vi. 计算;估算;推断 {cet4 cet6 ky toefl ielts :7824}
intuition [ˌɪntjuˈɪʃn] n. 直觉;直觉力;直觉的知识 {cet6 ky toefl ielts gre :7905}
conditional [kənˈdɪʃənl] n. 条件句;条件语 adj. 有条件的;假定的 { :8076}
converged [kən'vɜ:dʒd] v. 聚集,使会聚(converge的过去式) adj. 收敛的;聚合的 { :8179}
converge [kənˈvɜ:dʒ] vt. 使汇聚 vi. 聚集;靠拢;收敛 {cet6 toefl ielts gre :8179}
encode [ɪnˈkəʊd] vt. (将文字材料)译成密码;编码,编制成计算机语言 { :8299}
encoding [ɪn'kəʊdɪŋ] n. [计] 编码 v. [计] 编码(encode的ing形式) { :8299}
validation [ˌvælɪ'deɪʃn] n. 确认;批准;生效 { :8314}
downside [ˈdaʊnsaɪd] n. 负面,缺点;下降趋势;底侧 adj. 底侧的 { :8709}
implicitly [ɪm'plɪsɪtlɪ] adv. 含蓄地;暗中地 { :8775}
quiz [kwɪz] n. 考查;恶作剧;课堂测验 vt. 挖苦;张望;对…进行测验 {gk cet4 cet6 ky :8784}
loo [lu:] n. 厕所,洗手间;赌金;卢牌戏(一种纸牌赌博) vt. 使罚赌金 n. (Loo)人名;(德、法)洛 { :8889}
derivative [dɪˈrɪvətɪv] n. [化学] 衍生物,派生物;导数 adj. 派生的;引出的 {toefl gre :9140}
neural [ˈnjʊərəl] adj. 神经的;神经系统的;背的;神经中枢的 n. (Neural)人名;(捷)诺伊拉尔 { :9310}
activations [,æktɪ'veɪʃən] n. [电子][物] 激活;活化作用 { :9314}
activation [ˌæktɪ'veɪʃn] n. [电子][物] 激活;活化作用 { :9314}
neuron [ˈnjʊərɒn] n. [解剖] 神经元,神经单位 {cet6 toefl :9397}
salient [ˈseɪliənt] n. 凸角;突出部分 adj. 显著的;突出的;跳跃的 n. (Salient)人名;(西)萨连特 {toefl gre :9408}
en [en] n. 半方;字母N prep. 在…中 n. (En)人名;(芬、柬)恩 { :9798}
metric [ˈmetrɪk] adj. 公制的;米制的;公尺的 n. 度量标准 {cet4 cet6 ky ielts :10163}
propagate [ˈprɒpəgeɪt] vt. 传播;传送;繁殖;宣传 vi. 繁殖;增殖 {cet6 toefl ielts gre :10193}
propagated [ˈprɔpəɡeitid] 传播 { :10193}
inception [ɪnˈsepʃn] n. 起初;获得学位 n. 《盗梦空间》(电影名) {gre :10325}
grad [græd] n. 毕业生;校友 n. (Grad)人名;(英、法、德、罗、瑞典)格拉德 { :10355}
pixels ['pɪksəl] n. [电子] 像素;像素点(pixel的复数) { :10356}
pixel [ˈpɪksl] n. (显示器或电视机图象的)像素(等于picture element) { :10356}
generalize [ˈdʒenrəlaɪz] vi. 形成概念 vt. 概括;推广;使...一般化 {cet6 ky toefl ielts gre :10707}
tweak [twi:k] n. 扭;拧;焦急 vt. 扭;用力拉;开足马力 { :10855}
wha [ ] [医][=warmed,humidified air]温暖、潮湿的空气 { :11046}
saturating [ˈsætʃəreitɪŋ] v. 浸湿,浸透( saturate的现在分词 ); 使…大量吸收或充满某物 { :11157}
infinity [ɪnˈfɪnəti] n. 无穷;无限大;无限距 {cet6 gre :11224}
delve [delv] n. 穴;洞 vi. 钻研;探究;挖 vt. 钻研;探究;挖 n. (Delve)人名;(英)德尔夫 {gre :11237}
axes [ˈæksi:z] n. 轴线;轴心;坐标轴;斧头(axe的复数) { :11322}
malicious [məˈlɪʃəs] adj. 恶意的;恶毒的;蓄意的;怀恨的 {cet6 toefl gre :11330}
washer [ˈwɒʃə(r)] n. [机] 垫圈;洗涤器;洗衣人 { :11379}
seminal [ˈsemɪnl] adj. 种子的;精液的;生殖的 adj. 有创造力的,对未来有影响的;重大的 {gre :11387}
optimize [ˈɒptɪmaɪz] vt. 使最优化,使完善 vi. 优化;持乐观态度 {ky :11612}
propagation [ˌprɒpə'ɡeɪʃn] n. 传播;繁殖;增殖 {cet6 gre :12741}
multiplication [ˌmʌltɪplɪˈkeɪʃn] n. [数] 乘法;增加 {cet6 :12748}
personalized [ˈpəːs(ə)n(ə)lʌɪzd] adj. 个性化的;个人化的 v. 个性化(personalize的过去式);个人化 { :13175}
buggy [ˈbʌgi] n. 童车;双轮单座轻马车 adj. 多虫的 {toefl gre :13418}
entropy [ˈentrəpi] n. [热] 熵(热力学函数) { :13494}
blurry [ˈblɜ:ri] adj. 模糊的;污脏的;不清楚的 { :13819}
zebra [ˈzebrə] n. [脊椎] 斑马 adj. 有斑纹的 {zk gk cet4 cet6 ky :13912}
zebras ['zɪbrəz] n. 斑马( zebra的名词复数 ) { :13912}
logistic [lə'dʒɪstɪkl] adj. 后勤学的;[数] 符号逻辑的 { :14538}
dimensional [dɪ'menʃənəl] adj. 空间的;尺寸的 {toefl :15066}
adversarial [ˌædvəˈseəriəl] adj. 对抗的;对手的,敌手的 { :15137}
transferable [trænsˈfɜ:rəbl] adj. 可转让的;[数] 可转移的 { :16039}
APP [æp] abbr. 应用(Application);穿甲试验(Armor Piercing Proof) n. (App)人名;(英)阿普 { :16510}
mu [mju:] n. 希腊语的第12个字母;微米 n. (Mu)人名;(中)茉(广东话·威妥玛) { :16619}
optimization [ˌɒptɪmaɪ'zeɪʃən] n. 最佳化,最优化 {gre :16923}
firewall [ˈfaɪəwɔ:l] n. 防火墙 vt. 用作防火墙 { :17087}
iteration [ˌɪtəˈreɪʃn] n. [数] 迭代;反复;重复 { :17595}
summation [sʌˈmeɪʃn] n. 和;[生理] 总和;合计 {gre :17935}
dataset ['deɪtəset] n. 资料组 { :18096}
permutations [pɜ:mju:'teɪʃnz] n. [数] 排列(permutation的复数) { :18648}
iguana [ɪˈgwɑ:nə] n. 鬣蜥蜴 { :18852}
BOT [bɒt] n. 马蝇幼虫,马蝇 n. (Bot)人名;(俄、荷、罗、匈)博特;(法)博 { :18864}
难点词汇
unsupervised [ˌʌn'sju:pəvaɪzd] adj. 无人监督的;无人管理的 { :19787}
perturbation [ˌpɜ:təˈbeɪʃn] n. [数][天] 摄动;不安;扰乱 { :19948}
rephrased [ri:ˈfreizd] v. 改述,改撰( rephrase的过去式和过去分词 ) { :20756}
rephrase [ˌri:ˈfreɪz] vt. 改述;重新措辞 { :20756}
encrypted [inˈkriptid] v. 把…编码;把…加密(encrypt的过去分词) { :21117}
generative [ˈdʒenərətɪv] adj. 生殖的;生产的;有生殖力的;有生产力的 { :21588}
epsilon [ˈepsɪlɒn] n. 希腊语字母之第五字 { :22651}
annotation [ˌænə'teɪʃn] n. 注释;注解;释文 { :22939}
deterministic [dɪˌtɜ:mɪ'nɪstɪk] adj. 确定性的;命运注定论的 { :23481}
unconstrained [ˌʌnkən'streɪnd] adj. 不勉强的;非强迫的;不受约束的 { :23653}
IM [ ] abbr. 感应电动机(Induction Motor) abbr. 即时通信(Instant Messaging) { :24105}
encoder [ɪn'kəʊdə] n. 编码器;译码器 { :24604}
GE [ʒei] abbr. 美国通用电气公司(General Electric Co.);总能量(gross energy) n. (Ge)人名;(朝)揭;(俄)格 { :25836}
xavier ['zʌvɪə] n. 泽维尔(男子名) { :26299}
embeddings [ɪm'bɛd] v. [医] 植入;埋藏(embed的ing形式) { :27523}
logarithm [ˈlɒgərɪðəm] n. [数] 对数 { :27896}
ve [ ] 委内瑞拉 abbr. 虚拟环境(Virtual Environment) { :28191}
transferability [ˌtrænsˌfɜ:rə'bɪlətɪ] n. 可转移性;可转让性 { :28407}
scalar [ˈskeɪlə(r)] n. [数] 标量;[数] 数量 adj. 标量的;数量的;梯状的,分等级的 { :28925}
ramen [rɑmən] n. (方便)拉面,拉面 { :29936}
unpaired ['ʌn'peəd] adj. 不成双的;无对手的;无配偶的 { :30070}
scalable ['skeɪləbl] adj. 可攀登的;可去鳞的;可称量的 { :30540}
thi [ ] abbr. 温度-湿度指数(Temperature Humidity Index) { :30640}
sigmoid ['sɪgmɔɪd] n. 乙状结肠(等于sigmoidal);S状弯曲 adj. 乙状结肠的;C形的;S形的 { :31478}
doppelganger ['dɒpelɡɑ:nɡər] n. 面貌极相似的人;幽灵 { :31488}
discriminator [dɪ'skrɪmɪneɪtə] n. [电子] 鉴别器;辨别者 { :34448}
classifiers [ ] (classifier 的复数) n. 分类者, 分粒器, 分级机, 汉语中的量词 [计] 分类符, 分类器 { :37807}
classifier [ˈklæsɪfaɪə(r)] n. [测][遥感] 分类器; { :37807}
iterate [ˈɪtəreɪt] vt. 迭代;重复;反复说;重做 {toefl gre :38640}
Goodfellow [ ] [人名] [英格兰人姓氏] 古德费洛绰号,意气相投的伙伴,来源于中世纪英语,含义是“好+伙伴”(good+fellow) { :39174}
initialization [ɪˌnɪʃəlaɪ'zeɪʃn] n. [计] 初始化;赋初值 { :40016}
VER [vɜ:] n. DOS命令:显示DOS版本号 { :45633}
iteratively [ ] [计] 迭代的 { :48568}
生僻词
anonymized [ ] 隐去姓名资料 使匿名
backpropagate [ ] [网络] 反向传播
Coursera [ ] [网络] 免费在线大学课程;免费在线大;斯坦福
crypto ['krɪptəʊ] n. 秘密赞同者;秘密党员
denoise [di:'nɔiz] 降噪, 消除干扰
denoising [ ] [网络] 去噪;去噪声;小波去噪
derivate ['derɪveɪt] n. 导数;派生词;派生的事物 adj. 引出的;系出的
ensembling [ ] [网络] 综合
Frechet [ ] [计] F-拓扑
generalizable ['dʒenərəlaɪzəbl] adj. 可概括的,可归纳的
growingly [ ] adv. growing的变形
kurakin [ ] [网络] 库拉金;寄给阿金
linearize ['lɪnɪəraɪz] vt. 使线性化
linearizing ['liniәraiziŋ] 线性化的
logit ['lɒdʒɪt] 分对数
medias [ ] n. (西)丝袜
minibatch [ ] [网络] 迷你
minibatches [ ] [网络] 小批量
minimax ['mɪnəˌmæks] n. 极小化极大,极小极大(极大中的极小),鞍点
oppositive [ә'pɔzitiv] a. 反对的;相反的
outputted ['aʊt.pʊt] n. 产量;产品;【电】发电力;供给量 [网络] 输出;产出;输出量
overfit [ ] [网络] 过拟合;过度拟合;过适应
overfitting [ ] n. 过适;[数] 过度拟合 v. 过适(overfit现在分词)
quizzes [kwiziz] n. 小测验(quiz复数形式);智力比赛 v. 测验;盘问(quiz的第三人称单数形式)
relu [ ] [网络] 关节轴承
resi [ ] [网络] 万鼎工程;菜;树脂
safet [ ] abbr. self-aligned MESFET process 自对准金属-半导体场效应晶体管工艺
softmax [ ] [网络] 柔性最大传递函数;前回收的日志文件的百分比;西风狂诗曲系列篇章
someth [ ] [网络] 官方完整
tako [ ] [地名] [日本] 多古
takuya [ ] [网络] 拓也;寺田拓哉;卓也
trainable [t'reɪnəbl] adj. 可训练的,可教育的
versa ['vɜ:sə] adj. 反的
wx [ ] abbr. weather 天气; weather report 气象报告; watts second 瓦特秒; waxy 蜡(状)的
zhang [ ] n. 张,章(中国姓氏)
zi [,zi 'aɪ] abbr. 美国本土,后方地带(等于zone of interior)
词组
a batch [ei bætʃ] un. 一批 [网络] 同批产品;罩式炉;详细
a dot [ ] [网络] 阿顿;阿突
a flip [ ] [网络] 翻筋斗
A minus [ ] [网络] A减
an algorithm [ ] [网络] 规则系统;运算程式
and vice versa [ ] [网络] 反之亦然;反过来也一样;科技中的政府
AX (axis) [ ] 线
binary classification [ ] 二元分类
classify as [ ] [网络] 归类为;分类为;出库类型
column vector [ ] un. 列向量;列向和;纵矢量 [网络] 列矢量;行向量;向量了
correlate to [ ] [网络] 使相互关联;相关;与…相互关联
cross entropy [ ] 交叉熵
descent algorithm [ ] 下降算法
door mat [ ] un. 门前擦鞋棕垫;蹭鞋胶垫;门口踏脚垫 [网络] 门垫;门前棕垫;乡村小花脚踏垫
dot product [dɔt ˈprɔdʌkt] un. 点积;标量积 [网络] 点乘;数量积;内积
Epsilon sign [ ] 《英汉医学词典》Epsilon sign 艾泼斯龙征
et al [ ] abbr. 以及其他人,等人
et al. [ˌet ˈæl] adv. 以及其他人;表示还有别的名字省略不提 abbr. 等等(尤置于名称后,源自拉丁文 et alii/alia) [网络] 等人;某某等人;出处
et. al [ ] adv. 以及其他人;用在一个名字后面 [网络] 等;等人;等等
flip that [ ] None
forward propagation [ ] 正向传播
generator output [ ] 电机输出[功率]
gradient descent [ ] n. 梯度下降法 [网络] 梯度递减;梯度下降算法;梯度递减的学习法
gradient descent algorithm [ ] [网络] 梯度下降算法;梯度陡降法;梯度衰减原理
gradient sign [ ] 坡度标
high gradient [ ] 高梯度
in the proximity of [ ] na. 在…附近 [网络] 在...附近
latent code [ ] 隐性分类码
linear network [ ] [网络] 线性网络;线性网路;线性神经网络
linear operation [ ] [网络] 线性运算;它并不是个线性操作;线性演算
linear regression [ ] un. 线性回归;直线回归 [网络] 线性回归分析;线性回归法;线性衰退
logarithm function [ ] n.对数函数
logistic regression [loˈdʒɪstɪk rɪˈɡrɛʃən] n. 逻辑回归 [网络] 吉斯回归;逻辑斯回归;罗吉斯回归
maximum probability [ ] un. 最高概率 [网络] 最大概率;最大机率
minus one [ ] [网络] 桃花源;幸福意外;谢谢你捧场
minus sign [ˈmainəs sain] n. 负号 [网络] 减号;减号的故事;负符号
negative infinity [ ] 负无穷大,负无限大
neural network [ˈnjuərəl ˈnetwə:k] n. 神经网络 [网络] 类神经网路;类神经网络;神经元网络
neural networks [ ] na. 【计】模拟脑神经元网络 [网络] 神经网络;类神经网路;神经网络系统
object detection [ ] [科技] 物体检测
optimization problem [ ] un. 最佳化问题 [网络] 最优化问题;次要最佳化问题
optimization problems [ ] [网络] 最佳化问题;最优化问题;最适化问题
optimization process [ ] un. 最优化过程 [网络] 优化历程;最佳化处理
overlap with [ ] vt.与...相一致
pixel image [ˈpiksəl ˈimidʒ] [医]像素显像
pixel value [ ] [网络] 像素值;像素数值;像素单元值
probability distribution [ ] un. 概率分布 [网络] 机率分布;机率分配;确率分布
probability distributions [ ] [网络] 概率分布;学过几率分布;机会率分布
probability of [ ] na. (飞弹不被击落的)概率 [网络] 变异概率
row vector [ ] un. 行向量;单行矩阵;行矢量 [网络] 列向量;列矢量;列向量使用
salient feature [ ] un. 特征 [网络] 特点;鲜明特征
salient features [ ] na. 特点 [网络] 特征;特色;突出特点
small perturbation [ ] 小微扰
test validation [ ] [网络] 测验效度
the algorithm [ ] [网络] 算法
the ax [ ] 斧子
the downside [ ] [网络] 不利方面;缺点
the equilibrium [ ] [网络] 平衡;那种平静
the FA [ ] [网络] 英格兰足总;英国足球协会;英国足总
the vice [ ] [网络] 罪恶谷
time derivative [ ] 时间导数,时间微商
to compute [ ] [网络] 计算;用计算机计算
to download [ ] 下载
to forge [ ] [网络] 煅炼;稳步前进;假造
to propagate [ ] [网络] 传播;传种;推展
to skip [ ] [网络] 略过;跳越;跳过
to update [ ] [网络] 更新;重要更新公告;每月更新
vice versa [ˌvaɪs ˈvɜ:sə] adv. 反之亦然;反过来也一样 [网络] 小爸爸大儿子;反过来亦然;反过来的
web app [ ] [网络] 网页应用;网络应用;应用程序
惯用语
does that make sense
does that makes sense
i don't know
i mean
if you
in fact
let's say
one question
plus 0
so yeah
you know
you're fooling the network
单词释义末尾数字为词频顺序
zk/中考 gk/中考 ky/考研 cet4/四级 cet6/六级 ielts/雅思 toefl/托福 gre/GRE
* 词汇量测试建议用 testyourvocab.com
