07
文本
████ 重点词汇
████ 难点词汇
████ 生僻词
████ 词组 & 惯用语
[学习本文需要基础词汇量:
[本次分析采用基础词汇量:
Hi everyone. Uh, welcome to lecture number seven.
Um, so, up to now, uh,
I believe, can you hear me in the back? Is it easy?
Okay. So, in the last set of module that you've seen,
you've learned about and how they
can be applied to imaging, notably.
Uh, you've played with different types of layers including pooling,
max pooling, average pooling, and
You've also seen some classification, uh,
with the most classic
uh, all the way up to
Uh, and then you jumped into advanced application like
uh, and the Fast R-CNN,
Faster R-CNN series with an optional video.
And finally, uh, face recognition and
So, today, we are going to build on top of everything you've seen in this set of modules,
to try to
Because you, you, you notice after seeing, uh,
the set of modules up to now that a lot of, uh,
improvements of the
So, we try something, uh,
we do
sometimes the model improves, sometimes it doesn't.
We use a
of methods that would make our model improve.
It's not satisfactory from a scientific
so people are also searching how can we find, uh,
an effective way to improve our
not only with trial and error,
but with theory that goes into the network and
So, today, we will focus on that.
We first, uh, we'll see three methods,
and class
which are used to kind of understand what was the decision process of the network.
Given this output, how can we map back
the output decision on the input space to
see which part of the inputs were
And later on, we will
details into the network by looking at
what happens at an
at a layer level,
and at a network level with another set of methods,
We will spend some time on the
uh, it's a cool, it's a cool type of, uh,
more
Uh, if we have time, we'll go over a fun application called Deep Dream, um,
which is super cool visuals for some of you who know it.
Okay? Let's go.
Menti code is on the board,
if you guys need to, to sign up.
So, uh, as usual,
we'll go over some context,
trial information and small case studies,
so don't hesitate to participate.
So, you've built an animal
um, and you gave it to them.
It's, it's super good.
It's been trained, uh, on ImageNet plus some other data.
And what, what is a little worrying is that
the pet shop is a little reluctant to use your network,
because they don't understand the decision process of the model.
So, how can you quickly show that the model is actually looking at a specific animal,
let's say a cat, if I give it an input that is a cat.
We've seen that together,
one time, everybody remembers?
So, I'll go quickly. Uh, you have a network,
here is a dog given as an input to a CNN.
The CNN assuming the constraint is that there is one animal per image was trained with
a
And what we want is to take the
the score of dog and
to know which parts of the inputs were
So, it's the size of the input.
It's a matrix of numbers.
If the numbers are large in absolute value,
it means the
Okay? What do you think the score of dog is?
Is it the output probability or no?
What -
[NOISE]
Yeah?
Score of the dog?
It's the score of the dog, yeah.
But is it, uh, 0.85, that's what I mean?
[NOISE] No, there are actually formulas used
Yes. So, it's the,
it's the score that is pre-softmax.
It's the score that comes before the
So, as a reminder, here's a
So, you get as a vector, that is a set of scores that are not necessarily probabilities,
they are just scores between
You give them to the
the softmax, what it's going to do is that it's going to output
a vector where the sum of
all the probabilities in this vector are going to sum up to one.
Okay?
using the
we use the score of dog,
we will get a better representation here.
The reason is in order to maximize this number,
score of dog divided by the sum of the score of al - all animals,
or like maybe I,
I should write
by sum of
One way is to minimize the
So, you see, so maybe moving a certain
the general output of the network.
But it actually doesn't have an influence on the score of dog
one layer before. Does it make sense?
So, that's why we would use, uh,
the scores pre-softmax instead of
using the scores post-softmax that are the probabilities.
Okay. And what's fun is here you cannot see that,
the slides are online if you wanna - if you wanna look at it on your computers.
But you have some of the
the same positions as the dog is on the input image that are stronger.
So, we see some white
And this can be used to segment the - the dog probably.
So, you could use a simple
uh,
It doesn't work too - too well in practice,
so we have better methods to do
but this can be done as well.
So, this is what is called
and it's a common technique to quickly, uh,
So, here's another
Now you've built the animal
they're still a little scared,
but you wanna prove that the model is actually looking
at the input image at the right position.
You don't need to be quick but you have to be very precise.
[NOISE] Yeah?
So, going back from the last slide,
is the
No, the
Okay.
It's the values of the de - the
Oh, okay. So, it's like the gradient's at [
So, you - you take the score of dog,
you
it gives you a matrix that's exactly the same size as the x.
And you use - you use like a specific color scheme to see which
Perfect, thank you.
Okay. So, here we have our CNN.
The dog is forward
uh, probability score for the dog.
Now, you want a method that is more
precise than the previous one but not necessarily too fast.
And this one, we've talked about it a little bit, it's
So, the idea here is to put a gray square on the dog here.
And we
What we get is another probability distribution
that is probably similar to the one we had before,
because the gray square doesn't seem to impact too much of the image.
At -, uh, at least from a human perspective,
we still see a dog, right?
So, the score of dog might be high, 83 percent probably.
What we can say, is that we can build
a probability map corresponding to the class dog and ha - and we
will write down on this map how confident is
the network if the gray square is at a specific location.
So, for our first location,
it seems that the network is very confident,
so let's put a red square here.
Now, I'm going to move the gray square a little bit.
I'm shifting it just as we do for a
to send again this new image in the network.
It's going to give me
a new probability distribution output and the score of dog might change.
So, looking at the score of dog,
I'm going to say, okay,
the network is still very confident that there is a dog here, and I continue.
I shift it again, here same,
network's still very confident that there is a dog.
Now, I shift the, the, the square, um,
that the - the face of the dog is partially
Probability of dog will probably go down,
because the network cannot see one eye of the dog.
It's not confident that there's a dog anymore.
So, probably, the confidence of the network went down.
I'm going to put a, a square that is tending to be blue, and I continue.
I shift it again and here we don't see the dog face anymore.
So, probably the network might,
might classify this as a chair, right?
Because the chair is more obvious than the dog now.
So, I'm gonna put a blue square here and we're going to continue.
Here, we don't see the tail of the dog,
it's still fine, the network is pretty confident,
And what I will look at now is
this probability map which tells me roughly where the dog is.
So, here we used a pretty big filter compared to the size of the image.
The smaller the, sorry,
the pretty big gray square,
the smaller the gray square,
the more precise this probability map is going to be.
if you have time, if you can,
you can take your time with the pet shop to explain them, uh,
what's happening, you would do that. Yeah?
Would you ever, in an
We will see that in the next slide. That's correct.
So let's see more examples.
Here, we have three classes and these, these,
these images has been - have been generated by Matthew Zeiler and Rob
This paper,
is one of the
led the research in in
So, I'd advise you to take a look at it,
and we will refer to it a lot of time in this lecture.
So, now we have three examples.
One is a
which is this type of cute dog, a car wheel,
which is the true class of the second image,
and an
which is this type of dog here on the last image.
So, if you do the same thing as we did before that's what you would see.
So, just to clarify,
here we see a blue color.
It means when the gray square was positioned here or centered at this location,
the network was less confident that the true class was
the confidence of
because the confidence of tennis ball went up.
And another interesting thing to notice is on the last picture here.
You see that there is a,
a red color on the top left of the image.
And this is you exactly as what - as what you mentioned Adam is that,
when the square was on the face of the human,
the network was much more confident that the true
Because you removed a lot of
meaningful information for the network which has the face of the human.
And similarly, if you put the square on the dog,
the true class that the network was outputting was human probably, does make sense?
Okay. So, this is called
and it's the second method that you now have seen for
interpreting where the network looks at on an input.
So, let's move to class
So, I don't know if you remember,
but two weeks ago, Pranav when he
discussed the techniques that he has used in
he explained that you get a - he did a chest x-ray.
And he manages to,
to tell the doctor where the network is looking at when predicting
a certain disease based on his chest X-ray, right? You remember that?
So, this was done through class
and that's what we're going to see now.
So, one important thing to notice is that we discussed that
classification networks seem to have a very good
and we can see it with the two methods that we previously discussed.
that you've studied in this set of modules.
The YOLOv2
because classification has a lot of data,
a lot more than
Has been trained on classification,
builds a very good
and
Okay. And so the core idea of class
CNNs have a very good
even if they were trained only on image level labels.
So, we have this network.
There is a very classic network used for classification.
We give it a kid and a dog.
Uh, this class activation map is coming from MIT,
the MIT lab with Bolei
And forward
a dog through the network which has some
And at the end, you usually
and run it through several fully connected layer
which are going to play the role of the
and send it to a softmax,
and get the probability output.
Now, what we're going to do is that we are going to prove that
this CNN is
So, we're going to convert this same network in another network.
And the part which is going to change is only the last part.
that you lose all
You have a volume that has
although it's been going through some max pooling,
so it's been down sampled and you lost some part of the
run it through a fully connected layer, and then it's over.
You - it-s, it's super hard to find out where
the activation was corresponds to on the input space.
So, instead of using
we're going to use global average pooling.
We're going to explain what it is.
A fully connected softmax layer and get the probability output.
And we're going to show that now this network can
be trained very quickly because we just need to train one layer,
the fully connected here,
and can show where the network looks at.
The same as the previous network.
So, let's talk about it more in detail.
Assume this was the last
and it outputs a volume,
a volume that is sized to
So, six filters were used in the last
And so we have six feature maps now.
I'm going to convert this using
a
What is global average pooling?
It's just taking these feature maps.
Each of them averaging them into one number.
So, now instead of having a four by four by six volume,
I have a one by one by six volume,
but we can call it a vector.
So, what's interesting is that this number,
actually holds the information of the whole feature map that
came before in, one number being averaged over it.
I'm going to put these in a vector,
and I'm going to call them
As usual a_1, a_2,
a_3, a_4, a_5, a_6.
As I said, I'm going to train a fully-connected layer here with the softmax activation,
and the outputs are going to be the probabilities.
So, what is interesting about that?
It's that the feature maps here as you know will contain some visual patterns.
So, if I look at the first feature map,
I can plot it here,
so these are the values.
And of course, this one is much more
It's not a four by four it's much more numbers.
But this - you can say that this is the feature map,
and it seems that the
There was a visual pattern in the inputs that activated the feature map,
and the filters which generated this feature map here in this location.
Same for the second one, there's probably two objects or
two patterns that activated the filters that generated this feature map,
So we have six of those.
And after I've trained my fully connected layers here - my fully connected layer,
I look at the score of dog.
Score of dog is 91 percent.
What I can do is to know this 91 percent,
how much did it come from these feature maps?
And how can I know it?
It's because now I have a direct mapping using the weights.
I know that the weight number one here,
this edge you see it,
is how much the score was dependent on the orange feature map?
if you look at the green edge,
is the weights that has multiplied
this feature map to give birth to the outputs of a dog.
So, this weight is telling me how much this feature map the green one
has influence on the output.
So, now what I can do is to sum all of these,
a weighted sum of all these feature maps.
And if I just do this weighted sum,
I will get another feature map.
Something like that. And you notice that,
this one seems to be highly influenced by the green one,
the green feature map, yeah.
It means probably the weight here was higher.
It probably means that the second feature of
the last
Okay. And then, once I get this feature map,
this feature map is not the size of the input image, right?
It's the size of the height and width of the output of the last CONV.
So, the only thing I'm going to do is,
I am going to up sample it back simply,
so that it fits the size of the input image,
and I'm going
The reason it's called class activation map is because
this feature map is dependent on the class you're talking about.
If I was using, uh,
let's say I was using car here,
if I was using car,
the weights would have been different, right?
Look at the edges that connect
the first activation to the activation of the previous layer.
These weights are different. So, if I sum
all of these feature maps I'm going to get something else.
And what you can notice is,
probably if I look at the class of human the weights number one might be very high,
because it seems that this visual pattern that
activated the first feature map was the face of the kid.
Okay. So, what is super cool is that you can get your network,
and just change the last few layers into
global average pooling plus a softmax fully connected layer.
And you can do that, and
It requires a small fine tuning.
Yeah.
So are these like
It's a different vocabulary,
I would use saliency maps for the
the
Uh, it's not the
it's just an up sampling to the,
to the input space based on the feature maps of the last CONV layer.
So it's mostly just examining the weights and sort of doing like
on them, not so much that different from
Yes.
Good [NOISE].
Any other questions on class activation maps? Yes.
Does taking the average not kill the
Yeah. That's a good question. So, taking
the average, does it kill the
So, let me, let me write down the formula here.
This is the score that we're interested in,
let's say dog plus C. What you could say is that this score
is a sum of K equals one to six of WK,
which is the, the weight that,
that connects the output activation to the previous layer,
times what's times A of the previous layer.
Uh, let's say we, we,
we use a
and IJ is the location and I sum that over the locations.
Can you see in the back? Roughly? So, what I'm saying is that here,
I have my global average pooling that happened
here and I can divide it by the certain number,
so divided by 16, four by four.
Okay. I can switch the two sums,
so I can say that this thing is a sum over IJ
the locations times sum over K equals one to six of what,
WK times AK, so the
the
Does it make sense? Does this makes sense?
So I, I still have the,
the, the location, I still moved,
I still moved the sum around and what I could do is to say that this thing is
the score in location IJ of the class activation map,
is a class score for this location IJ and I'm summing it over all locations.
So, just by flipping what the average pooling was doing over the locations,
I can say that my weighting, using my weights,
all the activation in a specific location for all the feature maps,
I can get the score of this position in regards to the final output.
[NOISE] The reason we're not losing it is because we know,
we know what the feature maps are.
Right. We know what they are and we know that they've been averaged exactly,
so we exactly can map it back.
Were you giving only one way to each [
Yeah. Because we, we assume that each filter that
generated these feature maps detects one, one specific thing.
So, like if, if this is the feature map it means assuming the filter was detecting dog,
that we're going to see just,
just something here meaning that there is a dog here and if there was a dog on
the lower part of the image we would also
have strong
I, I say if you wanna see more of the map behind it,
check the paper, but this is the
You can flip the
the global average pooling and show that you keep the spatial information.
The thing is you do the global average pooling,
but you don't lose the feature maps because you know where they
were from the output of the count, right?
So, you're not, you're not deleting this information.
So, the
the activation is K divided by 16 is instead of taking the average, right, for that [
Yeah.
[NOISE] Okay, let's move on and watch a cool video on how act - class activation maps work.
This video was from
And it's, uh, it's live so it's very quick.
So, you can see that the network is looking at this speed boat.
Okay.
seen are methods that are roughly mapping back the output to
the input space and helping us
most
Now we're going to try to
in the
the network see our world, not necessarily related to a specific input, but in general.
Okay. So, the pet shop now trusts your model
because you - you've used
class activation maps to show that the model is looking at the right place,
uh, but they got a little scared when you did that.
And they asked you to explain what the model thinks a dog is.
So, you have this trained
and you have an output probability.
Yeah, let me take one in the back. Yeah.
Um, what are some good ways to
Non-image data that's a, that's a good question.
It's actually, so the reason we're seeing images was
um, if you look at let's say time series data.
So, either speech or natural language,
the main way to realize those is, uh,
with the attention method,
uh, are you familiar with that?
So, in the next set of modules that you're going to
start this week and you will just study
in the next two weeks you will see a
which will tell you which part of a sentence was important,
let's say to output a number like assuming you're doing
You know some languages,
they don't have a direct one to one mapping.
It means I might say,
uh, I love cats,
but in another language maybe [NOISE] this same sentence
will be cats I love or something near that, its fit.
And you want an [NOISE] attention model to
show you that the cat was referring to the second.
I think it's, it's, it's okay.
Okay, sorry guys [NOISE].
[NOISE] So, going back to the presentation.
Now, we're going to
And so the new thing is the pet shop
is a little scared and asked you to explain what the network thinks a dog is.
What's the representation of dog for the network?
So, here, we're going to use a method that we've
already seen together called
which is defining an objective,
that is technically the score of the dog,
minus a
What the
is it's - it's saying that x should look natural.
It's not necessarily L2
it can be something else,
and we - we will discuss it in the next slide,
but don't think about it right now.
What we will do is we will
the back-propagation of this objective function all the way back to the input,
and perform
So, it's an
takes longer than the class activation map.
And we repeat the process, forward
and update the pixels and so on.
You guys are familiar with that?
So let's see what - what we can
So, actually, if you take an image net - classification network,
and you perform this on the classes of goose or
looking at or what the network thinks that
So, for the
somehow, but these are - are still quite hard to interpret.
It's not super easy to see and even worse here on the screen,
better on your computers.
But you can see
you can see orange color for
It means that pushing the pixels to an orange color would
actually lead to a higher score of the
If you use a better
you might get better pictures.
So, this is for
this is for
So, a few things that are interesting to see,
is that in order to maximize the score of
what the network
It means that's 10
a higher score of the class
Talking about
It says that for
we don't want to have extreme values of pixel.
It doesn't help much to have one pixel with an extreme value,
one pixel with a low value and so on.
So, we're going to
all the pixels so that all the values are around each other,
and then we can re-scale it between zero and 20 - 255 if you want.
One thing to notice is that the
constrain the inputs to be between zero and 255.
You can go to
while an image is stored with numbers between zero and 255,
so you might want
This is another type of regularization.
One thing that led to beautiful pictures was what Jason Yosinski and his team did is,
they forward
updated the pixels, and
Because what - what is not useful for
is if you have high frequency variation between pixels,
it doesn't have to visualize,
if you have many pixels close to each other that have many different values.
Instead, you want to have a smooth transition among pixels,
and this is another type of regularization called
Okay? So, this method actually makes a lot of sense in - in - in scientific terms.
You're - you're maximizing an objective function that
gives you what the network sees as flamingo,
which would maximize the score of flamingo.
So, we call it also class model
So, does a more realistic class model,
Um, does a more realistic class model
So, it's hard to map the accuracy of the model based on this visualization,
but it's a good way to
Yeah. We're going to - to see more of this later.
I think the most interesting part is actually on this slide is,
we - we did it for the class score,
but we could have done it with any activation.
So, let's say I stop in the middle of the network,
and I define my objective function to be this activation.
I'm going to back
It will tell me what is this activation.
What does this activation fire for?
So, that's even more interesting I think than looking at
the inputs and then the output. Does that make sense?
That we could do it on any activation?
Yep.
[NOISE] Any questions on that? [NOISE]
Okay. So, now, we're going to do another trick which is data-set search.
It's actually one of the most useful, I think.
Not fast, but very useful.
So, the pet shop loved the previous technique,
and asks if there are other alternatives to - to
show what - what an activation in the middle of a network is thinking.
You take an image, forward
Now, what you're going to do is select a feature map,
let's say this one, where at this layer,
and the feature map is of size five by five by 256.
It means that the CONV layer here had 256 filters, right?
You are going to look at these feature maps and select probably,
uh, yeah, what you're going to do is select one of the feature maps, okay?
We select one out of 256 feature maps,
and we're going to learn - run a lot of data,
forward propagate it through the network,
and look which data points have had the maximum activation of this feature map.
So, let's say we do it with the first feature map.
We notice that these are the top five images that really fired this feature map,
like high
What it tells us, is that probably this feature map is
detecting shirts. Could do the same thing,
let's say we take the second feature map,
and we look which data points have maximized the activations of this feature map,
out of a lot of data.
And we see that this is what we got,
the top five images.
Probably means that the other feature map seems to be activated when seeing edges.
So, the second one is much more likely to
appear earlier in the network obviously than later.
So, one thing that you may ask is,
do these images seem cropped?
Like I don't think that this was an image in the data-set,
it's probably a
What do you think this crop corresponds to?
[NOISE]
Any idea how we cropped
the image, and why these are cropped?
[NOISE] Like, why - why didn't I show you the full images?
How was I able to show you the cropped?
[NOISE].
[
That's correct. So, let's say we pick an activation,
an activation in the network.
This activation for a
Right? Doesn't see it.
What it sees is a
Does that make sense? So, let's look at another slide.
Here, we have a picture of units,
64 by 64 by 3.
It's our input. We run it through a five-layer ConvNet.
in height and width, but bigger in depth.
If I tell you what this activation is seeing.
If you map it back, you look at the stride and the filter size you've used,
you could say that this is the part that this filter is seeing.
This - this -, uh, this activation is seeing.
It means the pixel that was up there had no influence on this activation,
and it makes sense when you think of it.
You're - you're - the - the easiest way to think about it is looking at the - the top picks,
the - the - the top entry on the
You have the input image, you put a filter here.
This filter gives you one number, right?
This number, this activation only depends on this part of the image,
but then if you add a
it will take more filters.
the more part of the image the activation will see.
So, if you look at an activation in layer 10,
it will see much - a much larger part of the input
than an activation in layer one.
So, that's why - that's why probably the pictures that I showed here,
these ones are very small part
small crops of the image,
which means the activation I was talking about here is probably earlier in the network.
It sees a much smaller part of the input.
[inaudible] [NOISE]
Yeah, yeah. So, what you look at it which activation was maximum.
You look at this one and then you match this one back to crop. Does that make sense?
Okay, so here's units again,
up and same, this one would correspond more in the center of the image.
This
Okay cool. So, let's talk about
This is gonna be the hardest part of the lecture,
but probably helping with - with more
That was the
And we said that giving a code to the generator,
the generator is able to output an image.
So, there is something happening here that we didn't talk about.
Is how can we start with a 100
output a 64 by 64 by 3 image? That seems weird.
We could use, you might say,
a fully connected layer with a lot of
another one is to use a
So,
in a smaller volume in height and width deeper in - in depth,
while the deconvolution will do the reverse.
It will up-sample the height and width of an image.
So, that would be useful in this case.
Another case where it would be useful is
You remember our case studies, uh,
for
Give it to a
It's going
So, it's going to lower the height and width.
The interesting thing about this
is that it holds a lot of meaningful information.
But what we want ultimately,
is to get a
and the
So we need a deconvolution network to up-sample it.
So, deconvolution are used in these cases.
Today the case we're going to talk about is visualization.
Remember the gradient
We define an objective function by choosing an activation in the middle of the network,
and we want the objective to be equal to this activation to find
the input image that maximizes its activation through an
Now, we don't want to use an
We want to use a reconstruction of this activation
directly in the input space by one
So, let's say I select this feature map out of
255, sorry, 5 by 5 by 256.
What I'm going to do is,
I'm going to identify
It's this one, third column second row.
I'm going to set all the others to zero.
Just this one I keep it,
because it seems that this one has detected something.
Don't wanna talk about the others.
I'm going to try to
So, I'm going
the reverse
I will unpool, I will un-ReLU,
let's say, doesn't
But un-ReLU and deconv.
And I will do it several times because this activation went through several of them.
So I will do it again and again until I see, oh,
this specific activation that I selected in
the feature map fired because it saw the ears of the dog.
And as you see, this image is cropped again.
It's not the entire image,
it's just the part that the activation has seen.
And if you look at where the activation is located on the feature map,
it makes sense that this is the part that corresponds to it.
We are going to
what do we mean by Un-reLU,
and what do we mean by de-conv.
Okay. Yes.
So, if we had [inaudible].
Would we have just gotten a reconstruction of the whole image?
So, the difference is, you mean if we don't
It shows that this reconstruction would
It would be more
Doesn't, doesn't necessarily mean you will not get the full image,
because probably the other activations probably didn't even fire,
means they didn't detect anything else.
It's just that it's gonna - it's gonna add some noise to this reconstruction.
Okay, so let's talk about deconvolution a little bit on the board.
[NOISE] So, to start with deconvolution,
and you, you guys can take notes if you want.
We are going to spend about 20 minutes on the board now to discuss deconvolution, okay?
[NOISE] To understand the deconvolution,
we first need to understand deconvolution.
We've seen it, uh, from a
computer science perspective, but actually,
what we are going to do here is we are going to frame
deconvolution as a
You're going to see that it's actually possible.
So let's start with a 1D conv.
For the 1D convolution,
I will take an input x which is of size 12,
x1, x2, x3, x4,
x5, x6, x7, x8.
So, 8 plus 2 padding,
which gives me the 12 that I mentioned.
So, the input is a one-dimensional vector which has padding of two on both sides.
I will give it to a layer that will be a 1D conv.
And this layer would have only one filter.
And the filter size will be four.
We will also use a stride equal to two.
[NOISE] So, my first question is,
what's the size of the output?
Can you guys
and tell me what's the size of the output.
[NOISE]. Input size 12,
[NOISE] filter of size four,
stride of two, padding of two.
Five, yeah I heard you, yeah.
So, remember use
So, what I'm going to get is Y1,
Y2, Y3, Y4, Y5.
[NOISE] So, I'm going to focus on this specific convolution for now.
And I'm going to show now that we can define it as,
as a
So, the way to do it is,
I guess the easiest way is to write the system of equation
that is underlying here. What is Y1?
Y1 is the filter applied to the four first values here. This makes sense?
So, if I define my filter as being y W1,
W2, W3, and W4,
what I'm gonna get is that Y1 equals W1
zero plus W3 times x1 plus W4 times x2.
This makes sense? Just the convolution,
Y2 is going to be same thing,
but with a stride of two, going two down.
So, it's going to give me W1 times x1 plus W2 times
x2 plus W3 times x3 plus W4 times x4.
Correct? Everybody is following?
No.
We will do it for all the y's until Y5,
and we know that Y5 is
the filter and the four last number here, summing them.
So, it will give me W1 times x7 plus W2 times
x8
[NOISE]
Okay. Now what we're going to do is to try to write down
y as
We need to find what this w matrix is.
And looking at this system of equation,
it seems that it's not impossible. So let's try to do it.
I will write my Y vector here, Y_1,
Y_2, Y_3, Y_4, Y_5.
And I will write my matrix here and my vector x here.
So first question is,
what do you think will be the shape of
this w matrix? Um?
5 by 12.
5 by 12. Correct. We know that this is 5 by 1,
this is 12 by 1,
so of course w is going to be 5 by 12.
Right?
So, now, let's try to fill it in 0,
0, x_1, x_2, x_3,
Can you guys see in the back or no?
Yeah? Okay. Cool. Ah, so,
I'm going to fill in this matrix regarding this system of equation.
I know that the Y1 would be w_1 times 0,
w_2 times 0, w_3 times x_1, w_4 times x_2.
So this vector is going to multiply the first row here.
So I just have to place my ws here.
w_1 will come here, multiply 0,
w_2 will come here, w_3 would come here,
and w_4 would come here.
And all the rest would be filled in with 0s, right?
I don't want any more
How about the second row of this matrix?
I know that Y_2 has to be equal to this
And I know that it's going to give me w_1x_1 plus w_2x_2 plus w_3x_3.
x_1 is the third input on this vector, third - third entry.
So, I would need to shift what I had in
the previous row with a stride of two, it will give me that.
I should get the second equation up there.
And so on and you understand what happens, right?
This pattern will just shift with the stride of two on the side.
So, I would get zeros here and I will get my w_1,
w_2, w_3, w_4 and then zeros.
And all the way down here.
And all the way down here,
what I will get is w_4,
w_3, w_2, w_1 and zeros.
So the only thing I wanna mention here
is that the
framed as a
So why did you have
in the left side, that's when multiplying the - [NOISE]
For the top row, why the zeros are on the right side?
Yes.
Because I don't want Y hat - Y_1 to be dependent on x_3 to x_8.
So I want these to be zero
Okay. Oh, because of the stride and the window size.
Okay.
Thank you.
So why is this important for
the intuition behind the deconvolution and the existence of the deconvolution?
It's because if we manage to write down y equal
we probably can write down x equal w
y if w is an
to be our deconvolution.
what's the shape of this new matrix?
Yes.
We have 12 by 1 on one side,
5 by 1 on the other, it has to be 12 by 5. So it's flipped compared to
w. So one thing we're going to do here is we're going to make an assumption.
First assumption is that w is an
And on top of that, we're going to make a stronger assumption which
is that w is an
And without going into the details here,
same as when we proved
we made some assumptions that are not always true.
This assumption is not going to be always true.
One, one intuition that you can have is,
if I'm using a filter that is,
ah, assume the filter is an
So like, ah,
zero, zero,
In this case,
Why? A matrix that is
I dot-product them together, it should give me zero.
Same with the rows, you can see it.
So, what's interesting is that, ah,
if the stride was four,
there will be no
It would give me an
Here a stride is two but if I replace this w_1 by
zero, zero,
so plus one, zero, zero,
you can see that the
The zeros will multiply the ones and the ones
will multiply the zeros, it gives me a zero
So, this is a case where it works.
Practices doesn't always work.
The reason we're making this assumption is because we wanna make a reconstruction, right?
So, we wanna be able to have this w minus one, this,
this, this
But at, at first-order
we can assume that
even if this assumption is not always true.
In the case where w is
I know that the
Or another way to write it,
is that for
w
So, what it tells me is that x is going to be w
So, let's see what we get from that.
Let me write down the Menti code.
So, let's say now we have our x and we wanna
or we will have our y and we want to generate our x using this method.
So, I would, what I would write is to understand the 1D deconv.
We can use the following illustrations,
where we have x here,
which is zero, zero, x_1,
x_2, x_3, all the way down to x_8.
Okay? And I will have my w matrix here,
w
Y_1, Y_2, Y_3, Y_4, and Y_5 here.
And so, I know that this matrix will be the
So, I can just write down the transpose.
The transpose will be w_1, w_2, w_3, w_4.
Okay? I will shift it down with a stride
of two and so on.
[NOISE]
And this whole thing will be W Transpose.
So, th - the small issue here is that this in
practice is not - is going to be very similar to a convolution,
but because, uh, but it's going to be a tiny little different interval of implementation.
Another question I might ask is,
how can we do the same thing with the same pattern as we have here?
It means the stride is going from left to right,
instead of going from up to down.
I'm going to introduce that with a technique called sub-pixel convolution.
And for those of you who read papers and segmentation in visualization,
So, let's see how it works.
I just wanna do the same operation,
but instead of doing it with a strike going from up to down,
I want to do it from a strike going from left to right.
O - one, one thing you wanna,
you wanna notice here,
is that, uh, the two lines that I wrote here are cropped.
And the reason is because we're using a padded input.
Here, we will just crop the two top lines.
And same for the two last lines.
They will be cropped. Look at that.
W1 will multiply Y1,
and this one will multiply Y2 and so on.
So, this
but I don't want that to happen because I wanna get the padded zero here.
So, I will just crop that.
In this matrix it's actually going to be smaller than it seems,
and is going to generate my X1 through X8 and then I will
pad the top values and the bottom values.
Okay, just the height.
So, let's look at the sub-pixel convolution. I have my input.
And I will do something quite fun.
I will perform a sub-pixel operation on Y. What does it mean?
I will insert zeros almost everywhere.
I will insert them, and I will get 0,
0, Y1, 0, Y2,
0, Y3, 0, Y4,
0, Y5 and 0, 0.
Even more, one more 0 here, one more 0 here.
So, this vector is just the vector Y with
some zeros inserted around it and also in the middle between the elements of Y.
Now, why is that interesting?
It's interesting because I can now write down my convolution by flipping my weight.
[NOISE]
So, let me explain a little bit what happened here.
What we wanted is,
in order to be able to efficiently compute the deconvolution
the same way as we've learned
We wanted to have the weights
scattered from left to right with a stride moving from left to right.
What we did, is that we used a sub-pixel version of Y by inserting
and we divided the stride by two.
So, instead of having a stride of two as we had in our convolution,
we have a stride of one in our deconvolution.
So, notice that I shift my weights from one at every step,
when I move from one row to another.
Second thing is, I flipped my weights.
I flipped my weights. So, instead of having W1, W2,
W3, W4, now I have W4, W3, W2, W1.
And what you could see is looking at that,
first, look at this row,
the first row that is not cropped.
The result of the dot product of this row with this vector is going to be Y1 times W3,
plus Y2 times W1.
Yeah? Now, let's look what happened here.
I look at my first row here,
the dot product of this first row with my Y here is going to be - sorry,
sorry, we - these two are cropped as well.
And same here. So, looking at my first non-cropped row
here as
plus W2 - sorry, plus W1 times Y2.
So, exactly the same thing as I got there.
So, these two operations are exactly the same operations. They're the same thing.
You get the same results two different way of doing it.
One, is using a weird operation with strides going from top to bottom.
And the second one is exactly a convolution. This is a convolution.
Convolution plus flipped weights,
And on top of that,
padding here and there.
So, this was the hardest part.
Okay? Does it give you more intuition on the convolution here?
You know now how convolution can be framed as
a
And you know also that under these assumptions,
the way we will
dividing the stride by two, and inserting zeros.
If we just do that, we're
For
the following way you wanna
just
insert zeros sub-pixel, and finally divide the stride.
And that's the de-convolution.
So, super complex thing to understand but this is the intuition behind it.
Now, let, let's try to have an intuition of how it would work in two-dimension.
Uh, let me write it down.
The sub-pixel convolution, we already have that [inaudible] [NOISE]
Why do we use that?
Yeah.
Because in terms of implementation this is the same as what we've been using here.
It's, it's very similar,
while this one is another implementation.
So, you could do both the same,
is the same operation.
But in practice this one is easier to understand because it,
it's exactly the same operation of the convolution,
with flipped weights,
That's why I wanted to show that. Yeah.
So, uh, what, what happens when,
uh, the assumption [OVERLAPPING].
When
Yeah.
So,
but what we want is to be able to see a reconstruction.
And if we use this method we will still see a reconstruction.
Practice if we had really W minus one,
So, uh, let me go over the 2D,
uh, the 2D example.
We are going to go a little over time because we have
two hours technically for - one hour and 50 minutes,
and uh, and let me go over the 2D example.
And then we will answer this question on why we need to make this assumption.
So, here is the interpretation of the 2D deconvolution.
Let me write it down here.
[NOISE]
The intuition behind the 2D deconv is, I get my inputs.
Which is five by five,
and this I call it x. I forward propagate it using a filter of size two-by-two,
in a conv layer,
and a stride of two.
This is my convolution. What I get.
So, if you do five minus two,
plus the padding which is zero,
divided by two,
I forgot the plus one actually here,
plus one and you floor it.
So - so, five minus two divided by two gives you,
uh, three divided by two plus one.
Um, no actually it will give you three by three,
yeah, three by three.
A y of three by three. That's what you get.
What you're going to do here,
is you're going to
In order to
in order to
you're going to use a stride of one.
And what we said is that we need to divide this stride by two, right?
So, we need a stride of one,
and the filter will be the same, two-by-two.
And you remember that what we've seen,
is that the filter is the same.
It's just that it's going to be flipped.
So, you will use a filter of two-by-two, but flipped.
We hope to get a five-by-five input,
which is going to be our
And the way we're going to do it,
is this is the intuition behind it. Yeah.
Is it up two by two? [NOISE].
Five minus two divided by two. Yeah, it's two by two.
Okay. Up two by two.
Thanks . [OVERLAPPING]. Two by two.
Five-by-five here.
That's what we hope to
The way we will do it, is we will take the filter,
s is two by two.
We will put it here.
And we will multiply all the weights of this filter by y11.
All the weights will be
So, I will get four values here,
which are going to be w4 y111,
w3 y111 and so on.
Now, I will shift this with a stride of one.
And I will put my filter again here.
And I will multiply all the entries by y12 and so on.
And you see that this entry has an
So, it will, it will be updated at every step of the convolution.
It's not like what happened in the forward pass.
So, this is the intuition behind the 2D convolution.
3D,
uh, a volume here.
So, your filter is going to be a volume.
What you're going to do is you're going to put the volume here,
And then if you have a second filter,
you would put it again on top of it and multiply
by y11 all the weights of the filter and so on.
It's a little complicated,
but this is the intuition behind deconvolution.
Okay, let's get back to the lecture.
I'm going to take one question here if you guys need
[NOISE] Don't worry if you don't understand deconvolution truly.
The important part is that you get the intuition here and you understand how we do it.
So, let me make a comment.
[NOISE] Why do we need to make this assumption and do we need to make it?
[NOISE] When we want to
we need to make this assumption because we don't want
to
What we know is that the activation we selected here on
the feature map is - has gone through the entire pipeline of the ConvNet.
So, to
We need to pass them to the deconvolution and
If we're doing the segmentation,
like we talked about for the live cell
we don't need to do this assumption.
We're just saying that this is a procedure that is a deconvolution,
and we will train the weights of the deconvolution.
So, there is no need to make this assumption,
it's just we have a technique that is dividing the stride by
one and inserting zeros and then beam,
we
that is an
So, there's two use cases.
One where you use the weights and one where you don't.
In this case, we don't want to
we wanna use the weights. So let's see.
Let's see a - a version more visual of the
So, we do the sub-pixel image.
This is my image, four by four,
I insert zeros and I pad it,
I get a nine by nine image.
I have my filter like that.
And this filter will
I will - it will
so I will place it on my input,
and at every step I will perform a convolution up.
I will get a value here.
The value is blue because as you can see the weights that
affected the output were only the blue weights.
I would use a stride of one beam.
Now, the weights that affect my input are the green ones and so on.
And I would just
And now one step down.
I see that the weights that are impacting my input are
So, I would put a purple square here and so on.
So, I just do the convolution like that.
that the values that are blue in my out six by six output,
were generated only using the blue values of the filter,
the blue weights in the filter.
The ones that are green were only
used-were only generated using the green values of my filter.
So, actually this
or deconvolution could have been done with four
with the blue weights, green weights,
purple weights and yellow weights.
And then, just - just replace such that the adjustments would be the output.
Just put the output of each of these conv and mix them to give out a six by six output.
Only thing you need to know we have an input four by four
and we get an output six by six. That's what we wanted.
We wanted to
We can
So, let's see what happens now.
We understood what, uh,
what deconv was doing.
So, we're able to deconv.
What we need to do is also to unpool and to unReLU.
Fortunately, it's easier than the deconv.
So, we're not gonna do board work anymore.
So, let's see how unpool works.
If I give you this, uh,
inputs to the pooling - to
The output is obviously going to be this one,
42 is the maximum of these four numbers.
Assuming we're using a two-by-two filter with stride of two,
six is the maximum of the red numbers and seven the - the orange ones.
Now, question. I give you back the outputs and I tell you, give me the input.
Can you give me the input or no?
No.
No, why - why? [NOISE] You only keep the maximum.
So, you - you lost all the other numbers.
I don't know anymore the zero,
one and minus one that were the red numbers here
because they didn't pass through the maximum.
So, max pool is not
from a
What we can do is
How can we do that? [NOISE].
Spread it out.
Spread it out. That's a good point, we could spread out the six among the four values.
That would be an
A better way if we manage to
is to
We
using a matrix that is very easy to score,
of zeros and ones.
And we pass it to the unpooling.
And now we can
because we know where 6 was,
we know where 12 was,
we know where 42 was and 7 was.
But it's still not
Think about maxpool
It's exactly the same thing.
These numbers 0, 1, - 1.
They had no impact on the loss function at the end,
because they didn't pass through the
So, actually with the switches you can have the exact
because they didn't affect the loss during the
That - that make sense?
Okay. So, this is maxpooling,
unpooling, unmaxpooling.
And we can use it with the switches. We can
Why not just
Yeah, why don't we just
We could - could cache the entire thing.
But in terms of back - for
of efficiency we will just use this switching because it's enough.
But not for unpooling though.
Yeah, yeah, for unpooling you're right, we could cache everything.
But then it's cheating, like you - you kept it, it's like, just give it back.
Okay.
So, what we need to do, in fact,
is to pass the switches and the filters back
to the unpooling deconv in order to reconstruct.
Switches are
and filters are the filters that I will transpose under this assumption on the board.
Okay. And so on and so on,
and I get my reconstruction.
I just need to explain the
I give you this input to
All the negative numbers are going to be
and the others are going to be kept.
Now, let's say I'm doing a
What do I get if I give you that?
This is the
and I'm asking you what are the
[NOISE] How does the ReLU behave in
[NOISE].
Zeros? [NOISE] Which ones are zero?
Um, the negative.
The negative are
The negatives in this yellow matrix are going to be zeros during the
Are you guys sure? [NOISE] Think always
about what was the influence of the input on
the loss function and you will find out what was the backpropagation.
Look at this number. This number here, - 2.
Did this number have,
the fact that it was - 2,
did it have any influence on the loss function?
No, it could have been - 10,
it could have been - 20.
It's not gonna impact the loss function.
So, what do you think should be the number here?
Zero.
Zero. Even if the number that is coming back,
the gradient is 10.
So, what do you think should be the ReLU backward output?
[NOISE]
Same idea as max-pooling.
What we need to do is to remember the switches.
Remember which of these values had an impact on the loss.
We pass the switches,
all these values here that are kind of a y, you know this is a y.
All these ones had no impact on the loss function.
So, when you
their gradient should be
doesn't matter
It's not gonna make the loss go down.
So, these are all zeros and the rest they just pass.
Why do they pass with the same value?
Because ReLU for positive numbers was 1.
So, this number 1 here that passed the ReLU during the
it was not modified.
Its gradient is going to be 1.
Now, in this
we're not going to use ReLU backward.
We're going to use something we call ReLU DeconvNet let's say.
The reason we're not, the intuition between why we're not
using ReLU backward is because what we're interested
in is to know which pixels of the input positively affected the,
the activation that we're talking of.
So, what we're going to do is that we're just going to do a ReLU.
We're just going to do a ReLU backward.
Another reason is when we reconstruct,
we wanna have the minimum influence from the
because we don't really want our reconstruction to depend on the
We would like our reconstruction to be
just look at this activation, reconstruct what happened.
So, that's what you're going to use.
Again, this is a
and it's not going to be
Okay.
and find out what was this activation corresponds to.
It took time to understand it,
but it's super fast to do now,
It's just one pass, not
We could do it with every layer.
So, let's say we do it with the first block of conv, ReLU, maxpool.
I go here. I choose an activation.
I, I, I, I find the maximum activation.
I set all the others to 0.
I unpool, ReLU, deconv and I find out
This specific activation was looking at edges like that.
So, let's
what's happening inside the network.
So, all the visualization we're going to see now can be found in
Matthew Zeiler's and Rob Fergus'
paper
I'm going to explain what they correspond to, but check,
check out their papers if you want to understand more into detail.
So, what happens here is that on,
on the top left, you have nine pictures.
These are the cropped pictures of the data set that
activated the first filter of the first layer maximum.
So, we have a first filter on the first layer and we run
all the data sets and we recorded what are the main pictures that activate this filter.
These were the main ones. And we did the same thing for
all the filters of the first layer and there are nine times nine of them.
There are a lot of them, I think.
In the bottom here you have the filters,
which are the weights that were plotted.
Just take the filter, plot the weights.
This is th - this is important only for the first layer.
When you go deeper into your network,
the filter itself cannot be interpreted.
It's super hard to understand it.
Here, because the weights are directly multiplying the pixels,
the first layer weights can be
let's look at the third one,
the third filter here on the first row.
The third filter has weights that are kind of
like one of the
And in fact if you look at the
the feature map corresponding to this filter,
they're all like cropped images that correspond to
That's what happens. Now, the,
the deeper we go, the more fun we have.
So let's go. Results on a
What's happened here is they took 50,000 images,
they forward
They recorded which image is the maximum,
the one that's maximized the activation of
the feature map corresponding to the first filter of layer two,
second filter and so on for all the filters.
Let's look at one of them.
We can see that's, okay,
we have a circle on this one.
It means that this,
the filter
[NOISE] to this has been activated through probably a wheel or something like that.
So, that the image of the wheel was the one that maximized
the activation of this one and then we use the deconv method to reconstruct it.
Any questions on that? Yeah.
What if the
Good question, yeah. What if the
You would use the same,
the same type of method and you would try to
Okay, let's go a little deeper.
forward propagate all the images of the data set,
find the nine images that are
the maximum activate - that lead to the maximum activation of the first filter.
These are plotted on top here.
What you can see is like for this filter,
that is the sixth row first filter,
features are more
So, this filter actually was activated to many different types of circles,
it's still activated although the circles were different sized.
Can go even deeper up third layer.
What's interesting is that the deeper you go,
the more complexity you see.
So, at the beginning we were seeing only edges,
now we see much more complex figures.
You can see a face here,
in this - in this entry.
It means that this filter activated for when
it sees this - when it has seen a data point that had this face,
then we
cropped it on the face.
Uh, the face is kind of red,
it means that the more red it was,
the more activation it led to.
And same top nine for layer three.
So, these are the nine images that actually led to the face.
These are the nine images that maximize the, the,
the activation of the feature map corresponding to that filter and so on.
So, here is a very funny.
[inaudible] [NOISE].
Can you stand up? [NOISE].
And realization layers,
we can switch back and forth between showing
the actual activations and showing images
So, he's - he's giving his own image to the network right now.
By the time we get to the fifth convolutional layer,
the features being
So, these are the
For example, this
We can further investigate this
First, we can
using new regularization techniques that are described in [OVERLAPPING].
Our paper, the one we talked about.
These
[OVERLAPPING] It also taught the images from the training set to activate this
as well as pixels from those images most responsible for
the high activations
And this is the deconvolutionary substance.
This feature responds to multiple faces in different locations.
And by looking at the deconv,
we can see that it would respond more strongly if we had even darker eyes and
We can also confirm that it cares about the head and shoulders,
but ignores the arms and
We can even see that it fires to some extent for cat faces.
Using back-prop or deconv,
we can see that this unit depends most strongly
on a couple of units in the previous layer conv4,
and about a dozen or so in conv3.
So they're trying to track back track where - which
So, let's look at another
So, what is this unit doing?
From the top nine images,
we may conclude that it fires for different types of clothing,
but examining the
detecting not clothing
In the live plot, we can see that it's activated by my shirt and
smoothing out half of my shirt causes that half of the activations to decrease.
Finally, here's another interesting neuron.
This one has learned to look for printed text in a variety of sizes, colors, and
This is pretty cool because we never asked
the network to look for
The only labels we provided were at the very last layer.
So, the only reason the network learned features like texts and faces in
the middle was to support final decisions at that last layer.
For example, the text detector may provide good evidence that a
a book seen on edge and detecting many books
next to each other might be a good way of detecting a
which was one of the categories we trained the net to recognize.
In this video, we've shown some of the features of
the DeepViz
we've learned by using it. You can
Yeah, so they had
which is exactly what you visualize here,
and you could test the
takes time to - to get - get it to run,
but - but if you want to visualize all the
Okay. So, uh, let's go quickly.
We'll spend about three minutes on the optional Deep Dream one because it's fun.
And yeah, feel free - feel free to jump in and ask questions.
So, the Deep Dream one is, uh,
is implemented by
the page - the - the
The idea here is to generate parts using this knowledge of
visualization and how they do that is quite interesting.
They would take an input,
forward propagate it to the network and at
a specific layer that we call the - the green layer,
then pick activation and set the gradient to be equal to this activation.
The gradient at this layer and then we back
So, earlier what we did is that we defined a new objective function,
that was equal to an activation and we tried to maximize this objective function.
Here, they - they're doing it even stronger.
They take the activations and they set the
And so the stronger the activation,
the stronger it's going to become later on, and so on and so on,
So, they are trying to see what the network is
activating for and in - increase even this activation.
So, forward propagate the image,
set the gradient of the dreaming layer to be equal to its activation,
but back propagate all the way back to the inputs and update the pixel of the image.
Do that several time and every time the activations will change.
So, you have to set again the new activations to
be the - the - the gradients of the green layer and back propagate,
and ultimately, you will see things happening.
So, it's hard to see here on the screen,
but you would have a pig appearing here.
You'd have like a tree somewhere there, and some animals,
and a lot of animals are going to start appearing in this cloud.
It's interesting because it means,
let's say, you see this cloud here?
If the network thought that this cloud looked a little bit like a dog,
so one of the - the feature maps was - which
would be generated by the filter that detects dog would activate itself a little bit.
Because we set the gradient to be equal to the activation,
it's going to increase the appearance of the dog in the image and so on.
And then you will see a dog appearing after a few
So, it's quite fun and if you
So, you see a pig-snail,
it's kind of a pig with
Camel-bird, dog - dog-fish.
I'd advise you to like look at this on
the slides rather than on the screen, but it's quite fun.
And same, like if you give that type of image,
you would see that - because the network thought there was like a tower a little bit,
you will increase the network's confidence in the fact that there is
a tower by changing the - the image and the tower would come out.
And so on, it's quite cool.
Uh, yeah and if you're dreaming lower layers,
obviously you will see edges happening or patterns coming.
Because then the lower layers seem to detect an edge and then you will
increase its confidence on its edge so it will - it will create an edge on the image.
This is a fun video I have, Deep Dream on a video.
[MUSIC].
So, everything that the
[MUSIC] And what's funny
is that there is so many animals in the video.
And the reason is [MUSIC].
Gets too
[LAUGHTER] So, one - one insight that is fun about it is,
uh, if the network and this is not only for Deep Dream,
it's also for - it's mostly for gradient
Let's say we have an output score of a
and we define our objective function to be the
and we try to find the image that
maximizes a
What's interesting is that the network thinks that
the
Not only the dumbbell. And you can see it here, you see the hands.
And the reason is it has never seen a dumbbell alone.
So, probably in ImageNet there is no picture of a dumbbell
alone in a corner and labeled as dumbbell.
But instead, it's usually a human trying to push hard.
Okay. So, just
we are now able to answer all the following questions.
What part of the input is responsible for the output beam,
class activation maps seem to be the best way to go.
What is the role of a given neuron feature layer?
Deconvolve, reconstruct, search in a
What are the top images and do gradient
Check - can we check what the network focuses on?
How does the network see our world?
I would say
maybe Deep Dream is cool stuff.
And then what are the - the implication and - and use cases of these
Uh, you can use saliency maps to segment,
it's not very useful given the new methods we have.
But the deconvolution that we've seen together is
widely used for segmentation and reconstruction.
Also for generating
Uh, these
to detect if some of the
So, let's say you have a network and you use the
you see that whatever the input image you give,
some feature maps are always dark.
It means that the feature that generated
this feature map by
So, it's not being even trained.
That's a type of insight you can get.
Okay, thanks guys.
Sorry we went over time.
[NOISE]
知识点
重点词汇
detection [dɪˈtekʃn] n. 侦查,探测;发觉,发现;察觉 {cet4 cet6 gre :6133}
standpoint [ˈstændpɔɪnt] n. 立场;观点 {cet4 cet6 ky toefl :6206}
blurred [blɜ:d] v. 玷污;使…模糊不清;使感觉迟钝(blur的过去式和过去分词) adj. 模糊不清的;被弄污的 { :6364}
blurring [blɜ:rɪŋ] n. 模糊 adj. 模糊的 vi. 模糊(blur的现在分词) { :6364}
download [ˌdaʊnˈləʊd] vt. [计] 下载 {gk :6382}
synthetic [sɪnˈθetɪk] n. 合成物 adj. 综合的;合成的,人造的 {cet4 cet6 ky toefl ielts :6608}
messy [ˈmesi] adj. 凌乱的,散乱的;肮脏的,污秽的;麻烦的 {gk toefl :6651}
messier [ˈmesi:ə] adj. 肮脏的( messy的比较级 ); 混乱的; 难以处理的; 令人厌烦的 { :6651}
overlap [ˌəʊvəˈlæp] n. 重叠;重复 vi. 部分重叠;部分的同时发生 vt. 与…重叠;与…同时发生 {cet6 ky toefl ielts gre :6707}
inaudible [ɪnˈɔ:dəbl] adj. 听不见的;不可闻的 { :6808}
algorithm [ˈælgərɪðəm] n. [计][数] 算法,运算法则 { :6819}
algorithms [ˈælɡəriðəmz] n. [计][数] 算法;算法式(algorithm的复数) { :6819}
simplify [ˈsɪmplɪfaɪ] vt. 简化;使单纯;使简易 {gk cet4 cet6 ky ielts :7074}
torso [ˈtɔ:səʊ] n. 躯干;裸体躯干雕像;未完成的作品;残缺不全的东西 {gre :7317}
reconstruct [ˌri:kənˈstrʌkt] vt. 重建;改造;修复;重现 {toefl :7327}
reconstructed [ri:kən'strʌktɪd] adj. 重建的;改造的 v. 重建;改造(reconstruct的过去式) { :7327}
gradient [ˈgreɪdiənt] n. [数][物] 梯度;坡度;倾斜度 adj. 倾斜的;步行的 {cet6 toefl :7370}
gradients [ˈgreɪdi:ənts] n. 渐变,[数][物] 梯度(gradient复数形式) { :7370}
flattening ['flætnɪŋ] n. 整平;扁率;压扁作用 v. 压扁(flatten的ing形式) { :7436}
flattened ['flætnd] adj. 没精打采的;垂头丧气的 v. 平整;打倒(flatten的过去分词) { :7436}
flatten [ˈflætn] vt. 击败,摧毁;使……平坦 vi. 变平;变单调 n. (Flatten)人名;(德)弗拉滕 {cet6 gre :7436}
fonts ['fɒnts] n. 字体(font的复数) { :7448}
validate [ˈvælɪdeɪt] vt. 证实,验证;确认;使生效 {toefl gre :7516}
blog [blɒg] n. 博客;部落格;网络日志 { :7748}
wrinkles ['rɪŋklz] n. 皱纹;皱褶(wrinkle的复数形式) v. 起皱(wrinkle的第三人称单数形式) { :7819}
Et ['i:ti:] conj. (拉丁语)和(等于and) { :7820}
compute [kəmˈpju:t] n. 计算;估计;推断 vt. 计算;估算;用计算机计算 vi. 计算;估算;推断 {cet4 cet6 ky toefl ielts :7824}
computed [kəmˈpju:tid] v. 计算(compute的过去式) adj. 计算的((compute的过去分词) { :7824}
intuition [ˌɪntjuˈɪʃn] n. 直觉;直觉力;直觉的知识 {cet6 ky toefl ielts gre :7905}
spirals [ˈspaiərəlz] n. 螺旋(线)( spiral的名词复数 ); 螺旋式的上升(或下降) v. 盘旋上升(或下降)( spiral的第三人称单数 ); (物价等)不断急剧地上升(或下降) { :8028}
whirls [hwə:lz] v. (使)飞快移动,使旋转( whirl的第三人称单数 ) { :8035}
hound [haʊnd] n. 猎犬;卑劣的人 vt. 追猎;烦扰;激励 {cet6 ky ielts :8069}
ascents [əˈsents] n. 上升( ascent的名词复数 ); (身份、地位等的)提高; 上坡路; 攀登 { :8121}
ascent [əˈsent] n. 上升;上坡路;登高 {toefl ielts :8121}
hack [hæk] n. 砍,劈;出租马车 vt. 砍;出租 vi. 砍 n. (Hack)人名;(英、西、芬、阿拉伯、毛里求)哈克;(法)阿克 {gre :8227}
encode [ɪnˈkəʊd] vt. (将文字材料)译成密码;编码,编制成计算机语言 { :8299}
encoding [ɪn'kəʊdɪŋ] n. [计] 编码 v. [计] 编码(encode的ing形式) { :8299}
notation [nəʊˈteɪʃn] n. 符号;乐谱;注释;记号法 {cet6 toefl ielts :8312}
validation [ˌvælɪ'deɪʃn] n. 确认;批准;生效 { :8314}
zoom [zu:m] vi. 嗡嗡作响; 急速上升 n. 嗡嗡声; 隆隆声; (车辆等)疾驰的声音; 变焦 vt. 使急速上升; 使猛增 {gk ky :8608}
visualize [ˈvɪʒuəlaɪz] vt. 形象,形象化;想像,设想 vi. 显现 {cet6 ielts :8673}
visualized [vɪʒʊəˌlaɪzd] adj. 直观的;直视的 v. 使形象化;想像(visualize的过去分词) { :8673}
visualizing ['vɪzjʊəlaɪzɪŋ] n. 肉眼观察 { :8673}
downside [ˈdaʊnsaɪd] n. 负面,缺点;下降趋势;底侧 adj. 底侧的 { :8709}
snail [sneɪl] n. 蜗牛;迟钝的人 vt. 缓慢移动 vi. 缓慢移动 {cet6 :8765}
cache [kæʃ] n. 电脑高速缓冲存储器;贮存物;隐藏处 vt. 隐藏;窖藏 vi. 躲藏 {gre :8893}
clarification [ˌklærəfɪ'keɪʃn] n. 澄清,说明;净化 {toefl :8909}
mcdonald [mәk'dɔnәld] 麦当劳(McDonald's) 麦克唐纳(人名) { :8947}
insertion [ɪnˈsɜ:ʃn] n. 插入;嵌入;插入物 { :9116}
derivative [dɪˈrɪvətɪv] n. [化学] 衍生物,派生物;导数 adj. 派生的;引出的 {toefl gre :9140}
neural [ˈnjʊərəl] adj. 神经的;神经系统的;背的;神经中枢的 n. (Neural)人名;(捷)诺伊拉尔 { :9310}
activations [,æktɪ'veɪʃən] n. [电子][物] 激活;活化作用 { :9314}
activation [ˌæktɪ'veɪʃn] n. [电子][物] 激活;活化作用 { :9314}
Fergus ['fә:^әs] 费格斯(姓氏, 男子名) { :9390}
neuron [ˈnjʊərɒn] n. [解剖] 神经元,神经单位 {cet6 toefl :9397}
neurons [ ] n. 神经元,神经细胞(neuron的复数形式) { :9397}
microscopic [ˌmaɪkrəˈskɒpɪk] adj. 微观的;用显微镜可见的 {cet6 toefl gre :9581}
vertically ['vɜ:tɪklɪ] adv. 垂直地 { :9720}
approximate [əˈprɒksɪmət] adj. [数] 近似的;大概的 vt. 近似;使…接近;粗略估计 vi. 接近于;近似于 {cet4 cet6 ky toefl ielts gre :9895}
scientifically [ˌsaɪən'tɪfɪklɪ] adv. 系统地;合乎科学地;学问上 { :9981}
rectangle [ˈrektæŋgl] n. 矩形;长方形 {gk cet6 ky toefl ielts gre :10058}
rosier [ˈrəʊzi:ə] adj. 玫瑰色的( rosy的比较级 ); 愉快的; 乐观的; 一切都称心如意 { :10106}
metric [ˈmetrɪk] adj. 公制的;米制的;公尺的 n. 度量标准 {cet4 cet6 ky ielts :10163}
propagate [ˈprɒpəgeɪt] vt. 传播;传送;繁殖;宣传 vi. 繁殖;增殖 {cet6 toefl ielts gre :10193}
propagated [ˈprɔpəɡeitid] 传播 { :10193}
propagating [ˈprɔpəɡeitɪŋ] v. 传播(propagate的ing形式);繁殖 adj. 传播的;繁殖的 { :10193}
approximation [əˌprɒksɪˈmeɪʃn] n. [数] 近似法;接近;[数] 近似值 { :10242}
diagonal [daɪˈægənl] n. 对角线;斜线 adj. 斜的;对角线的;斜纹的 {toefl gre :10261}
diagonals [daɪˈægənəlz] n. <数>对角线( diagonal的名词复数 ); 斜线 { :10261}
inception [ɪnˈsepʃn] n. 起初;获得学位 n. 《盗梦空间》(电影名) {gre :10325}
pixels ['pɪksəl] n. [电子] 像素;像素点(pixel的复数) { :10356}
pixel [ˈpɪksl] n. (显示器或电视机图象的)像素(等于picture element) { :10356}
generalizing [ˈdʒenərəlaizɪŋ] 归纳 { :10707}
contextual [kənˈtekstʃuəl] adj. 上下文的;前后关系的 { :10846}
synthesized ['sɪnθɪsaɪzd] adj. 合成的;综合的 v. 合成(synthesize的过去分词);综合 { :10905}
blah [blɑ:] n. 废话;空话;瞎说 n. (Blah)人名;(捷)布拉赫 int. 废话 { :10986}
wha [ ] [医][=warmed,humidified air]温暖、潮湿的空气 { :11046}
artificially [ˌɑ:tɪ'fɪʃəlɪ] adv. 人工地;人为地;不自然地 { :11137}
infinity [ɪnˈfɪnəti] n. 无穷;无限大;无限距 {cet6 gre :11224}
delve [delv] n. 穴;洞 vi. 钻研;探究;挖 vt. 钻研;探究;挖 n. (Delve)人名;(英)德尔夫 {gre :11237}
seminal [ˈsemɪnl] adj. 种子的;精液的;生殖的 adj. 有创造力的,对未来有影响的;重大的 {gre :11387}
overlay [ˌəʊvəˈleɪ] n. 覆盖图;覆盖物 vt. 在表面上铺一薄层,镀 { :11456}
optimized ['ɒptɪmaɪzd] adj. 最佳化的;尽量充分利用 { :11612}
horizontally [ˌhɒrɪ'zɒntəlɪ] adv. 水平地;地平地 { :11924}
ex [eks] n. 前妻或前夫 prep. 不包括,除外 { :12200}
dumbbell ['dʌmbel] n. 哑铃;蠢人 { :12245}
bookcase [ˈbʊkkeɪs] n. [家具] 书柜,书架 {gk ielts :12527}
mo [məʊ] abbr. 卫生干事,卫生管员(Medical Officer);邮购(Mail Order);方式(Modus Operandi);邮政汇票(Money Order) { :12537}
oftentimes [ˈɒfntaɪmz] adv. 时常地 { :12676}
propagation [ˌprɒpə'ɡeɪʃn] n. 传播;繁殖;增殖 {cet6 gre :12741}
multiplications [ ] (multiplication 的复数) n. 乘法, 增加, 乘法运算 [医] 增殖; 倍增 { :12748}
kyle [kaɪl] n. (苏)狭海峡,海峡 n. (Kyle)人名;(英)凯尔;(瑞典)许勒;(西)基莱 { :13115}
Afghan [ˈæfɡæn] n. 阿富汗语;阿富汗人 adj. 阿富汗人的;阿富汗的 { :13137}
healthcare ['helθkeə] n. 医疗保健;健康护理,健康服务;卫生保健 {ielts :13229}
segmentation [ˌsegmenˈteɪʃn] n. 分割;割断;细胞分裂 { :13396}
visualization [ˌvɪʒʊəlaɪ'zeɪʃn] n. 形象化;清楚地呈现在心 { :13979}
visualizations [ ] (visualization 的复数) n. 可见性, 形象化 [医] 使显形, 造影[术], 想象 { :13979}
husky [ˈhʌski] adj. 声音沙哑的;有壳的;强壮的 n. 强壮结实之人;爱斯基摩人 {gre :14361}
regenerate [rɪˈdʒenəreɪt] vt. 使再生;革新 adj. 再生的;革新的 vi. 再生;革新 {cet6 ky toefl :14883}
transpose [trænˈspəʊz] n. 转置阵 vt. 调换;移项;颠倒顺序 vi. 进行变换 {gre :14972}
transposed [ ] adj. 移调的;变调的 v. 调换;颠倒顺序;移项(transpore的过去分词) { :14972}
dimensional [dɪ'menʃənəl] adj. 空间的;尺寸的 {toefl :15066}
adversarial [ˌædvəˈseəriəl] adj. 对抗的;对手的,敌手的 { :15137}
retrain [ˌri:ˈtreɪn] vt. 重新教育;再教育 vi. 再训练;再教育 n. (Retrain)人名;(法)雷特兰 { :15253}
retrained [ri:ˈtreind] v. 重新教育,再教育( retrain的过去式和过去分词 ) { :15253}
unbiased [ʌnˈbaɪəst] adj. 公正的;无偏见的 {toefl :15836}
inverse [ˌɪnˈvɜ:s] n. 相反;倒转 adj. 相反的;倒转的 vt. 使倒转;使颠倒 {cet4 ky toefl gre :15867}
inverts [inˈvə:ts] v. 使倒置,使反转( invert的第三人称单数 ) { :15967}
invert [ɪnˈvɜ:t] n. 颠倒的事物;倒置物;倒悬者 adj. 转化的 vt. 使…转化;使…颠倒;使…反转;使…前后倒置 {cet6 ky toefl ielts gre :15967}
normalization [ˌnɔ:məlaɪ'zeɪʃn] n. 正常化;标准化;正规化;常态化 {cet6 ky :16091}
toolbox [ˈtu:lbɒks] n. 工具箱 { :17283}
SE [ ] abbr. 东南方(southeast) { :17431}
iterations [.ɪtə'reɪʃ(ə)n] n. 迭代次数;反复(iteration的复数) { :17595}
notepads [ ] 注释板(notepad的复数) { :17692}
equalized [ˈi:kwəlaizd] v. (使某事物)相等( equalize的过去式和过去分词 ) { :17737}
exponential [ˌekspəˈnenʃl] n. 指数 adj. 指数的 {toefl :17748}
summations [səˈmeɪʃənz] n. 总和( summation的名词复数 ); 加在一起; 总结; 概括 { :17935}
summation [sʌˈmeɪʃn] n. 和;[生理] 总和;合计 {gre :17935}
datasets [ ] (dataset 的复数) [电] 资料组 { :18096}
dataset ['deɪtəset] n. 资料组 { :18096}
ostrich [ˈɒstrɪtʃ] n. 鸵鸟;鸵鸟般的人 {gre :18490}
pelican [ˈpelɪkən] n. [鸟] 鹈鹕 { :18790}
iguana [ɪˈgwɑ:nə] n. 鬣蜥蜴 { :18852}
难点词汇
granular [ˈgrænjələ(r)] adj. 颗粒的;粒状的 {ielts :20261}
subsample ['sʌbsɑ:mpl] n. (从样品中再抽取的)子样品;二次抽样样品 vt. 对…作二次抽样 { :20642}
flamingo [fləˈmɪŋgəʊ] n. [鸟] 火烈鸟 { :21112}
flamingos [fləˈmɪŋgəʊz] n. 红鹳,火烈鸟(羽毛粉红、长颈的大涉禽)( flamingo的名词复数 ) { :21112}
generative [ˈdʒenərətɪv] adj. 生殖的;生产的;有生殖力的;有生产力的 { :21588}
localization [ˌləʊkəlaɪ'zeɪʃn] n. [计] 定位;局限;地方化 { :21883}
NY [ ] abbr. 纽约(美国一座城市,New York) { :21993}
carapace [ˈkærəpeɪs] n. 壳;甲壳 {toefl gre :23667}
occlusion [ə'klu:ʒn] n. 闭塞;吸收;锢囚锋 { :24330}
convoluting [ˈkɔnvəlju:tɪŋ] v. 回旋,卷绕,盘旋( convolute的现在分词 ) { :24355}
orthogonal [ɔ:'θɒgənl] adj. [数] 正交的;直角的 n. 正交直线 { :24671}
dalmatian [dæl'meiʃiәn] n. 达尔马西亚狗;达尔马西亚人 adj. 达尔马西亚的 { :25118}
Dalmatians [dælˈmeiʃiənz] n. 斑点狗( Dalmatian的名词复数 ) { :25118}
iterative ['ɪtərətɪv] adj. [数] 迭代的;重复的,反复的 n. 反复体 { :25217}
invariant [ɪnˈveəriənt] n. [数] 不变量;[计] 不变式 adj. 不变的 { :26080}
xavier ['zʌvɪə] n. 泽维尔(男子名) { :26299}
occluded [əˈklu:did] v. 闭塞的;堵塞;咬合的(occlude 的过去分词) { :27220}
SU [ ] abbr. 后勤部队(Service Unit) n. (Su)人名;(土、柬)苏;(中)苏(普通话·威妥玛) { :27413}
regularize [ˈregjələraɪz] vt. 调整;使有秩序;使合法化 { :29422}
Gaussian ['gaʊsɪən] adj. 高斯的 { :29650}
interpretable [ɪn'tɜ:prɪtəbl] adj. 可说明的;可判断的;可翻译的 { :30754}
convolution [ˌkɒnvəˈlu:ʃn] n. [数] 卷积;回旋;盘旋;卷绕 { :30767}
convolutions [kɒnvə'lu:ʃnz] n. 回旋,盘旋,卷绕( convolution的名词复数 ) { :30767}
trippy ['trɪpɪ] adj. 由致幻药引起幻觉的 { :31207}
saliency [ˌseɪ'ljənsɪ] n. 显著;卓越;特点;凸起 { :33942}
discriminative [dɪs'krɪmɪnətɪv] adj. 区别的,歧视的;有识别力的 { :36291}
subspace ['sʌbspeɪs] n. 子空间 { :36324}
regularization [ˌregjʊlərɪ'zeɪʃən] n. 规则化;调整;合法化 { :37553}
classifier [ˈklæsɪfaɪə(r)] n. [测][遥感] 分类器; { :37807}
initialization [ɪˌnɪʃəlaɪ'zeɪʃn] n. [计] 初始化;赋初值 { :40016}
subpart [sʌb'pɑ:t] n. 子部件 { :41301}
zhou [dʒəu] n. 周(中国姓氏);周朝(中国古代王朝) { :49559}
生僻词
backprop [bæk prɒp] un. 后撑
backpropagate [ ] [网络] 反向传播
backpropagation [ ] n. 反向传播算法 [网络] 反向传播了;反向传播法;传播网络
cla [ ] abbr. communication link analyzer 通讯连接分析器
conv ['kənv] [医][=convalescence]恢复(期),康复(期)
convolutional [kɒnvə'lu:ʃənəl] adj. 卷积的;回旋的;脑回的
convolve [kən'vɒlv] vt. 使卷;使盘旋;使缠绕 vi. 盘旋;卷;缠绕
cro [ ] n. (Cro)人名;(法、意)克罗 阴极射线示波器(Cathode-Ray Oscillograph)
datas [ ] n. 数据输入
deconvolution [di:kɒnvə'lu:ʃən] n. [地质] 反褶积,[计] 去卷积
deconvolutional [ ] [网络] 去卷积
deconvolve [,di:kәn'vɔlv] vt.[计]去…卷积,展开…卷积
deconvolving [,di:kən'vɔlv] vt. 展开…卷积;去…卷积
elementwise [ ] [网络] 元素对元素
gener [ ] [网络] 产生;制造;出生
gla [ ] abbr. γ—亚麻酸(Gamma-Linolenic Acid);大伦敦政府(Greater London Authority);总可出租面积(Gross Leasable Area)
google [ ] 谷歌;谷歌搜索引擎
hartebeest ['hɑ:tɪbi:st] n. 大羚羊(产于非洲)
hyperparameter [ ] [网络] 超参数;分别有一个带有超参数
invertible [ɪn'vɜ:tɪbl] adj. 可逆的;倒转的
kth [ ] abbr. Kungliga Tekniska Hegskolan (Royal Institute of Technology, Stockholm) 斯德哥尔摩皇家工学院
multiplicate ['mʌltɪplɪkeɪt] adj. 多种的;多重的
nx [ ] abbr. next 接下去的; 其次的; 下一个的; nonexpendable 非消耗品
oftentime [ ] [网络] 的时间
Pomeranian [.pɒmә'reiniәn] a. 波美拉尼亚的 n. 波美拉尼亚人, 波美拉尼亚种小狗
relu [ ] [网络] 关节轴承
softmax [ ] [网络] 柔性最大传递函数;前回收的日志文件的百分比;西风狂诗曲系列篇章
thresholding [ ] [网络] 二值化;阈值处理;阈值化
upsample [ ] [网络] 内插滤波进行升采样;升频;对输入信号过采样
upsampled [ ] [网络] 升频
upsampling [ ] [网络] 提升采样;增采样;提昇采样
wx [ ] abbr. weather 天气; weather report 气象报告; watts second 瓦特秒; waxy 蜡(状)的
zeroes [ˈziərəuz] n. (数字)零( zero的名词复数 ); 零点; 零度; 没有
词组
a dot [ ] [网络] 阿顿;阿突
a fox [ ] [网络] 狐狸;一只狐狸;狐理
a hack [ ] [网络] 网络攻击
a max [ ] [网络] 最大值;最大净光合速率;最大聚集率
a toolbox [ ] 工具箱
activation function [ ] 激活函数
activation level [ ] 激动水平
activation mapping [ ] 《英汉医学词典》activation mapping 激动标测法
Afghan hound [ˌæfgæn 'haʊnd] n. 阿富汗猎狗 [网络] 阿富汗猎犬;阿富汗狩猎犬;阿富汗犬
back propagation [ˈbækˌprɔpəˈgeɪʃən] [网络] 反向传播;误差反向传播;反向传播算法
backward path [ ] un. 回程通路;反向通路 [网络] 反向路径
be messy [ ] [网络] 那会很麻烦
black dot [ ] un. 黑斑 [网络] 黑点型;黑点款;圆点图案
blah blah [ ] [网络] 等等;生活废话;磨嘴皮子
blog post [ ] [网络] 博客文章;博客帖子;部落格文章
convolution operation [ ] un. 褶积运算 [网络] 卷积运算
delve into [ ] [网络] 钻研;深入研究;探究
dot product [dɔt ˈprɔdʌkt] un. 点积;标量积 [网络] 点乘;数量积;内积
edge detection [ ] un. 边缘检测;边检测 [网络] 边缘侦测;边界检测;边沿检测
edge detector [ ] un. 边缘检测器 [网络] 边缘觉察器;边缘检测算子;信号缘侦测器
equalize to [ ] (或with)使相等;使相同;使平等
et al [ ] abbr. 以及其他人,等人
et al. [ˌet ˈæl] adv. 以及其他人;表示还有别的名字省略不提 abbr. 等等(尤置于名称后,源自拉丁文 et alii/alia) [网络] 等人;某某等人;出处
et. al [ ] adv. 以及其他人;用在一个名字后面 [网络] 等;等人;等等
flip all [ ] [WIN]全部翻转
forward propagation [ ] 正向传播
Gaussian blur [ ] [网络] 高斯模糊;高斯模糊滤镜;高度模糊
gradient descent [ ] n. 梯度下降法 [网络] 梯度递减;梯度下降算法;梯度递减的学习法
hack that [ ] [网络] 这样砍
identity matrix [ ] un. 〔数〕幺矩阵;纯量矩阵;恒等矩阵;单位矩阵 [网络] 单位化矩阵;单位阵;产生单位矩阵
intermediate layer [ ] un. 中间层;过渡层 [网络] 中层;中间界面层;中间过渡层
invertible matrix [ ] n. 非奇异方阵 [网络] 可逆矩阵;可泄矩阵;反矩阵
iterative process [ ] un. 迭代过程;迭绕法 [网络] 迭代程序;迭代估计控制;反复式
kit fox [kit fɔks] 小狐,小狐毛皮; 敏狐
machine translation [məˈʃi:n trænsˈleiʃən] n. 机器翻译;计算机翻译 [网络] 机骗译;机译;机器翻译技术
mathematical operation [ ] un. 数学运算 [网络] 数字运算;数学计算
mathematical operations [ ] [数] 数学运算
mathematical perspective [ ] 《英汉医学词典》mathematical perspective 几何透视
microscopic imaging [ ] 显微成像
minus infinity [ ] [网络] 负无穷大;负无限大
minus one [ ] [网络] 桃花源;幸福意外;谢谢你捧场
multiply by [ ] v. 乘 [网络] 乘以;乘上;使相乘
neural network [ˈnjuərəl ˈnetwə:k] n. 神经网络 [网络] 类神经网路;类神经网络;神经元网络
neural networks [ ] na. 【计】模拟脑神经元网络 [网络] 神经网络;类神经网路;神经网络系统
object detection [ ] [科技] 物体检测
orthogonal matrices [ ] 正交矩阵
orthogonal matrix [ɔ:ˈθɔɡənl ˈmeɪtrɪks] [网络] 正交矩阵;正交阵;直交矩阵
per se [ˌpɜ: ˈseɪ] adv. 本身;本质上 [网络] 自身;本来;本身餐厅
pixel image [ˈpiksəl ˈimidʒ] [医]像素显像
plus infinity [ ] [网络] 正无穷大;正无限大
plus zero [ ] un. 正零
reconstruction method [ ] 重建法
saliency map [ ] [网络] 显著性地图;显著性图;显著图
set to zero [ ] un. 调到零位;调零 [网络] 设置为零;置零;零调整
simple matrix [ ] 单纯矩阵
spatial information [ ] 空间信息
spatial localization [ ] 《英汉医学词典》spatial localization 空间定位
spiral whirl [ ] [网络] 螺旋形旋涡
spiral whirls [ ] 螺旋形旋涡
synthetic image [ ] 综合图象
the ass [ ] [网络] 驴子;菊门;深渊
the downside [ ] [网络] 不利方面;缺点
the fox [ ] [网络] 狐狸;女狐;沙狐
the matrix [ ] [网络] 黑客帝国;骇客任务;骇客帝国
the Max [ ] [网络] 麦克斯;牛魔王;电子产品配件
the purple [ ] 帝位;王位;显位;红衣主教的职位
the reconstruction [ ] [网络] 重构法;构建;战地雄心
the snail [ ] [网络] 蜗牛;井底的蜗牛;丝瓜花上蜗牛
time zero [ ] 计时起点,时间零点
to clip [ ] [网络] 夹娃娃机;擦撞;到剪切板
to compute [ ] [网络] 计算;用计算机计算
to encode [ ] [网络] 编码;内码;骗码
to overlay [ ] 覆盖
to summarize [ ] [网络] 总结;总结来说;概括
to update [ ] [网络] 更新;重要更新公告;每月更新
validation set [ ] 验证集
vector operation [ ] un. 向量运算 [网络] 矢量操作;矢量运算;向量操作
visualization method [ ] 显像法
visualize doing [ ] 历历描绘......于心
zero in [ˈziərəu in] na. 调整(枪炮的)射距;把(火力)对准目标 [网络] 归零;瞄准;瞄准锁定
zero in on [ˈziərəu in ɔn] (使)瞄准…,(使)对准…,对…集中火力[注意力]
Zero Minus [ ] [网络] 绝对零点
zero out [ ] na. 给…以免税待遇 [网络] 清零了;取消;置零
zero zero [ˈziərəu ˈziərəu] 零
惯用语
12 by 5
and in fact
and now
and so
and so on
does that make sense
does that makes sense
in practice
occlusion sensitivity
plus one
same thing
so now
that makes sense
单词释义末尾数字为词频顺序
zk/中考 gk/中考 ky/考研 cet4/四级 cet6/六级 ielts/雅思 toefl/托福 gre/GRE
* 词汇量测试建议用 testyourvocab.com
