06

文本

████    重点词汇
████    难点词汇
████    生僻词
████    词组 & 惯用语

[学习本文需要基础词汇量：8,000 ]
[本次分析采用基础词汇量：6,000 ]

. Hey, everyone. Welcome back.

Um, this is, people can hear me okay? .

So if as usual,

you can take a second to enter your,

uh, SU ID so we know who's here.

Um, so today's lecture will be a Choose Your Own Adventure lecture, um.

So I think, ,

by now you've learned a lot about the, um,

technical aspects of building learning and then,

in the third course, uh,

in the third set of modules,

you saw some of the principles for

Learning and how best to use these tools,

um, in order to be efficient in how you build a machine learning application.

What I want to do today is a step through with you,

a complicated machine learning application,

and, um, throughout all of today's lecture,

I'm gonna, ,

step you through a scenario and then ask you to kind of choose your own adventure.

Because if you have to work on this project,

what are you gonna do, right?

Um, and to give you more of that practice in the next, um,

what, hour and a bit that we have, uh,

on thinking through machine learning strategy, um.

And you know I've,

I've seen in so many projects, uh, there are,

there are sometimes things that

a less sophisticated team will take a year to do,

but if you're actually very strategic and very

sophisticated in deciding what you will do next, right,

how to drive a project forward,

I've seen many times that what a different team will take to do,

maybe you could do it in a month or two, right.

And, , if you're trying to, um,

, write a research paper, or build a business, or build a product,

the, the, the ability to drive

a machine learning project quickly gives you a huge advantage and just,

, you're making much more efficient use of your life as well, right.

Um, so, in, for today I'd like to,

er, er, I'm gonna pose a scenario,

pose a machine learning application and say,

, I mean you are the CEO of this project, what are you going to do next?

So, but I'd like to have today's meeting be quite interactive as well.

So, can I get people to sit in groups of two,

and ideally three or so,

maybe one and

try to sit next to someone that you don't work with all the time.

Er, so, so, if you're sitting,

sitting next to your best friend,

I'm glad your best friend is in the class with you,

but go sit with someone else because I think,

um - I've done this multiple times and the discussion's

actually richer if you talk to someone that you don't know super well.

So actually take a second,

introduce yourself and, and, and, and just greet your neighbor, I guess.

So, the example I wanna go through today is actually

a of the example I described briefly,

uh, uh, in the last lecture I taught, uh,

Building a Speech Recognition System, right.

So, remember I briefly motivated this ,

wake word or last time where,

, uh, right.

I, I, I, I, I actually have both an Echo and a Home, uh,

but you know, it's it's a lot of work to

these things to turn on and off the ,

um, and so if you can build a chip,

uh, to sell to say a lamp maker,

to recognize uh, phrases like,

you know - let's say we call the lamp Robert, right, um.

Then you can recognize phases like Robert, , right?

Robert turn off, and you have a little switch to give this thing different names,

you can call it Robert, or,

or or Alice or something.

So you can also have ,

, , turn off.

Just give, give your a lamp a name and just say Hey,

Robert, , right?

So, rather than detecting different names and turn on and turn off,

I'm just gonna focus on - just for the technical discussion,

I'm just going to focus on the phrase Robert turn on, er,

but it's kind of the same problem we need to solve like,

four times to give it two names or to turn on and turn off.

So, I'm gonna Robert turn on as RTO, right,

so if you wanna call your name Robert and,

um, tell your lamp to turn on, um.

I think I was inspired by one ,

wrote these um, novel series and,

and all his robots' name started with R. So,

maybe R is Robot turn on, um.

And so, uh, let's see.

So, let's say that, um,

you are the new CEO of a small with,

you know, three persons, uh,

and your goal is to build an, um,

is to build a circuit or actually,

your goal is to build a learning , um,

that can recognize this phrase,

, er,

so that when someone, you know,

buys this lamp and they say Robert turn on,

then the lamp can turn on, right?

And just focusing on the task of building, ,

then, you know, to,

to be CEO of the , it means you need to do a lot of things, right?

You need to figure out how to do the ,

you figure out who are the lamp makers, etc.

So there's all that stuff, but for today,

let's just focus on the machine learning aspect of it, um.

And so my first question to you is very open-ended,

is - but and this is the life of a CEO, right?

You wake up one day and you've just got to decide what to do, um.

But so my first question to you is the open-ended question is,

you're the CEO, uh,

you're gonna shop at work uh, you know,

tomorrow in your office and you want to build

a learning to detect the phrase Robert turn on for this application, right?

So, um, so my question is,

what are you gonna do, right?

So take a, take a minute to,

uh, answer that by yourself first.

Uh, don't, don't discuss with your neighbor yet,

but you know, you're gonna shop in your office and,

and then you're gonna start working on

these engineering problems of building a to do this so,

uh, and, and do this as yourself, right?

Don't, don't, don't pretend the - yeah, this whatever,

er, er, er, er, CEO with $10 billion to spend or whatever.

Just do it. Just say yeah,

I - but I don't think this is a terrible idea.

I, I, I, I, this is not the best idea but I think this could work.

So you're actually welcome to do this.

But let's say you decide to do this and you go

into your office tomorrow, right, what do you do, right?

Why don't you take, um,

why don't you take let's say two minutes to enter an answer,

then we can, then we can discuss.

[NOISE] In fact, ,

uh, yeah, yes, I,

one thing I really like about the answer was actually

the read exist - read existing literature part, right, um.

In fact when you start a new project, um, uh,

uh, and I think, um,

uh, when you start doing a new project like that,

assuming you've not worked with trigger before, you know,

reading research papers or reading code on , or reading on

this problem is actually a very good way to quickly level up your knowledge, um.

And I think that, you know it, it, it turns out that,

uh, in terms of your,

uh, exploration strategy, right, um,

I want to describe to you how I read research papers, um,

uh, which is - so this is, um,

not a good way to review the literature which is if

the x-axis is time and the is research papers,

what some people will do is find the first research paper and read that until it's done.

And then go and find a second research paper and read that until it's done,

and then go and find

the third research paper and this - because this is a very way

of reading research papers and I find

that the more strategic way to to go through these resources,

everything ranging from , um,

lots of good Medium articles that explain things,

right, uh, research papers,

um, right, .

Is if you use a parallel exploration process where -

This, this is actually what it feels like when I'm doing research .

I'm trying to learn about a new field I'm not that experienced in.

Right, so I've just done a lot of work on trigger .

But if I hadn't worked on this before,

then I would probably find you know, three papers.

So again x-axis is time and as different papers.

, you know read a few papers,

kind of at a surface level and them.

And based on that, you might decide to read that one in greater detail,

and then to add other papers that you start

and maybe find another one that you want to read in great detail,

and then to gradually add new papers to

your reading list and read some to completion and some not to completion.

Um, you, I, I was actually um, ah, ah,

one of my friends, uh,

a for - former student at ,

who mentioned that he was wanting to learn about a new topic.

And he, he was,

he told me he's compiling a reading list of 200 research papers that

he wanted to read. That sounds like a lot.

You rarely read 200 papers.

But I think if you read 10 papers,

you have a basic understanding.

If you read 50, you have a pretty decent understanding.

If you read like 100,

I think you have a very good understanding, uh, uh,

of, of, of a few but often this is [NOISE] time well spent. I guess.

Um and ah, some other tips, again this is,

I'm - I'm really thinking if you really are

a CEO of this startup and this is what you wanna do,

what advice would I give you?

Um, ah, ah when you're reading papers ah,

other thing to realize,

one is that uh,

some papers don't make sense.

Right. And this fine.

Uh, uh, even I read some papers I would just go .

I don't think that makes sense.

Uh, and, and it's not that for us to uh,

find papers from a decade ago and we learned that half of it was great

and the other half of it you know

was really talking about things that were not that important.

Right, so it's okay.

Uh, uh, authors you know,

usually papers are technically accurate but often what they thought was

important like maybe an author thought

that using batch number was really important for this problem.

But it just turns out not to be the case.

That, that happens a lot.

That happens sometimes.

[NOISE] Um, and I think the other tactic that I see

students sometimes not use enough is uh,

talking to experts including contacting the authors.

So when I read a paper [NOISE] um, uh,

I don't, I don't bother the authors

unless I've actually like tried to figure it out myself.

Right. But if you actually spend some time trying to understand

the paper and if it really doesn't make sense to you uh,

uh, uh, it's, it's,

it's okay to the authors and see if they respond.

And [NOISE] and people are busy.

Maybe there's a 50 percent chance of respond.

And that's okay because it takes you five minutes

to write an and there's a 50 percent chance to get back to you.

That could be time pretty well spent.

Uh, uh, but, but don't, don't,

don't bother people unless you've tried to do your own work.

I, I usually get a lot of from you know high-school students - that do

not feel like they've done their own work an - and I just write, and then yes.

So just don't, don't,

don't bother people unless you've actually tried to do your own work.

[LAUGHTER] Um [NOISE] cool.

So, after um, [NOISE] looking at the literature ah,

and having a base maybe

an open source implementation or getting a sense of the avenue you wanna try.

Oh and it, it turns out the,

the trigger literature is actually

one literature where there isn't consensus on.

This is a , this a bad .

Right. Despite all the , wake word

that you know some of you may use already uh,

there, there, there isn't actually consensus in the,

in, in, in the research for me today on like,

this is the best avenue to try.

Um, but so let's say that um, you've read some papers,

some open-source implementations and now

you want to start training your first system.

Right. And last time we talked about this we talked a little bit

about how much time you will spend to collect data,

and, , we said you - spend a small amount of time.

Spend like a day, or maybe two days at most.

So collect your first data sets and start training up a model.

Um, but my next question to you is what data would you collect?

Right. Um, in particular what train test data

[NOISE] would you collect?

So you've decided on an initial and you

want to [NOISE] train something to recognize this phrase, .

Uh, I think this uh, probably I don't think it's possible the data set.

I don't think anyone has collected a data set

with the words Robert turn on and posted it on the Internet.

So you have to collect your own data for

this particular trigger phrase that you wanna use.

But um, you know as CEO of this startup trying to build a to

detect the phrase "," um what data do you collect?

Right. So why don't you take,

why don't you again, take on um,

let's say three minutes to write an answer to this.

Yeah. I think this is an interesting one.

Um, Robert turn on over and over and then data .

[NOISE] Um, data is one of those techniques that um,

uh, is a way to reduce

the your learning because you're generating more data.

And ah, having worked on this problem I happen to know data works uh,

is very useful for this problem.

But if you didn't already know that fact this is

one of the things I would probably not do right away,

because I would train a quick and dirty system that you really

have a high variance problem before investing the effort into data .

So data is one of those techniques that so you know,

like it never hurts here or it rarely hurts,

usually helps but I don't bother to make that investment unless you have

collected the evidence that you actually have

a high variance problem and that this is actually a good use of your time.

Right.

Yeah. [NOISE] I think this,

this one actually, this is actually nice.

, uh, record everyone at startup saying, Robert turn on 100 times.

So - really nice thing about that.

You get it done really quickly.

Um, uh, when I'm working with teams um,

I actually think in terms of hours,

in terms of how long it take us to do something.

So this one you can probably do in like 30 minutes.

Right so you get your data set collected in 30 minutes and get going.

Uh, or, or, or you run around and just ask you know

friends or strangers to speak into your uh, uh, laptop microphone.

You could spend a few hours to get a much bigger data set than possibly at startup.

I probably do that. Probably I should go and collect

data in several hours rather than only spend 30 minutes.

But this is actually pretty interesting as well because it lets you

get it done really quickly. That makes sense?

Right. [NOISE] , yeah.

So let me actually uh, uh,

share some more concrete advice.

Right. And, and I think I should someti - sometime back um, to,

to prepare a that you'll see later in this course.

Ken and and I, we're actually you know building the system partially to,

to, to create a homework right, that, that,

that you'll see later in this course. So this is like a [NOISE].

Uh, this thing is

a nice learning example that we're using in a few points throughout this course.

Um, so here's one thing you can do.

Uh, and this, this is actually what's um,

[NOISE] what we did.

Right. Which is uh, collect [NOISE] um,

well a little bit um,

[NOISE] collect a hundred examples [NOISE] of uh,

uh, 10-second .

Right? And so, uh,

it turns out once you grab a hold of someone,

uh, and ask them to speak into your microphone, you know,

you can keep them for, um, three seconds

which is how long it takes to say,

or you can keep them for ten seconds which they're actually very

willing to spend the extra seven seconds with you.

Right? Um, but so if this is ten seconds of audio data, you know,

so this is ten seconds of audio, right, and,

and audio is just patterns of,

uh, little changes in air pressure.

Right. So, if you plot audio,

the reason it looks like this .

It's just, uh, the,

the way you're hearing my voice is you know,

my voice or the speakers are creating very rapid changes in air pressure and

your ear measures those very rapid changes in

air pressure, interprets the sound and so a microphone, uh,

is a sensitive device for recording these very very high frequency changes in

air pressure and these plots that you see

in audio is just, what is the air pressure at different moments in time.

Right? But so given a, um,

uh, ten second clip like this.

If this is a three-second section,

where they said, ,

then what you would like to do is to build a desk lamp say,

that can sit here and the lamp is turned off,

, turn off,

, , .

And at the moment they finish saying, Robert turn on, yeah, you turn it on.

So this is, uh, output label y really, right?.

, and then it's not detecting the phrase.

So - so what you want to do for the system is,

um, at you know,

pretty much the moment they finished saying Robert turn on, uh,

you once your learning algorithm to output a one, that's your target label y saying,

"Yeah, I just heard this trigger word."

Uh, and for all other times,

you wanted to output zero,

right? Cause - because, uh, and the one is when you

decide to turn on the lamp at that moment in time.

Right? So to collect the data set,

um, here's something you can do,

which is collect 100 ,

of 10 seconds each.

, when I'm my work or my team's work,

I will really, you know,

look at these numbers and think okay,

- let's say you actually if you are doing it.

Let's say you are running around and you want to collect 100 ,

uh, uh, maybe 10 people,

10 clips per person or maybe 100 different people.

Um, I will actually estimate, you know,

if you go to uh,

how long does it take to get one person?

You can probably get one person every minute or two,

if you go to a busy place on,

on like the .

So you could probably get this done in like, uh,

100 to 200 minutes, like, two or three hours right?

It's not that bad. So you get this done quite quickly.

Um, and so and,

and let's say you collect 100 and actually for the,

for the, for the purposes of,

uh, today let's say,

you collect 100 to use for training,

25 for your set,

um, [NOISE] and zero for the test set.

Right. That's actually not that if you're building

a new product to just not have a test set because your goal is, uh,

to build something that convinces, you know,

just early phases of a project,

sometimes they don't bother with a test set.

If, if, it goes to publish a paper then

of course you need a collected test set.

But if you're just building a product,

then you don't need a evaluation sometimes you can

just get started without dealing with a test set, right?

So it's pretty easy to get started.

Um, .

[NOISE]

.

So taking that audio clip from above, um,

one thing you can do to turn this into a supervised learning problem, uh,

is to take - so the,

the phrase Robert turn on can be said in less than three seconds.

So let's say you take three seconds as of audio.

Right? So we can do is, uh, .

So let's say here was when Robert turn on was said.

So what you can do is, uh,

oh, right, and the target label is zero, zero, zero,

zero, zero, zero, zero one,

zero, zero, zero, zero.

Um, what you can do is then different audio clips of three seconds.

So here's one audio clip,

you can take that audio clip.

This is X and the target label is zero because,

because was Robert turn on was not said [NOISE].

Um, and you can take on - on this audio clip at different random, each clipped,

three second clip and that clip also has a target label 0.

Um, ,

for this one right?

Which is a three-second clip that can,

that-that, that ends at,

uh, on, the last part of the on sound you would have a target label of 1.

Right? So, and - and when - when you learn about sequence models, RNNs,

you learn a better method than this explicit clipping.

But for now let's say you take these,

um, audio clips and turn it into, uh,

three - so take a 10 second clip and by clipping

out ran - different windows you can take your,

um, let's say 100, uh - uh, clips.

And because for each ten second clip you can take different windows,

you could turn this into let say,

uh, 3,000 [NOISE] training examples.

Right? So here I took a ten second clip and - and [NOISE] show, you know,

took three different three second windows, but you take 30 three second windows,

then each 10 second audio clip becomes 30 examples.

And now, you've turned a problem into

a where you need to train a

that inputs a three-second clip and labels it as either zero or one.

Right? This make sense?

And so this is an example of, uh,

the - the more complex, uh,

pipelines you might have,

if you're building a learning algorithm.

So take a continuous, you know,

audio and turn it into

a which you've

learned how to build various for.

Right? And again, when you learn about RNNs,

you learn about other ways to process sequence data or data.

Okay. So, um, oh, go ahead.

[] uh is that getting the data?

[NOISE] Uh is this getting, yes, I would, yeah, actually,

if you have 100 examples, uh,

it's not that hard to just listen to it on your laptop or

some audio playing software to figure out when - when they finished saying,

uh, Robert turn on.

And then at that moment to put a one in the target label right?

Because this is really when you want the lamp to turn on, right?

Makes sense? Cool.

So, um, any other questions?

Actually, feel free to ask clarifying questions, yeah, go ahead.

I, I wonder if this is gonna cause a problem but ones are too .

Oh, sure, let me get back to that. Anything else? .

Is there a specific reason why you only train them on a few seconds instead of ten seconds, []

[]

[NOISE] I see, yeah, why do we do three seconds, or four or five seconds

then there's another to contend. So I think.

uh, [NOISE] ,

you have say it really slowly to take a few seconds is this, right?

This is Robert turn on.

And so again is this design choice?

[LAUGHTER] Um, yeah, all right.

So, so, um, let's say you do this,

feed it to a supervised learning algorithm,

train a , um,

and let's say that when you classify this,

ah, when, when you run this algorithm,

you end up with 99.5 percent accuracy [NOISE] right? Um,

uh, but you find that has .

[NOISE]

Right? Um, and, and what I mean is that whatever audio you give it,

it just output zero all the time.

So just says, ,

I never heard the phrase "Robert turn on", you know.

So, so, uh, so, uh,

so, so my question to you is,

you know - and by the way,

the reason I'm going through these scenarios is,

um, I found that, uh,

a good way to gain good and,

and to become good at making these decisions,

is these are the decisions that a project leader, , or leader or a CEO needs to make.

These are actually like pretty much exactly the decisions you need to make.

And I find that, um,

one of the ways to gain this type of experience is you,

you know, find a job with a good AI team and work with them for five years, right?

And then you actually live through this and you see what they do.

But instead of needing you to go and spend five years to see ten examples of this,

I'm trying to step you through maybe one example in, in one hour.

So, so instead of, uh, you know,

gaining this experience through work experience, which is great,

but takes many, many years [LAUGHTER] or many,

many months hoping to,

you know, let's just put you in the position of making these decisions.

You can learn from them much faster, right?

Um, so - and all the examples I'm giving are actually completely realistic, right?

They're either exactly or very similar to things I have seen in,

in actual, you know, very real projects.

So question is your learning algorithm gives this result,

95 percent accuracy, , what do you do?

Let me mention some of - some of the answers I really liked.

I think that, uh, um, you know,

I - when I think of building learning ,

the process is often specify a set and/or test set that measure what you care about.

And then you don't always have to do it,

but it's good .

It's just - it is,

uh, the clarity of your thinking, right?

If you have a very clear specification of the problem.

And I think one insight out of this is that if your set is really , right?

Because it's so ,

that accuracy in your set doesn't translate to what you actually care about.

Because, you know, presumably,

this is 99.5 percent accurate on the dev set as well.

But this performance is terrible.

So it's doing great on a dev set,

on your accuracy , but giving you terrible performance.

So I think of it as good .

You know, this is kind of good sound practice, uh, to,

to just specify, make sure you at least have a dev set and

evaluation that corresponds more closely to what you care about.

So making the dev set more balanced, uh,

equal numbers of positive and negative would,

would be a good step toward that.

Um, and then I think,

um, uh, you could also,

um - there are a few people that talked about, um,

give the higher weights to the positive examples, right?

So, you know, um, uh,

one way to do this is to your training and your dev sets to make

them more in terms of

maybe closer to a balanced ratio positive negative examples.

That'd be okay. The other way to not do ,

we'd just give the positive examples a greater weight, right?

Um, I would probably .

Um, another thing you could do, um,

uh, you know, in the,

in the interests of, um, speed,

even if it's not the most,

most sound thing to do,

is to change the target labels to be a bunch of ones after that.

And this is , this is not formally .

But if you've implemented the rest of this code already,

this might be a reasonable,

you know, a little bit thing to do.

But this is - this, this, this might work well enough.

Right? I would - I might not - I don't know if I would want

to try to write an academic research paper with this method,

maybe you can get away with it.

But this little thing that I think if you tried to publish a paper with this,

academic might raise their eyebrows and say maybe,

you know, maybe this is okay.

But I think if you want something quick and dirty, that just works.

, uh, uh, the ones,

changing a bunch of labels to be ones so that's,

say, here, right?

Uh, that ends just a little bit after Robert turn on, that's still labeled one,

that'll be pretty reasonable.

But this will be saying that, uh,

for anywhere [NOISE] within maybe a 0.5 second period after Robert turn on finished,

it's okay to turn on the light within that period.

That you kind of wanna be turning on the light.

Turning on the lamp, you know,

say within half a second, right, after Robert turn on is - has been said.

And this would be a not - this would be a way to just get more labels of ones in there.

All right, that makes sense? Yeah?

With like your data sets with [] ,

how does that translate to when you deploy this,

you're not going to see Robert turn on as much, right?

Like one out of 1,000 might be reflected,

but what do we expect to see?

Yeah, yeah. Right. So, um,

I think that, uh - how do we put it?

Um, so if you actually - yes, so uh, this,

this is sort of a dev set and evaluation measure kind of question, right?

So, uh, one of the - a couple of the that people often use,

uh, when actually working on this, is,

uh, when someone says Robert turn on,

what is the chance that it actually wakes up, or the lamp turns on?

And then the second is, if no one is saying anything to the lamp, you know,

how often does it turn on by itself without you having said anything?

So those are the two people actually use.

And, and sometimes you also

try to combine them in a single number evaluation or something.

Uh, but I think that, um, uh,

you could identify the data sets and measure both of these things.

, and then hopefully find a way to combine them

into a single real number, which I think - yeah.

And I think one of the ways you talked about in the,

in the videos as well as, right?

Does that make sense? Uh, yeah.

But I think - , uh, so the question is really, um, uh,

on what is it that satisfies a user need, right?

And, uh, and just one, one thing about,

uh, the straightforward way of ,

is that if you don't do this then your whole data set,

just has very few positive examples, right?

Um, and so if you throw away all the negative examples,

so that you cut down the number of negative examples until

you have exactly equal numbers of positives and negatives,

you've actually thrown away a lot of negative examples.

This makes sense? And so one,

one problem with the straightforward way of ,

is that, you know, in your audio clip,

in your test 10 second clip that we collected by running around Stanford,

um, you have one example of Robert turn on.

And so if you want exactly per,

perfectly balanced positive and negative,

it means that you're allowed to only one negative example out of this.

You can say, that's negative and that's a positive.

And you can't more negative examples from this, right?

So, so if you use,

uh - if you insist on a perfect ,

you're actually throwing away a lot of negative examples that,

that could be helpful for the learning algorithm.

Great. [NOISE] Um, so all right.

[NOISE] So, um, you know,

a lot of the of,

uh, building learning is,

um, uh, building learning feels more like , right?

Because what happens in

a typical machine learning is you implement something and it doesn't work.

So you figure out what is the problem,

so fix that, uh,

like or or adding more ones.

And so that fixes the current problem.

And then after fixing the current problem, which,

which is the one we just solved, say,

you then come across a new problem and you have to solve that.

And you fix that problem,

you click somewhere else, another new problem.

So I find that, uh, the of, um,

when I'm working on a machine learning project,

it often feels more like software than software development, right?

Because you're often trying to figure out what doesn't work and you're trying to fix that.

And after you fix that problem,

then another bug surfaces and you that,

and you do that, and another, and you kind of keep doing that until works.

So if I keep talking about,

you know, your algorithm doesn't work,

what do you do next, right?

That, that's kind of the theme of today's presentation.

But that, that is what the - That is what

your day-to-day work of developing

a learning algorithm is usually like because it's like,

it doesn't work, and you fix it.

It still doesn't work, then you fix that,

and it still doesn't work, you fix it.

And you do that enough times until it works, right?

That, that is actually what often working

on a learning algorithm works - look, looks like.

All right. So let's say you fix that problem, um,

and you conclude, uh,

through doing error analysis,

that your algorithm is , right?

So you know, you've - you've added a lot more ones,

so the is a little bit more balanced.

So let's just add a bunch of ones like I did on that previous board,

so let's just add a lot of ones here,

so the isn't as .

, [NOISE] let's see if - [NOISE],

um, right, okay, good.

Let's say that - sorry [NOISE] ,

see, too many pages of notes here.

Okay, good. So let's say that you find that it achieves now 98 percent

accuracy on training and 50 percent accuracy on the dev set, right?

So very large gap between your training and your dev set performance,

and so a clear sign of .

And so I think of one of the earlier questions,

someone talked about data ,

and so we have this clear sign of ,

this is a good time to consider data augmentation, right?

- and so let's say you go ahead and do data augmentation.

So for audio, this is how you could do data augmentation,

which is, um, collect a bunch of background audio.

You know, so I guess if you're trying to build a lab that might go into people's homes,

then you could go into your friends' homes and,

you know, with their permission, record, right?

What the background sound of their home looks like.

You know, maybe there are people talking in the background,

maybe the TV on in the background.

Well, whatever goes on in people - people's homes.

Um, and then, it turns out that if you take a, um, say,

a one second clip,

of Robert turn on, or RTO,

and you add that to a background clip,

then you can an audio clip of

what it sounds like in your friend's house if someone were

to suddenly pop up and say Robert turn on

against the background sound of your friend's house, right?

And - and it turns out that, um,

if you want to make the system ,

so actually, for example,

have a - I know - I - I - I actually know someone that lives, unfortunately,

close near to a train station and so

their house actually has a lot of train station noise from the .

And so what you can do to make your system more is

also take of say train noise, right?

Like noise,

and if you take that noise and take,

uh, in this case, let's say one-second,

one-second of a three-second clip of someone saying Robert turn

on and you that on top of the train in the background,

then what you end up with is a 10 second clip of someone saying Robert

turn on against the noisy train in the background type of noise.

All right. And so in order to do data augmentation or data synthesis,

you can take some one-second clips of people saying Robert turn on in

a quiet background and then take

some one second clip of people saying random words, right?

, you know, , right?

Since you're Stanford, and this against

the train noise background and then you would have, in this case,

you would have what sounds like train noise, ,

, Robert turn on, ,

, , train noise, , train noise, and so on right?

And then you could generate the labels now as zeros there,

ones there and then zeros there, right?

Because if this is what it actually sounded like in a user's home,

then you want the lamp to turn on after Robert turn on but not after these random words.

So you can pick different random words.

Great. Um, so let's see.

Great.

, what I'd

like you to do is evaluate three different possible ways,

um, to collect noisy data, right?

To - to - to collect this type of background data, right?

Um, and so what I'd like you to do for the next question is let's say you and your team,

you know, have, ah, .

Um, a few different ways to collect this type

of background noise data and let's say you've decided

that you would like to collect 10 hours of background noise data, right?

Okay? So I'm going to present to you three options.

One is um, you know,

run around Stanford and place around Stanford or in their friend's homes,

do this with consent and don't - don't, you know,

California as you - you - you're not supposed to -

don't record people without their knowledge and consent, right?

Second is clips online.

It turns out if you go to there are these like 10 hour long clips of,

you know, rain noise or cars driving around.

So you actually - and again, if you do that,

find something that's Creative Commons and sort of licensed, right?

Another thing you could do is, ah,

use .

.

Where you can have people all around the world be paid,

you know, modest amounts of money to submit audio clips, right?

So for the next exercise what I want you to do because, um,

and I want that this exercise of discipline which is,

what I want you to do is, um,

I want you to estimate. Let's see.

What time is it now?

Okay, it's 12:30 PM right now.

What I want you to do is write down three numbers in

the next exercise to estimate if you were to do this,

you know, let's say you were to go do this right now, right?

By what time will you have finished if you were to do option one?

What time would you finish if you were to do option two?

What time would you finish if you were to do option three?

If your goal is to collect 10 hours of data through one of these mechanisms.

Does that make sense? So it's 12:30 PM now.

So what I'd like you to do is just write down three numbers.

First number is what time is it,

what time would it be by the time you collected 10 hours of data,

you know, from around Stanford.

What time would it be right - and if you could do this

in - so if you think you would do it by tonight,

then write 09:00 PM.

If you think it'll do, if you think it will take you one week,

that write the date one week from now,

right? Whatever it is.

But just write down three numbers of these three activities, okay?

Why don't you do this one relatively quickly?

Can people do this in like maybe a minute and a half?

All right, cool. This is interesting, um.

Yeah, whether people are, actually this is a surprisingly large .

I'll mention one thing that um, surprised me.

Um, I'll give you my own assessment.

I think that uh, you

know when I'm leading startup teams we tend to be very , right?

And so I think that um,

if the goal is to collect 10 hours of data,

if you have three friends who have a laptop you can collect

three hours of data per hour because you got three recordings going .

So if I were doing this with say two other friends you know,

I bet, I bet we could get this done by tonight, right?

Because if you need nine hours of data that's each person needs to collect

three hours of data and you run around Stanford and you keep the microphones running,

I bet, I bet I could get this done by 6:00 PM right maybe,

maybe even earlier I don't know.

clips online uh,

is actually, , it's actually an interesting one, maybe be about the same time.

Um, it turns out one thing about clips online is that um,

uh, I think a lot of the you,

the - there are people that um,

have trouble sleeping at night so they listen to highway noise or whatever.

And so there are these you know 20 hours of

highway clips highway noise on that you can find.

But I, I don't know how those clips are generated.

And I suspect a lot of them loop, right?

Meaning it's the same one hour played over and over.

So I actually think it's harder than, than,

than one like, yes they get 10 hours of

um [NOISE] non-repetitive data and it's one of those things you know,

if I take an hour of highway sound and loop

it you can't tell the difference because all highway sound sounds the same.

I just can't tell one minute of highway sound from another one

but if you have one hour of highway sound looped 10 times,

the learning algorithm will actually perform much less well

than if you have 10 hours of fresh highway sound.

So this I would actually have a harder time doing.

I think I've probably I,

I, I would - if I were doing this I,

because of these problems I would probably budget until sometime tomorrow, right?

May - may - maybe, maybe 9:00 PM or something. Maybe that's .

I'm not sure. Um, the one surprise

to me was some people thought they could do this by tonight.

Uh, I, a - again I've used is actually

a huge process to set up , get people on board.

Um, and especially to get them on microphone.

Uh, uh, I don't know if you implement something

on Flash that can speak in their .

[NOISE] And then Flash isn't supported, it's,

it's, actually, it's actually not that easy to get a lot of

Turkers to do this and the global supply of [NOISE] Turkers is also .

So I would, if I were doing this I would

probably [NOISE] I don't know maybe a week or something, right?

Hard to say, I'm not sure um.

But so the specific opinion isn't that important but I want

you to go through this exercise because this is how um,

efficient startup teams should you know,

a list of things and then you all figure out how long you

think it'll take to do these things and I think uh,

you can have a debate about how high quality the data is, I

think you can get very high quality data from this and from this.

Uh, [NOISE] I, I, I just don't trust a lot of those online .

Uh, but this is really fast and you can get pretty high-quality data.

I would probably do this to collect the background sound to get going, right?

But I think that part of their I see of you know,

fast-moving teams is pretty much exactly what you did.

Which one, what is that the exercise of

the lists of options and then really

estimating or what time can we get this done and then use that to pick an option, right?

Um, and then I wanna just mention one last thing uh,

which is that [NOISE] these differences matter, right?

Um, you know I've actually built, I've built a lot systems, built a lot of machine learning systems.

But um, oh and,

and I think by the way if you do everything we just

described and you'll see this later in a problem set.

Uh, you can actually with this set of

ideas pretty much this set of ideas that we just went through today.

You can actually build, build a,

build a pretty decent trigger word

or wake registering word .

In fact we'll through pretty much this in a later homework exercise.

But now you know when you get to that homework exercise when you do RNNs uh,

you know how you could come up with this sort of process yourself if,

if you didn't already know how to make these types of choices. Yeah, question

Just one question. Uh, I conduct my research []

how my affect my result but

at the beginning it seems like it's not important like my [inaudible]

So when I it,

it, my, what mess my result but what, what you have to think about it?

Yeah so my advice - [NOISE] ,

does the microphone affect your result, right?

[OVERLAPPING] My, my advice is would be to uh,

get something going quick and dirty and then develop a data set, right,

with the actual types of data you can develop on

your real microphone and then see if there is a problem and it may be er,

different microphones do have different characteristics.

And if it is a problem then go back and think about

how you collect data that's more representative how you

test. I wanna mention one more quick thing, you were handed

class surveys and wants to do something real quick which is um,

I wanna tell you why these things really matter which is um,

if this is uh, performance, right?

Uh, uh, , actually let's say error.

, this is time, right?

And if this is today and you're the CEO of a startup remember that's,

that's what we're doing in this lesson.

And this is six months from now,

but this is 12 months from now.

Um, you know maybe if a competitor,

actually maybe, maybe I don't know.

[LAUGHTER] Maybe because we've talked about this so much in

this class maybe two of you in this class are gonna build a startup and be a competitor.

Um, but over time most machine learning teams,

you know the error actually goes down over time as you work on problems,

very good and this is what I see in practical projects.

You know, we work on project, improve the system,

and the error actually goes down over time as you work on

this over the next 12 months say but if you're really a CEO of

a startup doing this [NOISE] and it turns out that it's

the that have the discipline to constantly be the most efficient.

Um, don't do something that takes you two days.

If you can get a similar result in one day.

The difference is not that you're one day slower,

the difference is that you are 2x faster, right?

And then, then having that if we take

this whole chart and it on the .

Um then you [NOISE] want to

be this startup that you know makes

the same amount of process in six months instead of 12 months, right?

And because uh, if you're able to do this then

your startup will actually perform much better in .

Assuming your accuracy's important which it seems to be for wake word.

And so don't think of this as saving you a day here

and there think of this as making your team twice as fast.

And that's the difference between this level

of performance and that level of performance.

So that's why when I'm, you know building teams to

and executing these projects I tend to be pretty about uh,

making sure we're very efficient in exploring the options

and [NOISE] don't wait till tomorrow to collect data of

quality when you have a better idea of collecting

data by today because the difference is not that you wasted 12 hours,

the difference you are twice as slow as a company, right?

So I think uh, so hopefully through this example on

your ongoing experiences throughout this course it can help

you continue to get better at this.

Right. Um, last thing we want to do [NOISE] was uh,

we're about the course and go ahead um,

we went to hand out a survey um,

an anonymous survey uh,

to get some feedback from you about this class.

And whenever we get these surveys uh,

we end up uh, uh,

thanks to previous generations of students' feedback.

We've already been gradually making class better.

So I think Ken and I actually read all of these questions ourselves and try to find

ways to take your feedback to improve the class so if you can take you know, five minutes um,

for this survey and you can hand it in just drop it off up here in front.

Uh, we're very grateful for your suggestions.

Okay? , I think if you haven't entered your ID yet uh,

you can still do so but uh, that's it for today.

So please fill out the survey and just

drop it off back in front then we'll wrap up. Okay, thank you.

知识点

重点词汇
embedded [ɪm'bedɪd] v. 嵌入（embed的过去式和过去分词形式） adj. 嵌入式的；植入的；内含的 { :6007}

appropriately [ə'prəʊprɪətlɪ] adv. 适当地；合适地；相称地 { :6089}

continuation [kənˌtɪnjuˈeɪʃn] n. 继续；续集；延长；附加部分；扩建物 {toefl gre :6109}

detection [dɪˈtekʃn] n. 侦查，探测；发觉，发现；察觉 {cet4 cet6 gre :6133}

detections [dɪ'tekʃnz] n. 察觉，发觉，侦查( detection的名词复数 ) { :6133}

downloaded [ˈdaunləudid] vt. [计] 下载 { :6382}

download [ˌdaʊnˈləʊd] vt. [计] 下载 {gk :6382}

downloading ['daʊnləʊdɪŋ] n. 下装，[计] 下载；[计] 下传 v. [计] 下载（download的现在分词形式） { :6382}

tricky [ˈtrɪki] adj. 狡猾的；机警的 { :6391}

robust [rəʊˈbʌst] adj. 强健的；健康的；粗野的；粗鲁的 {cet6 ky toefl ielts gre :6419}

hygiene [ˈhaɪdʒi:n] n. 卫生；卫生学；保健法 {toefl ielts gre :6492}

randomly ['rændəmlɪ] adv. 随便地，任意地；无目的地，胡乱地；未加计划地 { :6507}

rigorous [ˈrɪgərəs] adj. 严格的，严厉的；严密的；严酷的 {cet6 ky toefl ielts :6606}

uncommon [ʌnˈkɒmən] adj. 不寻常的；罕有的 adv. 非常地 { :6725}

unlimited [ʌnˈlɪmɪtɪd] adj. 无限制的；无限量的；无条件的 {cet6 :6742}

inaudible [ɪnˈɔ:dəbl] adj. 听不见的；不可闻的 { :6808}

algorithm [ˈælgərɪðəm] n. [计][数] 算法，运算法则 { :6819}

algorithms [ˈælɡəriðəmz] n. [计][数] 算法；算法式（algorithm的复数） { :6819}

dubious [ˈdju:biəs] adj. 可疑的；暧昧的；无把握的；半信半疑的 {cet6 ky toefl ielts gre :6855}

temporal [ˈtempərəl] n. 世间万物；暂存的事物 adj. 暂时的；当时的；现世的 n. (Temporal)人名；(法)唐波拉尔 {toefl gre :6868}

squash [skwɒʃ] n. 壁球；挤压；咯吱声；南瓜属植物；（英）果汁饮料 vt. 镇压；把…压扁；使沉默 vi. 受挤压；发出挤压声；挤入 {cet6 toefl gre :6923}

simplifying [ˈsimplifaiŋ] v. 简约，简化（simplify进行时形式） { :7074}

cafeteria [ˌkæfəˈtɪəriə] n. 自助餐厅 {gk cet4 cet6 ky ielts :7076}

Turk [tә:k] n. 土耳其马；土耳其人 { :7202}

cardinal [ˈkɑ:dɪnl] adj. 主要的，基本的；深红色的 n. 红衣主教；枢机主教；鲜红色；【鸟类】(北美)主红雀 {ky toefl ielts gre :7343}

binary [ˈbaɪnəri] adj. [数] 二进制的；二元的，二态的 { :7467}

rainier [ ] n. 雷尼尔山；赤阳苹果 n. (Rainier)人名；(英)雷尼尔；(法)兰尼埃 { :7477}

validate [ˈvælɪdeɪt] vt. 证实，验证；确认；使生效 {toefl gre :7516}

sharpens [ˈʃɑ:pənz] v. （使）提高( sharpen的第三人称单数 ); （使声音）变得尖锐; （使）变得更好（或技术更高、更有效等）; （使）变得锋利 { :7542}

blog [blɒg] n. 博客；部落格；网络日志 { :7748}

intuitions [ˌɪntjuˈiʃənz] n. 直觉( intuition的名词复数 ); 凭直觉感知的知识; 直觉力 { :7905}

hypothetical [ˌhaɪpəˈθetɪkl] adj. 假设的；爱猜想的 {gre :8049}

moderately [ˈmɒdərətli] adv. 适度地；中庸地；有节制地 {cet6 :8132}

berkeley ['bɑ:kli, 'bә:kli] n. 贝克莱（爱尔兰主教及哲学家）；伯克利（姓氏）；伯克利（美国港市） { :8189}

isaac ['aizәk] n. 以撒（希伯来族长，犹太人的始祖亚伯拉罕和萨拉的儿子）；艾萨克（男人名） { :8205}

hack [hæk] n. 砍，劈；出租马车 vt. 砍；出租 vi. 砍 n. (Hack)人名；(英、西、芬、阿拉伯、毛里求)哈克；(法)阿克 {gre :8227}

reviewers [rɪv'ju:əz] n. 评论者（reviewer的复数）；评审员 { :8282}

tha [,ti ɛtʃ 'e] abbr. thaumatin 竹芋蛋白 { :8395}

variability [ˌveəriəˈbɪləti] n. 可变性，变化性；[生物][数] 变异性 {toefl :8403}

skimming ['skɪmɪŋ] n. 撇取浮沫；浮渣 v. 撇去…的浮物（skim的ing形式） { :8515}

skim [skɪm] n. 撇；撇去的东西；表层物；瞒报所得的收入 adj. 脱脂的；撇去浮沫的；表层的 vt. 略读；撇去…的浮物；从…表面飞掠而过；去除；（为逃税而）隐瞒（部分收入） vi. 浏览；掠过 {cet4 cet6 ky toefl ielts gre :8515}

augment [ɔ:gˈment] n. 增加；增大 vt. 增加；增大 vi. 增加；增大 {cet6 ky toefl ielts gre :8589}

neural [ˈnjʊərəl] adj. 神经的；神经系统的；背的；神经中枢的 n. (Neural)人名；(捷)诺伊拉尔 { :9310}

compress [kəmˈpres] vt. 压缩，压紧；精简 vi. 受压缩小 {cet4 cet6 ky toefl ielts gre :9510}

sparse [spɑ:s] adj. 稀疏的；稀少的 {toefl ielts gre :9557}

browser [ˈbraʊzə(r)] n. [计] 浏览器；吃嫩叶的动物；浏览书本的人 { :9689}

nope [nəʊp] adv. 不是，没有；不 { :9734}

robotic [rəʊˈbɒtɪk] n. 机器人学 adj. 机器人的，像机器人的；自动的 { :10115}

metric [ˈmetrɪk] adj. 公制的；米制的；公尺的 n. 度量标准 {cet4 cet6 ky ielts :10163}

metrics ['metrɪks] n. 度量；作诗法；韵律学 { :10163}

obsessive [əbˈsesɪv] adj. 强迫性的；着迷的；分神的 { :10199}

strategically [strə'ti:dʒɪklɪ] adv. 战略性地；战略上 { :10245}

micro [ˈmaɪkrəʊ] adj. 极小的；基本的；微小的 n. 微型计算机；微处理器 { :10740}

sequential [sɪˈkwenʃl] adj. 连续的；相继的；有顺序的 {gre :10797}

synthesized ['sɪnθɪsaɪzd] adj. 合成的；综合的 v. 合成（synthesize的过去分词）；综合 { :10905}

synthesize [ˈsɪnθəsaɪz] vt. 合成；综合 vi. 合成；综合 {cet6 toefl :10905}

wha [ ] [医][=warmed,humidified air]温暖、潮湿的空气 { :11046}

whack [wæk] n. 重击；尝试；份儿；机会 vt. 重打；猛击；击败；削减 vi. 重击 { :11376}

manually ['mænjʊəlɪ] adv. 手动地；用手 {toefl :12167}

amazon ['æməzən] 亚马逊；古希腊女战士 { :12482}

configure [kənˈfɪgə(r)] vt. 安装；使成形 { :13210}

prioritizing [praiˈɔritaizɪŋ] v. 目标优选；指定优先权；依主次程序排列（prioritize的ing形式） { :13446}

rigorously ['rɪɡərəslɪ] adv. 严厉地；残酷地 { :13742}

unbalanced [ˌʌnˈbælənst] adj. 不平衡的；错乱的；不稳定的；收支不平衡的，未决算的 v. 使失去平衡；使（精神）错乱（unbalance的过去分词） { :14269}

anonymously [ə'nɒnɪməslɪ] adv. 不具名地；化名地 { :14502}

proportionate [prəˈpɔ:ʃənət] adj. 成比例的；相称的；适当的 vt. 使成比例；使相称 { :15394}

mathematically [ˌmæθə'mætɪklɪ] adv. 算术地，数学上地 { :15474}

oscillate [ˈɒsɪleɪt] vt. 使振荡；使振动；使动摇 vi. 振荡；摆动；犹豫 {ielts gre :15486}

circuitry [ˈsɜ:kɪtri] n. 电路；电路系统；电路学；一环路 { :15641}

stanford ['stænfәd] n. 斯坦福（姓氏，男子名）；斯坦福大学（美国一所大学） { :15904}

难点词汇
SP [ ] abbr. 自行驱动的（Self Propelled）；服务提供商（Service Provider） { :16790}

waveform ['weɪvfɔ:m] n. [物][电子] 波形 { :17729}

labeling ['leɪblɪŋ] n. 标签；标记；[计] 标号 v. 贴标签；分类（label的现在分词） { :17997}

dataset ['deɪtəset] n. 资料组 { :18096}

doable [ˈdu:əbl] adj. 可做的 { :18441}

anytime ['enɪˌtaɪm] adv. 任何时候；无例外地 { :19391}

scrappy [ˈskræpi] adj. 爱打架的；杂凑的；不连贯的；生气勃勃的 {gre :20002}

brainstorm [ˈbreɪnstɔ:m] n. 集思广益；头脑风暴；灵机一动 vt. 集体讨论；集思广益以寻找 vi. 集体讨论；动脑筋；出主意 { :20387}

abbreviate [əˈbri:vieɪt] vt. 缩写，使省略；使简短 vi. 使用缩写词 {toefl gre :20690}

brainstorming [ˈbreɪnstɔ:mɪŋ] n. 集体研讨；发表独创性意见 v. 集思广益以寻找；集体自由讨论（brainstorm的ing形式） { :20927}

microfilms ['maɪkrəʊfɪlmz] n. （拍摄文件等用的）缩微胶卷( microfilm的名词复数 ) { :22448}

augmentation [ˌɔ:ɡmen'teɪʃn] n. 增加，增大；增加物 {gre :22669}

RI [ ] abbr. （美国）罗得岛州（Rhode Island）；剩余收入（residual income） n. (Ri)人名；(日)理(名) { :24818}

rea [ ] abbr. （美）农村电气化管理局（Rural Electrification Administration）；铁路快运代办处（Railway Express Agency） n. (Rea)人名；(英)雷；(西、意、罗、瑞典)雷亚 { :25673}

debugging ['di:'bʌgɪŋ] n. 调试以排除故障 v. 排除故障；发现并改正错误（debug的ing形式） { :28755}

dev [dev] abbr. 发展（develop）；偏差（deviation）；开发人员（developer）；设备驱动程序 n. (Dev)人名；(尼、印)德夫 { :28908}

workflow ['wɜ:kfləʊ] n. 工作流，工作流程 { :31107}

prototyping [prəʊtə'taɪpɪŋ] n. [计] 样机研究；原型设计 { :37954}

prob [p'rɒb] abbr. problem 问题; probability 概率; problematic 问题的; problematical 问题的 { :38611}

rebalancing [ ] [财]调整资金组合 { :41158}

mindset [ˈmaɪndset] n. 心态；倾向；习惯；精神状态 { :42826}

eunice [ˈju:nis] n. 尤妮斯（女子名，义为快乐的胜利） { :44408}

生僻词
Asimov [ ] 阿西莫夫（人名）

brainstormed [ˈbreɪnˌstɔ:md] v. 集中各人智慧猛攻( brainstorm的过去式和过去分词 )

caltrain [ ] [网络] 加州铁路；加州火车；加州湾区铁路

email ['i:meɪl] n. 电子信函 vt. 给…发电子邮件 n. (Email)人名；(法)埃马伊 {zk ielts :0}

emails ['iːmeɪl] n. 电子信函 vt. 给…发电子邮件 n. (Email)人名；(法)埃马伊

ght [ ] abbr. growth hormone treatment 生长激素治疗; Goodenough-Harris test 戈-哈试验; general hypoxemic test 普通血氧不足试验; glaucoma hemifield test 青光眼半视野试验

GitHub [ ] [网络] 源码托管；开源项目；控制工具

google [ ] 谷歌；谷歌搜索引擎

hacky ['hækɪ] n. 出租汽车司机

hyperparameter [ ] [网络] 超参数；分别有一个带有超参数

overfitting [ ] n. 过适；[数] 过度拟合 v. 过适（overfit现在分词）

rebalance [re'bæləns] 再平衡

resample [ri:'sæmpl] 重采样，重复取样

resampling [re'sɑ:mplɪŋ] 再取样，重采样，重取样

reweighting [ ] [网络] 权；重权

startup ['stɑ:tʌp] n. 启动；开办

startups [ ] n. 创业（startup的复数）；开办

youtube ['ju:tju:b] n. 视频网站（可以让用户免费上传、观赏、分享视频短片的热门视频共享网站）

词组
a clip [ ] [网络] 主叫线路识别显示

a hack [ ] [网络] 网络攻击

a tech [ ] [网络] 泓晟；科技；艾德贴标机

Amazon Mechanical Turk [ ] [网络] 亚马逊土耳其机器人；亚马逊人端运算平台；亚马逊的土耳其机器人网站

audio clip [ ] [网络] 音频剪辑；音频复制文件；音频剪切

audio data [ ] [网络] 语音数据；音频数据；声音数据

audio source [ ] [网络] 音频源；音源；音频文件

batch number [ ] un. 组号 [网络] 批号；批次号；生产批号

binary classification [ ] 二元分类

blog post [ ] [网络] 博客文章；博客帖子；部落格文章

chat with [ ] [网络] 和…聊天；与…闲聊；与某人聊天

classification problem [ ] 分类问题

clip out [ ] vt.剪下来

clip to [ ] vt.夹在...上

data synthesis [ ] 数据合成

detection problem [ ] 探测问题

detection system [ ] un. 检测系统；探测系统 [网络] 侦测系统；检测系统试剂；检测体系

duration of [ ] 持续时间

good algorithm [ ] un. 优质算法

halfway through [ ] [网络] 过半了；在某事过了一半时；没有译准

homework problem [ ] [网络] 作业性问题；家庭作业问题

homework problems [ ] [网络] 功课问题

horizontal axis [ˌhɔriˈzɔntəl ˈæksis] un. 横轴；水平轴 [网络] 横坐标；水平轴线；横轴线

in parallel [in ˈpærəlel] adv. 并联；并行 [网络] 并行的；平行；并联的

Isaac Asimov [ ] [网络] 阿西莫夫；艾西莫夫；艾萨克·阿西莫夫

learning algorithm [ ] [网络] 学习演算法；学习算法；学习机制

light bulb [lait bʌlb] n. 灯泡 [网络] 电灯泡；右下灯泡；白炽灯

Mechanical Turk [ ] 土耳其机器人

minus one [ ] [网络] 桃花源；幸福意外；谢谢你捧场

neural net [ ] adj. 神经网络的 [网络] 神经网络法；类神经网分析；神经法则

neural network [ˈnjuərəl ˈnetwə:k] n. 神经网络 [网络] 类神经网路；类神经网络；神经元网络

neural network architecture [ ] 《英汉医学词典》neural network architecture 神经网络构筑学

neural networks [ ] na. 【计】模拟脑神经元网络 [网络] 神经网络；类神经网路；神经网络系统

noisy data [ˈnɔizi ˈdeitə] [医]噪声数据

on the web [ ] [网络] 在互连网上；在互联网上；网站内容

out of whack [aut ɔv hwæk] na. 〈非正式〉坏了；不对头；(身体)不舒服；(机械)运转不正常 [网络] 紊乱；出故障；运行不正常

plus minus [ ] un. 正负 [网络] 加减；加减符；加减乐团

temporal data [ˈtempərəl ˈdeitə] [网络] 暂态数据；时态数据；时序资料

the algorithm [ ] [网络] 算法

the duration [ ] [网络] 持续时间；的期限；时值

the horizontal [ ] [网络] 中脑水平切面

the marketplace [ ] [网络] 市集；市场购物中心；市场价值

the vertical [ ] [网络] 垂直性

the web [ ] [网络] 网；网页；网络

to download [ ] 下载

ton of [ ] 大量,许多

trigger word [ˈtriɡə wə:d] [网络] 触发字；触发语；触发者

variance in [ ] ...的变化

vertical axis [ˈvə:tikəl ˈæksis] na. 【数】(直)立轴 [网络] 垂直轴；纵轴；竖轴

WEB BROWSER [web ˈbrauzə] n. Web浏览器 [网络] 网络浏览器；网页浏览器；网路浏览器

word detection [ ] [计] 字检测

zero detection [ ] 检“0”

zero zero [ˈziərəu ˈziərəu] 零

惯用语
all right
and then
and um
and you know
i don't know
i think
let's say
robert turn on
so um
train noise
turn on
turned off
you know

单词释义末尾数字为词频顺序
zk/中考 gk/中考 ky/考研 cet4/四级 cet6/六级 ielts/雅思 toefl/托福 gre/GRE
* 词汇量测试建议用 testyourvocab.com