Everyone should have one extra token, as a token (hahahahahahaha) of good will.
Remember that P1 is due at 2:20pm, and that HW04 is due next Tuesday before class.
28 September 2010
27 September 2010
Lecture 9: Dynamic programming and the CKY algorithm
We will begin by briefly attempting to recover from my confusion explanation of X-bar theory, and talk about specifiers, complements and adjuncts. We'll then finish up the DP versus NP distinction and IP for S, and then movement. We'll then get to the computational side...
Parsing as search
At a basic level, the parsing problem is to find a tree rooted at S (or IP...) that spans the entire sentence. We can think of this as a (bottom-up) search problem over partial hypotheses (eg., hypotheses that span an initial segment of the sentence).
Dynamic Programming
Search can be expensive because we end up re-creating the same constituents over and over again. This suggests dynamic programming. For the following, we will assume our grammar has been binarized, so rules always look like "A -> B C". Unary rules (eg., "A -> B") are do-able, but are a simple extension.
We will write sentences as "0 The 1 man 2 ate 3 a 4 sandwich 5" where the numbers indicate positions. A partial hypothesis has a label (eg., "NP") and a span (eg, "from 0 to 2"). We will write this as NP[0,2].
The key insight in dynamic programming is to store a chart. A chart is a two-dimensional array of (eg.) sets. Chart[i,j] will store all possible constituents that can span the range from i to j.
Initialization:
Time complexity is O(|sentence|^3 * |Grammar|).
Parsing as search
At a basic level, the parsing problem is to find a tree rooted at S (or IP...) that spans the entire sentence. We can think of this as a (bottom-up) search problem over partial hypotheses (eg., hypotheses that span an initial segment of the sentence).
- At any given time, we've built a partial tree over the first n words
- Want to extend it to include the n+1st word
- A search step involves:
- Choose a rule from the grammar
- Choose existing constituents that match that rule
- Create a new constituent
Dynamic Programming
Search can be expensive because we end up re-creating the same constituents over and over again. This suggests dynamic programming. For the following, we will assume our grammar has been binarized, so rules always look like "A -> B C". Unary rules (eg., "A -> B") are do-able, but are a simple extension.
We will write sentences as "0 The 1 man 2 ate 3 a 4 sandwich 5" where the numbers indicate positions. A partial hypothesis has a label (eg., "NP") and a span (eg, "from 0 to 2"). We will write this as NP[0,2].
The key insight in dynamic programming is to store a chart. A chart is a two-dimensional array of (eg.) sets. Chart[i,j] will store all possible constituents that can span the range from i to j.
Initialization:
- For each position i=1..|sentence|
- For each possible part of speech P for word i
- Insert P into Chart[i-1,i]
- (This means that we've found a way to cover i-1 to i using non-terminal P)
- For phrase lengths l=2..|sentence|
- For each start position i=0..|sentence|-1
- Set end position k=i+2
- For each split point j=i+1..k-1
- For each grammar rule X -> Y Z
- If we can find a Y in Chart[i,j] and a Z in Chart[j,k]:
- Add X to Chart[i,k] with backpointers ala Viterbi
- (This means we've found a way to cover i to k using the non-terminal X)
Time complexity is O(|sentence|^3 * |Grammar|).
24 September 2010
23 September 2010
Lecture 8: Context Free Grammars
Context free grammars:
* strictly more powerful way of expressing languages than a regular expression
* it's languages are called context free languages
* many human languages are approximately context-free (there are always exceptions), while most are not regular
Context free grammars consist of:
* An alphabet (the "words" in the language)
* A set of non-terminals
* A "start" non-terminal
* A set of rewrite rules for getting from non-terminals to either other non-terminals or terminals
Simple context free grammar:
This is actually a finite language (what are the sentences in this language)?
Derivations are "proofs" that a sentence belongs to the language by giving a/the sequence of productions that leads to that sentence.
We'll usually draw trees as proofs, but other formats are more useful as text, for instance: (S (NP (D the) (N sandwich)) (VP (V ate) (NP (D the) (N man))))
In such grammars, we will often call the things just above the terminals (parts of speech) "pre-terminals".
We can make this grammar more fun by adding rules:
Now we have an infinite grammar that can generate strings like "the man with the man with the sandwich with the man with the ..."
Note also that this grammar is now ambiguous. "the man likes the man with the sandwich" could either be (S (NP the man) (VP (V likes) (NP (NP the man) (PP (P with) (NP the sandwich))))) or (S (NP the man) (VP (VP (V likes) (NP the man)) (PP (P with) (NP the sandwich)))). The first one has the "correct" meaning, while the second would mean that he uses the sandwich to like the man.
A simple context free language that is not regular is:
Which describes the language "{ a^n b^n : n > 0}" of strings of equal numbers of as and bs. You can show this language is not regular formally, or you can recognize that a FSM would kind of have to have a N-many states to recognize the finite version of this language up to sequences of length N.
A popular approach for context free models of human language is "X-bar" theory. Here, we have base categories (X0 = N,V,A,P), then "projections" of these categories: X' (aka X-bar), and X'' (aka X-double-bar aka XP). The general form of a phrase is (XP (YP (X' (X0 ZP)))), where YP is the "specifier", X0 is the head, and ZP is the complement (argument). Only full projections can fit in as adjunctions, specifiers or complements.
The idea is that the arguments fit in the "bar" slots, and the adjuncts fit in the "phrase" slots. Roughly, an argument is something that is sort-of necessary and an adjunct is sort-of optional. Usually you can have arbitrarily many adjuncts, but only a fixed number of arguments. For instance in "the man ate a sandwich of turkey with lettuce on a plate with gusto at the restaurant" the verb "ate" has one argument (sandwich) and two adjuncts ("with gusto" and "at the restaurant") while sandwich has three adjunctions "with turkey, with lettuce and on a plate).
Here are some example sentences in X-bar style analyses:
(Warning: by now, lots of people no longer think NPs are really headed by nouns, but are rather headed by determiners, so that what we usually think of as an "NP" is really a "DP." That is, instead of "(NP (N' (DP (D' (D the))) (N man)))" you would have "(DP (D' (D the) (NP (N' (N man)))))". In this class, we'll always use NP and not DP, but you should know that this is slightly outdated.)
* strictly more powerful way of expressing languages than a regular expression
* it's languages are called context free languages
* many human languages are approximately context-free (there are always exceptions), while most are not regular
Context free grammars consist of:
* An alphabet (the "words" in the language)
* A set of non-terminals
* A "start" non-terminal
* A set of rewrite rules for getting from non-terminals to either other non-terminals or terminals
Simple context free grammar:
S -> NP VP NP -> D N VP -> V NP D -> "the" D -> "a" N -> "man" N -> "sandwich" V -> "ate" V -> "likes"
This is actually a finite language (what are the sentences in this language)?
Derivations are "proofs" that a sentence belongs to the language by giving a/the sequence of productions that leads to that sentence.
We'll usually draw trees as proofs, but other formats are more useful as text, for instance: (S (NP (D the) (N sandwich)) (VP (V ate) (NP (D the) (N man))))
In such grammars, we will often call the things just above the terminals (parts of speech) "pre-terminals".
We can make this grammar more fun by adding rules:
NP -> NP PP VP -> VP PP PP -> P NP P -> "with"
Now we have an infinite grammar that can generate strings like "the man with the man with the sandwich with the man with the ..."
Note also that this grammar is now ambiguous. "the man likes the man with the sandwich" could either be (S (NP the man) (VP (V likes) (NP (NP the man) (PP (P with) (NP the sandwich))))) or (S (NP the man) (VP (VP (V likes) (NP the man)) (PP (P with) (NP the sandwich)))). The first one has the "correct" meaning, while the second would mean that he uses the sandwich to like the man.
A simple context free language that is not regular is:
S -> a S b S -> ab
Which describes the language "{ a^n b^n : n > 0}" of strings of equal numbers of as and bs. You can show this language is not regular formally, or you can recognize that a FSM would kind of have to have a N-many states to recognize the finite version of this language up to sequences of length N.
A popular approach for context free models of human language is "X-bar" theory. Here, we have base categories (X0 = N,V,A,P), then "projections" of these categories: X' (aka X-bar), and X'' (aka X-double-bar aka XP). The general form of a phrase is (XP (YP (X' (X0 ZP)))), where YP is the "specifier", X0 is the head, and ZP is the complement (argument). Only full projections can fit in as adjunctions, specifiers or complements.
The idea is that the arguments fit in the "bar" slots, and the adjuncts fit in the "phrase" slots. Roughly, an argument is something that is sort-of necessary and an adjunct is sort-of optional. Usually you can have arbitrarily many adjuncts, but only a fixed number of arguments. For instance in "the man ate a sandwich of turkey with lettuce on a plate with gusto at the restaurant" the verb "ate" has one argument (sandwich) and two adjuncts ("with gusto" and "at the restaurant") while sandwich has three adjunctions "with turkey, with lettuce and on a plate).
Here are some example sentences in X-bar style analyses:
- (PP (DP (D' (D so))) (P' (ADVP (ADV' (ADV completely))) (P' (P in) (NP (N' (DP (D' (D the))) (N wrong))))))
- (IP (NP Poirot) (I' (Infl will) (VP (V' (V abandom)) (NP (N' (DP (D' (D the))) (N investigation))))))
- Identical structure for the second half of "I did not expect Poirot to abandon the investigation."
- Under some theories, (IP (NP Poirot) (I' (Infl +ed) (VP (V abandon) (NP the investigation))) is appropriate
(Warning: by now, lots of people no longer think NPs are really headed by nouns, but are rather headed by determiners, so that what we usually think of as an "NP" is really a "DP." That is, instead of "(NP (N' (DP (D' (D the))) (N man)))" you would have "(DP (D' (D the) (NP (N' (N man)))))". In this class, we'll always use NP and not DP, but you should know that this is slightly outdated.)
21 September 2010
HW3: you may ignore the part about Chomsky normal form
If you've already done it, or choose to do it anyway, I'll give you a tiny bit of extra credit. I thought we'd be a bit further alon by now when I wrote that homework a few weeks ago. (You should still make a semi-valiant effort on the first part of the CFG question).
16 September 2010
Lecture 6: Part of Speech Tagging
Woohoo, some linguistics!
Parts of speech are useful for several things:
Note: not all parts of speech exist in every language, and if a words is in one part of speech in language X, it's translation isn't necessarily the same part of speech in language Y. (Eg., beautiful in English/Korean, or green in English/Japanese).
Some major open classes:
Some closed class:
Rule based approaches usually consist of:
Rules get more complicated...
Parts of speech are useful for several things:
- Knowing how to pronounce a word (object noun vs verb)
- Knowing what words come around it (personal pronouns vs possessive pronouns)
- Knowing how it behaves syntactically -- most verbs behave the same and are "substitutable" from a syntactic perspective (though probably not from a semantic perspective)
Note: not all parts of speech exist in every language, and if a words is in one part of speech in language X, it's translation isn't necessarily the same part of speech in language Y. (Eg., beautiful in English/Korean, or green in English/Japanese).
Some major open classes:
- Nouns: things. Concrete (ship, chair), abstract (happiness) or verb-like terms "gerunds" (running), proper "Hal"
- Verbs: actions. Eat, drink, sleep; English has auxiliaries but these are usually considered closed class (see below)
- Adjectives: mostly modify nouns (blue, big, old)
- Adverbs: mostly modify verbs (slowly, very) but also other weird things (every, unfortunately, not)
Some closed class:
- Prepositions: the bunny ran ___ the hill
- Determiners: articles: a, an, the; others: this blog
- Pronouns: (personal) she, I, who, (possessive) my, her, (wh) what, how
- Conjunctions: and, but, either/or
- Auxiliary verbs: would, could, can, to -- but not the preposition "to"
- Particles: like prepositions, but go with verbs (put down, raise up, etc.)
- Numerals: one, two, three, first, second, third, etc.
Rule based approaches usually consist of:
- A small lexicon that covers some common open-class words and most common closed-class words
- A set of rules for disambiguation
- A set of heuristics for guessing the tag of unknown (open class) words
- Does it end in "ed"? "ly"? "ing"? "s"?
- Is it capitalized? (Doesn't apply to P1)
Rules get more complicated...
- Noun/Verb disambiguation is usually the biggest issue
- Is there a determiner or adjective to my left? How far?
- Am I at the end of a sentence?
- Is the an auxiliary to my left? An adverb?
- There's also Verb/Aux disambiguation
- Verb Aux is rare order, but Aux Verb is common
- Is there a verb anywhere in the sentence?
- Noun/WH-complementizer disambiguation
- John saw that book
- John saw that book burn up
- Is there a verb to the right, nearby?
15 September 2010
Slightly updated notes.sty file
If you're annoyed with the "solution" environment in the notes.sty file that I've provided, please download the new one off the web page. It is much friendlier and will let you have better equations, line breaks, etc., inside solutions. (Thanks to your fellow student, Ke Wu, for the patch!)
14 September 2010
Lecture 5: N-gram models
Today we'll talk first about estimation of:
We'll then run through the chain rule to break p(s) = p(w_1 w_2 ... w_N) = prod_n p(w_n | w_{1..n-1}) and then apply a Markov assumption to the terms in the product to arrive at the n-gram language model.
This language model will be a product of Multinomial random variables, so we can apply everything we know about estimation.
Finally, we'll talk about:
(Warning: the treatment we give of out of vocabulary -- "OOV" -- terms will not be very detailed!)
- Bernoulli random variables (coin flips)
- p(Heads) ~= Count(Heads) / Count(total flips)
- Multinomial random variables (die rolls)
- p(die comes up "4") ~= Count("4"s) / Count(total rolls)
- Now think about a giant die with |Vocabulary|-many sides!
We'll then run through the chain rule to break p(s) = p(w_1 w_2 ... w_N) = prod_n p(w_n | w_{1..n-1}) and then apply a Markov assumption to the terms in the product to arrive at the n-gram language model.
This language model will be a product of Multinomial random variables, so we can apply everything we know about estimation.
Finally, we'll talk about:
- Smoothing (adding fake counts)
- Structural zeros versus statistical zeros
(Warning: the treatment we give of out of vocabulary -- "OOV" -- terms will not be very detailed!)
Slight change to P1 "tagger" script
The old rules were that if you ran the online tagger at time X, you would have to wait until time X+24 hours to run it again.
To be nice to people who have work schedules, this has been "softened." The new policy is that every 24 hours you earn one "token." You spend tokens by running the tagger. However, you can only keep a total of three tokens at any one time.
The effect of this policy is that you could continue doing one handin every 24 hours, or you could do two every 48 hours, or three every 72 hours. But not more than that. So on average it's the same, but you get a bit more flexibility.
The script is a bit more complicated now (the logic is harder), so please let me know if you run into any bugs.
Also, if you run the handin and just enter your ID, but not a file to upload, it will tell you how many tokens you have left and, if it's less than three, the date/times that those tokens were used so you can figure out how long until you get a new one (you get a new one when the oldest one "dies" -- i.e., 72 hours pass since the oldest one).
Hopefully this is clear :).
To be nice to people who have work schedules, this has been "softened." The new policy is that every 24 hours you earn one "token." You spend tokens by running the tagger. However, you can only keep a total of three tokens at any one time.
The effect of this policy is that you could continue doing one handin every 24 hours, or you could do two every 48 hours, or three every 72 hours. But not more than that. So on average it's the same, but you get a bit more flexibility.
The script is a bit more complicated now (the logic is harder), so please let me know if you run into any bugs.
Also, if you run the handin and just enter your ID, but not a file to upload, it will tell you how many tokens you have left and, if it's less than three, the date/times that those tokens were used so you can figure out how long until you get a new one (you get a new one when the oldest one "dies" -- i.e., 72 hours pass since the oldest one).
Hopefully this is clear :).
13 September 2010
HW03 is posted
(You'll need to look at it for the HMM part of P1.)
10 September 2010
Project matchmaking
If you need, or want, one or more partners for the first project (you're free to switch up between projects if you'd like), please email me now (before you forget!). We'll have a 5-10 minute matchmaking session at the end of class on Tuesday. When you email me, please give me your name and your department (CS, EE, Linguistics, etc...) and whether you have a team at all (eg., a team of two who wants one more) or are currently alone (i.e., need one or two people to make a team).
09 September 2010
Lecture 4: Probability and statistics
You should be sure that you are comfortable with the basic rules of probability:
You should also be comfortable with some basic statistics:
- Conditional probabilities: what does p(a|b) mean?
- Marginalization: p(a) = sum_b p(a,b)
- Chain rule: p(a,b) = p(a) p(b|a) = p(b) p(a|b)
- Iterated chain rule: p(a,b,c,d,...,z) = p(a) p(b|a) p(c|a,b) ... p(z|a,b,c,...,y)
- Bayes' rule: p(a|b) = p(a) p(b|a) / p(b)
- Generalized Bayes' rule: how to do BR on p(a|b,c) or p(a,b|c)
- Expectations: E_p[f(x)] = sum_x p(x) f(x)
You should also be comfortable with some basic statistics:
- Maximum likelihood estimates
- = relative frequencies
- = counting and normalizing
- The difference between
- Structural zeros
- Statistical zeros
- The importance of smoothing
07 September 2010
Project 1 is posted, due 30 Sept
Use this space to ask questions / discuss the project.
[Update 14 Sep] Several people have asked what tools/data are allowed for the rule-based tagger and for the "do as well as you can" part. Here is an attempt to clarify. My goal isn't to make life hard on you, but to not give unfair advantages to some subsets of the class.
[Update 14 Sep] Several people have asked what tools/data are allowed for the rule-based tagger and for the "do as well as you can" part. Here is an attempt to clarify. My goal isn't to make life hard on you, but to not give unfair advantages to some subsets of the class.
- For the rule-based tagger, any rules are fair game, but it would be better if you made your tagger WITHOUT looking at the labeled training data. That is, feel free to look at the sentences, but pretend that you don't have access to the tags. After all, that's how it would be in the real world. Yes, this means that you'll have to use some linguistics. That's not a terrible thing :).
- For the last part of the project, where you have to build the best tagger possible, people have asked about using external tools. For this project, I would ask that anything you use beyond normal library / unix tools stuff, you implement yourself. Yes, you could probably do better using well-engineered tools, but that probably won't help you learn as much as you will if you do something slightly less good, but do it on your own.
- (Note that this final rule will not be the case for the final class project: there, anything is fair game!)
06 September 2010
How to get a little extra credit...
Students always ask me about extra credit opportunities at the end of the semester. This is your opportunity to be pro-active. The CL lab here has a regular colloquium series, which has a shared calendar here. Roughly every other week we have an external speaker coming through. The first few are Roger Levy (psycho/cognitive computational linguistics from UCSD), William Cohen (machine learning and information extraction, CMU) and Eugene Charniak (parsing and everything else NLP, Brown). (Eugene is actually my great-grand-advisor!). I highly encourage all of these talks, and they should all be quite different. We'll have more folks posted on the schedule as the semester moves on. At any rate, if you go to the talk, and write a "response" that's approximately one page long, I'll give you some extra credit. You'll get (up to) 3 points (one homework) for the first one you do, (up to) 2 points for the second and third, and 1 point for the fourth and fifth. That's 9 points of possible extra credit. You won't get further credit for more than five, but of course I encourage you to keep going to the seminars :).
Lecture 3: Regular languages and finite state machines
A language, in a formal sense, is simply a set of strings. Languages can be defined explicitly, as in L = { the , a , an }, which is a language containing three strings, or implicitly, as in L = all valid English sentences (to the extent that the latter is well defined).
Languages can be finite or infinite. The first example is finite. The second is (probably) infinite, for instance:
A simpler way to get an infinite language is as follows:
One way of defining a "regular language" is "any language that can be described by a regular expression."
Note: regular languages are closed under: concatenation, intersection, disjunction, Kleene closure.
Another way to approach languages is through "machines" or "automata." A finite state automata (FSA) is defined by:
You can think of an FSA as a machine that jumps around states, and emits the "letters" on the edges as it goes. It can only stop in a final state. This is the generative view.
The alternative view is that you give an FSA a string, and it tries to follow the corresponding edges, until is reaches a final state. If it can reach a final state, it "accepts" the string. If it cannot, it "rejects" it. (Note that if one path leads to acceptance and another to rejection, then it accepts.)
What we defined above is a non-deterministic FSA. A deterministic FSA is one in which two outgoing edges from a single state never have the same label.
An additional augmentation is to allow epsilon-transitions, namely, transitions that don't emit any words.
It's not too hard to verify that the set of languages that can be defined by an FSA is exactly the set of regular languages. Moreover, adding epsilon transitions and non-determinism don't give you any extra power.
FSAs are very powerful despite being so simple. They are useful (at least) for the following tasks:
Don't discount this approach for being too simple, either... people have done the following with finite state transducers:
Languages can be finite or infinite. The first example is finite. The second is (probably) infinite, for instance:
- Bob watched Alice eat the apple.
- Claire watched Bob watch Alice eat the apple.
- Doug watched Claire watch Bob watch Alice eat the apple.
- Elif watched Doug watch Clair watch Bob watch Alice eat the apple.
- ... and so on
A simpler way to get an infinite language is as follows:
- Bob is happy.
- Bob is very happy.
- Bob is very very happy.
- Bob is very very very happy.
- ... and so on
- /a/ means the string "a"
- /ab/ means match "a" followed by "b"
- /a*/ means a sequence of zero or more "a"s
- /a?/ means an optional "a"
- /a|b/ means "a" or "b"
- /^a/ means match "a" but only at the beginning of a string
- /a$/ means match "a" only at the end
- /a(b|c)/ means match "a" followed by one of "b" or "c"
- /a+/ means /aa*/ (i.e., one or more "a"s) -- note that "grep" doesn't support this: you have to use "egrep"
- /./ means any single character
- /[abcde]/ means /a|b|c|d|e/
- /[a-z]/ means /a|b|c|...|y|z/
- If you need to say "." or "*" or "+" put a "\" before it.
One way of defining a "regular language" is "any language that can be described by a regular expression."
Note: regular languages are closed under: concatenation, intersection, disjunction, Kleene closure.
Another way to approach languages is through "machines" or "automata." A finite state automata (FSA) is defined by:
- A (finite) set of states
- A set of labeled directed edges ("transitions") connecting states. The labels are from the alphabet of our language.
- A set of initial states (usually we assume just one)
- A set of final states (usually we assume just one)
You can think of an FSA as a machine that jumps around states, and emits the "letters" on the edges as it goes. It can only stop in a final state. This is the generative view.
The alternative view is that you give an FSA a string, and it tries to follow the corresponding edges, until is reaches a final state. If it can reach a final state, it "accepts" the string. If it cannot, it "rejects" it. (Note that if one path leads to acceptance and another to rejection, then it accepts.)
What we defined above is a non-deterministic FSA. A deterministic FSA is one in which two outgoing edges from a single state never have the same label.
An additional augmentation is to allow epsilon-transitions, namely, transitions that don't emit any words.
It's not too hard to verify that the set of languages that can be defined by an FSA is exactly the set of regular languages. Moreover, adding epsilon transitions and non-determinism don't give you any extra power.
FSAs are very powerful despite being so simple. They are useful (at least) for the following tasks:
- Grammars of person names
- Times, dates, emails, addresses, etc.
- Morphological analysis / segmentation
- Center embedding: The mouse the cat chased died.
- Recursion of arbitrary depth: The man who [some sentence] is nice.
- Cats -> cat +N +PL
- Cat -> cat +N +SG
- Cities -> city +N +PL
- Merging -> merge + V +present-participle
- Caught -> catch +V +past-participle
- uyuyorum I am sleeping
- uyuyorsun you are sleeping
- uyuyor he/she/it is sleeping
- uyuyoruz we are sleeping
- uyuyorsunuz you are sleeping
- uyuyorlar they are sleeping
- uyuduk we slept
- uyudukça as long as (somebody) sleeps
- uyumalıyız we must sleep
- uyumadan without sleeping
- uyuman your sleeping
- uyurken while (somebody) is sleeping
- uyuyunca when (somebody) sleeps
- uyutmak to cause somebody to sleep
- uyutturmak to cause (somebody) to cause (another) to sleep
- uyutturtturmak to cause (somebody) to cause (some other) to cause (yet another) to sleep
- ... and so on
- Recognizing pairs of strings
- Mapping input string to (sets of) output strings
- Mapping output strings to (sets of) input strings
Don't discount this approach for being too simple, either... people have done the following with finite state transducers:
- Transliteration
- Summarization
- Translation
- Question answering
- Dialogue agents
- Parsing (needs some extensions)
- ... pretty much everything :)
02 September 2010
Python Tutorial
For those of you who'd like to learn python, I've used the following tutorial for previous classes I've taught, and I think it's pretty good (I didn't write it: I stole most of it from some friends at Berkeley): http://www.cs.utah.edu/~hal/courses/2010S_AI/p0/p0.html
01 September 2010
HW01 is posted
(Reminder: if you need an extension for Rosh Hashanah, please let me know by this coming Tuesday, the 7th.)
Lecture 2: NLP Applications and Some History
There are loads of applications of NLP technology; we'll talk about some in more detail at the tail end of the course.
One of the first applications was translation
In the 60s, there was a split into the AI camp (which was primarily symbolic) and the statistical camp (which was mostly EE folks working on speech or OCR).
In 1964, the ALPAC report killed much funding of AI research in the US. This, together with Chomsky's anti-statistics view (1959) led to a decade where a lot of this stuff was not mainstream.
In the 70s and early 80s, the IBM folks (statisticians) pushed statistical speech recognition using hidden Markov models. Logic arose as a language for talking about grammar (eg., LFG, which we'll see later). NLU came about, beginning with a blocks world ("put the red block next to the green one") and then later the Yale school (Schank and colleagues) with conceptual knowledge. This was also when lots of work was being done on discourse.
In the late 80s and early 90s, finite state methods and statistics made a comeback, led largely by the translation efforts at IBM, and the work on morphology and phonology by Kaplan and Kay, and syntax by Church. This was the beginning of the return of empiricism.
Nowadays, we have tons of data and empiricism is at it's peak (maybe???) and there's a lot you can do when you have lots of data. We'll see some examples in class for several of the linguistic phenomena we talked about last time.
Other big applications:
One of the first applications was translation
- Perhaps one of the first CS problems
- Wanted to translate Russian to English during the cold war
- Primary trend: think of translation as a problem of code breaking
- This eventually led to information theory
- Claude Shannon's noisy channel model
- Shannon's notion of entropy and the Shannon game
In the 60s, there was a split into the AI camp (which was primarily symbolic) and the statistical camp (which was mostly EE folks working on speech or OCR).
In 1964, the ALPAC report killed much funding of AI research in the US. This, together with Chomsky's anti-statistics view (1959) led to a decade where a lot of this stuff was not mainstream.
In the 70s and early 80s, the IBM folks (statisticians) pushed statistical speech recognition using hidden Markov models. Logic arose as a language for talking about grammar (eg., LFG, which we'll see later). NLU came about, beginning with a blocks world ("put the red block next to the green one") and then later the Yale school (Schank and colleagues) with conceptual knowledge. This was also when lots of work was being done on discourse.
In the late 80s and early 90s, finite state methods and statistics made a comeback, led largely by the translation efforts at IBM, and the work on morphology and phonology by Kaplan and Kay, and syntax by Church. This was the beginning of the return of empiricism.
Nowadays, we have tons of data and empiricism is at it's peak (maybe???) and there's a lot you can do when you have lots of data. We'll see some examples in class for several of the linguistic phenomena we talked about last time.
Other big applications:
- Automatically summarizing documents, or collections of documents
- Web search (only some people consider this true NLP)
- Question answering
- Dialogue systems
- Speech recognition
- Natural language generation
- Sentiment/opinion analysis
- ... lots more ...
HW00
Sorry, meant to post this earlier. There will usually be an announcement on the blog when each homework and project is posted. This gives you a spot to ask questions about them.
Since I didn't post this earlier, I got the following question by email, which I'll answer here:
As an aside, if you have difficulty submitting HW00 (i.e., the system doesn't work), please email me. HW00 is the only homework we'll accept late, but only because of bugs on my end, not because of bugs on your end :). But if you weren't registered for the class when I created the database for the handin script, then it'll probably reject you. So just email me with your uID and name, and I'll add you manually.
Since I didn't post this earlier, I got the following question by email, which I'll answer here:
I was looking at HW00 and I am unsure what you meant by 'clause structure ambiguity'?Clause structure ambiguity is syntactic ambiguity: the "Structural ambiguity" part from the Lecture 1 post, or the woman on the hill with the telescope example from class.
As an aside, if you have difficulty submitting HW00 (i.e., the system doesn't work), please email me. HW00 is the only homework we'll accept late, but only because of bugs on my end, not because of bugs on your end :). But if you weren't registered for the class when I created the database for the handin script, then it'll probably reject you. So just email me with your uID and name, and I'll add you manually.
Subscribe to:
Posts (Atom)