28 October 2010

Lecture 16: Computational Semantics

A recent development in NLP-land is the interest in "semantic role labeling."  I tend to think of this as "Parsing++", but it's basically a halfway point between "syntax" (which we've already passed a bit) and "semantics" (which we don't know how to do).

The idea in SRL is to find predicates (mostly verbs) and identify the arguments of these verbs (eg., Agents, Patients, Experiencers, Themes, etc.).  For example:
  • John ate a sandwich
    • For the predicate ate, John is the agent and sandwich is the patient
  • A sandwich was eaten by John
    • Same as above.
  • Bill saw John eat a sandwich
    • For the predicate ate, it's the same as above
    • For the predicate saw, the agent is Bill
There were basically two initial attempts to annotate data with semantic roles: PropBank and FrameNet.  The biggest difference between the two is that FrameNet used a consistent labeling of arguments across verbs, whereas PropBank just called things "Arg0", "Arg1", ... "ArgN", where "Arg0" for one verb might have nothing to do with "Arg0" for another verb.

The general approach to doing semantic role labeling is a pipeline of "first parse, second find verbs, third find the arguments to those verbs."  (There's some argument as to whether parsing is strictly necessary, but it's clearly at least helpful.)

The biggest challenge here is finding the arguments.  Basically we'll go over the parse tree, find potential arguments (basically internal nodes in the tree), and then try to use statistical information to figure out which nodes correspond to which arguments (or none).
  • John saw a man with a telescope.
For the "John was using the telescope" interpretation, the predicate saw has an Agent (John), a Patient (man) and a Means (telescope).

First, we'll identify all potential argument candidates:
  • John
  • a
  • man
  • a man
  • with
  • a
  • telescope
  • a telescope
  • with a telescope
We might initially filter some of these out based on label (eg., DTs are unlikely to be arguments).  Maybe this leaves us with:
  • John
  • man
  • a man
  • telescope
  • a telescope
  • with a telescope
Now, we need to figure out which (if any) of these maps to one of the given semantic roles.  A baseline approach is to estimate something like P("telescope" is the means | verb is saw), but this is very brittle because it depends on "bilexical dependencies".

A more sane approach is to use instead (or in addition) tree paths.  This idea has come up in a lot of places, and often works pretty well.  If we look at the parse of this sentence:
  • (S (NP John) (VP (V saw) (NP (DT a) (NN man)) (PP (P with) (NP (DT a) (NN telescope)))))
The we can ask ourselves how we can get from the "saw" node to the "telescope" node.  In this case, we go up to the VP, then down to the PP, then down to the NP then down to the NN.

There are lots of ways to represent this path, but a baseline might be to say that the path looks like:
  • up:VP  down:3:PP  down:2:NP  down:2:NN
Here, the numbers represent the child number.  (Perhaps we'd want those on the "up" as well so we can remember where we were.  Alternatively, we might encode up-left and up-right, since order matters a lot in English.)

However we encode the path, we now need to estimate something like P(means | path).  A naive approach would be to Bayes' rule this to get P(means) * P(path | means) and then we could model the path with a Markov model.  Or, if you've taken machine learning, you can use the path as a collection of features in some learning algorithm (or if you're being fancy, turn it into a kernel!).  We'll talk about this more when we talk about learning.

As a final note, Matthew Gerber and Joyce Chai got best paper at ACL 2010 for their paper Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates, where they look at similar things for nominal predicates rather than verbal predicates.  In this case, I agree: this is a great paper.

26 October 2010

Midterm is Posted

here...

Lecture 15: Computational Lexical Semantics

Today's topic is word sense disambiguation.  This is a very "heated" topic and we'll discuss in class (not on the public blog!) why this might be!

Here are some examples of the verb "drive"... which are different senses?
  1. Can you drive this four-wheel truck?
  2. We drove to the university every morning.
  3. She drove me to school every day.
  4. He drives me mad.
  5. She is driven by her passion.
  6. They drove back the invadors.
  7. She finally drove him to change jobs.
  8. He drove a nail into the wall.
  9. The player drove the ball far out into the field.
  10. She is driving away at her doctoral thesis.
  11. What are you driving at?
  12. My new truck drives well.
  13. He drives for the taxi company in DC.
  14. The car drove around the corner.
  15. The farmers drove the cows into the barn.
  16. We drive the beltway to work.
  17. He drove a golf ball across the street.
  18. I hit the ball with a bat and drove it into left field.
  19. The workers drove a tunnel through the hill.
  20. These engines are driven by steam.
  21. The hunter drove the forest looking for deer.
  22. After finding deer, he drove it into open ground.
We'll talk about this example after you've clustered these examples into senses.

Two of the most important concepts in WSD are:
  1. One sense per discourse (which may or may not be true...)
  2. One sense per collocation (which also may or may not be true...)
The Yarowsky algorithm is an "unsupervised" algorithm (i.e., doesn't really require people to label data) for doing WSD that has inspired a ton of other work in WSD as well as other areas (including some of my own!!!).  Wikipedia has a good description of the algorithm if you don't like the one in the book; we'll talk about how and why it works in class.

21 October 2010

HW07 is posted

(I'm trying to get better about posting these early, like in the beginning of the semester when I wasn't as behind schedule on everything!)

Lecture 14: Linguistic Challenges

Final project details:
  • Groups, just like normal projects.
  • Scope: something that with an extra month of work could turn into a paper at a major conference (ACL, EMNLP, NAACL, etc....)
  • Four styles of projects:
    • Reimplement some cool, interesting system from the past 2 or 3 years in an area of NLP/CL that you think is interesting.  (+1 month would be to figure out what it does wrong, propose some new idea, and then get a paper out of it.)
    • Take some problem you care about from your own research and look at how NLP can help it.  (You're a better judge than I am about turning this into a paper, but at least you care about the topic!)
    • Pick some interesting, and under-studied linguistic phenomena (like the things we're talking about today) and do a study on this phenomena.  Things not in English are totally allowed, but it would help if you knew the language you were working with.  If you want to do data collection using mechanical turk, I have a budget of $200 for this: we can see how many people want to do this and then try to divide fairly (note: fairly is not the same as evenly).
    • Something else you can convince me is interesting...
  • You will be graded entirely on your writeup, which should look more or less like a 4-6 page ACL workshop paper (you're encouraged to use the ACL style files).
Here are some examples of linguistic phenomena that I really don't have much idea how to handle.  After are some additional project ideas....
  1. Metaphor.  This can be narrowly construed ("John devoured his steak like a lion") or broadly construed ("John drove Mary crazy").  The usual theory is that there are concrete, physical verbs/nouns; and then when we want to talk about something abstract, we ground it in one of these physical verbs.  So a metaphor will usually consist of one abstract thing and one physical thing.  There are interesting things to be done both on interpreting metaphors as well as generating metaphors.  I have a small amount of data.
  2. Quantifier degree.  I can say "Most men are jerks" and I can say "Most jerks drink beer" but does this mean that most men drink beer (or rather, that I think that most men drink beer?).  What about other quantifiers?  How consistent is this across different statements ("Most men are jerks" versus "Most animals are invertebrates" -- is the meaning of "most" the same)?  I don't have any data, but you could collect it.
  3. Metonymy.  This is when I say "The White House" and really mean "Obama" (or at least I mean "someone from the White House").  This combines with ideas of selectional preferences, where we might be able to figure out that the agent (or subject) of "said" should be animate, and might know that the White House is not animate, and so it must be metonymic.  Can we identify metonymic usages?  Can we resolve them?  I have a reasonable amount of data.
  4. Quantifier scope.  This is a classic semantics problem.  "All the women on the island have a husband."  Does this mean that each woman has her own husband, or that there's one man who is married to all of the women simultaneously.  Similarly, "Each man entered the bar with a dog" -- how many dogs are there?  I have no data.
  5. Negation scope.  This is perhaps particularly interesting in the case of sentiment analysis.  "I did not like the plot but liked the acting."  -- Does "not" scope over just "like the plot" or over the full "like the plot but liked the acting"?  There is also an interaction with quantifiers ("No man should marry a leprechaun"), but this is both rare and hard.  I have no data, but I think we could find some.
  6. Language change.  We have English documents going back 500 years, but English back then looks a lot different than English now.  Moreover, our part of speech taggers and parsers that work well now won't work well on Shakespeare (I would guess :P).  Can we track changes back in time, and use this information to get today's models to do a good job way back when?  I have a lot of data.
  7. Traces and Ellipsis.  "John went to the store but Jerry didn't ______" -- what was it that Jerry didn't do.  This is a form of a trace.  The Penn Treebank comes with some but not all traces spelled out.  Try to figure out where they should be and what they refer to.
  8. Exaggeration.  No one has ever looked at this problem.  Okay that's not true, but only because it's an overstatement.  Can we automatically identify overstatements in text?  Can we tone them down or at least identify the degree of overstatement?  I have no data, but it shouldn't be too hard to find some.
  9. Emotion from text.  Can we identifying character's emotions by looking at what they say, for instance in novels.  I have data of quoted speech, paired with emotional strength -- can we predict emotional strength given the quoted speech and perhaps the context in which it was spoken?  I have lots of data.
  10. Language and X.  Often we can get language paired with objects in another medium, for instance images and captions.  Can we identify correspondences?  Or perhaps generate captions from images?  We have (or can get) similar data for comments and code, or formal specifications (variant of first order logic) and descriptions.  Or database queries and text.  I have some data, depending on what you want "X" to be.
  11. Shannon game (word/character prediction) with domain knowledge.... The Shannon game is a way of measuring entropy of a language.  Perhaps make an online game (like hangman?) where we can get people to play this game, and then can also tell them things like "this is a news story" or "this is a children's book" or "this is a medical text" or something and see how it affects their ability to guess the next work.
  12. "Translating" from tweets to language that your parents could understand.  We have lots of SMS/twitter data, and lots of normal text.  A lot of this is abbreviation expansion, but some is not.  Can we figure out which terms in tweets should be expanded, and then try to find their expansions in normal text?  There are at least two types ("something" -> "sth" or "laugh out loud" -> "lol"), but maybe more.  We can get lots of data easily.
  13. Identifying character relationships in literature.  We can track character interactions in literature, but the relationship between these characters is not known.  And perhaps it changes over time.  Can we figure out how two characters in a novel feel about each other on the basis of what they do/say to each other? I have some data and we can get a lot more easily.
  14. Many other languages have phenomena that English does not.  For example, subject dropping happens in lots of languages (eg., Japanese) when it's clear from context.  Also stuff in the Semantic Analysis chapter of J+M is pretty much up for grabs.
I'm sure there are lots of things I'm not thinking of that would be fun/interesting to do.  This is your opportunity to be creative and choose something that just plain sounds cool.

    19 October 2010

    Lecture 13: Interpretation as Abduction

    "Interpretation as Abduction" is one of the more infuential computational semantics papers from its era (and really, ever).

    The basic claim is that through (weighted) abduction, one can explain a whole host of complex linguistic phenomena:
    • Reference resolution ("John_1 saw himself_1 in the mirror" -- where "John" and "himself" co-refer)
    • Interpretation of compound nominals ("computer science" versus "cs department")
    • Syntactic ambiguity (eg., PP attachment)
    • Metonymy ("The White House announced that..." really means that someone in the White House announced something)
    • Etc....
    In modern terms, if you're familiar with probabilistic modeling, interpretation as abduction can be through of as computing most probable explanations in graphical models / databases.  (If that means nothing to you, that's fine, too.)

    Abduction is the type of inference we try to do in science.  We observe something, and we try to posit what caused this to happen.  For instance, we may observe that apples fall from trees.  We can posit this is due to gravity.  Or we can posit that this is due to little elves that hang out in apples and little pots of gold in the earth and the elves are trying to get to the pots of gold.  One of these is a more plausible explanation than the other, since it requires us to invent fewer things.  (Though as far as I know, no one really knows how gravity works... so maybe it's a bad example... maybe it is elves/gold.)  This is often referred to as Occam's Razor: of all explanations, choose the "simplest."

    The key idea to use abduction to do sentence interpretation is:
    1. Prove the logical form of the sentence
      1. together with the constraints that predicates impose on their arguments (eg., the subject of "run" must be animate)
      2. allowing for coercions (eg., we might coerce someone who works at the White House into the White House)
    2. Merging redundancies where possible
    3. Making assumptions where necessary
    The last point in the key point: anything we have to assume in order to prove the sentence is the new information conveyed in the sentence.

    In weighted abduction, we assume that there is a cost for coercions and making assumptions, and that we want to find the set of things we can assume that have the least possible cost.

    The idea is that our database might contain entries like:
    • If P and Q then R
    We will add weights to these to get something like:
    • If P / w1 and Q / w2 then R
    The semantics is that if it costs $c to assume R, then it costs $w1*c to assume P and $w2*c to assume Q.

    If we have redundancy... eg., something like "blah blah q(x) and q(y) blah blah" and at some point we derive x=y, then we can merge this to just "blah blah q(x) blah blah" and the cost of q(x) is the minimum of the costs of q(x) and q(y).

    For example, we might have:
    • (forall x) car(x) / 0.8  AND  notop(x) / 0.4 IMPLIES convertible(x)
    The weights here say that if we know something is a car, it costs less to assume it's a convertible than it would if we had only known it had no top.

    One kind of cool outcome of this is that you can make "reverse" inferences fairly easily.  For instance, maybe I want to be able to assume that sometimes when people talk about animals, they're actually talking about cats.  But they're being vague.  I can just add a rule:
    • (forall x) animal(x)  AND  etc(x)  IMPLIES  cat(x)
    Here, we're saying that if you're an animal, and if certain other unspecified things ("etc") happen, then you're a cat.  Without the etc(), this is obviously false.  But with the etc(), we can just add weights and allow ourselves to interpret sentences like "I sensed an animal in the shadows.  It meowed.", whereI need to coerce an animal into a cat.

    It turns out (see section 6 in the paper) that you can actually handle all of syntax, semantics and pragmatics in this single framework!

    15 October 2010

    P2 deadline and ML midterm

    I've been informed that there's a midterm in Lise's ML class on the 19th, which coincides with the P2 deadline.

    To try to help you not lose too much sleep, I've pushed the deadline for P2 back to Friday night at 10p.  However, to be fair to those who don't have ML, or who have planned their schedule around the deadline on the 19th, I'll give you 15% extra credit if you hand it in by the original deadline (19th).  If you do hand it in early, please make a note in your writeup that you've done so.  Hopefully 15% is enough to balance out anything that you would lose by not being able to compete up to the last minute with your classmates on the competition parts.

    Also, a note on pruning:

    This is now in the .pdf writeup, but:
    • Be sure that you prune all cells, including the right-most ones.  Your labeled F measure for unpruned should be about 78%.
    • Don't forget to debinarize before you evaluate!
    • If you prune every cell, including the root, with K=2, your labeled F measure should be about 23%.  You should have about 83 unparsable sentences.
    • If you prune every cell except the root (which is a totally sensible thing to do -- thanks to several of you who have pointed this out -- your labeled F should be about 59%, and you should have about 38 unparsable sentences.  (As you can see, it's a really good idea not to prune the root, especially since it doesn't make things any slower -- In fact, it might actually make things faster!)
    • You can implement it either of the above two ways -- just be sure to document which you're doing.
    • Be sure to prune before applying unary rules.  Thus, even with K=2, you will often end up with more than 2 items in a chart cell.  However, only 2 of these will have binary derivations; all the rest should have unary derivations.
    As I said in class, the danger with me trying to give you hints is that often you guys end up doing a better job implementing this stuff than I do!  But I'm pretty sure the above numbers are correct -- I've inspected my charts to make sure they look okay.

    14 October 2010

    Lecture 12: First Order Logic

    So far we've talked a lot about syntax. Of course, syntax isn't the end-all be-all of language processing: we want to "understand" language. Often this means mapping natural language sentences to some form of formal language sentences. This means we have to decide on a formal language. A common choice is first order logic (FOL).

    Expressions in FOL take one of the following forms:
    • Atomic symbols like "Hal"
    • Relation symbols like "Eats(.,.)", for instance in "Eats(Hal, Sandwich)".  Relations can have arbitrary arity (i.e., take arbitrarily many arguments)
    • Variables like "a" (see later for quantification)
    • Combinations via and, or, not, implies, iff, etc.  Eg., Eats(Hal, Sandwich) implies not OnThePlate(Sandwich)
    • Existential and universal quantifiers.  For example, "(exists a) Eats(Hal, a)" (which roughly means "Hal eats something.")
    • Equality symbol.  For example, "(exists a) Eats(Hal, a) and (a = Sandwich)"
    Of course a lot of this is redundant: you only really need negation and "and" and one quantifier and relations and equality and can derive the rest.

    FOL is a nice language to map into because we know a lot about how to reason in FOL.  Reasoning is, of course, hard (take an AI class), but sort of do-able.

    The general way in which FOL systems work is:
    • We have a bunch of background knowledge expressed as "axioms" -- basically a database of FOL sentences
    • We get a text, from which we derive new FOL sentences
    • Someone issues a query, which we will try to "prove" given our collection of sentences.  A proof of a query is an answer to the question
      • Someone might query "Who shot Lincoln?"
      • We map this to something like "(exists a) Shot(a, Lincoln)"
      • We try to prove this statement given our database, which might include some statement like "Shot(Booth, Lincoln)".  Thus, "a=Booth" is a "proof" of our query, and gives us the answer
    The big problem is often interpretation: how to get from natural language to formal language.  Syntax is a bridge that will help us get there, but it's not everything.

    The common assumption (which is not always true) is that language is compositional.  Namely, the meaning of a sentence is derived from the meanings of its parts.  For examples:
    • Hal ate a sandwich with mayo.
    • Parses to (Hal (ate (a sandwich (with mayo))))
    • Bottom up, we can look at (mayo) and derive "(exists a) Mayo(a)"
    • Then (sandwich (with mayo)) and derive "(exists b) Sandwich(b) and Contains(Sandwich, ___)" where ___ comes from the interpretation of (with mayo), yielding "(exists a,b) Mayo(a) and Sandwich(b) and Contains(a,b)"
    • Then (a sandwich ...), takes what we had before and makes "b" unique, usually written as "(exists a, exists!b) ..."
    • Then we take Hal and get "Hal"
    • And finally, "x ate y" will give us something like "(exists e) Eat(e, x, y) and InThePast(e)" which when we plug in with x and y gives the final interpretation:
        (exists a, exists !b, exists e) Mayo(a) and Sandwich(b) and Contains(a,b) and Eat(e, Hal, b) and InThePast(e)
    The hope is that this interpretation is agnostic to annoyances of syntax.  For example, if we instead wrote "The sandwich containing mayo was eaten by Hal", we would end up with the same interpretation.

    In order to make this happen, we usually need a semantic grammar that tells us how to get from syntax to semantics.  The grammar will consist of rules that are akin to the type of rules we applied to do the semantic analysis of the above example sentence.

    Key challenges with FOL:
    • What are the right sets of symbols and relations?
    • Where does the necessary background knowledge come from?
    • How to get the semantic grammar
      • You can hand code it (see verbnet and the unified verb index for a verb-centric view).
      • Or you can try to learn it from examples (see Ray Mooney's talk yesterday)

    • Sometimes need more than FOL
      • How to handle "Many students love NLP"?
      • Or "John did something yesterday" -- for this we may need second order logic, where you can quantify over relations -- "(exists R, e) R(e, John) and InThePast(e)"

    12 October 2010

    Project 2 warning on pruning results

    I had a bug in my pruning, and wasn't pruning the right-most cells. (It was an off-by-one error.) Since English is mostly right-branching, the cells that touch the right edge (i.e., end of sentence) are incredibly important, so not pruning them makes life a lot better than otherwise. I'm working on fixing it right now and checking to make sure there aren't other bugs. I'll post a comment to this post when it's done and I've updated the results in the .pdf, but your pruning results should be much worse than what is in the writeup (i.e., with K=2, you should get no parse on 80-something sentences, and your f-score should be in the 10s or 20s).

    Project 2 stuff...

    I fixed the P2 pdf (just the one linked on the web page, not the one in the .tar.gz file) so that it includes unary productions in the examples. This is only relevant for part 2 -- nothing else changed.

    Also, I'm not sure if I included them previously, but here are the links for the online test scripts:

    HW05 is posted

    (Sorry for the delay -- there's just one question :P)

    Lecture 11: Incorporating Context

    Many languages express some form of agreement.  In English it is mostly Subject/Verb agreement (i.e., "He eats a sandwich" versus "They eat a sandwich").  In other languages agreement can be more widespread: Adjective/Noun, Determiner/Noun, Verb/Object, Noun/Relative Clause, etc.

    Suppose we want to capture this with a (P)CFG.... the only real way to do this is to expand our set of non-terminals.  I.e., we might start with a grammar like:

    S -> NP VP
    NP -> Pro
    NP -> Det Noun
    VP -> Verb NP
    Pro -> he
    Pro -> they
    Verb -> eat
    Verb -> eats
    Det -> a
    

    Which will generate our good sentences, but also bad sentences like "He eat a sandwich" and "They eats a sandwich". We can fix this by adding labels to our non-terminals:

    S -> NP+sing VP+sing           S -> NP+plur VP+plur
    NP+sing -> Pro+sing            NP+plur -> Pro+plur
    NP+sing -> Det Noun+sing       NP+plur -> Det Noun+plur
    VP+sing -> Verb+sing NP        VP+plur -> Verb+plur NP
    Pro+sing -> he                 Pro+plur -> they
    Verb+sing -> eats              Verb+plur -> eat
    Det -> a
    

    Now life is great: we will no longer produce our bogus sentences. (Note that we would also have to account for "a" being singular, etc...)

    Of course there are other phenomena we might want to capture, for instance pronouns sitting in object position have to change case: "A sandwich eats him" not "A sandwich eats he". If we ignore the singular/plural distinction, we can solve this separately.

    S -> NP+nom VP
    VP -> Verb NP+acc
    NP+nom -> Pro+nom         NP+acc -> Pro+acc
    NP -> Det Noun
    Pro+nom -> he             Pro+acc -> him
    Pro+nom -> they           Pro+acc -> them
    

    And so on...

    We've now solve the nominative/accusative issue, so we can combine the sing/plur grammar with the nom/acc grammar. The problem is exponential blowup: handling one roughly doubled the size of the grammar, handling the other doubles it again. In general, for every agreement-like phenomenon we want to handle, we'll double the size of our grammar again. This is bad (a) because it makes our grammar huge and (b) because we'll need tons and tons and tons of data to get reliable estimates of these probabilities from a treebank!

    (Other English phenomena you might want to capture are selectional preferences -- which are really more semantics issues than syntactic issues; or determiner/noun agreement.)

    One solution to this problem is to augment our grammar with features. That is, basically do what linguists do and write "+sing" on the pre-terminal, but not split the grammar into bits. Instead, we need to come up with ways of propagating these features up the tree, and checking for disagreement. This process is known as "unification".

    The idea of unification is that everything has a bunch of features with values. Values can be atomic (number=sing) or disjunctions (number={sing,plur}) or variables (number=A). Two values unify basically if they have a non-empty intersection (it's actually more complicated than this, but this is the basic idea). So sing and sing unify and produce sing. Similarly sing and {sing,plur} unify and produce sing. A and sing unify and produce sing with a note off on the side of the page that A=sing. sing and plur don't unify. Our grammar will now encode what unifications happen at each step (eg., when producing a S, you want to unify the number feature of the NP and VP; or when producing a VP, you want to unify the object NP with "case=accusative" and likewise for subject NPs).

    Our grammar is now a set of context free productions and unification rules. One nice thing is that chart parsing basically still works, but our parse cells become more complicated because they have to remember the values of features. But at least our grammars don't blow up!

    07 October 2010

    Lecture 10: Statistical Parsing

    Context free grammars are great, but as with any X, if X is great, then probabilistic X is greater still!

    A probabilistic context free grammar (PCFG) is a CFG with "weights" attached to each of the rules.  These weights must be non-negative.  For any left-hand-side, the sum of weights must be one.  The following is a small PCFG:

    S   0.5 -> NP VP
    S   0.5 -> VP
    NP  0.7 -> Det Noun
    NP  0.2 -> Proper
    NP  0.1 -> NP PP
    VP  0.6 -> Verb NP
    VP  0.3 -> Verb
    VP  0.1 -> VP PP
    

    Note that for any given LHS (eg., "NP") the sum of weights (probabilities) is one.

    The probability of a tree is just the product of probabilities of rules used in that tree.

    We can generate random sentences with a PCFG by starting at "S" and then picking a rule at random (weighted by the probabilities) and then recursively generating the corresponding children.

    The probability of a sentence under this grammar is obtained by summing (marginalizing) out all the trees that can generate that sentence.

    To parse with a PCFG, we just parse with the corresponding CFG, but then weigh items in the chart by their probabilities.  If we find a higher-probability derivation of the same item in the chart, we throw out the old one and replace it with the new one.  In CKY this is easy.

    A treebank is a collection of human-labeled trees.  Given a treebank, we can extract the corresponding PCFG by counting productions and normalizing.  Most trees are not binary so you need to binarize your rules.

    Note on smoothing: you have to smooth if you want your grammar to be able to parse out-of-sample trees.  Smoothing can be done just like in HMMs, but you want to be careful not to compute stuff that is irrelevant.

    Note on efficiency: The grammars we produce are huge (tens or hundreds of thousands of rules).  As a result, paying O(N^3|G|) is often prohibitive, even for N around 30 or 40.  A simple solution is pruning: after you process each cell in the chart, throw out it's low probability items.  You can either do count-based pruning (only keep the top 5 in each cell), ratio-based pruning (only keep the ones within a factor of 10 probability of the best one), or probability-based pruning (only keep the ones with probability at least 10e-6).  Usually the last sucks, but the first two work reasonably well.

    04 October 2010

    No class Tue 05 Oct 2010

    Due to Hal being sick again.  Sigh.

    So here's a puzzle to entertain you while you're not learning all about treebanking and PCFGs....

    Suppose Hal presents symptoms that make it 90% certain that he has some illness X.  And suppose that the rate of transmission of X is 5%.  Suppose that the average rate of incidence of X in a general population is 0.01%.  What is the probability that you would get sick if we had class tomorrow (and you showed up)?  There are 45 students in class: what's the probability that at least one would get sick?  Finally, suppose that your utility function is defined as U(learn about pcfgs and don't get sick) = +10, U(don't learn and don't get sick) = 0, U(learn and get sick) = -20, U(don't learn and get sick) = -100.  (How annoying would that be.)  What is the expected utility if I do teach class?  What is the expected utility if I cancel class?  Am I doing the right thing?

    Feel free to post solutions :).

    01 October 2010

    Projec 2 is posted, due Oct 19

    It's all about parsing.  The writeup is really long, but don't be too intiminated by it.  It's long because I tried to give a lot of debugging help.  The project is all about grammars and parsing, so go off and do syntax!