To try to help you not lose too much sleep, I've pushed the deadline for P2 back to Friday night at 10p. However, to be fair to those who don't have ML, or who have planned their schedule around the deadline on the 19th, I'll give you 15% extra credit if you hand it in by the original deadline (19th). If you do hand it in early, please make a note in your writeup that you've done so. Hopefully 15% is enough to balance out anything that you would lose by not being able to compete up to the last minute with your classmates on the competition parts.
Also, a note on pruning:
This is now in the .pdf writeup, but:
- Be sure that you prune all cells, including the right-most ones. Your labeled F measure for unpruned should be about 78%.
- Don't forget to debinarize before you evaluate!
- If you prune every cell, including the root, with K=2, your labeled F measure should be about 23%. You should have about 83 unparsable sentences.
- If you prune every cell except the root (which is a totally sensible thing to do -- thanks to several of you who have pointed this out -- your labeled F should be about 59%, and you should have about 38 unparsable sentences. (As you can see, it's a really good idea not to prune the root, especially since it doesn't make things any slower -- In fact, it might actually make things faster!)
- You can implement it either of the above two ways -- just be sure to document which you're doing.
- Be sure to prune before applying unary rules. Thus, even with K=2, you will often end up with more than 2 items in a chart cell. However, only 2 of these will have binary derivations; all the rest should have unary derivations.
How do we handle "unseen word" in dev.txt?
ReplyDeleteDoes "unseen word" simply mean parsing fail?
@Anonymous: Yes, unseen words will result in failed parses.
ReplyDeleteFor the extra credit part, you will need to deal with unknown words, but of course that's not required. (You could, in theory, apply some of the same ideas from the part of speech assignment.)
so is it considered as extra credit if we submit anytime in 19th Oct ?
ReplyDelete@Anonymous: sure, why not? :)
ReplyDeleteOutput of treebanker parser evaluaton script is like below:
ReplyDeletenot enough lines in /var/tmp/CGItemp19000
What does that mean? I debinarized and validated but there was no errors.
How do i debinarize "dev.txt.parsed" if there are parsing failures? If there are less than 100 parsing result, how can I evaluate that?
ReplyDelete@Anonymous: My debinarization script should handle parsing errors okay... it you just print "no parse!" when there is a parsing error, it'll just repeat that line and debinarize everything else.
ReplyDelete@Anonymous: The debinarize script should work fine even with parsing errors, so long as you print "no parse!" on such sentences.
ReplyDeleteif this is a production:
ReplyDeleteADJP ADJP PP
can we have same nonterminal in both right nad left side of a production ?
@Anonymous: Sure! For example, NP -> NP PP is a very frequent production!
ReplyDelete