Please share with graduate and undergraduate students looking for summer internships. The Johns Hopkins University Human Language Technology Center of Excellence (COE) is seeking candidates for our summer internship program as part of SCALE, our yearly summer workshop (Summer Camp for Advanced Language Exploration.) Interns will work on research in speech, text and graph processing as part of a larger workshop team. Internship positions are fully funded, including travel, living expenses and stipend. This summer's workshop topic is "Combining Speech, Text and Graphs," which will draw from a number of research areas: *** Natural Language Processing *** - machine translation and NLP on informal texts - opinion extraction from informal texts - topic models - information extraction - domain adaptation and transfer learning - multi-lingual learning *** Speech *** - topic detection - keyword spotting - resource constrained speech processing - robust speech processing *** Graphs *** - finding communities in social networks - anomaly detection - learning on large scale graphs - graph-based topic modeling Candidates should be currently enrolled in an undergraduate or graduate degree program. Applications submitted by Friday Jan 14, 2011 will receive full consideration. For more information: http://web.jhu.edu/HLTCOE/scaleinterns2011.html Applicants will be required to obtain a US security clearance, which requires US citizenship. If you do not already have a clearance, we will work with you to obtain one.
22 December 2010
Summer internships at JHU/COE
... in case anyone is reading this, I just got the following email. I know this program and it's good.
21 December 2010
Grades (Almost) Posted, Semester Over
Hi all --
I figure you're likely to read to the end to find out about grades, so before I get to that, let me just take this chance to say that I really enjoyed this class this semester. You were all great. Everyone did awesome on both the final exam and final projects, and I'm really thrilled. If you couldn't tell already, I love language stuff and I encourage you all to continue on and learn everything there is to know. Philip is teaching the follow-on course in the Spring, which should be awesome. I'm also running an unofficial seminar on "Large Data" stuff in the spring; you can get more info here (sign up for the mailing list if you're interested). Anyway, I had a great time teaching; I hope you had fun in class.
Regarding grades, I need to submit them by midnight tonight. And since I don't plan on staying up until midnight, this really means tonight by about 11p.
I've posted "unofficial" grades on grades.cs.umd.edu, so you can see what your grade is. Note that the "total" column on that spreadsheet is completely misleading, since it doesn't include all the weirdo grading rules (dropping of worst projects/homeworks, inclusion of extra credit, etc.). I have all the numbers in a separate spreadsheet, so if something looks odd to you and you'd like the full computation, please let me know. It's of course possible to change grades later, but it's a pain, so I'd rather hear about any issues now.
That's it. Have a great break and I hope to see some of you again in the Spring!
-h
ps., per second comment below, I added an extra column, MyOverall. The grading is as follows:
98 = A+
95 = A
92 = A-
88 = B+
85 = B
82 = B-
78 = C+
75 = C
72 = C-
Note that your score will be exactly one of these numbers: This is just my way of encoding your grade. This isn't actually what your score was :).
I figure you're likely to read to the end to find out about grades, so before I get to that, let me just take this chance to say that I really enjoyed this class this semester. You were all great. Everyone did awesome on both the final exam and final projects, and I'm really thrilled. If you couldn't tell already, I love language stuff and I encourage you all to continue on and learn everything there is to know. Philip is teaching the follow-on course in the Spring, which should be awesome. I'm also running an unofficial seminar on "Large Data" stuff in the spring; you can get more info here (sign up for the mailing list if you're interested). Anyway, I had a great time teaching; I hope you had fun in class.
Regarding grades, I need to submit them by midnight tonight. And since I don't plan on staying up until midnight, this really means tonight by about 11p.
I've posted "unofficial" grades on grades.cs.umd.edu, so you can see what your grade is. Note that the "total" column on that spreadsheet is completely misleading, since it doesn't include all the weirdo grading rules (dropping of worst projects/homeworks, inclusion of extra credit, etc.). I have all the numbers in a separate spreadsheet, so if something looks odd to you and you'd like the full computation, please let me know. It's of course possible to change grades later, but it's a pain, so I'd rather hear about any issues now.
That's it. Have a great break and I hope to see some of you again in the Spring!
-h
ps., per second comment below, I added an extra column, MyOverall. The grading is as follows:
98 = A+
95 = A
92 = A-
88 = B+
85 = B
82 = B-
78 = C+
75 = C
72 = C-
Note that your score will be exactly one of these numbers: This is just my way of encoding your grade. This isn't actually what your score was :).
14 December 2010
12 December 2010
Interested in Language Science?
Just got the following email from Colin Philips in Linguistics / Cognitive Neuroscience. This is regarding language science. Please see below... feel free to email me if you have questions:
I'm hoping that you can help us to reach students in the CS/iSchool universe who might be interested in taking advantage of our unique interdisciplinary language science opportunities. We're particularly interested in reaching 1st and 2nd year students. We'll be holding an informational meeting for students tomorrow at 1pm in 1108B Marie Mount, but I'd be happy to meet at another time with anybody who is interested but not available at that time. We'll tell students about the opportunities and benefits, and also talk about the resources that are available to help them, including new plans to help them to develop interdisciplinary training plans that are both innovative and feasible. Csilla Kajtar already circulated a message about this, but we know that people often just ignore messages sent to mailing lists.
As you know, the closer integration of NLP and cognitive work in language is right at the top of our list of opportunities-that-we'd-be-idiots-not-to-pursue, and student training is one of the best ways to achieve this.
09 December 2010
Final Exam, Due Dec 17, 3:30pm
Here's a copy of the final exam as well as the source LaTeX. Please feel free to either print it and do it by hand, or to do it in LaTeX and print the solution. You may turn it in any time between now and 3:30pm on Dec 17. (Because our official exam time is 1:30-3:30 on Dec 17.) Please hand it in in one of three ways: (1) give it to me in person in my office or otherwise; (2) slide it completely under my office door (AVW 3227); (3) give it to Amit in person.
If you have any clarification questions, please post them here.
If you have any clarification questions, please post them here.
06 December 2010
P4 deadline pushed back to Dec 14
The 9th is apparently the deadline for the ML project.
05 December 2010
P4: Small error in example numbers....
There are some sentences in the training data that contain a "TAB" character. The reasonable thing to do would be just to consider this as whitespace. For some reason I didn't do this. In my example of DF computation, I did this. Which somewhat changes all the remaining numbers.
Instead of rerunning everything I'll just tell you what the updated top frequency words are if you do it "properly." In general, for this assignment, don't worry if your numbers are slightly different than mine -- it may have to do with how you handle the non-ascii characters that appear once in a while in the data.
Instead of rerunning everything I'll just tell you what the updated top frequency words are if you do it "properly." In general, for this assignment, don't worry if your numbers are slightly different than mine -- it may have to do with how you handle the non-ascii characters that appear once in a while in the data.
2999 . 2999 , 2998 of 2997 the 2997 and 2994 in 2989 to 2988 a 2885 as 2875 by 2862 for 2860 ) 2859 ( 2836 with 2832 that 2801 '' 2788 `` 2759 on 2717 from
Last seminar of the semester: Michael Paul Dec 8, 11am
December 8: Michael Paul: Summarizing Contrastive Viewpoints in Opinionated Text
AVW 2120 Performing multi-document summarization of opinionated text has unique challenges because it is important to recognize that the same information may be presented in different ways from different viewpoints. In this talk, we will present a special kind of contrastive summarization approach intended to highlight this phenomenon and to help users digest conflicting opinions. To do this, we introduce a new graph-based algorithm, Comparative LexRank, to score sentences in a summary based on a combination of both representativeness of the collection and comparability between opposing viewpoints. We then address the issue of how to automatically discover and extract viewpoints from unlabeled text, and we experiment with a novel two-dimensional topic model for the task of unsupervised clustering of documents by viewpoint. Finally, we discuss how these two stages can be combined to both automatically extract and summarize viewpoints in an interesting way. Results are presented on two political opinion data sets. This project was joint work with ChengXiang Zhai and Roxana Girju. Bio: Michael Paul is a first-year Ph.D. student of Computer Science at the Johns Hopkins University and a member of the Center for Language and Speech Processing. He earned a B.S. from the University of Illinois at Urbana-Champaign in 2009. He is currently a Graduate Research Fellow of the National Science Foundation and a Dean's Fellow of the Whiting School of Engineering.
02 December 2010
Lecture 25: Mapping Text to Actions
There has been a bunch of work recently on trying to automatically find relationships between language and the "real world", where "real world" actually often means some sort of simulated environment. Here are a few papers along these lines:
In the first paper, which is the one we'll talk about most, the key idea is that of hierarchical plans, represented as a pcfg. For instance we might have a rule "OfferCup -> PickUpCup MoveCup ReleaseCup", where each of the subactions might either be atomic (correspond to actual muscle movements) or might itself be broken down further. (Qustion: how context free is this problem?)
The key ambiguity is due to the fact that actions do not select for exactly one interpretation, as in the Blicket example.
In this paper, they hand constructed a PCFG for actions and the key learning question was whether you could figure out the level of ambiguity automatically. The basic idea is to look at relative frequencies of occurance between lexical items and nodes in the PCFG tree for the actions.
- Fleischman, M. B. and Roy, D. Intentional Context in Situated Language Learning. Ninth Conference on Computational Natural Language Learning , Ann Arbor, MI. June 2005.
- Learning to Connect Language and Perception [Abstract] [PDF]
Raymond J. Mooney
In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI), Senior Member Paper, Chicago, IL, pp. 1598-1601, July 2008. - S.R.K. Branavan, Harr Chen, Luke Zettlemoyer and Regina Barzilay
"Reinforcement Learning for Mapping Instructions to Actions",
Proceedings of ACL, 2009. Best Paper Award - Learning semantic correspondences with less supervision.
Percy Liang, Michael I. Jordan, Dan Klein.
Association for Computational Linguistics and International Joint Conference on Natural Language Processing (ACL-IJCNLP), 2009. - Adam Vogel and Dan Jurafsky. 2010. Learning to Follow Navigational Directions. In Proceedings of ACL-2010, Uppsala, Sweden. [PDF]
In the first paper, which is the one we'll talk about most, the key idea is that of hierarchical plans, represented as a pcfg. For instance we might have a rule "OfferCup -> PickUpCup MoveCup ReleaseCup", where each of the subactions might either be atomic (correspond to actual muscle movements) or might itself be broken down further. (Qustion: how context free is this problem?)
The key ambiguity is due to the fact that actions do not select for exactly one interpretation, as in the Blicket example.
In this paper, they hand constructed a PCFG for actions and the key learning question was whether you could figure out the level of ambiguity automatically. The basic idea is to look at relative frequencies of occurance between lexical items and nodes in the PCFG tree for the actions.
01 December 2010
P4, grading rules
So P4 has been posted for a while. It is "optional" in the sense that your project grades will be based on your best three out of four grades. In particular, here's what will happen.
Suppose that your grades on P1, ..., P4 are a,b,c,d. (If you don't do P4, then d=0.)
Let x = [ (a + b + c + d) - min { a, b, c, d } ] / 3
Then x is your average grade on your best three projects.
We will use x as your overall project grade (i.e., since each project is weighed equally, it will be like you got a score of x on all FOUR of them).
Suppose that your grades on P1, ..., P4 are a,b,c,d. (If you don't do P4, then d=0.)
Let x = [ (a + b + c + d) - min { a, b, c, d } ] / 3
Then x is your average grade on your best three projects.
We will use x as your overall project grade (i.e., since each project is weighed equally, it will be like you got a score of x on all FOUR of them).
Subscribe to:
Posts (Atom)