22 December 2010

Summer internships at JHU/COE

... in case anyone is reading this, I just got the following email. I know this program and it's good.

Please share with graduate and undergraduate students looking for
summer internships.

The Johns Hopkins University Human Language Technology Center of
Excellence (COE) is seeking candidates for our summer internship
program as part of SCALE, our yearly summer workshop (Summer Camp for
Advanced Language Exploration.) Interns will work on research in
speech, text and graph processing as part of a larger workshop team.

Internship positions are fully funded, including travel, living
expenses and stipend.

This summer's workshop topic is "Combining Speech, Text and Graphs,"
which will draw from a number of research areas:

*** Natural Language Processing ***
- machine translation and NLP on informal texts
- opinion extraction from informal texts
- topic models
- information extraction
- domain adaptation and transfer learning
- multi-lingual learning

*** Speech ***
- topic detection
- keyword spotting
- resource constrained speech processing
- robust speech processing

*** Graphs ***
- finding communities in social networks
- anomaly detection
- learning on large scale graphs
- graph-based topic modeling

Candidates should be currently enrolled in an undergraduate or
graduate degree program. Applications submitted by Friday Jan 14, 2011
will receive full consideration.

For more information: http://web.jhu.edu/HLTCOE/scaleinterns2011.html

Applicants will be required to obtain a US security clearance, which
requires US citizenship.  If you do not already have a clearance, we
will work with you to obtain one.

21 December 2010

Grades (Almost) Posted, Semester Over

Hi all --

I figure you're likely to read to the end to find out about grades, so before I get to that, let me just take this chance to say that I really enjoyed this class this semester.  You were all great.  Everyone did awesome on both the final exam and final projects, and I'm really thrilled.  If you couldn't tell already, I love language stuff and I encourage you all to continue on and learn everything there is to know.  Philip is teaching the follow-on course in the Spring, which should be awesome.  I'm also running an unofficial seminar on "Large Data" stuff in the spring; you can get more info here (sign up for the mailing list if you're interested).  Anyway, I had a great time teaching; I hope you had fun in class.

Regarding grades, I need to submit them by midnight tonight.  And since I don't plan on staying up until midnight, this really means tonight by about 11p.

I've posted "unofficial" grades on grades.cs.umd.edu, so you can see what your grade is.  Note that the "total" column on that spreadsheet is completely misleading, since it doesn't include all the weirdo grading rules (dropping of worst projects/homeworks, inclusion of extra credit, etc.).  I have all the numbers in a separate spreadsheet, so if something looks odd to you and you'd like the full computation, please let me know.  It's of course possible to change grades later, but it's a pain, so I'd rather hear about any issues now.

That's it.  Have a great break and I hope to see some of you again in the Spring!

 -h


ps., per second comment below, I added an extra column, MyOverall. The grading is as follows:

98 = A+
95 = A
92 = A-
88 = B+
85 = B
82 = B-
78 = C+
75 = C
72 = C-

Note that your score will be exactly one of these numbers: This is just my way of encoding your grade. This isn't actually what your score was :).

12 December 2010

Interested in Language Science?

Just got the following email from Colin Philips in Linguistics / Cognitive Neuroscience.  This is regarding language science. Please see below... feel free to email me if you have questions:

I'm hoping that you can help us to reach students in the CS/iSchool universe who might be interested in taking advantage of our unique interdisciplinary language science opportunities. We're particularly interested in reaching 1st and 2nd year students. We'll be holding an informational meeting for students tomorrow at 1pm in 1108B Marie Mount, but I'd be happy to meet at another time with anybody who is interested but not available at that time. We'll tell students about the opportunities and benefits, and also talk about the resources that are available to help them, including new plans to help them to develop interdisciplinary training plans that are both innovative and feasible. Csilla Kajtar already circulated a message about this, but we know that people often just ignore messages sent to mailing lists.

As you know, the closer integration of NLP and cognitive work in language is right at the top of our list of opportunities-that-we'd-be-idiots-not-to-pursue, and student training is one of the best ways to achieve this.

09 December 2010

Final Exam, Due Dec 17, 3:30pm

Here's a copy of the final exam as well as the source LaTeX.  Please feel free to either print it and do it by hand, or to do it in LaTeX and print the solution.  You may turn it in any time between now and 3:30pm on Dec 17.  (Because our official exam time is 1:30-3:30 on Dec 17.)  Please hand it in in one of three ways: (1) give it to me in person in my office or otherwise; (2) slide it completely under my office door (AVW 3227); (3) give it to Amit in person.

If you have any clarification questions, please post them here.

06 December 2010

05 December 2010

P4: Small error in example numbers....

There are some sentences in the training data that contain a "TAB" character.  The reasonable thing to do would be just to consider this as whitespace.  For some reason I didn't do this.  In my example of DF computation, I did this.  Which somewhat changes all the remaining numbers.

Instead of rerunning everything I'll just tell you what the updated top frequency words are if you do it "properly."  In general, for this assignment, don't worry if your numbers are slightly different than mine -- it may have to do with how you handle the non-ascii characters that appear once in a while in the data.

   2999 .
   2999 ,
   2998 of
   2997 the
   2997 and
   2994 in
   2989 to
   2988 a
   2885 as
   2875 by
   2862 for
   2860 )
   2859 (
   2836 with
   2832 that
   2801 ''
   2788 ``
   2759 on
   2717 from