Dinner at the Columbia Cottage. Join us for food and conversation – all are welcome! RSVP on Facebook.
3 pm (location TBA)
A computational linguistics/natural language processing presentation by Nizar Habash of the Center for Computational Learning Systems:
Automatic Diacritization of Arabic Text
Arabic is written without certain orthographic symbols, called diacritics, which represent among other things short vowels. The restoration of diacritics to written Arabic is an important processing step for several computational linguistic applications, including training language models for automatic speech recognition, text-to- speech generation, and so on. We present here a new diacritization system for written Arabic based on a new combination of known techniques: a lexical resource for morphological analysis, a multi-classifier tagger and a lexeme language model. This new diacritization system outperforms the best previously published results by reducing the word error rate to 14.9% and reducing the diacritic error rate to 4.8%. The presentation includes a detailed error analysis classifying the type of errors resolved by each of the different modules used.
Time and location TBA
Peter Connor of Barnard College will give a lecture on translation. More details to come.