Chinese Natural Language Processing and Speech ProcessingOverviewWe work on a wide variety of research in Chinese Natural Language Processing and speech processing, including word segmentation, part-of-speech tagging, syntactic and semantic parsing, machine translation, disfluency detection, prosody, and other areas. We provide softwares for Chinese word segmentation, Chinese parsing and Chinese part-of-speech tagging. More details on each topic:
Chinese Word Segmentation![]() Parsing and Grammatical RelationsThe Chinese parser is based on the ACL 2003 paper: Roger Levy and Christopher D. Manning. 2003. Is it harder to parse Chinese, or the Chinese Treebank?. ACL 2003.
In addition to PCFG parsing, the Stanford Chinese parser can also output
a set of Chinese grammatical relations that describes more
semantically abstract relations between words. An example Chinese sentence looks like: ![]() Pi-Chuan Chang, Huihsin Tseng, Dan Jurafsky, and Christopher D. Manning. 2009. Discriminative Reordering with Chinese Grammatical Relations Features. Part-of-Speech TaggingThe Stanford part-of-speech tagger takes word-segmented Chinese text as input and assigns a part of speech to each word (and other tokens), such as a noun or a verb. This Chinese POS tagger is designed for LDC style word segmented texts, and adopts a subset of features from:Huihsin Tseng, Daniel Jurafsky, Christopher Manning. 2005. Morphological features help POS tagging of unknown words across language varieties.Its overall accuracy is 93.65% and the unknown word accuracy is 84.84%. Named Entity Recognition
We have done extensive research on improving Chinese NER performance
using semi-supervised learning methods with bilingual parallel text.
Our results yield significant (~3% F1) improvements over strong CRF baselines
that are enhanced with distributional similarity features. ![]() Mengqiu Wang and Christopher D. Manning. 2013. Cross-lingual Pseudo-Projected Expectation Regularization for Weakly Supervised Learning. Transactions of ACL 2013Software Instructions: Follow these instructions to reproduce experiments reported in these papers. Speech ProcessingOur Chinese speech research has focused on areas like the study and detection of disfluencies (filled pauses like uh and word fragments), prosody, and the detection of speech acts. People
Software
PublicationsCross-lingual Pseudo-Projected Expectation Regularization for Weakly Supervised Learning
[pdf]
Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition
[pdf]
Effective Bilingual Constraints for Semi-supervised Learning of Named Entity Recognizers
[pdf]
Named Entity Recognition with Bilingual Constraints
[pdf]
Discriminative Reordering with Chinese Grammatical Relations Features
[pdf]
Disambiguating "DE" for Chinese-English Machine Translation
[pdf]
Optimizing Chinese Word Segmentation for Machine Translation Performance
[pdf]
Stanford University's Chinese-to-English Statistical Machine Translation System for the 2008 NIST Evaluation
[pdf]
Detection of Word Fragments in Mandarin Telephone Conversation
[pdf]
A Conditional Random Field Word Segmenter for SIGHAN Bakeoff 2005
[pdf]
Morphological features help POS tagging of unknown words across language varieties
[pdf]
Accent Detection and Speech Recognition for Shanghai-Accented Mandarin
[pdf]
A preliminary study of Mandarin filled pauses
[pdf]
Detection of Questions in Chinese Conversation
[pdf]
Parsing Arguments of Nominalizations in English and Chinese
[pdf
]
Pradhan, Sameer, Honglin Sun, Wayne Ward, James H. Martin, and
Daniel Jurafsky
Is it harder to parse Chinese, or the Chinese Treebank?
[pdf] |