搜索结果: 1-15 共查到“文学 Corpora”相关记录32条 . 查询时间(0.031 秒)
Mining Parallel Corpora from Sina Weibo and Twitter
Mining Parallel Corpora Sina Weibo Twitter
2016/7/7
Microblogs such as Twitter, Facebook, and Sina Weibo (China’s equivalent of Twitter) are a
remarkable linguistic resource. In contrast to content from edited genres such as newswire,
microblogs cont...
Reflections on the Penn Discourse TreeBank,Comparable Corpora,and Complementary Annotation
Penn Discourse TreeBank Comparable Corpora Complementary Annotation
2015/9/14
The Penn Discourse Treebank (PDTB) was released to the public in 2008. It remains the largest manually annotated corpus of discourse relations to date. Its focus on discourse relations that are either...
Evaluating Centering for Information Ordering Using Corpora
Evaluating Centering Information Ordering Using Corpora
2015/9/7
In this article we discuss several metrics of coherence defined using centering theory and investigate the usefulness of such metrics for information ordering in automatic text generation. We estimate...
Constructing Corpora for the Development and Evaluation of Paraphrase Systems
Paraphrase Systems Constructing Corpora
2015/9/6
Automatic paraphrasing is an important component in many natural language processing tasks.
In this article we present a new parallel corpus with paraphrase annotations. We adopt a definition o...
Orthographic Errors in Web Pages:Toward Cleaner Web Corpora
Orthographic Errors Web Pages Cleaner Web Corpora
2015/9/1
Since the Web by far represents the largest public repository of natural language texts, recent experiments, methods, and tools in the area of corpus linguistics often use the Web as a corpus. For app...
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Machine Translation Performance Exploiting Non-Parallel Corpora
2015/8/31
We present a novel method for discovering parallel sentences in comparable, non-parallel corpora.We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether o...
Parallel Text Processing: Alignment and Use of Translation Corpora
Translation Corpora Alignment
2015/8/26
One can’t help but be fascinated by two sentences in parallel translation, the selfsame
meaning diffused, distributed, diverging across alternative expressions. In his Le Ton
beau de Marot: In Prais...
Exclamatives and heightened emotion: Extracting pragmatic generalizations from large corpora
corpus pragmatics exclamatives expressives logistic regression
2015/6/15
Exclamatives like What a dump!, Wow!, and Boy, you’ve grown! are, when uttered in context, rich in information about the speaker’s attitudes. Drawing on evidence from about 100, 000 online product rev...
The pragmatics of expressive content: Evidence from large corpora
expressives intensives antihonorifics corpus pragmatics logistic regression Chinese, English German Japanese
2015/6/15
We use large collections of online product reviews, in Chinese, English, German, and Japanese, to study the use conditions of expressives (swears, antihonorifics, intensives).The distributional eviden...
Developing linguistic theories using annotated corpora
Developing linguistic theories annotated corpora
2015/6/15
This paper aims to carve out a place for corpus research within theoretical linguistics and psycholinguistics. We argue that annotated corpora naturally complement native speaker intuitions and contro...
AUTOMATIC ACQUISITION OF A LARGE SUBCATEGORIZATION DICTIONARY FROM CORPORA
AUTOMATIC ACQUISITION LARGE SUBCATEGORIZATION DICTIONARY CORPORA
2015/6/12
This paper presents a new method for producing a dictionary of subcategorization frames from unlabelled text corpora. It is shown that statistical filtering of the results of a finite state parser run...
Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora
Labeled LDA supervised topic model credit attribution multi-labeled corpora
2015/6/12
A significant portion of the world’s text is tagged by readers on social bookmarking websites. Credit attribution is an inherent problem in these corpora because most pages have multiple tags, but the...
Who Leads Whom:Topical Lead-Lag Analysis across corpora
Who Leads Whom Topical Lead Lag Analysis across corpora
2015/6/10
Understanding the lead/lag of communities in the context of a given topic is an interesting problem in computational social science. In this work, we study the particular problem of whether research g...
Unsupervised morphological analysis of small corpora: First experiments with Kilivila
Unsupervised morphological analysis small corpora First experiments with Kilivila
2015/4/21
Language documentation involves linguistic analysis of the collected material, which is typically done manually. Automatic methods for language processing usually require large corpora. The method pre...
Prospects for e-grammars and endangered languages corpora
e-grammars endangered languages corpora
2015/4/21
This contribution explores the potentials of combining corpora of language use data with language description in e-grammars (or digital grammars). We present three directions of ongoing research and d...