[PDF][PDF] Clustering by authorship within and across documents

E Stamatatos, M Tschnuggnall… - … Notes Papers of …, 2016 - repository.uantwerpen.be
Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR …, 2016repository.uantwerpen.be
The vast majority of previous studies in authorship attribution assume the existence of
documents (or parts of documents) labeled by authorship to be used as training instances in
either closed-set or open-set attribution. However, in several applications it is not easy or
even possible to find such labeled data and it is necessary to build unsupervised attribution
models that are able to estimate similarities/differences in personal style of authors. The
shared tasks on author clustering and author diarization at PAN 2016 focus on such …
Abstract
The vast majority of previous studies in authorship attribution assume the existence of documents (or parts of documents) labeled by authorship to be used as training instances in either closed-set or open-set attribution. However, in several applications it is not easy or even possible to find such labeled data and it is necessary to build unsupervised attribution models that are able to estimate similarities/differences in personal style of authors. The shared tasks on author clustering and author diarization at PAN 2016 focus on such unsupervised authorship attribution problems. The former deals with single-author documents and aims at grouping documents by authorship and establishing authorship links between documents. The latter considers multi-author documents and attempts to segment a document into authorial components, a task strongly associated with intrinsic plagiarism detection. This paper presents an overview of the two tasks including evaluation datasets, measures, results, as well as a survey of a total of 10 submissions (8 for author clustering and 2 for author diarization).
repository.uantwerpen.be
以上显示的是最相近的搜索结果。 查看全部搜索结果