How do the kids speak? Improving educational use of text mining with child-directed language models

P Organisciak, M Newman, D Eby, S Acar… - … and Learning Sciences, 2023 - emerald.com
Purpose Most educational assessments tend to be constructed in a close-ended format,
which is easier to score consistently and more affordable. However, recent work has …

Giving shape to large digital libraries through exploratory data analysis

P Organisciak, BM Schmidt… - Journal of the Association …, 2022 - Wiley Online Library
The emergence of large multi‐institutional digital libraries has opened the door to aggregate‐
level examinations of the published word. Such large‐scale analysis offers a new way to …

[PDF][PDF] Research access to in-copyright texts in the humanities

P Organisciak, JS Downie - Information and Knowledge …, 2021 - library.oapen.org
In 2004, John Unsworth noted that the primary constraint to humanities in the digital age is
the current copyright landscape, limiting which primary sources can be accessed, shared …

Uncovering black fantastic: Piloting a word feature analysis and machine learning approach for genre classification

NN Parulian, R Dubnicek, G Worthey… - Proceedings of the …, 2022 - Wiley Online Library
Given the size of digital library collections and the inconsistencies in their genre‐related
bibliographic metadata, as digital libraries grow and their contents are opened for …

CH-Bench: a user-oriented benchmark for systems for efficient distant reading (design, performance, and insights)

J Willkomm, M Raster, M Schäler, K Böhm - International Journal on …, 2023 - Springer
Data science deals with the discovery of information from large volumes of data. The data
studied by scientists in the humanities include large textual corpora. An important objective …

Characterizing same work relationships in large-scale digital libraries

P Organisciak, S Shetenhelm, DFA Vasques… - … in Contemporary Society …, 2019 - Springer
As digital libraries grow, they are prompting new consideration into same-work relationships.
They provide unique opportunities for resource discovery, but their scale and aggregated …

[PDF][PDF] " The library is open!": Open data and an open API for the HathiTrust Digital Library.

JA Walsh, G Layne-Worthey, J Jett, B Capitanu… - CHR, 2023 - jawalsh.github.io
This paper describes the history, policy, semantics, and uses of the HathiTrust Research
Center Extracted Features dataset, an open-access representation of the 17+ million volume …

Moving Past Metadata: Improving Digital Libraries with Content‐Based Methods

A VandenBosch, BM Schmidt… - Proceedings of the …, 2021 - Wiley Online Library
The growth of text mining and corpus analytic scholarship over large digital libraries brings
to light the issues created by text duplication and variation within collections that are not …

Improving text relationship modelling with artificial data

P Organisciak, M Ryan - Journal of Information Science, 2024 - journals.sagepub.com
Data augmentation uses artificially created examples to support supervised machine
learning, adding robustness to the resulting models and helping to account for limited …

Providing pin-point page-level precision to 1 trillion tokens of text for workset creation

D Bainbridge, JS Downie, B Capitanu - … of the 18th ACM/IEEE on Joint …, 2018 - dl.acm.org
We report on the work undertaken developing a web environment that allows users to
search over 1 trillion tokens of text--down to the page-level--of the HathiTrust Part-of-Speech …