The emergence of large multi‐institutional digital libraries has opened the door to aggregate‐ level examinations of the published word. Such large‐scale analysis offers a new way to …
In 2004, John Unsworth noted that the primary constraint to humanities in the digital age is the current copyright landscape, limiting which primary sources can be accessed, shared …
Given the size of digital library collections and the inconsistencies in their genre‐related bibliographic metadata, as digital libraries grow and their contents are opened for …
J Willkomm, M Raster, M Schäler, K Böhm - International Journal on …, 2023 - Springer
Data science deals with the discovery of information from large volumes of data. The data studied by scientists in the humanities include large textual corpora. An important objective …
P Organisciak, S Shetenhelm, DFA Vasques… - … in Contemporary Society …, 2019 - Springer
As digital libraries grow, they are prompting new consideration into same-work relationships. They provide unique opportunities for resource discovery, but their scale and aggregated …
This paper describes the history, policy, semantics, and uses of the HathiTrust Research Center Extracted Features dataset, an open-access representation of the 17+ million volume …
A VandenBosch, BM Schmidt… - Proceedings of the …, 2021 - Wiley Online Library
The growth of text mining and corpus analytic scholarship over large digital libraries brings to light the issues created by text duplication and variation within collections that are not …
P Organisciak, M Ryan - Journal of Information Science, 2024 - journals.sagepub.com
Data augmentation uses artificially created examples to support supervised machine learning, adding robustness to the resulting models and helping to account for limited …
We report on the work undertaken developing a web environment that allows users to search over 1 trillion tokens of text--down to the page-level--of the HathiTrust Part-of-Speech …