Academia has traditionally faced a substantial gender gap in staff positions and career path progression. Women do not advance up the academic career ladder in the same rate as …
The majority of scientific papers are distributed in PDF, which pose challenges for accessibility, especially for blind and low vision (BLV) readers. We characterize the scope of …
We present the Webis-STEREO-21 dataset, a massive collection of S cientific Te xt Re use in O pen-access publications. It contains 91 million cases of reused text passages found in …
Background Using reports of randomised trials of smoking cessation interventions as a test case, this study aimed to develop and evaluate machine learning (ML) algorithms for …
An important number of digitized lexical resources remain unexploited due to their unstructured content. Manually structuring such resources is a costly task given their …
We present SciA11y, a system that renders inaccessible scientific paper PDFs into HTML. SciA11y uses machine learning models to extract and understand the content of scientific …
In the past few decades, we saw a proliferation of scientific articles available online. This data-rich environment offers several opportunities but also challenges, since it is …
M Khemakhem, A Herold, L Romary - GLOBALEX workshop at LREC …, 2018 - hal.science
The last decade has seen a rapid development of the number of NLP tools which have been made available to the community. The usability of several e-lexicography tools represents a …
JP Naiman, PKG Williams, A Goodman - International Journal on Digital …, 2024 - Springer
Scientific articles published prior to the “age of digitization” in the late 1990s contain figures which are “trapped” within their scanned pages. While progress to extract figures and their …