Citation recommendation: approaches and datasets

M Färber, A Jatowt - International Journal on Digital Libraries, 2020 - Springer
Citation recommendation describes the task of recommending citations for a given text. Due
to the overload of published scientific works in recent years on the one hand, and the need …

[HTML][HTML] Ungendered writing: Writing styles are unlikely to account for gender differences in funding rates in the natural and technical sciences

SPJM Horbach, JW Schneider, M Sainte-Marie - Journal of Informetrics, 2022 - Elsevier
Academia has traditionally faced a substantial gender gap in staff positions and career path
progression. Women do not advance up the academic career ladder in the same rate as …

Improving the accessibility of scientific documents: Current state, user needs, and a system solution to enhance scientific PDF accessibility for blind and low vision …

LL Wang, I Cachola, J Bragg, EYY Cheng… - arXiv preprint arXiv …, 2021 - arxiv.org
The majority of scientific papers are distributed in PDF, which pose challenges for
accessibility, especially for blind and low vision (BLV) readers. We characterize the scope of …

A large dataset of scientific text reuse in Open-Access publications

L Gienapp, W Kircheis, B Sievers, B Stein, M Potthast - Scientific Data, 2023 - nature.com
We present the Webis-STEREO-21 dataset, a massive collection of S cientific Te xt Re use
in O pen-access publications. It contains 91 million cases of reused text passages found in …

[HTML][HTML] Using machine learning to extract information and predict outcomes from reports of randomised trials of smoking cessation interventions in the Human …

R West, F Bonin, J Thomas, AJ Wright… - Wellcome Open …, 2023 - ncbi.nlm.nih.gov
Background Using reports of randomised trials of smoking cessation interventions as a test
case, this study aimed to develop and evaluate machine learning (ML) algorithms for …

Automatic extraction of TEI structures in digitized lexical resources using conditional random fields

M Khemakhem, L Foppiano, L Romary - electronic lexicography, eLex …, 2017 - hal.science
An important number of digitized lexical resources remain unexploited due to their
unstructured content. Manually structuring such resources is a costly task given their …

Scia11y: Converting scientific papers to accessible html

LL Wang, I Cachola, J Bragg, EYY Cheng… - Proceedings of the 23rd …, 2021 - dl.acm.org
We present SciA11y, a system that renders inaccessible scientific paper PDFs into HTML.
SciA11y uses machine learning models to extract and understand the content of scientific …

[HTML][HTML] R-classify: Extracting research papers' relevant concepts from a controlled vocabulary

T Aggarwal, A Salatino, F Osborne, E Motta - Software Impacts, 2022 - Elsevier
In the past few decades, we saw a proliferation of scientific articles available online. This
data-rich environment offers several opportunities but also challenges, since it is …

Enhancing usability for automatically structuring digitised dictionaries

M Khemakhem, A Herold, L Romary - GLOBALEX workshop at LREC …, 2018 - hal.science
The last decade has seen a rapid development of the number of NLP tools which have been
made available to the community. The usability of several e-lexicography tools represents a …

The digitization of historical astrophysical literature with highly localized figures and figure captions

JP Naiman, PKG Williams, A Goodman - International Journal on Digital …, 2024 - Springer
Scientific articles published prior to the “age of digitization” in the late 1990s contain figures
which are “trapped” within their scanned pages. While progress to extract figures and their …