An overview on machine translation evaluation

L Han - arXiv preprint arXiv:2202.11027, 2022 - arxiv.org
Since the 1950s, machine translation (MT) has become one of the important tasks of AI and
development, and has experienced several different periods and stages of development …

Medtem2. 0: Prompt-based temporal classification of treatment events from discharge summaries

Y Cui, L Han, G Nenadic - … of the 61st Annual Meeting of the …, 2023 - aclanthology.org
Discharge summaries are comprehensive medical records that encompass vital information
about a patient's hospital stay. A crucial aspect of discharge summaries is the temporal …

HOPE: A task-oriented and human-centric evaluation framework using professional post-editing towards more effective MT evaluation

S Gladkoff, L Han - arXiv preprint arXiv:2112.13833, 2021 - arxiv.org
Traditional automatic evaluation metrics for machine translation have been widely criticized
by linguists due to their low accuracy, lack of transparency, focus on language mechanics …

Investigating massive multilingual pre-trained machine translation models for clinical domain via transfer learning

L Han, G Erofeev, I Sorokina, S Gladkoff… - arXiv preprint arXiv …, 2022 - arxiv.org
Massively multilingual pre-trained language models (MMPLMs) are developed in recent
years demonstrating superpowers and the pre-knowledge they acquire for downstream …

Topic modelling of swedish newspaper articles about coronavirus: a case study using latent dirichlet allocation method

B Griciūtė, L Han, G Nenadic - 2023 IEEE 11th International …, 2023 - ieeexplore.ieee.org
Topic Modelling (TM) is a natural language processing (NLP) method for discovering topics
in a collection of documents. Being an unsupervised method, it is a valuable tool when trying …

Toward more effective human evaluation for machine translation

BS Fuentes, G Foster, M Freitag… - Proceedings of the 2nd …, 2022 - aclanthology.org
Improvements in text generation technologies such as machine translation have
necessitated more costly and time-consuming human evaluation procedures to ensure an …

cushLEPOR: customising hLEPOR metric using optuna for higher agreement with human judgments or pre-trained language model LaBSE

L Han, I Sorokina, G Erofeev, S Gladkoff - arXiv preprint arXiv:2108.09484, 2021 - arxiv.org
Human evaluation has always been expensive while researchers struggle to trust the
automatic metrics. To address this, we propose to customise traditional metrics by taking …

Student's t-Distribution: On Measuring the Inter-Rater Reliability When the Observations are Scarce

S Gladkoff, L Han, G Nenadic - arXiv preprint arXiv:2303.04526, 2023 - arxiv.org
In natural language processing (NLP) we always rely on human judgement as the golden
quality evaluation method. However, there has been an ongoing debate on how to better …

A Comparative Study on Automatic Coding of Medical Letters with Explainability

J Glen, L Han, P Rayson, G Nenadic - arXiv preprint arXiv:2407.13638, 2024 - arxiv.org
This study aims to explore the implementation of Natural Language Processing (NLP) and
machine learning (ML) techniques to automate the coding of medical letters with visualised …

Readability metrics for machine translation in Dutch: Google vs. Azure & IBM

C van Toledo, M Schraagen, F van Dijk, M Brinkhuis… - Applied Sciences, 2023 - mdpi.com
This paper introduces a novel method to predict when a Google translation is better than
other machine translations (MT) in Dutch. Instead of considering fidelity, this approach …