Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models

H Park, S Lee, G Gim, Y Kim, D Kim, C Park - arXiv preprint arXiv …, 2024 - arxiv.org
To address the challenges associated with data processing at scale, we propose Dataverse,
a unified open-source Extract-Transform-Load (ETL) pipeline for large language models …