Using GPT-3 as a Text Data Augmentator for a Complex Text Detector

Mario Romero-Sandoval, Saul Calderon-Ramirez, Martin Solis

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

3 Citas (Scopus)

Resumen

In this work, we explore the problem of complex text detection. This problem is a frequent challenge when implementing text simplification pipelines. Identifying complex text segments can trigger text simplification models, making a better resource usage as state of the art Large Language Models are expensive to use. We focus in Spanish, as it is an under-represented language, given the lack of simple/complex paired datasets. We use a novel paired dataset in Spanish of financial educational texts to train and test our methods. To improve the performance of the classifier, we propose the usage of text simplifications generated with GPT-3 (data augmenter) to alleviate the need to label a large number of text segments as simple or complex. We use the BERT pre-trained model on Spanish data known as Spanish BERT (BETO) and explore the effect of augmenting target data in the model performance.

Idioma originalInglés
Título de la publicación alojada5th IEEE International Conference on BioInspired Processing, BIP 2023
EditorialInstitute of Electrical and Electronics Engineers Inc.
ISBN (versión digital)9798350330052
DOI
EstadoPublicada - 2023
Evento5th IEEE International Conference on BioInspired Processing, BIP 2023 - San Carlos, Alajuela, Costa Rica
Duración: 28 nov 202330 nov 2023

Serie de la publicación

Nombre5th IEEE International Conference on BioInspired Processing, BIP 2023

Conferencia

Conferencia5th IEEE International Conference on BioInspired Processing, BIP 2023
País/TerritorioCosta Rica
CiudadSan Carlos, Alajuela
Período28/11/2330/11/23

Huella

Profundice en los temas de investigación de 'Using GPT-3 as a Text Data Augmentator for a Complex Text Detector'. En conjunto forman una huella única.

Citar esto