TY - GEN
T1 - Using GPT-3 as a Text Data Augmentator for a Complex Text Detector
AU - Romero-Sandoval, Mario
AU - Calderon-Ramirez, Saul
AU - Solis, Martin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In this work, we explore the problem of complex text detection. This problem is a frequent challenge when implementing text simplification pipelines. Identifying complex text segments can trigger text simplification models, making a better resource usage as state of the art Large Language Models are expensive to use. We focus in Spanish, as it is an under-represented language, given the lack of simple/complex paired datasets. We use a novel paired dataset in Spanish of financial educational texts to train and test our methods. To improve the performance of the classifier, we propose the usage of text simplifications generated with GPT-3 (data augmenter) to alleviate the need to label a large number of text segments as simple or complex. We use the BERT pre-trained model on Spanish data known as Spanish BERT (BETO) and explore the effect of augmenting target data in the model performance.
AB - In this work, we explore the problem of complex text detection. This problem is a frequent challenge when implementing text simplification pipelines. Identifying complex text segments can trigger text simplification models, making a better resource usage as state of the art Large Language Models are expensive to use. We focus in Spanish, as it is an under-represented language, given the lack of simple/complex paired datasets. We use a novel paired dataset in Spanish of financial educational texts to train and test our methods. To improve the performance of the classifier, we propose the usage of text simplifications generated with GPT-3 (data augmenter) to alleviate the need to label a large number of text segments as simple or complex. We use the BERT pre-trained model on Spanish data known as Spanish BERT (BETO) and explore the effect of augmenting target data in the model performance.
KW - GPT-3
KW - Text Complexity Detection
KW - Text Simplification
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85184349315&partnerID=8YFLogxK
U2 - 10.1109/BIP60195.2023.10379347
DO - 10.1109/BIP60195.2023.10379347
M3 - Contribución a la conferencia
AN - SCOPUS:85184349315
T3 - 5th IEEE International Conference on BioInspired Processing, BIP 2023
BT - 5th IEEE International Conference on BioInspired Processing, BIP 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on BioInspired Processing, BIP 2023
Y2 - 28 November 2023 through 30 November 2023
ER -