Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic Keyword Extraction

Lorenzo Pozzi, Isaac Alpizar-Chacon, Sergey Sosnovsky

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

1 Cita (Scopus)

Resumen

As textbooks evolve into digital platforms, they open a world of opportunities for Artificial Intelligence in Education (AIED) research. This paper delves into the novel use of textbooks as a source of high-quality labeled data for automatic keyword extraction, demonstrating an affordable and efficient alternative to traditional methods. By utilizing the wealth of structured information provided in textbooks, we propose a methodology for annotating corpora across diverse domains, circumventing the costly and time-consuming process of manual data annotation. Our research presents a deep learning model based on Bidirectional Encoder Representations from Transformers (BERT) fine-tuned on this newly labeled dataset. This model is applied to keyword extraction tasks, with the model’s performance surpassing established baselines. We further analyze the transformation of BERT’s embedding space before and after the fine-tuning phase, illuminating how the model adapts to specific domain goals. Our findings substantiate textbooks as a resource-rich, untapped well of high-quality labeled data, underpinning their significant role in the AIED research landscape.

Idioma originalInglés
Páginas (desde-hasta)66-77
Número de páginas12
PublicaciónCEUR Workshop Proceedings
Volumen3444
EstadoPublicada - 2023
Evento5th International Workshop on Intelligent Textbooks, iTextbooks 2023 - Tokyo, Japón
Duración: 3 jul 2023 → …

Huella

Profundice en los temas de investigación de 'Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic Keyword Extraction'. En conjunto forman una huella única.

Citar esto