Evolutionary learning of selection Hyper-Heuristics for choosing the right method in Text Classification Problems

JONATHAN DE JESUS ESTRELLA RAMIREZ

Please use this identifier to cite or link to this item: http://repositorio.ugto.mx/handle/20.500.12059/9604

Full metadata record

DC Field	Value	Language
dc.rights.license	http://creativecommons.org/licenses/by-nc-nd/4.0	es_MX
dc.contributor	JUAN CARLOS GOMEZ CARRANZA	es_MX
dc.creator	JONATHAN DE JESUS ESTRELLA RAMIREZ	es_MX
dc.date.accessioned	2023-10-05T16:31:52Z	-
dc.date.available	2023-10-05T16:31:52Z	-
dc.date.issued	2023-09	-
dc.identifier.uri	http://repositorio.ugto.mx/handle/20.500.12059/9604	-
dc.description.abstract	La clasificación de textos es una tarea común en diferentes áreas de aprendizaje de máquina y tiene muchas aplicaciones. Dentro de esta tarea, existen diversos problemas tales como filtrado de correo electrónico, detección de noticias falsas, detección de sentimientos, etc. Diferentes conjuntos de datos pueden ser usados dependiendo del problema, pero el método de clasificación óptimo puede ser específico para cada uno. Sin embargo, el proceso para encontrar este método óptimo es un problema complicado. En el ámbito de aprendizaje de máquina automatizado, diferentes enfoques han sido desarrollados para atacar este problema; los más recientes basados en aprendizaje profundo. En este proyecto de tesis, un modelo evolutivo, con el objetivo de generalizar la selección de métodos en problemas de clasificación de textos mediante hiper-heurísticas de selección, es presentado. Dado un conjunto de datos, este es caracterizado mediante un grupo de 16 meta-características estadísticas que representan su distribución de datos. Una hiper-heurística consta de un conjunto de reglas de la forma si-entonces, donde cada regla evalúa el grupo de meta-características para así determinar el método de clasificación adecuado para tal conjunto de datos. El modelo evolutivo parte de la creación de una población inicial de hiper-heurísticas, que con el paso de las generaciones, es evolucionada mediante operadores de cruza y mutación específicos. Durante cada generación, el desempeño de las hiper-heurísticas es evaluado mediante un grupo de conjuntos de datos de entrenamiento. En la última generación, la hiper-heurística con el mejor desempeño es seleccionada, y su generalización final es determinada con un grupo de conjuntos de datos independiente. Los resultados y análisis indican que la mejor hiper-heurística aprendida, además de contar con una buena generalización, es más eficiente que dos sistemas de aprendizaje de máquina automatizados del estado del arte, con desempeños generales muy similares.	es_MX
dc.language.iso	eng	es_MX
dc.publisher	Universidad de Guanajuato	es_MX
dc.rights	info:eu-repo/semantics/openAccess	es_MX
dc.subject.classification	CIS- Maestría en Ingeniería Eléctrica (Instrumentación y Sistemas Digitales)	es_MX
dc.title	Evolutionary learning of selection Hyper-Heuristics for choosing the right method in Text Classification Problems	es_MX
dc.type	info:eu-repo/semantics/masterThesis	es_MX
dc.creator.id	info:eu-repo/dai/mx/cvu/806210	es_MX
dc.subject.cti	info:eu-repo/classification/cti/7	es_MX
dc.subject.cti	info:eu-repo/classification/cti/33	es_MX
dc.subject.cti	info:eu-repo/classification/cti/3304	es_MX
dc.subject.keywords	Text classification	en
dc.subject.keywords	Machine learning	en
dc.subject.keywords	Deep learning	en
dc.subject.keywords	Hyper-heuristics	en
dc.subject.keywords	Data processing	en
dc.subject.keywords	Clasificación de textos	es_MX
dc.subject.keywords	Aprendizaje automático	es_MX
dc.subject.keywords	Aprendizaje profundo	es_MX
dc.subject.keywords	Hiperheurística	es_MX
dc.subject.keywords	Procesamiento de datos	es_MX
dc.contributor.id	info:eu-repo/dai/mx/cvu/37720	es_MX
dc.contributor.role	director	es_MX
dc.type.version	info:eu-repo/semantics/publishedVersion	es_MX
dc.description.abstractEnglish	Text classification is a common task in various areas of machine learning and has many applications. In this task, there are different problems such as email filtering, fake news detection, sentiment detection, etc. Different datasets can be used depending on the problem, but the optimal classification method can be specific for each one. Nevertheless, the process to find this optimal method is a complicated problem. In the scope of automated machine learning, different approaches have been developed to attack this problem; the most recent ones based on deep learning. In this thesis project, an evolutionary model, with the objective of generalizing the selection of methods in text classification problems through selection hyper-heuristics, is presented. Given a dataset, it is characterized by a group of 16 statistical metafeatures that represent its data distribution. A hyper-heuristic consists of a set of rules of the if-then form, where each rule evaluates the group of meta-features to determine the appropriate classification method for that dataset. The evolutionary model begins with the creation of an initial population of hyper-heuristics, which, over the generations, is evolved through specific crossover and mutation operators. During each generation, the performance of the hyper-heuristics is calculated using a group of training datasets. In the last generation, the hyper-heuristic with the best performance is selected, and its final generalization is determined with a group of independent datasets. The results and analysis indicate that the best learned hyper-heuristic, in addition to having a good generalization, is more efficient than two state-of the-art automated machine learning systems, with very similar overall performance.	en
Appears in Collections:	Maestría en Ingeniería Eléctrica (Instrumentación y Sistemas Digitales)

Files in This Item:

File	Description	Size	Format
JONATHÁN DE JESÚS ESTRELLA RAMÍREZ_Tesis24.pdf		6.95 MB	Adobe PDF	View/Open

Show simple item record