Educational Data Mining: Discovering Principal Factors for Better Academic Performance

Abstract

The past decades have witnessed the vigorous development of new technologies in the educational field, among which Educational Data Mining (EDM) played an indispensable role in pedagogical improvement, enabling researchers to discover useful knowledge from education-oriented databases. By clustering student-related and parents-related variables into three categories: demographic and family background information (Demographic), self-perceived willingness for education (Willingness), perceived family interaction (Interaction) and utilizing various EDM methodologies such as linear regression, regression tree, random forest, and neural network, this study is the first attempt to conduct a comprehensive and quantitative investigation into the principal factors that influence Chinese junior high school students’ academic performance on a nationally representative survey, the China Education Panel Survey (CEPS) dataset. Additionally, this study further summarizes, explains, and compares different principal factors discovered by different EDM techniques, and proposes two practical strategies for mitigating China’s educational inequality.

Publication
2021 the 3rd International Conference on Big Data Engineering and Technology