Egitim ve Bilim, vol.49, no.217, pp.201-223, 2024 (SSCI)
This study aimed to conduct Differential Item Functioning (DIF) determination studies using different methods on items in the Program for International Student Assessment (PISA) 2018 reading test in Turkey and compare the performance of the methods used. In the analyses, considering the individualized test design, item packages in the same test(s) in the core, first-stage, and second-stage sections were used. The second package (Core RC2), second package (Stage 1-R12H), and third package (Stage 2-R23H) were selected for the core, first, and second stages, respectively. Three partially scored items in this package were excluded from the analysis, and 33 common items with score of 1-0 were included in the analysis. This study included 147 Turkish students who responded to package of items. The variables of gender (ST004D01T), school location (SC001Q01TA) and index of economic, cultural and social status (ESCS) were drawn from student- and school-scale data and combined with the items of the cognitive test. Prior to data analysis, the dataset was organized, missing data and outliers were examined, and the assumptions of the theories were tested. Within the scope of the study, the Mantel-Haenszel (MH), Logistic Regression (LR), SIBTEST, and Raju’s Area Measures methods were employed for two categorical variables, and the Generalized MH, Generalized LR, and Generalized Lord’s χ2 methods were used for three categorical variables. According to gender variables, two, four, and three items were found to show DIF in the MH, SIBTEST, and LR methods, respectively, whereas 17 items were found to display DIF according to the unsigned area test, and seven items were found to display DIF according to the signed area test in Raju’s Area Measures. According to the ESCS variable, two and one items manifested DIF in MH and LR, respectively, while 15 items were found to manifest DIF according to the unsigned area test and eight items manifested DIF according to the signed area test in Raju’s Area Measures. None of the items showed DIF when using the SIBTEST method. According to the school location variable, one, two and 28 item were found to show DIF in Generalized MH, Generalized LR and Generalized Lord’s χ2 method, respectively. The results of the study indicate that although the Classical Test Theory (CTT) -based- and Item Response Theory (IRT)- based DIF methods are compatible, they differ in the level of DIF. IRT- based methods detect more DIF items than CTT- based methods. Additionally, similar results were obtained using the Generalized MH and LR methods.