The Potency of Consequential Validity Evidence in High-Stakes Assessment Practices

Youssef Oufela; Abdallah Ghaicha

doi:10.3968/13480

The Potency of Consequential Validity Evidence in High-Stakes Assessment Practices

Youssef Oufela, Abdallah Ghaicha

Abstract

The last few years have evidently witnessed the emergence of a growing body of research that underscores the importance of investigating the impact of test use (Imsa-ard, 2020; Pan & Roever, 2016; Saglam and Tsagari, 2022; Tsagari, 2011). In many contexts, the remarkably increased reliance on high-stakes testing and standardized assessments by educational authorities and policymakers has resulted in discontent and raised disquieting concerns about the consequences of these tests for different stakeholders. In fact, this is utterly one of the leading factors to the upsurge of research studies that investigate and evaluate the impact and repercussions of test use. The present article primarily discusses the dynamic role of consequential validity in high-stakes assessment practices. Firstly, it briefly draws on the historical and theoretical background underpinning the concept of consequential validity. Secondly, it sheds light on the contentious debate revolving around it in the existing literature. Thirdly, it shortly addresses the issue of bias and unfairness in the use of testing. Fourthly, it synthesizes findings from numerous studies pertaining to the unintended consequences of high-stakes assessments. Finally, it concludes with implications for different stakeholders; future researchers, policymakers, test designers and classroom teachers.

Keywords

Consequential validity; High-stakes assessment practices; Validity; High-stakes tests; Test consequences; Test use; Washback; Stakeholders

Full Text:

PDF

References

Abbas, A., & Thaheem, S., S. (2018). Washback Impact on teachers’ instruction resulting from students’ apathy. Research on Humanities and Social Sciences, 8(6), 45-54.

Aftab, A., Qureshi, S., & William, I. (2014). Investigating the washback effect of the Pakistani Intermediate English Examination. International Journal of English and Literature, 5(7), 149-154. https://doi.org/10.5897/ijel2013.0521

Al Amin, M., & Greenwood, J. (2018). The examination system in Bangladesh and its impact: on curriculum, students, teachers and society. Language Testing in Asia, 8(1), 1-18. https://doi.org/10.1186/s40468-018-0060-9

Alderson, J. C., & Hamp-Lyons, L. (1996). TOEFL preparation courses: a study of washback. Language Testing, 13(3), 280-297. https://doi.org/10.1177/026553229601300304

Alderson, J. C., & Wall, D. (1993). Does washback exist? Applied Linguistics, 14(2), 115-129. https://doi.org/10.1093/applin/14.2.115

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association, American Psychological Association, & National Council on Measurement in Education.

Bachman, l. & Palmer, A. S. (1996) Language Testing in Practice. Oxford: Oxford University Press.

Bachman, L. F. (1990). Fundamental Considerations in Language Testing. Oxford, UK Oxford University Press.

Bachman, L.F. & Palmer, A. (2010). Language Assessment in Practice: Developing Language Assessments and Justifying Their Use in the Real World, Oxford: Oxford University Press.

Banerjee, H. L. (2016). Test Fairness in Second Language Assessment. Studies in Applied Linguistics and TESOL, 16(1), 54-59. https://doi.org/10.7916/d88g8z90

Barnes, M. (2016). The Washback of the TOEFL iBT in Vietnam. Australian Journal of Teacher Education, 41(7), 158-174. https://doi.org/10.14221/ajte.2016v41n7.10

Black, P. (1998). Formative assessment: raising standards inside the classroom. The School Science Review, 80(291), 39-46. https://eric.ed.gov/?id=EJ580558

Black, P. (1998). Testing, Friend or Foe? The Theory and Practice of Assessment and Testing. London: The Falmer Press.

Black, P., & Wiliam, D. (1998). Assessment and Classroom Learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7-74. https://doi.org/10.1080/0969595980050102

Black, P., Harrison, C., Lee, C., Marshall, B., & William, D. (2003). Assessment for Learning - Putting It into Practice. Maidenhead, UK: Open University Press.

Blazer, C. (2011). Unintended Consequences of High-Stakes Testing. Information Capsule. Research Services, 1008, 1-21. http://files.eric.ed.gov/fulltext/ED536512.pdf

Borsboom, D., & Wijsen, L. D. (2016). Frankenstein’s validity monster: the value of keeping politics and science separated. Assessment in Education: Principles, Policy & Practice, 23(2), 281-283. https://doi.org/10.1080/0969594x.2016.1141750

Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2004). The Concept of Validity. Psychological Review, 111(4), 1061-1071. https://doi.org/10.1037/0033295x.111.4.1061

Brookhart, S. M. (2001). Successful students’ formative and summative uses of assessment information. Assessment in Education: Principles, Policy & Practice, 8(2), 153-169. https://doi.org/10.1080/09695940123775

Burns, T. D., Brockmeier, L. L., Green, R. B., Tsemunhu, R., & Rieger, A. (2020). Special educators’ views about the effects of high stakes testing. Journal of Liberal Arts and Humanities, 1(8), 48-62.

Chang, C., H. & Seow, T. (2018). Geographical education that matters - A Critical discussion of consequential validity in assessment of school geography. Geographical Education, 31, 31-40.

Chapelle, C. A. (1999). Validity in language assessment. Annual Review of Applied Linguistics, 19, 254-272. https://doi.org/10.1017/s0267190599190135

Cheng, L. and Curtis, A. (2004). Washback or washout: A review of the impact of testing on teaching and learning. In L. Cheng, Y. Watanabe, and Curtis (Eds.), Washback in Language Testing: Research Context and Methods (pp.3-17). Mahwah New Jersey USA: Lawrence Erlbaum Associates. https://doi.org/10.4324/9781410609731-9

Cheng, L., & Hong, W. (2004). Understanding professional challenges faced by Chinese teachers of English. TESL-EJ, 7(4). http://files.eric.ed.gov/fulltext/EJ1068090.pdf

Cizek, G. J. (2012). Defining and distinguishing validity: Interpretations of score meaning and justifications of test use. Psychological Methods, 17(1), 31-43. https://doi.org/10.1037/a0026975

Cizek, G. J., Bowen, D., & Church, K. (2010). Sources of Validity Evidence for Educational and Psychological Tests: A Follow-Up Study. Educational and Psychological Measurement, 70(5), 732-743. https://doi.org/10.1177/0013164410379323

Cronbach L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281-302. https://doi.org/10.1037/h0040957

Dayal, H. C., & Lingam, G. I. (2015). Fijian teachers’ conceptions of assessment. Australian Journal of Teacher Education, 40(8), 42-58. https://doi.org/10.14221/ajte.2015v40n8.3

Dong, M., & Liu, X. (2022). Impact of learners’ perceptions of a high-stakes test on their learning motivation and learning time allotment: A study on the washback mechanism. Heliyon, 8(12), 1-9. https://doi.org/10.1016/j.heliyon.2022.e11910

Dong, M., Fan, J., & Xu, J. (2021). Differential washback effects of a high-stakes test on students’ English learning process: evidence from a large-scale stratified survey in China. Asia Pacific Journal of Education, 43(1), 252-269. https://doi.org/10.1080/02188791.2021.1918057

Fulcher, G., & Davidson, F. (2007). Language Testing and Assessment: An Advanced Resource Book. Routledge

Ghaicha, A. (2016). Theoretical framework for educational assessment: A Synoptic review. Journal of Education and Practice, 7(24), 212-231. http://files.eric.ed.gov/fulltext/EJ1112912.pdf

Ghaicha, A., & Oufela, Y. (2020). Backwash in higher education: Calibrating assessment and swinging the pendulum From Summative Assessment. Canadian Social Science, 16(11), 1-6. https://doi.org/10.3968/11905

Ghaicha, A., & Oufela, Y. (2021). Moroccan EFL secondary school teachers’ current practices and challenges of formative assessment. Canadian Social Science, 17(1), 1-15. https://doi.org/10.3968/12015

Gipps, C. V. (1994). Beyond Testing: Towards a Theory of Educational Assessment. London, the Falmer Press.

Gunn, J., Al-Bataineh, A., & Al-Rub, M. A. (2016). Teachers’ perceptions of high-stakes testing. International Journal of Teaching and Education, 4(2), 49-62. https://doi.org/10.20472/te.2016.4.2.003

Hazaea, A. N., & Tayeb, A., Y. (2018). Washback effect of LOBELA on EFL teaching at preparatory year of Najran University. International Journal of Educational Investigations, 5(3), 1-14.

Hubley, A. M., & Zumbo, B. D. (2011). Validity and the consequences of test interpretation and use. Social Indicators Research, 103(2), 219-230. https://doi.org/10.1007/s11205-011-9843-4

Hughes, A. (2003). Testing for language teachers. Cambridge: Cambridge University Press.

Iliescu, D., & Greiff, S. (2021). On consequential validity [Editorial]. European Journal of Psychological Assessment, 37(3), 163-166. https://doi.org/10.1027/1015-5759/a000664

Im, G., Shin, D., & Cheng, L. (2019). Critical review of validation models and practices in language testing: their limitations and future directions for validation research. Language Testing in Asia, 9(1), 1-26. https://doi.org/10.1186/s40468-019-0089-4

Imsa-ard, P. (2020). Voices from Thai EFL teachers: perceptions and beliefs towards the English Test in the National Examination in Thailand. Language Education and Acquisition Research Network Journal, 13(2), 269-289.

Jaenes, P., V. (2017). Testing writing: the washback on “Cambridge English: first” preparation courses in southern Spain. Working Papers on English Studies, 24(4), 75-113.

Kane, M. J. (2013). Validating the Interpretations and Uses of Test Scores. Journal of Educational Measurement, 50(1), 1-73. https://doi.org/10.1111/jedm.12000

Kunnan, A. J. (2004). Test fairness. In M. Milanovic & C. Weir (Eds.), European Year of Languages Conference Papers, Barcelona (pp.27-48). Cambridge University Press.

Lane, S. (2014). Validity evidence based on testing consequences. PubMed, 26(1), 127-135. https://doi.org/10.7334/psicothema2013.258

Larsson, M., & Olin-Scheller, C. (2020). Adaptation and resistance: washback effects of the national test on upper secondary Swedish teaching. The Curriculum Journal, 31(4), 687-703. https://doi.org/10.1002/curj.31

McNamara, T. (2000). Language Testing. Oxford University Press.

McNamara, T., & Roever, C. (2006a). Language Testing: The social Dimension. Oxford: Blackwell.

McNamara, T., & Roever, C. (2006b). Language testing: the social dimension. International Journal of Applied Linguistics, 16(2), 242-258. https://doi.org/10.1111/j.1473-4192.2006.00117.x

McNamara, T., & Ryan, K. A. (2011). Fairness versus justice in language testing: The Place of English literacy in the Australian Citizenship Test. Language Assessment Quarterly, 8(2), 161-178. https://doi.org/10.1080/15434303.2011.565438

Mehrens, W. A. (1997). The Consequences of consequential validity. Educational Measurement: Issues and Practice, 16(2), 16-18. https://doi.org/10.1111/j.1745-3992.1997.tb00588.x

Meijer, H., Brouwer, J., Hoekstra, R., & Strijbos, J. (2022). Exploring Construct and Consequential Validity of Collaborative Learning Assessment in Higher Education. Small Group Research, 53(6), 891-925. https://doi.org/10.1177/10464964221095545

Messick, S, (1995). Standards of validity and the validity of standardizing performance assessment. Educational Measurement: Issues and Practice, 14(4), 5-8.

Messick, S. (1987). Validity. Educational Testing Service, Princeton, N. J., 1-209.

Messick, S. (1990). Validity of test interpretation and use. Educational Testing Service, Princeton, N. J., 1-33.

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13-23.

Messick, S. (1994). Validity of psychological assessment: Validation of inferences from Pearson’s responses and performances as scientific inquiry into score meaning. Educational Testing Service, Princeton, N. J., 1-33.

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741-749. https://doi.org/10.1037/0003-066x.50.9.741

Messick, S. (1996). Validity and washback in language testing. Language Testing, 13(3), 241-256. https://doi.org/10.1177/026553229601300302

Messick, S. (1998). Test validity: A Matter of consequence. Social Indicators Research, 45, 35-44.

Miller, M. D., Linn, R.L. and Gronlund, N.E. (2009). Measurement and Assessment in Teaching. 10^th Edition, Pearson Education Ltd., Upper Saddle River.

Moss, P. (1992). Shifting conceptions of validity in educational measurement: implications for performance assessment. Review of Educational Research, 62(3), 229-258. https://doi.org/10.3102/00346543062003229

Moss, P. (2016). Shifting the focus of validity for test use. Assessment in Education: Principles, Policy & Practice, 23(2), 236-251. https://doi.org/10.1080/0969594x.2015.1072085

Onaiba, A., E. (2015). Impact of a public examination change on teachers’ perceptions and attitudes towards their classroom teaching practices. Journal of Research & Method in Education, 5(1), 70-78.

Pan, Y., & Roever, C. (2016). Consequences of test use: a case study of employers’ voice on the social impact of English certification exit requirements in Taiwan. Language Testing in Asia, 6(1), 1-21. https://doi.org/10.1186/s40468-016-0029-5

Popham, W.J. (1997). Consequential validity: Right concern-wrong concept. Educational Measurement: Issues and Practice, 21(1), 9-13. https://doi.org/10.1111/j.1745-3992.1997.tb00586.x

Reckase, M. D. (2005). Consequential validity from the test developer’s perspective. Educational Measurement: Issues and Practice, 17(2), 13-16. https://doi.org/10.1111/j.1745-3992.1998.tb00827.x

Saglam, A. L. G., & Tsagari, D. (2022). Evaluating perceptions towards the consequential validity of Integrated Language Proficiency Assessment. Languages, 7(1), 65. https://doi.org/10.3390/languages7010065

Salehi, H., Yunus, M., M., and Salehi, Z. (2012). Teachers’ Perceptions of High-Stakes Tests: A Washback Study. International Journal of Social Science and Humanity, 2(1), 70-74.

Segool, N., Carlson, J. E., Goforth, A. N., Von Der Embse, N., & Barterian, J. A. (2013). Heightened test anxiety among young children: Elementary school students’ anxious responses to high-stakes testing. Psychology in the Schools, 50(5), 489-499. https://doi.org/10.1002/pits.21689

Shaw, S., and Crisp, V. (2011). Tracing the evolution of validity in educational measurement: past issues and contemporary challenges. Research Matters, 11, 4-19.

Shaw, S., and Crisp, V. (2015). Reflections on a framework for validation - Five years on. Research Matters, 19, 31-37.

Shepard, L. A. (1997). The Centrality of test use and consequences for test validity. Educational Measurement: Issues and Practice, 16(2), 5-24. https://doi.org/10.1111/j.1745-3992.1997.tb00585.x

Shohamy, E., Donitsa-Schmidt, S., & Ferman, I. (1996). Test impact revisited: washback effect over time. Language Testing, 13(3), 298-317. https://doi.org/10.1177/026553229601300305

Sireci, S. G. (1998). Gathering and analyzing content validity data. Educational Assessment, 5(4), 299-321. https://doi.org/10.1207/s15326977ea0504_2

Slomp, D., Corrigan, J. A., & Sugimoto, T. (2014). A Framework for using consequential validity evidence in evaluating large-scale writing assessments: A Canadian Study. Research in the Teaching of English, 48(3), 276-302. https://opus.uleth.ca/bitstream/10133/3660/1/David_H_Slomp.pdf

Spratt, M. (2005). Washback and the classroom: the implications for teaching and learning of studies of washback from exams. Language Teaching Research, 9(1), 5-29. https://doi.org/10.1191/1362168805lr152oa

Stobart, G., & Eggen, T. J. H. M. (2012). High-stakes testing - value, fairness and consequences. Assessment in Education: Principles, Policy & Practice, 19(1), 1-6. https://doi.org/10.1080/0969594x.2012.639191

Stoneman, B. W. H. (2006). The impact of an exit English test on Hong Kong undergraduates: A study investigating the effects of test status on students’ test preparation behaviours. ProQuest Dissertations and Theses. The Hong Kong Polytechnic University. Retrieved from: https://theses.lib.polyu.edu.hk/handle/200/5489

Sukyadi, D., & Mardiani, R. (2011). The Washback Effect of the English National Examination (ENE) on English Teachers’ Classroom Teaching and Students’ Learning, K@ta, 13(1), 96-111. https://doi.org/10.9744/kata.13.1.96-111

Sultana, N. (2018). Investigating the relationship between washback and curriculum alignment: A literature review. Canadian Journal for New Scholars in Education, 9(2), 151-158. https://journalhosting.ucalgary.ca/index.php/cjnse/article/download/53107/pdf

Tajeddin, Z., Khatib, M., & Mahdavi, M. (2022). Critical language assessment literacy of EFL teachers: Scale construction and validation. Language Testing, 39(4), 649-678. https://doi.org/10.1177/02655322211057040

Tsagari, D. (2011). Washback of a high-stakes English exam on teachers’ perceptions and practices. Selected Papers on Theoretical and Applied Linguistics, 19, 431-445. https://doi.org/10.26262/istal.v19i0.5521

Van Bao, N., & Cho, Y. (2022). How the High-Stakes and College Entrance Exam Affects Students’ Perception: Implication on Management Policy in Higher Education. East Asian Journal of Business Economics, 10(1), 83-94.

Van der Walt, J. L., & Steyn, H. S. (2008). The validation of language tests. Stellenbosch Papers in Linguistics, 38, 191-204. https://doi.org/10.5774/38-0-29

Volante, L. (2004). Teaching to the test: What every educator and policymaker should know. Canadian Journal of Educational Administration and Policy, 35, 1-6. http://files.eric.ed.gov/fulltext/EJ848235.pdf

Volante, L., & Beckett, D. (2011). Formative Assessment and the Contemporary Classroom: Synergies and Tensions between Research and Practice. Canadian Journal of Education, 34(2), 239-255. http://files.eric.ed.gov/fulltext/EJ936752.pdf

Von Der Embse, N. P., Jester, D., Roy, D., & Post, J. (2018). Test anxiety effects, predictors, and correlates: A 30-year meta-analytic review. Journal of Affective Disorders, 227, 483-493. https://doi.org/10.1016/j.jad.2017.11.048

Wall, D., & Alderson, J. C. (1993). Examining washback: the Sri Lankan Impact Study. Language Testing, 10(1), 41-69. https://doi.org/10.1177/026553229301000103

Wallace, M. P. (2018). Fairness and justice in L2 classroom assessment: Perceptions from test takers. Journal of Asia TEFL, 15(4), 1051-1064. http://www.doi.org/10.18823/asiatefl.2018.15.4.11.1051

Wei, W. (2017). A Critical review of washback Studies: Hypothesis and evidence. In Second language learning and teaching. Springer International Publishing, 49-67. https://doi.org/10.1007/978-3-319-32601-6_4

Weir, C. J. (2005). Language Testing and Validation: An Evidence-based Approach. Basingstoke: Palgrave Macmillan.

Zhang, X. (2021). Stakeholders’ test perceptions on test reform. Studies in Educational Evaluation, 70, 1-9.

DOI: http://dx.doi.org/10.3968/13480

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Share us to:

Please send your manuscripts to hess@cscanada.net，or hess@cscanada.org for consideration. We look forward to receiving your work.

Articles published in Higher Education of Social Science are licensed under Creative Commons Attribution 4.0 (CC-BY).

HIGHER EDUCATION OF SOCIAL SCIENCE Editorial Office

Address: 1055 Rue Lucien-L'Allier, Unit #772, Montreal, QC H3G 3C4, Canada.
Telephone: 1-514-558 6138
Website: Http://www.cscanada.net Http://www.cscanada.org
E-mail: caooc@hotmail.com; office@cscanada.net

Username
Password
Remember me