Muehlhoff, Rainer; Henningsen, Marte
Chatbots im Schulunterricht: Wir testen das Fobizz-Tool zur automatischen Bewertung von Hausaufgaben Unveröffentlicht
Preprint auf arXiv:2412.06651, 2024.
Abstract | Links | BibTeX | Schlagwörter: AI, artificial intelligence, chatbots, correction, feedback, O
@unpublished{Muehlhoff2024,
title = {Chatbots im Schulunterricht: Wir testen das Fobizz-Tool zur automatischen Bewertung von Hausaufgaben},
author = {Rainer Muehlhoff and Marte Henningsen},
url = {https://doi.org/10.48550/arXiv.2412.06651
https://media.ccc.de/v/38c3-chatbots-im-schulunterricht},
doi = {10.48550/arXiv.2412.06651},
year = {2024},
date = {2024-12-09},
urldate = {2024-12-09},
issue = {arXiv:2412.06651},
abstract = {This study examines the AI-powered grading tool "AI Grading Assistant" by the German company Fobizz, designed to support teachers in evaluating and providing feedback on student assignments. Against the societal backdrop of an overburdened education system and rising expectations for artificial intelligence as a solution to these challenges, the investigation evaluates the tool's functional suitability through two test series. The results reveal significant shortcomings: The tool's numerical grades and qualitative feedback are often random and do not improve even when its suggestions are incorporated. The highest ratings are achievable only with texts generated by ChatGPT. False claims and nonsensical submissions frequently go undetected, while the implementation of some grading criteria is unreliable and opaque. Since these deficiencies stem from the inherent limitations of large language models (LLMs), fundamental improvements to this or similar tools are not immediately foreseeable. The study critiques the broader trend of adopting AI as a quick fix for systemic problems in education, concluding that Fobizz's marketing of the tool as an objective and time-saving solution is misleading and irresponsible. Finally, the study calls for systematic evaluation and subject-specific pedagogical scrutiny of the use of AI tools in educational contexts.},
howpublished = {Preprint auf arXiv:2412.06651},
keywords = {AI, artificial intelligence, chatbots, correction, feedback, O},
pubstate = {published},
tppubtype = {unpublished}
}
This study examines the AI-powered grading tool "AI Grading Assistant" by the German company Fobizz, designed to support teachers in evaluating and providing feedback on student assignments. Against the societal backdrop of an overburdened education system and rising expectations for artificial intelligence as a solution to these challenges, the investigation evaluates the tool's functional suitability through two test series. The results reveal significant shortcomings: The tool's numerical grades and qualitative feedback are often random and do not improve even when its suggestions are incorporated. The highest ratings are achievable only with texts generated by ChatGPT. False claims and nonsensical submissions frequently go undetected, while the implementation of some grading criteria is unreliable and opaque. Since these deficiencies stem from the inherent limitations of large language models (LLMs), fundamental improvements to this or similar tools are not immediately foreseeable. The study critiques the broader trend of adopting AI as a quick fix for systemic problems in education, concluding that Fobizz's marketing of the tool as an objective and time-saving solution is misleading and irresponsible. Finally, the study calls for systematic evaluation and subject-specific pedagogical scrutiny of the use of AI tools in educational contexts.
Hobert, Sebastian
How Are You, Chatbot? Evaluating Chatbots in Educational Settings – Results of a Literature Review Proceedings Article
In: Pinkwart, Niels; Konert, Johannes (Hrsg.): DELFI 2019, S. 259–270, Gesellschaft für Informatik e.V., Bonn, 2019, ISBN: 978-3-88579-691-6.
Abstract | Links | BibTeX | Schlagwörter: A, chatbots, evaluation, pedagogical conversational agents, technology-enhanced learning
@inproceedings{Hobert2019,
title = {How Are You, Chatbot? Evaluating Chatbots in Educational Settings – Results of a Literature Review},
author = {Sebastian Hobert},
editor = {Niels Pinkwart and Johannes Konert},
url = {https://dx.doi.org/10.18420/delfi2019_289},
doi = {10.18420/delfi2019_289},
isbn = {978-3-88579-691-6},
year = {2019},
date = {2019-09-16},
booktitle = {DELFI 2019},
pages = {259–270},
publisher = {Gesellschaft für Informatik e.V.},
address = {Bonn},
abstract = {Evaluation studies are essential for determining the utilization of technology-enhanced learning systems. Prior research often focuses on evaluating specific factors like the technology adoption or usability aspects. However, it needs to be questioned if evaluating only specific factors is appropriate in each case. The aim of this research paper is to outline which methods are suited for evaluating technology-enhanced learning systems in interdisciplinary research domains. Specifically, we focus our analysis on pedagogical conversational agents – i.e. learning systems that interact with learners using natural language. For instance, in addition to technology acceptance, further factors like learning success are more important in this case. Based on this assumption, we analyze the current state-of-the-art literature of pedagogical conversational agents to identify evaluation objectives, procedures and measuring instruments. Afterward, we use the results to propose a guideline for evaluations of pedagogical conversational agents.},
keywords = {A, chatbots, evaluation, pedagogical conversational agents, technology-enhanced learning},
pubstate = {published},
tppubtype = {inproceedings}
}
Evaluation studies are essential for determining the utilization of technology-enhanced learning systems. Prior research often focuses on evaluating specific factors like the technology adoption or usability aspects. However, it needs to be questioned if evaluating only specific factors is appropriate in each case. The aim of this research paper is to outline which methods are suited for evaluating technology-enhanced learning systems in interdisciplinary research domains. Specifically, we focus our analysis on pedagogical conversational agents – i.e. learning systems that interact with learners using natural language. For instance, in addition to technology acceptance, further factors like learning success are more important in this case. Based on this assumption, we analyze the current state-of-the-art literature of pedagogical conversational agents to identify evaluation objectives, procedures and measuring instruments. Afterward, we use the results to propose a guideline for evaluations of pedagogical conversational agents.