Möller, Moritz; Nirmal, Gargi; Fabietti, Dario; Stierstorfer, Quintus; Zakhvatkin, Mark; Sommerfeld, Holger; Schütt, Sven
Revolutionising Distance Learning: A Comparative Study of Learning Progress with AI-Driven Tutoring Sonstige
Preprint, 2024.
Abstract | Links | BibTeX | Schlagwörter: A, artificial intelligence, higher education, large language models, university teaching
@misc{Möller2024,
title = {Revolutionising Distance Learning: A Comparative Study of Learning Progress with AI-Driven Tutoring},
author = {Moritz Möller and Gargi Nirmal and Dario Fabietti and Quintus Stierstorfer and Mark Zakhvatkin and Holger Sommerfeld and Sven Schütt},
url = {https://arxiv.org/abs/2403.14642v1
https://doi.org/10.48550/arXiv.2403.14642
},
doi = {10.48550/arXiv.2403.14642},
year = {2024},
date = {2024-02-21},
issue = {arXiv:2403.14642v1},
abstract = {Generative AI is expected to have a vast, positive impact on education; however, at present, this potential has not yet been demonstrated at scale at university level. In this study, we present first evidence that generative AI can increase the speed of learning substantially in university students. We tested whether using the AI-powered teaching assistant Syntea affected the speed of learning of hundreds of distance learning students across more than 40 courses at the IU International University of Applied Sciences. Our analysis suggests that using Syntea reduced their study time substantially--by about 27% on average--in the third month after the release of Syntea. Taken together, the magnitude of the effect and the scalability of the approach implicate generative AI as a key lever to significantly improve and accelerate learning by personalisation.},
howpublished = {Preprint},
keywords = {A, artificial intelligence, higher education, large language models, university teaching},
pubstate = {published},
tppubtype = {misc}
}
Generative AI is expected to have a vast, positive impact on education; however, at present, this potential has not yet been demonstrated at scale at university level. In this study, we present first evidence that generative AI can increase the speed of learning substantially in university students. We tested whether using the AI-powered teaching assistant Syntea affected the speed of learning of hundreds of distance learning students across more than 40 courses at the IU International University of Applied Sciences. Our analysis suggests that using Syntea reduced their study time substantially--by about 27% on average--in the third month after the release of Syntea. Taken together, the magnitude of the effect and the scalability of the approach implicate generative AI as a key lever to significantly improve and accelerate learning by personalisation.
Balepur, Nishant; Ravichander, Abhilasha; Rudinger, Rachel
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question? Sonstige
In-progress preprint, 2024.
Abstract | Links | BibTeX | Schlagwörter: artificial intelligence, KI, large language models, LLM, multiple choice, O
@misc{balepur2024artifacts,
title = {Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?},
author = {Nishant Balepur and Abhilasha Ravichander and Rachel Rudinger},
url = {https://doi.org/10.48550/arXiv.2402.12483},
doi = {10.48550/arXiv.2402.12483 Focus to learn more},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
abstract = {Multiple-choice question answering (MCQA) is often used to evaluate large language models (LLMs). To see if MCQA assesses LLMs as intended, we probe if LLMs can perform MCQA with choices-only prompts, where models must select the correct answer only from the choices. In three MCQA datasets and four LLMs, this prompt bests a majority baseline in 11/12 cases, with up to 0.33 accuracy gain. To help explain this behavior, we conduct an in-depth, black-box analysis on memorization, choice dynamics, and question inference. Our key findings are threefold. First, we find no evidence that the choices-only accuracy stems from memorization alone. Second, priors over individual choices do not fully explain choices-only accuracy, hinting that LLMs use the group dynamics of choices. Third, LLMs have some ability to infer a relevant question from choices, and surprisingly can sometimes even match the original question. We hope to motivate the use of stronger baselines in MCQA benchmarks, the design of robust MCQA datasets, and further efforts to explain LLM decision-making.},
howpublished = {In-progress preprint},
keywords = {artificial intelligence, KI, large language models, LLM, multiple choice, O},
pubstate = {published},
tppubtype = {misc}
}
Multiple-choice question answering (MCQA) is often used to evaluate large language models (LLMs). To see if MCQA assesses LLMs as intended, we probe if LLMs can perform MCQA with choices-only prompts, where models must select the correct answer only from the choices. In three MCQA datasets and four LLMs, this prompt bests a majority baseline in 11/12 cases, with up to 0.33 accuracy gain. To help explain this behavior, we conduct an in-depth, black-box analysis on memorization, choice dynamics, and question inference. Our key findings are threefold. First, we find no evidence that the choices-only accuracy stems from memorization alone. Second, priors over individual choices do not fully explain choices-only accuracy, hinting that LLMs use the group dynamics of choices. Third, LLMs have some ability to infer a relevant question from choices, and surprisingly can sometimes even match the original question. We hope to motivate the use of stronger baselines in MCQA benchmarks, the design of robust MCQA datasets, and further efforts to explain LLM decision-making.