Thanks for this - I enjoyed reading. My feeling is that until a an LLM can continuously update, a sort of data flywheel I think it's been referred as, then user trust will be compromised. Systems need agility to improve just as you would expect a teacher to correct a mistake (and I've made a few mistakes in my time) Teachers/users also need to know this is the case & have evidence of that continuous updating. I think it also highlights teachers aren't just the sage on the stage.
Sorry if this is just repeating what you've already said :)
Love this perspectiv, you've really hit the nail on the head about our unrealistic zero-tolerance for EdTech errors compared to how we treat human mistakes. As someone who teaches both math and CS, I see this tension daily; it's almost like we expect a perfect, bug-free initial commit every time without any thought for the version control or rapid patch deployment needed in real world software.
Lovely article. It reminds me of the Self-Serving Bias: we're more willing to excuse our own failures but we don't extend that generosity to others, and in this case being overly critical of a mistake that others could make.
In addition, the Anchoring Effect, where we believe that we rationally analyse every factor objectively when determining value. In reality, initial perceptions are pervasive and will affect later perceptions and decisions .
This is not to say dissemination of knowledge via LLMs are not without their issues.
Great post, and very much up my street as someone who spent a few years working in edtech before moving into assessment (including high-stakes computer-adaptive assessment).
One concern I have is that bad questions like your chemistry example can still perform well psychometrically. And so they won't be thrown out by in-test trialling. The psychometric measures of validity are pretty limited and the assumptions they make are not always valid for curriculum-based educational assessment, so there's still a need for subject matter experts ruling on the validity of questions.
I loved this, thanks. I guess the one point to add is that the problem in the original Robbins question is that the basic disciplinary grammar is wrong. So we wouldn't "forgive" it if it were a human, in a way that we would forgive an error of omission, transcription or calculation. Even an error of accuracy could be forgiven at times, but the way this question is written strongly indicates the entity that wrote it fundamentally doesn't understand the subject. It's not a 90-99% error, it's a 0-10% one!
Thanks for this - I enjoyed reading. My feeling is that until a an LLM can continuously update, a sort of data flywheel I think it's been referred as, then user trust will be compromised. Systems need agility to improve just as you would expect a teacher to correct a mistake (and I've made a few mistakes in my time) Teachers/users also need to know this is the case & have evidence of that continuous updating. I think it also highlights teachers aren't just the sage on the stage.
Sorry if this is just repeating what you've already said :)
Thanks again for a great post.
Love this perspectiv, you've really hit the nail on the head about our unrealistic zero-tolerance for EdTech errors compared to how we treat human mistakes. As someone who teaches both math and CS, I see this tension daily; it's almost like we expect a perfect, bug-free initial commit every time without any thought for the version control or rapid patch deployment needed in real world software.
Lovely article. It reminds me of the Self-Serving Bias: we're more willing to excuse our own failures but we don't extend that generosity to others, and in this case being overly critical of a mistake that others could make.
In addition, the Anchoring Effect, where we believe that we rationally analyse every factor objectively when determining value. In reality, initial perceptions are pervasive and will affect later perceptions and decisions .
This is not to say dissemination of knowledge via LLMs are not without their issues.
Great post, and very much up my street as someone who spent a few years working in edtech before moving into assessment (including high-stakes computer-adaptive assessment).
One concern I have is that bad questions like your chemistry example can still perform well psychometrically. And so they won't be thrown out by in-test trialling. The psychometric measures of validity are pretty limited and the assumptions they make are not always valid for curriculum-based educational assessment, so there's still a need for subject matter experts ruling on the validity of questions.
I loved this, thanks. I guess the one point to add is that the problem in the original Robbins question is that the basic disciplinary grammar is wrong. So we wouldn't "forgive" it if it were a human, in a way that we would forgive an error of omission, transcription or calculation. Even an error of accuracy could be forgiven at times, but the way this question is written strongly indicates the entity that wrote it fundamentally doesn't understand the subject. It's not a 90-99% error, it's a 0-10% one!
Indeed. I did skirt over the fundamental appropriateness of the question!