I'm starting to use agentic AI to stand in as my first audience when I build something. We build a lot now with AI and vibe-coding, and getting to first user interaction is a really important milestone that teaches us a lot. For many reasons, we can't always get real users. So I'm asking agentic AI basically to take on a certain role and work with whatever I've built, without much more instruction than that. It's interesting to see where it interacts in a way that you expect, and where it deviates.
In education, I like this especially for rubrics. Building good, objective rubrics is hard. It requires a lot of sample inputs and then scoring those inputs. Agentic AI as user stand-ins can get a better rubric ready for real users more quickly.
thanks John! So are you saying that once you create a rubric, you have one set of agents do the assignment and another set review and evaluate the assignment?
That's how I'm using it. My use case is optimizing for AI-powered assessment, so the AI doing the rubric scoring is an important part. I realize that isn't the case for everyone, so AI acting as the student but faculty or TA's doing the scoring is perfectly valid too.
The part I keep human-in-the-loop is analyzing the scores. At that point you see patterns of issues you might have missed at the start. Things like "this criteria is weighted too heavily", or "these two criteria are overlapping, making it hard to score. I'll separate them", "this criteria is too vague. The score aren't reflective of what I expect."
Concrete example: Building a language app yesterday and I found patterns of AI users diverting the app into English when we wanted them to speak all in the target language. But they were still getting full credit for "completing" a successful conversation, because that part of the rubric wasn't specific enough to say "successful conversation in the language"
I've been using Undermind.ai a lot for targeted scientific literature searches and I'm trying to come up with ways to use it to teach biomedical graduate students. It has a Search Agent for search, a Report Agent to write out specific reports and a Generalist. You have to interact extensively with the Search and Report agents to shape what you want, but it is a guided interaction asking you to define your goals in detail. Just querying Claude or ChatGPT is very prompt-specific and you have to separate the wheat from the chaff, then you have to worry about hallucinations especially for small areas of research where there is not much information. The Undermind agents guide you through the typical thought process of building a query and I'm thinking it could be a good assistant in teaching students how to think through how to define what is relevant or irrelevant for their work. I think well structured agentic apps could be really helpful for education. Bonus for me is that Undermind queries help me design longer prompts for Claude Opus that sometimes goes off on tangents to find really interesting papers I may not have found myself.
I'm starting to use agentic AI to stand in as my first audience when I build something. We build a lot now with AI and vibe-coding, and getting to first user interaction is a really important milestone that teaches us a lot. For many reasons, we can't always get real users. So I'm asking agentic AI basically to take on a certain role and work with whatever I've built, without much more instruction than that. It's interesting to see where it interacts in a way that you expect, and where it deviates.
In education, I like this especially for rubrics. Building good, objective rubrics is hard. It requires a lot of sample inputs and then scoring those inputs. Agentic AI as user stand-ins can get a better rubric ready for real users more quickly.
thanks John! So are you saying that once you create a rubric, you have one set of agents do the assignment and another set review and evaluate the assignment?
That's how I'm using it. My use case is optimizing for AI-powered assessment, so the AI doing the rubric scoring is an important part. I realize that isn't the case for everyone, so AI acting as the student but faculty or TA's doing the scoring is perfectly valid too.
The part I keep human-in-the-loop is analyzing the scores. At that point you see patterns of issues you might have missed at the start. Things like "this criteria is weighted too heavily", or "these two criteria are overlapping, making it hard to score. I'll separate them", "this criteria is too vague. The score aren't reflective of what I expect."
Concrete example: Building a language app yesterday and I found patterns of AI users diverting the app into English when we wanted them to speak all in the target language. But they were still getting full credit for "completing" a successful conversation, because that part of the rubric wasn't specific enough to say "successful conversation in the language"
I've been using Undermind.ai a lot for targeted scientific literature searches and I'm trying to come up with ways to use it to teach biomedical graduate students. It has a Search Agent for search, a Report Agent to write out specific reports and a Generalist. You have to interact extensively with the Search and Report agents to shape what you want, but it is a guided interaction asking you to define your goals in detail. Just querying Claude or ChatGPT is very prompt-specific and you have to separate the wheat from the chaff, then you have to worry about hallucinations especially for small areas of research where there is not much information. The Undermind agents guide you through the typical thought process of building a query and I'm thinking it could be a good assistant in teaching students how to think through how to define what is relevant or irrelevant for their work. I think well structured agentic apps could be really helpful for education. Bonus for me is that Undermind queries help me design longer prompts for Claude Opus that sometimes goes off on tangents to find really interesting papers I may not have found myself.
Lance! Great read! Have you heard of LLM Wikis?
https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
Might interesting for you, in a similar vein of organizing personal knowledge