Assessing Multiple Student Skill Dimensions Using Large Language Models.
sdars-extra-image-long. The Challenge: Traditional student assessments often focus narrowly on academic performance and grades. They overlook a broader array of skills crucial for overall student development, especially for children aged 6 to 12. The Need: Parents and educators seek a more holistic view, needing insight into both cognitive and non-cognitive skills like problem-solving, creative thinking, teamwork, and leadership..
Beyond A. IMG_8909. kappframework-oHc-new.
Beyond A DEMO. 3. IMG_8909. kappframework-oHc-new.
Beyond A. 3. IMG_8909. kappframework-oHc-new. adobe--sdars-assessment-video.
Beyond A. 4. IMG_8909. kappframework-oHc-new.
Leveraging Large Language Models. Recent advances in LLMs and prompt engineering offer new opportunities to broaden student assessment. LLMs have been widely applied in education (e.g., AI-assisted writing, personalized learning), but their potential for evaluating multiple skill dimensions remains underexplored. Previous LLM applications in assessment mainly focused on cognitive skills, overlooking creativity, leadership, teamwork, and social-emotional growth. To achieve a comprehensive evaluation, it's necessary to analyze open-ended inputs like school reports, teacher comments, parent feedback, and self-evaluations..
Prompt Engineering Strategy. Role Context Dimension Information Prompt Output: Dimension scores, explanations, strengths, improvements, and recommendations. The "think+rubric+justification" strategy guides the model to interpret input, apply rubric reasoning, and justify the score, reducing errors and enabling appropriate reasoning mechanisms. LLMs were instructed to reason about overall capabilities/ deficiencies before outputting the final score..
Skill Dimension Assessment & Recommendation Sys. Fig.1_v20250409.
kappframework-oHc-new Physical Motor=40. Dimension Score Assessment Accuracy.
Methods. Skill Dimensions - Based on established frameworks + Chinese cultural factors, Leadership added to reflect cultural values. Expert Involvement - Experts Rubric refinement & data annotation, Substantial inter-rater agreement (Kappa 0.71). LLM Selection - Selected for reasoning capability & deployment suitability, including GPT-4o (2024-08-06). Dataset – Anonymized reports from 30 selected schools and generate variations skill dimensions 980 student profiles. Covers 1 to 6 skill dimensions + Leadership with Teamwork subset. Fine-tuning Supervised Fine-Tuning (SFT) on GPT-4o. Targeted Leadership dimension assessment. Dataset: 445 training, 115 validation records..