90 results on '"Automated grading"'
Search Results
2. Automated magnetic resonance imaging‐based grading of the lumbar intervertebral disc and facet joints.
- Author
-
Nikpasand, Maryam, Middendorf, Jill M., Ella, Vincent A., Jones, Kristen E., Ladd, Bryan, Takahashi, Takashi, Barocas, Victor H., and Ellingson, Arin M.
- Subjects
ZYGAPOPHYSEAL joint ,CONVOLUTIONAL neural networks ,JOINTS (Anatomy) ,PEARSON correlation (Statistics) ,INTERVERTEBRAL disk - Abstract
Background: Degeneration of both intervertebral discs (IVDs) and facet joints in the lumbar spine has been associated with low back pain, but whether and how IVD/joint degeneration contributes to pain remains an open question. Joint degeneration can be identified by pairing T1 and T2 magnetic resonance imaging (MRI) with analysis techniques such as Pfirrmann grades (IVD degeneration) and Fujiwara scores (facet degeneration). However, these grades are subjective, prompting the need to develop an automated technique to enhance inter‐rater reliability. This study introduces an automated convolutional neural network (CNN) technique trained on clinical MRI images of IVD and facet joints obtained from public‐access Lumbar Spine MRI Dataset. The primary goal of the automated system is to classify health of lumbar discs and facet joints according to Pfirrmann and Fujiwara grading systems and to enhance inter‐rater reliability associated with these grading systems. Methods: Performance of the CNN on both the Pfirrmann and Fujiwara scales was measured by comparing the percent agreement, Pearson's correlation and Fleiss kappa value for results from the classifier to the grades assigned by an expert grader. Results: The CNN demonstrates comparable performance to human graders for both Pfirrmann and Fujiwara grading systems, but with larger errors in Fujiwara grading. The CNN improves the reliability of the Pfirrmann system, aligning with previous findings for IVD assessment. Conclusion: The study highlights the potential of using deep learning in classifying the IVD and facet joint health, and due to the high variability in the Fujiwara scoring system, highlights the need for improved imaging and scoring techniques to evaluate facet joint health. All codes required to use the automatic grading routines described herein are available in the Data Repository for University of Minnesota (DRUM). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Exploring an effective automated grading model with reliability detection for large‐scale online peer assessment.
- Author
-
Lin, Zirou, Yan, Hanbing, and Zhao, Li
- Subjects
- *
HIGH schools , *RESEARCH funding , *AFFINITY groups , *EDUCATIONAL outcomes , *HIGH school students , *EDUCATIONAL tests & measurements , *DESCRIPTIVE statistics , *TEACHERS , *MIDDLE school students , *DEEP learning , *ONLINE education , *ARTIFICIAL neural networks , *WEB development , *AUTOMATION , *COMPUTER assisted instruction , *SHORT-term memory , *MIDDLE schools , *COMPUTER assisted testing (Education) - Abstract
Background: Peer assessment has played an important role in large‐scale online learning, as it helps promote the effectiveness of learners' online learning. However, with the emergence of numerical grades and textual feedback generated by peers, it is necessary to detect the reliability of the large amount of peer assessment data, and then develop an effective automated grading model to analyse the data and predict learners' learning results. Objectives: The present study aimed to propose an automated grading model with reliability detection. Methods: A total of 109,327 instances of peer assessment from a large‐scale teacher online learning program were tested in the experiments. The reliability detection approach included three steps: recurrent convolutional neural networks (RCNN) was used to detect grade consistency, bidirectional encoder representations from transformers (BERT) was used to detect text originality, and long short‐term memory (LSTM) was used to detect grade‐text consistency. Furthermore, the automated grading was designed with the BERT‐RCNN model. Results and Conclusions: The effectiveness of the automated grading model with reliability detection was shown. For reliability detection, RCNN performed best in detecting grade consistency with an accuracy rate of 0.889, BERT performed best in detecting text originality with an improvement of 4.47% compared to the benchmark model, and LSTM performed best with an accuracy rate of 0.883. Moreover, the automated grading model with reliability detection achieved good performance, with an accuracy rate of 0.89. Compared to the absence of reliability detection, it increased by 12.1%. Implications: The results strongly suggest that the automated grading model with reliability detection for large‐scale peer assessment is effective, with the following implications: (1) The introduction of reliability detection is necessary to help filter out low reliability data in peer assessment, thus promoting effective automated grading results. (2) This solution could assist assessors in adjusting the exclusion threshold of peer assessment reliability, providing a controllable automated grading tool to reducing manual workload with high quality. (3) This solution could shift educational institutions from labour‐intensive grading procedures to a more efficient educational assessment pattern, allowing for more investment in supporting instructors and learners to improve the quality of peer feedback. Lay Description: What is already known about this topic: Peer assessment has played an important role in large‐scale online learning, as it helps promote the effectiveness of learners' online learning.Issues such as disagreement between peer assessors, rough assessment, and plagiarism in large‐scale online learning can decrease peer assessment reliabilityIncorporating extensive data into a training model may result in grading uncertainties. What this paper adds: Detecting the peer assessment reliability before grading is essential in the context of large‐scale online learning.This study aimed to propose and validate an automated grading model with reliability detection for the large‐scale online peer assessment, which will help improve the effectiveness of automated grading, combining the advantages of computer technology and human expertise. Implications for practice and/or policy: The introduction of reliability detection is necessary to help filter out low reliability data in peer assessment, thus promoting effective automated grading results.This solution could assist assessors in adjusting the exclusion threshold of peer assessment reliability, providing a controllable automated grading tool to reducing manual workload with high quality. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. THE IMPACT OF AI ON TEACHERS: SUPPORT OR REPLACEMENT?
- Author
-
Nikitina, Iryna and Ishchenko, Tetyana
- Subjects
INTELLIGENT tutoring systems ,CAREER changes ,ARTIFICIAL intelligence ,EMOTIONAL intelligence ,LESSON planning - Abstract
The article explores the growing role of AI in education, analyzing whether it serves as a supportive tool for teachers or poses a threat to their jobs. AI can automate routine tasks such as grading and administrative work, freeing up teachers' time for more meaningful activities like personalized student interaction and lesson planning. Additionally, AI-powered tools enhance data-driven teaching, providing teachers with valuable insights to improve student performance and offer individualized support. While AI offers significant advantages, the article emphasizes that it cannot replace the human qualities essential to teaching, such as emotional intelligence, empathy, and creativity. These are areas where AI falls short, making it unlikely to fully replace teachers. Instead, AI allows teachers to evolve their roles from traditional instruction to mentorship and guidance, focusing on fostering critical thinking and creativity in students. However, there are concerns that AI could reduce teaching jobs or fundamentally change the role of educators. Teachers may also resist AI due to fears of job displacement or lack of sufficient training to use these technologies effectively. The article concludes that AI, when used responsibly, acts as a valuable partner rather than a replacement. By taking over repetitive tasks, AI enables teachers to concentrate on more impactful educational activities, ensuring that the human element remains at the heart of learning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Automated grading of anatomical objective structured practical examinations using decision trees: An artificial intelligence approach.
- Author
-
Bernard, Jason, Sonnadara, Ranil, Saraco, Anthony N., Mitchell, Josh P., Bak, Alex B., Bayer, Ilana, and Wainman, Bruce C.
- Abstract
An Objective Structured Practical Examination (OSPE) is an effective and robust, but resource‐intensive, means of evaluating anatomical knowledge. Since most OSPEs employ short answer or fill‐in‐the‐blank style questions, the format requires many people familiar with the content to mark the examinations. However, the increasing prevalence of online delivery for anatomy and physiology courses could result in students losing the OSPE practice that they would receive in face‐to‐face learning sessions. The purpose of this study was to test the accuracy of Decision Trees (DTs) in marking OSPE questions as a first step to creating an intelligent, online OSPE tutoring system. The study used the results of the winter 2020 semester final OSPE from McMaster University's anatomy and physiology course in the Faculty of Health Sciences (HTHSCI 2FF3/2LL3/1D06) as the data set. Ninety percent of the data set was used in a 10‐fold validation algorithm to train a DT for each of the 54 questions. Each DT was comprised of unique words that appeared in correct, student‐written answers. The remaining 10% of the data set was marked by the generated DTs. When the answers marked by the DT were compared to the answers marked by staff and faculty, the DT achieved an average accuracy of 94.49% across all 54 questions. This suggests that machine learning algorithms such as DTs are a highly effective option for OSPE grading and are suitable for the development of an intelligent, online OSPE tutoring system. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Automated magnetic resonance imaging‐based grading of the lumbar intervertebral disc and facet joints
- Author
-
Maryam Nikpasand, Jill M. Middendorf, Vincent A. Ella, Kristen E. Jones, Bryan Ladd, Takashi Takahashi, Victor H. Barocas, and Arin M. Ellingson
- Subjects
automated grading ,deep learning ,facet joint ,Fujiwara ,intervertebral disc ,machine learning ,Orthopedic surgery ,RD701-811 - Abstract
Abstract Background Degeneration of both intervertebral discs (IVDs) and facet joints in the lumbar spine has been associated with low back pain, but whether and how IVD/joint degeneration contributes to pain remains an open question. Joint degeneration can be identified by pairing T1 and T2 magnetic resonance imaging (MRI) with analysis techniques such as Pfirrmann grades (IVD degeneration) and Fujiwara scores (facet degeneration). However, these grades are subjective, prompting the need to develop an automated technique to enhance inter‐rater reliability. This study introduces an automated convolutional neural network (CNN) technique trained on clinical MRI images of IVD and facet joints obtained from public‐access Lumbar Spine MRI Dataset. The primary goal of the automated system is to classify health of lumbar discs and facet joints according to Pfirrmann and Fujiwara grading systems and to enhance inter‐rater reliability associated with these grading systems. Methods Performance of the CNN on both the Pfirrmann and Fujiwara scales was measured by comparing the percent agreement, Pearson's correlation and Fleiss kappa value for results from the classifier to the grades assigned by an expert grader. Results The CNN demonstrates comparable performance to human graders for both Pfirrmann and Fujiwara grading systems, but with larger errors in Fujiwara grading. The CNN improves the reliability of the Pfirrmann system, aligning with previous findings for IVD assessment. Conclusion The study highlights the potential of using deep learning in classifying the IVD and facet joint health, and due to the high variability in the Fujiwara scoring system, highlights the need for improved imaging and scoring techniques to evaluate facet joint health. All codes required to use the automatic grading routines described herein are available in the Data Repository for University of Minnesota (DRUM).
- Published
- 2024
- Full Text
- View/download PDF
7. A pilot cost-analysis study comparing AI-based EyeArt® and ophthalmologist assessment of diabetic retinopathy in minority women in Oslo, Norway
- Author
-
Mia Karabeg, Goran Petrovski, Silvia NW Hertzberg, Maja Gran Erke, Dag Sigurd Fosmark, Greg Russell, Morten C. Moe, Vallo Volke, Vidas Raudonis, Rasa Verkauskiene, Jelizaveta Sokolovska, Inga-Britt Kjellevold Haugen, and Beata Eva Petrovski
- Subjects
Screening ,Diabetic retinopathy ,Minority women ,Norway ,Manual grading ,Automated grading ,Ophthalmology ,RE1-994 - Abstract
Abstract Background Diabetic retinopathy (DR) is the leading cause of adult blindness in the working age population worldwide, which can be prevented by early detection. Regular eye examinations are recommended and crucial for detecting sight-threatening DR. Use of artificial intelligence (AI) to lessen the burden on the healthcare system is needed. Purpose To perform a pilot cost-analysis study for detecting DR in a cohort of minority women with DM in Oslo, Norway, that have the highest prevalence of diabetes mellitus (DM) in the country, using both manual (ophthalmologist) and autonomous (AI) grading. This is the first study in Norway, as far as we know, that uses AI in DR- grading of retinal images. Methods On Minority Women’s Day, November 1, 2017, in Oslo, Norway, 33 patients (66 eyes) over 18 years of age diagnosed with DM (T1D and T2D) were screened. The Eidon - True Color Confocal Scanner (CenterVue, United States) was used for retinal imaging and graded for DR after screening had been completed, by an ophthalmologist and automatically, using EyeArt Automated DR Detection System, version 2.1.0 (EyeArt, EyeNuk, CA, USA). The gradings were based on the International Clinical Diabetic Retinopathy (ICDR) severity scale [1] detecting the presence or absence of referable DR. Cost-minimization analyses were performed for both grading methods. Results 33 women (64 eyes) were eligible for the analysis. A very good inter-rater agreement was found: 0.98 (P
- Published
- 2024
- Full Text
- View/download PDF
8. CodeBuddy: A Programming Assignment Management System for Short-Form Exercises
- Author
-
Stephen R. Piccolo, Emme Tuft, P. J. Tatlow, Zach Eliason, and Ashlie Stephenson
- Subjects
programming education ,automated grading ,pair programming ,intelligent tutor ,web application ,automated assessment ,Computer software ,QA76.75-76.765 - Abstract
CodeBuddy is a software system for delivering computer-programming assignments to students. It is primarily used for short-form exercises, such as those delivered in introductory-programming courses and informal-learning settings. It provides a Web-based interface, the ability to execute code in a secure environment, support for custom testing logic, near-immediate feedback to students, and support for many programming languages. Other features include support for graphics-based programming exercises, pair programming, the ability for students to review the instructor’s solution after solving an exercise, and an intelligent tutor. Upon creating an account, each student is randomly assigned to an “A” or “B” cohort, thus enabling researchers to perform pedagogical research via online controlled experiments. These and other features offer opportunities for instructors to customize the learning experience, in diverse ways, for students learning to program.
- Published
- 2025
- Full Text
- View/download PDF
9. A pilot cost-analysis study comparing AI-based EyeArt® and ophthalmologist assessment of diabetic retinopathy in minority women in Oslo, Norway.
- Author
-
Karabeg, Mia, Petrovski, Goran, Hertzberg, Silvia NW, Erke, Maja Gran, Fosmark, Dag Sigurd, Russell, Greg, Moe, Morten C., Volke, Vallo, Raudonis, Vidas, Verkauskiene, Rasa, Sokolovska, Jelizaveta, Haugen, Inga-Britt Kjellevold, and Petrovski, Beata Eva
- Subjects
DIABETIC retinopathy ,MINORITY women ,ARTIFICIAL intelligence ,OPHTHALMOLOGISTS ,EYE examination - Abstract
Background: Diabetic retinopathy (DR) is the leading cause of adult blindness in the working age population worldwide, which can be prevented by early detection. Regular eye examinations are recommended and crucial for detecting sight-threatening DR. Use of artificial intelligence (AI) to lessen the burden on the healthcare system is needed. Purpose: To perform a pilot cost-analysis study for detecting DR in a cohort of minority women with DM in Oslo, Norway, that have the highest prevalence of diabetes mellitus (DM) in the country, using both manual (ophthalmologist) and autonomous (AI) grading. This is the first study in Norway, as far as we know, that uses AI in DR- grading of retinal images. Methods: On Minority Women's Day, November 1, 2017, in Oslo, Norway, 33 patients (66 eyes) over 18 years of age diagnosed with DM (T1D and T2D) were screened. The Eidon - True Color Confocal Scanner (CenterVue, United States) was used for retinal imaging and graded for DR after screening had been completed, by an ophthalmologist and automatically, using EyeArt Automated DR Detection System, version 2.1.0 (EyeArt, EyeNuk, CA, USA). The gradings were based on the International Clinical Diabetic Retinopathy (ICDR) severity scale [1] detecting the presence or absence of referable DR. Cost-minimization analyses were performed for both grading methods. Results: 33 women (64 eyes) were eligible for the analysis. A very good inter-rater agreement was found: 0.98 (P < 0.01), between the human and AI-based EyeArt grading system for detecting DR. The prevalence of DR was 18.6% (95% CI: 11.4–25.8%), and the sensitivity and specificity were 100% (95% CI: 100–100% and 95% CI: 100–100%), respectively. The cost difference for AI screening compared to human screening was $143 lower per patient (cost-saving) in favour of AI. Conclusion: Our results indicate that The EyeArt AI system is both a reliable, cost-saving, and useful tool for DR grading in clinical practice. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Smart grading: A generative AI-based tool for knowledge-grounded answer evaluation in educational assessments
- Author
-
Samuel Tobler
- Subjects
Artificial intelligence ,Test evaluation ,Educational assessment ,Automated grading ,GPT ,Large language model ,Science - Abstract
Evaluating text-based answers obtained in educational settings or behavioral studies is time-consuming and resource-intensive. Applying novel artificial intelligence tools such as ChatGPT might support the process. Still, currently available implementations do not allow for automated and case-specific evaluations of large numbers of student answers. To counter this limitation, we developed a flexible software and user-friendly web application that enables researchers and educators to use cutting-edge artificial intelligence technologies by providing an interface that combines large language models with options to specify questions of interest, sample solutions, and evaluation instructions for automated answer scoring. We validated the method in an empirical study and found the software with expert ratings to have high reliability. Hence, the present software constitutes a valuable tool to facilitate and enhance text-based answer evaluation. • Generative AI-enhanced software for customizable, case-specific, and automized grading of large amounts of text-based answers. • Open-source software and web application for direct implementation and adaptation.
- Published
- 2024
- Full Text
- View/download PDF
11. Examining the Efficacy of ChatGPT in Marking Short-Answer Assessments in an Undergraduate Medical Program
- Author
-
Leo Morjaria, Levi Burns, Keyna Bracken, Anthony J. Levinson, Quang N. Ngo, Mark Lee, and Matthew Sibbald
- Subjects
ChatGPT ,artificial intelligence ,short-answer assessment ,automated grading ,generative AI ,undergraduate medical education ,Special aspects of education ,LC8-6691 ,Medicine - Abstract
Traditional approaches to marking short-answer questions face limitations in timeliness, scalability, inter-rater reliability, and faculty time costs. Harnessing generative artificial intelligence (AI) to address some of these shortcomings is attractive. This study aims to validate the use of ChatGPT for evaluating short-answer assessments in an undergraduate medical program. Ten questions from the pre-clerkship medical curriculum were randomly chosen, and for each, six previously marked student answers were collected. These sixty answers were evaluated by ChatGPT in July 2023 under four conditions: with both a rubric and standard, with only a standard, with only a rubric, and with neither. ChatGPT displayed good Spearman correlations with a single human assessor (r = 0.6–0.7, p < 0.001) across all conditions, with the absence of a standard or rubric yielding the best correlation. Scoring differences were common (65–80%), but score adjustments of more than one point were less frequent (20–38%). Notably, the absence of a rubric resulted in systematically higher scores (p < 0.001, partial η2 = 0.33). Our findings demonstrate that ChatGPT is a viable, though imperfect, assistant to human assessment, performing comparably to a single expert assessor. This study serves as a foundation for future research on AI-based assessment techniques with potential for further optimization and increased reliability.
- Published
- 2024
- Full Text
- View/download PDF
12. Research on performance evaluation of higher vocational education informatization based on data envelopment analysis
- Author
-
Sergii Khrapatyi, Kseniia Tokarieva, Olena Hlushchenko, Oleksandra Paramonova, and Ielyzaveta Lvova
- Subjects
personalized learning ,educational chatbots ,ai-driven assessments ,automated grading ,educational technology ,Theory and practice of education ,LB5-3640 ,Science - Abstract
This article highlights the multifaceted role of AI in modern education and offers insights into innovative ways to revolutionize educational practices through AI technologies. Since this article provides comprehension of the scope and depth of AI's impact on the education sphere, it appeals to a diverse readership, encompassing educators, policymakers, researchers, and the general public. This article explores key issues within the domain of AI in education, including personalized learning, AI-driven assessments, data analytics, and the integration of AI into learning management systems. The article highlights promises, potentials, and challenges accompanying this technological advancement. The authors emphasize the need for a balanced and informed approach to using AI to enhance the education system.
- Published
- 2024
- Full Text
- View/download PDF
13. Research on performance evaluation of higher vocational education informatization based on data envelopment analysis.
- Author
-
Khrapatyi, Sergii, Tokarieva, Kseniia, Hlushchenko, Olena, Paramonova, Oleksandra, and Lvova, Ielyzaveta
- Subjects
VOCATIONAL education ,HIGHER education ,DATA envelopment analysis ,ARTIFICIAL intelligence in education ,PERFORMANCE evaluation ,LEARNING management system - Abstract
This article highlights the multifaceted role of AI in modern education and offers insights into innovative ways to revolutionize educational practices through AI technologies. Since this article provides comprehension of the scope and depth of AI's impact on the education sphere, it appeals to a diverse readership, encompassing educators, policymakers, researchers, and the general public. This article explores key issues within the domain of AI in education, including personalized learning, AI-driven assessments, data analytics, and the integration of AI into learning management systems. The article highlights promises, potentials, and challenges accompanying this technological advancement. The authors emphasize the need for a balanced and informed approach to using AI to enhance the education system. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Automated grading software tool with feedback process to support learning of hardware description languages.
- Author
-
Corso Pinzón, Andrés Francisco, Ramírez-Echeverry, Jhon J., and Restrepo-Calle, Felipe
- Subjects
LEARNING ,SOFTWARE development tools ,DIGITAL electronics ,GRAPHICAL user interfaces ,EDUCATIONAL intervention - Abstract
Hardware Description Languages (HDL) have gained popularity in the field of digital electronics design, driven by the increasing complexity of modern electronic circuits. Consequently, supporting students in their learning of these languages is crucial. This work aims to address this need by developing an automated assessment software tool with feedback process to support the learning of HDL and making an educational intervention to support the learning process of students. The tool's features were selected based on similar developments, and a prototype was designed and implemented. Additionally, an educational intervention was conducted over a five-week period in a Digital Electronics course at the National University of Colombia. Through analyzing students' interactions with the tool and their perceptions of its usage, the study examined their learning experiences. Among the features highlighted by students as most beneficial for their HDL learning process were the online availability of the tool, the feedback system that helped them identify and correct errors in their code, the provision of immediate feedback, the online editor with syntax highlighting, and the graphical user interface. This work makes two significant contributions to the field of HDL teaching in engineering. Firstly, a publicly accessible HDL grading tool has been developed, offering students immediate formative and summative feedback through an automated grader. Secondly, empirical evidence has been provided regarding the benefits of using such a tool in enhancing students' learning process. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. GPT-4 in Education: Evaluating Aptness, Reliability, and Loss of Coherence in Solving Calculus Problems and Grading Submissions
- Author
-
Gandolfi, Alberto
- Published
- 2024
- Full Text
- View/download PDF
16. Improving the network architecture of YOLOv7 to achieve real-time grading of canola based on kernel health
- Author
-
Angshuman Thakuria and Chyngyz Erkinbaev
- Subjects
YOLOv7 ,Convolutional Block Attention Mechanism ,Multi-Object Tracking ,Canola ,Quality Control ,Automated Grading ,Agriculture (General) ,S1-972 ,Agricultural industries ,HD9000-9495 - Abstract
The occurrence of heated and immature canola kernels caused by excessive drying and frost damage is undesired by grain buyers due to lower oil yield and diminished market value. The current grading process is visually examining each kernel's endosperm color and counting the damaged seeds. As this process is time-consuming, laborious, and prone to errors, this study proposes an automated grading technique based on object detection, multi-object tracking, and counting. The detection task was achieved via an improved YOLOv7 network (YOLOv7_ours) that was modified to increase its performance in accurately identifying small objects by adding two convolutional block attention modules in the neck region and decrease its computational complexity (cost) and size by substituting convolutional layers with ghost layers in all the Efficient Layer Aggregation Networks modules, and in the Spatial Pyramid Pooling Cross Stage Partial module present in YOLOv7. The weights of the trained network were fed to the ByteTrack multiple object tracker to track the detections frame-by-frame in a video feed. The unique identities generated by the tracker for each detected object of interest were then used to count the number of defects using a line cross algorithm. The mean average precision (mAP@0.5) obtained after training the YOLOv7_ours model was 1.02% better and its cost and size were 32.1% and 37.1% lower than the baseline YOLOv7 model. In a test video, the overall model achieved a multi-object tracking accuracy and counting accuracy of 84.8% and 93.9%, respectively. This three-stage model can be readily deployed in an edge device for accurate and real-time grading of canola kernels by grain buyers.
- Published
- 2023
- Full Text
- View/download PDF
17. CAN AI-ASSISTED ESSAY ASSESSMENT SUPPORT TEACHERS? A CROSS-SECTIONAL MIXED-METHODS RESEARCH CONDUCTED AT THE UNIVERSITY OF MONTENEGRO.
- Author
-
IVANOVIČ, Igor
- Subjects
- *
ARTIFICIAL intelligence , *LANGUAGE models , *CROSS-sectional method , *NATURAL language processing , *CHATGPT - Abstract
In this study, we will try to answer the question if an AI language model can provide teachers with essay assessment solutions that are on a par with the solutions provided by experienced professors. We designed a study with the aim of comparing the essay assessment outputs of the AI language model and three of our colleagues working at the University of Montenegro. The main aim of this paper is to investigate if this AI language model can be a viable teachers' assistance tool that provides immediate and meaningful feedback to teachers and students. Our hypothesis is, with some caveats, that the AI language model is more than a viable and useful tool, capable of providing meaningful and immediate feedback, greatly reducing the assessment time, and thus helping the teachers become more efficient and consistent. We will compare the results of 78 essays assessed by three teachers with the results provided by ChatGPT and see where the two sets of results converge or diverge in terms of their individual and overall scores. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. Automated grading system for quantifying KOH microscopic images in dermatophytosis.
- Author
-
KV, Rajitha, Govindan, Sreejith, PY, Prakash, Kamath, Asha, Rao, Raghavendra, and Prasad, Keerthana
- Subjects
- *
RECEIVER operating characteristic curves , *IMAGE segmentation , *SKIN imaging , *RINGWORM , *DEEP learning - Abstract
• Quantifying fungal load supports efficient clinical management of tinea infections. • Automated dermatophyte quantification is, hitherto unexplored. • Rising tinea infections and drug resistance support automated quantification. • U-Net applied segmentation and pixel-based grading were the main steps followed. [Display omitted] Concerning the progression of dermatophytosis and its prognosis, quantification studies play a significant role. Present work aims to develop an automated grading system for quantifying fungal loads in KOH microscopic images of skin scrapings collected from dermatophytosis patients. Fungal filaments in the images were segmented using a U-Net model to obtain the pixel counts. In the absence of any threshold value for pixel counts to grade these images as low, moderate, or high, experts were assigned the task of manual grading. Grades and corresponding pixel counts were subjected to statistical procedures involving cumulative receiver operating characteristic curve analysis for developing an automated grading system. The model's specificity, accuracy, precision, and sensitivity metrics crossed 92%, 86%, 82%, and 76%, respectively. 'Almost perfect agreement' with Fleiss kappa of 0.847 was obtained between automated and manual gradings. This pixel count-based grading of KOH images offers a novel, cost-effective solution for quantifying fungal load. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
19. Mass and volume estimation of diverse kimchi cabbage forms using RGB-D vision and machine learning.
- Author
-
Yang, Hae-Il, Min, Sung-Gi, Yang, Ji-Hee, Eun, Jong-Bang, and Chung, Young-Bae
- Subjects
- *
MACHINE learning , *COMPUTER vision , *WESTERN diet , *FEATURE extraction , *CABBAGE - Abstract
This study introduces a custom-built RGB-D-based machine vision system designed to accurately estimate the mass and volume of whole kimchi cabbage (WC) and longitudinally cut kimchi cabbage (LCC). Given the pivotal role of kimchi cabbage (KC) in both Asian and Western diets, accurate post-harvest assessment of its mass and volume is critical for quality control, sorting, and pricing. Conventional manual measurements and visual estimations are laborious and inaccurate. Our research leveraged RGB-D data to refine machine learning models and enhance the extraction and analysis of 2D, 3D, and colorimetric features for a more reliable estimation approach. The results demonstrate that integrating 3D and colorimetric features markedly improves the estimation accuracy, with notable success in mass estimation for LCC (R² = 0.913, ratio of performance to deviation (RPD) = 3.38) and robust volume predictions for both cabbage types (R² > 0.90, RPD > 3). However, challenges such as potential over-exclusion of outer leaves in LCC and the need for more advanced WC mass estimation techniques have been identified. Future work will focus on refining the feature extraction methods and assessing various imaging environments to enhance the precision of mass and volume predictions across different forms of KC. • Novel RGB-D-based system accurately estimates mass and volume of kimchi cabbage. • Integration of 3D and colorimetric features enhances estimation accuracy. • Significant success in mass estimation for longitudinally cut cabbage (R² = 0.913). • Robust volume predictions achieved for both whole and cut cabbage (R² > 0.90). • Study addresses critical need for reliable post-harvest assessment methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Automatic Short Answer Grading on High School's E-Learning Using Semantic Similarity Methods.
- Author
-
Wilianto, Daniel and Girsang, Abba Suganda
- Subjects
- *
STANDARD deviations , *ELEMENTARY schools , *HIGH schools , *DIGITAL learning - Abstract
Grading students' answers has always been a daunting task which takes a lot of teachers' time. The aim of this study is to grade students' answers automatically in a high school's e-learning system. The grading process must be fast, and the result must be as close as possible to the teacher assigned grades. We collected a total of 840 answers from 40 students for this study, each already graded by their teachers. We used Python library sentencetransformers and three of its latest pre-trained machine learning models (all-mpnet-base-v2, alldistilroberta-v1, all-MiniLM-L6-v2) for sentence embeddings. Computer grades were calculated using Cosine Similarity. These grades were then compared with teacher assigned grades using both Mean Absolute Error and Root Mean Square Error. Our results showed that all-MiniLM-L6-v2 gave the most similar grades to teacher assigned grades and had the fastest processing time. Further study may include testing these models on more answers from more students, also fine tune these models using more school materials. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Towards a Reliable and Rapid Automated Grading System in Facial Palsy Patients: Facial Palsy Surgery Meets Computer Science.
- Author
-
Knoedler, Leonard, Baecher, Helena, Kauke-Navarro, Martin, Prantl, Lukas, Machens, Hans-Günther, Scheuermann, Philipp, Palm, Christoph, Baumann, Raphael, Kehrer, Andreas, Panayi, Adriana C., and Knoedler, Samuel
- Subjects
- *
FACIAL paralysis , *COMPUTER science , *BELL'S palsy , *MODULAR forms , *PLASTIC surgery , *TUMOR grading - Abstract
Background: Reliable, time- and cost-effective, and clinician-friendly diagnostic tools are cornerstones in facial palsy (FP) patient management. Different automated FP grading systems have been developed but revealed persisting downsides such as insufficient accuracy and cost-intensive hardware. We aimed to overcome these barriers and programmed an automated grading system for FP patients utilizing the House and Brackmann scale (HBS). Methods: Image datasets of 86 patients seen at the Department of Plastic, Hand, and Reconstructive Surgery at the University Hospital Regensburg, Germany, between June 2017 and May 2021, were used to train the neural network and evaluate its accuracy. Nine facial poses per patient were analyzed by the algorithm. Results: The algorithm showed an accuracy of 100%. Oversampling did not result in altered outcomes, while the direct form displayed superior accuracy levels when compared to the modular classification form (n = 86; 100% vs. 99%). The Early Fusion technique was linked to improved accuracy outcomes in comparison to the Late Fusion and sequential method (n = 86; 100% vs. 96% vs. 97%). Conclusions: Our automated FP grading system combines high-level accuracy with cost- and time-effectiveness. Our algorithm may accelerate the grading process in FP patients and facilitate the FP surgeon's workflow. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. Building a Corpus of Task-Based Grading and Feedback Systems for Learning and Teaching Programming.
- Author
-
Strickroth, Sven and Striewe, Michael
- Subjects
INSTRUCTIONAL systems ,SCIENTIFIC community ,CORPORA ,COMMUNITIES ,EDUCATIONAL support ,VIRTUAL communities - Abstract
Using grading and feedback systems in the context of learning and teaching programming is quite common. During the last 20 to 40 years research results on several hundred systems and approaches have been published. Existing papers may tell researchers what works well in terms of educational support and how to make a grading and feedback system stable, extensible, secure, or sustainable. However, finding a solid basis for such kind of research is hard due to the vast amount of publications from a very diverse community. Hardly any recent systematic review includes data from more than 100 systems (most include less than 30). Hence, the authors started an endeavor to build a corpus of all task-based grading and feedback systems for learning and teaching programming that deal with source code and have been published in recent years. The intention is to provide the community with a solid basis for their research. The corpus is also designed to be updated and extended by the community with future systems. This paper describes the process of building the corpus and presents some meta-analysis that shed light on the involved research communities. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. Automated Short-Answer Grading using Semantic Similarity based on Word Embedding
- Author
-
Fetty Fitriyanti Lubis, Mutaqin, Atina Putri, Dana Waskita, Tri Sulistyaningtyas, Arry Akhmad Arman, and Yusep Rosmansyah
- Subjects
automated grading ,short answer ,semantic similarity ,syntax analysis ,word embeddings ,Technology ,Technology (General) ,T1-995 - Abstract
Automatic short-answer grading (ASAG) is a system that aims to help speed up the assessment process without an instructor’s intervention. Previous research had successfully built an ASAG system whose performance had a correlation of 0.66 and mean absolute error (MAE) starting from 0.94 with a conventionally graded set. However, this study had a weakness in the need for more than one reference answer for each question. It used a string-based equation method and keyword matching process to measure the sentences’ similarity in order to produce an assessment rubric. Thus, our study aimed to build a more concise short-answer automatic scoring system using a single reference answer. The mechanism used a semantic similarity measurement approach through word embedding techniques and syntactic analysis to assess the learner’s accuracy. Based on the experiment results, the semantic similarity approach showed a correlation value of 0.70 and an MAE of 0.70 when compared with the grading reference.
- Published
- 2021
- Full Text
- View/download PDF
24. Measuring the effectiveness of online problem solving for improving academic performance in a probability course.
- Author
-
González, José Antonio, Giuliano, Mónica, and Pérez, Silvia N.
- Subjects
PROBLEM solving ,ACADEMIC achievement ,PROBABILITY theory ,STATISTICS education ,INSTRUCTIONAL systems ,EDUCATIONAL outcomes - Abstract
Research on impact in student achievement of online homework systems compared to traditional methods is ambivalent. Methodological issues in the study design, besides of technological diversity, can account for this uncertainty. Hypothesis This study aims to estimate the effect size of homework practice with exercises automatically provided by the 'e-status' platform, in students from five Engineering programs. Instead of comparing students using the platform with others not using it, we distributed the subject topics into two blocks, and created nine probability problems for each block. After that, the students were randomly assigned to one block and could solve the related exercises through e-status. Teachers and evaluators were masked to the assignation. Five weeks after the assignment, all students answered a written test with questions regarding all topics. The study outcome was the difference between both blocks' scores obtained from the test. The two groups comprised 163 and 166 students. Of these, 103 and 107 respectively attended the test, while the remainder were imputed with 0. Those assigned to the first block obtained an average outcome of −1.85, while the average in the second block was −3.29 (95% confidence interval of difference, −2.46 to −0.43). During the period in which they had access to the platform before the test, the average total time spent solving problems was less than three hours. Our findings provide evidence that a small amount of active online work can positively impact on student performance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. Validity of a graph-based automatic assessment system for programming assignments: human versus automatic grading.
- Author
-
Zougari, Soundous, Tanana, Mariam, and Lyhyaoui, Abdelouahid
- Subjects
COMPUTERS in education ,TEACHERS' workload ,COMPUTER programming ,LEARNING goals ,COVID-19 ,TEACHERS - Abstract
Programming is a very complex and challenging subject to teach and learn. A strategy guaranteed to deliver proven results has been intensive and continual training. However, this strategy holds an extra workload for the teachers with huge numbers of programming assignments to evaluate in a fair and timely manner. Furthermore, under the current coronavirus (COVID-19) distance teaching circumstances, regular assessment is a fundamental feedback mechanism. It ensures that students engage in learning as well as determines the extent to which they reached the expected learning goals, in this new learning reality. In sum, automating the assessment process will be particularly appreciated by the instructors and highly beneficial to the students. The purpose of this paper is to investigate the feasibility of automatic assessment in the context of computer programming courses. Thus, a prototype based on merging static and dynamic analysis was developed. Empirical evaluation of the proposed grading tool within an introductory C-language course has been presented and compared to manually assigned marks. The outcomes of the comparative analysis have shown the reliability of the proposed automatic assessment prototype. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
26. A deep-learning-based grading system (ASAG) for reading comprehension assessment by using aphorisms as open-answer-questions
- Author
-
Mardini G., Ivan D., Quintero M., Christian G., Viloria N., César A., Percybrooks B., Winston S., Robles N., Heydy S., and Villalba R., Karen
- Published
- 2024
- Full Text
- View/download PDF
27. Colon Cancer Grading Using Infrared Spectroscopic Imaging-Based Deep Learning.
- Author
-
Tiwari, Saumya, Falahkheirkhah, Kianoush, Cheng, Georgina, and Bhargava, Rohit
- Subjects
- *
COLON cancer , *COLORECTAL cancer , *HEMATOXYLIN & eosin staining , *TUMOR grading , *FOURIER transforms , *DEEP learning - Abstract
Tumor grade assessment is critical to the treatment of cancers. A pathologist typically evaluates grade by examining morphologic organization in tissue using hematoxylin and eosin (H&E) stained tissue sections. Fourier transform infrared spectroscopic (FT-IR) imaging provides an alternate view of tissue in which spatially specific molecular information from unstained tissue can be utilized. Here, we examine the potential of IR imaging for grading colon cancer in biopsy samples. We used a 148-patient cohort to develop a deep learning classifier to estimate the tumor grade using IR absorption. We demonstrate that FT-IR imaging can be a viable tool to determine colorectal cancer grades, which we validated on an independent cohort of surgical resections. This work demonstrates that harnessing molecular information from FT-IR imaging and coupling it with morphometry is a potential path to develop clinically relevant grade prediction models. Graphical Abstract [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. Could automated machine-learned MRI grading aid epidemiological studies of lumbar spinal stenosis? Validation within the Wakayama spine study
- Author
-
Yuyu Ishimoto, Amir Jamaludin, Cyrus Cooper, Karen Walker-Bone, Hiroshi Yamada, Hiroshi Hashizume, Hiroyuki Oka, Sakae Tanaka, Noriko Yoshimura, Munehito Yoshida, Jill Urban, Timor Kadir, and Jeremy Fairbank
- Subjects
Lumbar spinal stenosis ,MRI scans ,Automated grading ,Repeatability ,Validation ,Diseases of the musculoskeletal system ,RC925-935 - Abstract
Abstract Background MRI scanning has revolutionized the clinical diagnosis of lumbar spinal stenosis (LSS). However, there is currently no consensus as to how best to classify MRI findings which has hampered the development of robust longitudinal epidemiological studies of the condition. We developed and tested an automated system for grading lumbar spine MRI scans for central LSS for use in epidemiological research. Methods Using MRI scans from the large population-based cohort study (the Wakayama Spine Study), all graded by a spinal surgeon, we trained an automated system to grade central LSS in four gradings of the bone and soft tissue margins: none, mild, moderate, severe. Subsequently, we tested the automated grading against the independent readings of our observer in a test set to investigate reliability and agreement. Results Complete axial views were available for 4855 lumbar intervertebral levels from 971 participants. The machine used 4365 axial views to learn (training set) and graded the remaining 490 axial views (testing set). The agreement rate for gradings was 65.7% (322/490) and the reliability (Lin’s correlation coefficient) was 0.73. In 2.2% of scans (11/490) there was a difference in classification of 2 and in only 0.2% (1/490) was there a difference of 3. When classified into 2 groups as ‘severe’ vs ‘no/mild/moderate’. The agreement rate was 94.1% (461/490) with a kappa of 0.75. Conclusions This study showed that an automated system can “learn” to grade central LSS with excellent performance against the reference standard. Thus SpineNet offers potential to grade LSS in large-scale epidemiological studies involving a high volume of MRI spine data with a high level of consistency and objectivity.
- Published
- 2020
- Full Text
- View/download PDF
29. Possibilities for automated programming style assessment in informatics olympiads
- Author
-
Jūratė Skūpienė
- Subjects
teachning programming ,algorithms ,programming style ,automated grading ,Mathematics ,QA1-939 - Abstract
Programming Style is an important part of program quality and it should be taken into account while assessing programs designedby competitors in informatics. In InternationalOlympiad in Informatics grading is automated and based on testing results only, while programming style is not taken into account. However there exists practice in universities in programming courses where programming style of submitted programs is evaluated automatically. The paper reviews existing experience and discusses possibilities for automated grading of programming style in informatics olympiads.
- Published
- 2021
- Full Text
- View/download PDF
30. Automated Short-Answer Grading using Semantic Similarity based on Word Embedding.
- Author
-
Lubis, Fetty Fitriyanti, Mutaqin, Putri, Atina, Waskita, Dana, Sulistyaningtyas, Tri, Arman, Arry Akhmad, and Rosmansyah, Yusep
- Subjects
SCORING rubrics ,VOCABULARY - Abstract
Automatic short-answer grading (ASAG) is a system that aims to help speed up the assessment process without an instructor's intervention. Previous research had successfully built an ASAG system whose performance had a correlation of 0.66 and mean absolute error (MAE) starting from 0.94 with a conventionally graded set. However, this study had a weakness in the need for more than one reference answer for each question. It used a string-based equation method and keyword matching process to measure the sentences' similarity in order to produce an assessment rubric. Thus, our study aimed to build a more concise short-answer automatic scoring system using a single reference answer. The mechanism used a semantic similarity measurement approach through word embedding techniques and syntactic analysis to assess the learner's accuracy. Based on the experiment results, the semantic similarity approach showed a correlation value of 0.70 and an MAE of 0.70 when compared with the grading reference. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
31. An AI-Based System for Formative and Summative Assessment in Data Science Courses.
- Author
-
Vittorini, Pierpaolo, Menini, Stefano, and Tonelli, Sara
- Subjects
SUMMATIVE tests ,DATA science ,FORMATIVE evaluation ,ARTIFICIAL intelligence ,MASSIVE open online courses - Abstract
Massive open online courses (MOOCs) provide hundreds of students with teaching materials, assessment tools, and collaborative instruments. The assessment activity, in particular, is demanding in terms of both time and effort; thus, the use of artificial intelligence can be useful to address and reduce the time and effort required. This paper reports on a system and related experiments finalised to improve both the performance and quality of formative and summative assessments in specific data science courses. The system is developed to automatically grade assignments composed of R commands commented with short sentences written in natural language. In our opinion, the use of the system can (i) shorten the correction times and reduce the possibility of errors and (ii) support the students while solving the exercises assigned during the course through automated feedback. To investigate these aims, an ad-hoc experiment was conducted in three courses containing the specific topic of statistical analysis of health data. Our evaluation demonstrated that automated grading has an acceptable correlation with human grading. Furthermore, the students who used the tool did not report usability issues, and those that used it for more than half of the exercises obtained (on average) higher grades in the exam. Finally, the use of the system reduced the correction time and assisted the professor in identifying correction errors. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
32. Could automated machine-learned MRI grading aid epidemiological studies of lumbar spinal stenosis? Validation within the Wakayama spine study.
- Author
-
Ishimoto, Yuyu, Jamaludin, Amir, Cooper, Cyrus, Walker-Bone, Karen, Yamada, Hiroshi, Hashizume, Hiroshi, Oka, Hiroyuki, Tanaka, Sakae, Yoshimura, Noriko, Yoshida, Munehito, Urban, Jill, Kadir, Timor, and Fairbank, Jeremy
- Subjects
SPINAL stenosis ,SPINE ,SPINAL canal diseases ,LUMBAR vertebrae ,BONES ,TEST systems ,RESEARCH evaluation ,WEIGHTS & measures ,MAGNETIC resonance imaging ,RESEARCH funding ,RESEARCH bias ,LONGITUDINAL method - Abstract
Background: MRI scanning has revolutionized the clinical diagnosis of lumbar spinal stenosis (LSS). However, there is currently no consensus as to how best to classify MRI findings which has hampered the development of robust longitudinal epidemiological studies of the condition. We developed and tested an automated system for grading lumbar spine MRI scans for central LSS for use in epidemiological research.Methods: Using MRI scans from the large population-based cohort study (the Wakayama Spine Study), all graded by a spinal surgeon, we trained an automated system to grade central LSS in four gradings of the bone and soft tissue margins: none, mild, moderate, severe. Subsequently, we tested the automated grading against the independent readings of our observer in a test set to investigate reliability and agreement.Results: Complete axial views were available for 4855 lumbar intervertebral levels from 971 participants. The machine used 4365 axial views to learn (training set) and graded the remaining 490 axial views (testing set). The agreement rate for gradings was 65.7% (322/490) and the reliability (Lin's correlation coefficient) was 0.73. In 2.2% of scans (11/490) there was a difference in classification of 2 and in only 0.2% (1/490) was there a difference of 3. When classified into 2 groups as 'severe' vs 'no/mild/moderate'. The agreement rate was 94.1% (461/490) with a kappa of 0.75.Conclusions: This study showed that an automated system can "learn" to grade central LSS with excellent performance against the reference standard. Thus SpineNet offers potential to grade LSS in large-scale epidemiological studies involving a high volume of MRI spine data with a high level of consistency and objectivity. [ABSTRACT FROM AUTHOR]- Published
- 2020
- Full Text
- View/download PDF
33. Automated grading of acne vulgaris by deep learning with convolutional neural networks.
- Author
-
Lim, Ziying Vanessa, Akram, Farhan, Ngo, Cuong Phuc, Winarto, Amadeus Aristo, Lee, Wei Qing, Liang, Kaicheng, Oon, Hazel Hweeboon, Thng, Steven Tien Guan, and Lee, Hwee Kuan
- Subjects
- *
ARTIFICIAL neural networks , *ACNE , *DEEP learning , *LABELS , *IMAGE analysis - Abstract
Background: The visual assessment and severity grading of acne vulgaris by physicians can be subjective, resulting in inter‐ and intra‐observer variability. Objective: To develop and validate an algorithm for the automated calculation of the Investigator's Global Assessment (IGA) scale, to standardize acne severity and outcome measurements. Materials and Methods: A total of 472 photographs (retrieved 01/01/2004‐04/08/2017) in the frontal view from 416 acne patients were used for training and testing. Photographs were labeled according to the IGA scale in three groups of IGA clear/almost clear (0‐1), IGA mild (2), and IGA moderate to severe (3‐4). The classification model used a convolutional neural network, and models were separately trained on three image sizes. The photographs were then subjected to analysis by the algorithm, and the generated automated IGA scores were compared to clinical scoring. The prediction accuracy of each IGA grade label and the agreement (Pearson correlation) of the two scores were computed. Results: The best classification accuracy was 67%. Pearson correlation between machine‐predicted score and human labels (clinical scoring and researcher scoring) for each model and various image input sizes was 0.77. Correlation of predictions with clinical scores was highest when using Inception v4 on the largest image size of 1200 × 1600. Two sets of human labels showed a high correlation of 0.77, verifying the repeatability of the ground truth labels. Confusion matrices show that the models performed sub‐optimally on the IGA 2 label. Conclusion: Deep learning techniques harnessing high‐resolution images and large datasets will continue to improve, demonstrating growing potential for automated clinical image analysis and grading. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
34. Analyzing the decline of student scores over time in self‐scheduled asynchronous exams.
- Author
-
Chen, Binglin, West, Matthew, and Zilles, Craig
- Subjects
- *
EXAMINATIONS , *DATA analysis , *TALLIES , *STUDENTS , *SCHOOL food - Abstract
Background: When students are given a choice of when to take an exam in engineering and computing courses, it has been previously observed that average exam scores generally decline over the exam period. This trend may have implications both for the design of interventions to improve student learning and for data analysis to detect collaborative cheating. Purpose/Hypothesis: We hypothesize that average exam scores decline over the exam period primarily due to self‐selection effects, where weaker students tend to choose exam times later in the exam period, while stronger students are more likely to choose earlier times. Design/Method: We collected 31,673 exam records over four semesters from six undergraduate engineering and computing courses that had both synchronous exams (all students at the same time) and asynchronous exams (students choose a time). We analyzed student exam time choice and asynchronous exam scores, using synchronous exam scores in the same course as a control variable. Results: We find that students with lower scores on synchronous exams generally elect to take asynchronous exams later and that controlling for student ability (via synchronous exams) removes 70% of the decline observed in average asynchronous exam scores over the exam period but does not eliminate the downward trend with time. Conclusions: We conclude that self‐selection effects are primarily responsible for exam score declines over time, that exam time selection is unlikely to be a useful target for interventions to improve performance, and that there is no evidence for widespread collaborative cheating in the dataset used in this research. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
35. Automated Assessment of Complex Programming Tasks Using SIETTE.
- Author
-
Conejo, Ricardo, Barros, Beatriz, and Bertoa, Manuel F.
- Abstract
This paper presents an innovative method to tackle the automatic evaluation of programming assignments with an approach based on well-founded assessment theories (Classical Test Theory (CTT) and Item Response Theory (IRT)) instead of heuristic assessment as in other systems. CTT and/or IRT are used to grade the results of different items of evidence obtained from students’ results. The methodology consists of considering program proofs as items, calibrating them, and obtaining the score using CTT and/or IRT procedures. These procedures measure overall validity reliability as well as diagnose the quality of each proof (item). The evidence is obtained through program proofs. The SIETTE system collects and processes all data to calculate the student knowledge level. This innovative method for programming task evaluation makes it possible to deploy the whole artillery developed in this research field over the last few decades. To the best of our knowledge, this is a new and original contribution in the area of programming assessment. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
36. Developing and Assessing MATLAB Exercises for Active Concept Learning.
- Author
-
Song, S. H., Antonelli, Marco, Fung, Tony W. K., Armstrong, Brandon D., Chong, Amy, Lo, Albert, and Shi, Bertram E.
- Subjects
- *
CONCEPT learning , *CURRICULUM , *TEACHER-student relationships , *ENGINEERING education , *EXPERIENTIAL learning - Abstract
Contribution: A systematic approach to MATLAB problem design and automated assessment is described, based on the experience working with the MATLAB server provided by MathWorks and integrated with the edX massive online open class (MOOC) platform. Background: New technologies, such as MOOCs, provide innovative methods to tackle new challenges in teaching and learning. However, they also bring challenges in course delivery and assessment, due to factors such as less direct student–instructor interaction. These challenges are especially severe in engineering education, which relies heavily on experiential learning, such as laboratory exercises and computer simulations, to assist students in understanding concepts. As a result, effective design of experiential learning components is extremely critical for engineering MOOCs. Intended Outcomes: This paper shares the experience gained through developing and offering an MOOC on communication systems, with special focus on the development and the automated assessment of MATLAB exercises for active concept learning. Application Design: The proposed approach introduced students to concepts by using learning components commonly provided by many MOOC platforms (e.g., online lectures and quizzes), and augmented the student experience with MATLAB-based computer simulations and exercises to enable more concrete and detailed understanding of the material. Findings: The effectiveness of the instructional methods was supported by evaluation of students’ learning performance. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
37. Statistical semi-supervised system for grading multiple peer-reviewed open-ended works.
- Author
-
Rico-Juan, Juan Ramón, Gallego, Antonio-Javier, Valero-Mas, Jose J., and Calvo-Zaragoza, Jorge
- Subjects
- *
STATISTICS , *LEARNING , *ROBUST statistics , *ACADEMIC workload of students , *TEACHERS' workload - Abstract
In the education context, open-ended works generally entail a series of benefits as the possibility of develop original ideas and a more productive learning process to the student rather than closed-answer activities. Nevertheless, such works suppose a significant correction workload to the teacher in contrast to the latter ones that can be self-corrected. Furthermore, such workload turns to be intractable with large groups of students. In order to maintain the advantages of open-ended works with a reasonable amount of correction effort, this article proposes a novel methodology: students perform the corrections using a rubric (closed Likert scale) as a guideline in a peer-review fashion; then, their markings are automatically analyzed with statistical tools to detect possible biased scorings; finally, in the event the statistical analysis detects a biased case, the teacher is required to intervene to manually correct the assignment. This methodology has been tested on two different assignments with two heterogeneous groups of people to assess the robustness and reliability of the proposal. As a result, we obtain values over 95% in the confidence of the intra-class correlation test (ICC) between the grades computed by our proposal and those directly resulting from the manual correction of the teacher. These figures confirm that the evaluation obtained with the proposed methodology is statistically similar to that of the manual correction of the teacher with a remarkable decrease in terms of effort. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
38. Automated neural foraminal stenosis grading via task-aware structural representation learning.
- Author
-
He, Xiaoxu, Leung, Stephanie, Warrington, James, Shmuilovich, Olga, and Li, Shuo
- Subjects
- *
STENOSIS , *OLDER patients , *PHYSICIANS , *QUALITY of life , *AUTOMATION , *CLASSIFICATION algorithms - Abstract
Neural foraminal stenosis (NFS) is the most common spinal disease in elderly patients, greatly affecting their quality of life. Efficient and accurate grading of NFS is extremely vital for physicians as it offers patients a timely and targeted treatment according to different grading levels. However, current clinical practice relies on physicians’ visual inspection and manual grading of neural foramina (NF), which brings the annoying inefficiency and inconsistency. A fully automated system is highly desirable but faces many technical challenges (e.g., the inefficiency in localizing NF candidates, and the severe ambiguities in grading). In this paper, an automated and accurate localization and grading clinical framework is proposed. By our framework, both localization and grading tasks are handled as multi-class classification problem: two-class classification (NF/non-NF) and four-class classification (normal/slight/marked/severe). To achieved it, a newly proposed saliency-biased Ncuts (SBNcuts) is utilized for efficient localization, and a novel task-aware structural representation learning (TASRL) model is developed for accurate localization and grading. Specifically, SBNcuts creatively incorporates saliency map as a preliminary guess of NF’s locations to refine the generated possible NF candidates with the preserved intact structure of NF. TASRL incorporates task labels (e.g., NF object label and four NFS grade labels) into manifold learning to obtain a discriminative, low-dimensional, and structural image representation, which enables similar appearance sharing among images with the same task label and different appearance among images with different task labels. The superior performance in localization and grading, with very high ( > 0.89) accuracy, specificity, sensitivity, and F-measure, have been demonstrated by experiments on 110 subjects. With our method, physicians could offer an efficient and consistent clinical grading for NFS. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
39. A Comparison between Two Automatic Assessment Approaches for Programming: An Empirical Study on MOOCs.
- Author
-
Bey, Anis, Jermann, Patrick, and Dillenbourg, Pierre
- Subjects
- *
MASSIVE open online courses , *COMPUTER programming education , *TEACHING methods , *EDUCATIONAL technology , *LEARNING - Abstract
Computer-graders have been in regular use in the context of MOOCs (Massive Open Online Courses). The automatic grading of programs presents an opportunity to assess and provide tailored feedback to large classes, while featuring at the same time a number of benefits like: immediate feedback, unlimited submissions, as well as low cost of feedback. The present paper compares Algo+, an automatic assessment tool for computer programs, to an automatic grader used in a MOOC course at EPFL (Ecole Polytechnique Fédérale de Lausanne, Switzerland). This empirical study explores the practicability and behavior of Algo+ and analyzes whether it can be used to evaluate a large scale of programs. Algo+ is a prototype based on a static analysis approach for automated assessment of algorithms where programs are not executed but analyzed by looking at their instructions. The second tool, EPFL grader, is used to grade programs submitted by students in MOOCs of Introductory programming with C++ at EPFL and is based on a compiler approach (Dynamic Analysis approach). In this technique submissions are assessed via a battery of unit tests where the student programs are run with standard input and assessed on whether they produced the correct output. In this study results showed the advantages and limits of each approach and pointed out how the two tools can be used to get a benefit assessment of students' learning in MOOCs of computer programming. This study led to the proposition of a model for the relationship between the number of submissions and the appearance of the most frequent submitted programs. This technique is used by Algo+ for giving feedback and it is based only on the n most redundant submissions that have been annotated by the instructor. [ABSTRACT FROM AUTHOR]
- Published
- 2018
40. Validation of automated screening for referable diabetic retinopathy with the IDx‐DR device in the Hoorn Diabetes Care System.
- Author
-
Van Der Heijden, Amber A., Abramoff, Michael D., Verbraak, Frank, Van Hecke, Manon V., Liem, Albert, and Nijpels, Giel
- Subjects
- *
DIABETIC retinopathy , *TYPE 2 diabetes complications , *BODY mass index , *STANDARD deviations , *DIAGNOSIS ,DIABETIC retinopathy treatment - Abstract
Abstract: Purpose: To increase the efficiency of retinal image grading, algorithms for automated grading have been developed, such as the IDx‐DR 2.0 device. We aimed to determine the ability of this device, incorporated in clinical work flow, to detect retinopathy in persons with type 2 diabetes. Methods: Retinal images of persons treated by the Hoorn Diabetes Care System (DCS) were graded by the IDx‐DR device and independently by three retinal specialists using the International Clinical Diabetic Retinopathy severity scale (ICDR) and EURODIAB criteria. Agreement between specialists was calculated. Results of the IDx‐DR device and experts were compared using sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV), distinguishing between referable diabetic retinopathy (RDR) and vision‐threatening retinopathy (VTDR). Area under the receiver operating characteristic curve (AUC) was calculated. Results: Of the included 1415 persons, 898 (63.5%) had images of sufficient quality according to the experts and the IDx‐DR device. Referable diabetic retinopathy (RDR) was diagnosed in 22 persons (2.4%) using EURODIAB and 73 persons (8.1%) using ICDR classification. Specific intergrader agreement ranged from 40% to 61%. Sensitivity, specificity, PPV and NPV of IDx‐DR to detect RDR were 91% (95% CI: 0.69–0.98), 84% (95% CI: 0.81–0.86), 12% (95% CI: 0.08–0.18) and 100% (95% CI: 0.99–1.00; EURODIAB) and 68% (95% CI: 0.56–0.79), 86% (95% CI: 0.84–0.88), 30% (95% CI: 0.24–0.38) and 97% (95% CI: 0.95–0.98; ICDR). The AUC was 0.94 (95% CI: 0.88–1.00; EURODIAB) and 0.87 (95% CI: 0.83–0.92; ICDR). For detection of VTDR, sensitivity was lower and specificity was higher compared to RDR. AUC's were comparable. Conclusion: Automated grading using the IDx‐DR device for RDR detection is a valid method and can be used in primary care, decreasing the demand on ophthalmologists. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
41. ISSLS PRIZE IN BIOENGINEERING SCIENCE 2017: Automation of reading of radiological features from magnetic resonance images (MRIs) of the lumbar spine without human intervention is comparable with an expert radiologist.
- Author
-
Jamaludin, Amir, Lootus, Meelis, Kadir, Timor, Zisserman, Andrew, Urban, Jill, Battié, Michele, Fairbank, Jeremy, McCall, Iain, Battié, Michele C, and Genodisc Consortium
- Subjects
- *
SPINE abnormalities , *LUMBAR vertebrae abnormalities , *INTERVERTEBRAL disk abnormalities , *BACKACHE diagnosis , *MAGNETIC resonance imaging , *BONE marrow , *COMPARATIVE studies , *INTERVERTEBRAL disk , *LUMBAR vertebrae , *SPINE diseases , *RESEARCH methodology , *MEDICAL cooperation , *ARTIFICIAL neural networks , *RESEARCH , *SPINAL stenosis , *SPONDYLOLISTHESIS , *EVALUATION research - Abstract
Study Design: Investigation of the automation of radiological features from magnetic resonance images (MRIs) of the lumbar spine.Objective: To automate the process of grading lumbar intervertebral discs and vertebral bodies from MRIs. MR imaging is the most common imaging technique used in investigating low back pain (LBP). Various features of degradation, based on MRIs, are commonly recorded and graded, e.g., Modic change and Pfirrmann grading of intervertebral discs. Consistent scoring and grading is important for developing robust clinical systems and research. Automation facilitates this consistency and reduces the time of radiological analysis considerably and hence the expense.Methods: 12,018 intervertebral discs, from 2009 patients, were graded by a radiologist and were then used to train: (1) a system to detect and label vertebrae and discs in a given scan, and (2) a convolutional neural network (CNN) model that predicts several radiological gradings. The performance of the model, in terms of class average accuracy, was compared with the intra-observer class average accuracy of the radiologist.Results: The detection system achieved 95.6% accuracy in terms of disc detection and labeling. The model is able to produce predictions of multiple pathological gradings that consistently matched those of the radiologist. The model identifies 'Evidence Hotspots' that are the voxels that most contribute to the degradation scores.Conclusions: Automation of radiological grading is now on par with human performance. The system can be beneficial in aiding clinical diagnoses in terms of objectivity of gradings and the speed of analysis. It can also draw the attention of a radiologist to regions of degradation. This objectivity and speed is an important stepping stone in the investigation of the relationship between MRIs and clinical diagnoses of back pain in large cohorts.Level Of Evidence: Level 3. [ABSTRACT FROM AUTHOR]- Published
- 2017
- Full Text
- View/download PDF
42. Personalized Assessment as a Means to Mitigate Plagiarism.
- Author
-
Manoharan, Sathiamoorthy
- Subjects
- *
PLAGIARISM , *INDIVIDUALIZED instruction , *PROBLEM solving , *STUDENT cheating , *SERVICE-oriented architecture (Computer science) - Abstract
Although every educational institution has a code of academic honesty, they still encounter incidents of plagiarism. These are difficult and time-consuming to detect and deal with. This paper explores the use of personalized assessments with the goal of reducing incidents of plagiarism, proposing a personalized assessment software framework through which each student receives a unique problem set. The framework not only auto-generates the problem set but also auto-marks the solutions when submitted. The experience of using this framework is discussed, from the perspective of both students and staff, particularly with respect to its ability to mitigate plagiarism. A comparison of personalized and traditional assignments in the same class confirms that the former had far fewer observed plagiarism incidents. Although personalized assessment may not be cost-effective in all courses (such as language courses), it still can be effective in areas such as mathematics, engineering, science, and computing. This paper concludes that personalized assessment is a promising approach to counter plagiarism. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
43. Cellular Automata as an Example for Advanced Beginners' Level Coding Exercises in a MOOC on Test Driven Development: Lessons Learned and Suggestions for Improvement.
- Author
-
Staubitz, Thomas, Teusner, Ralf, Meinel, Christoph, and Prakash, Nishanth
- Subjects
CELLULAR automata ,COMPUTER programming ,PARALLEL processing - Abstract
Programming tasks are an important part of teaching computer programming as they foster students to develop essential programming skills and techniques through practice. The design of educational problems plays a crucial role in the extent to which the experiential knowledge is imparted to the learner both in terms of quality and quantity. Badly designed tasks have been known to put-off students from practicing programming. Hence, there is a need for carefully designed problems. Cellular Automata programming lends itself as a very suitable candidate among problems designed for programming practice. In this paper, we describe how various types of problems can be designed using concepts from Cellular Automata and discuss the features which make them good practice problems with regard to instructional pedagogy. We also present a case study on a Cellular Automata programming exercise used in a MOOC on Test Driven Development using JUnit, and discuss the automated evaluation of code submissions and the feedback about the reception of this exercise by participants in this course. Finally, we suggest two ideas to facilitate an easier approach of creating such programming exercises. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
44. Assessing ocular bulbar redness: a comparison of methods.
- Author
-
Downie, Laura E., Keller, Peter R., and Vingrys, Algis J.
- Subjects
- *
DRY eye syndromes , *KERATOCONJUNCTIVITIS , *DIGITAL image processing , *EYE diseases , *COEFFICIENTS (Statistics) - Abstract
Purpose We consider whether quantification of ocular bulbar redness, using image processing of relative Red-channel activity (Red-value), can be applied to a clinical sample and how this approach compares to an automated bulbar redness grading technique (Oculus Keratograph 5M, R-scan). Methods Red-values from dry eye patients ( n = 25) were determined using image processing of digital photographs over the nasal bulbar conjunctiva. Red-values were compared with subjective grades from six clinicians who graded the images using the IER scale. We considered the level of agreement between the Red-value and automated bulbar redness scores from the commercial instrument (R-scan). Scoring variability for each technique was assessed using the geometric coefficient of variation ( gCoV, %). Agreement between techniques was considered with Bland-Altman analyses. Results Red-values showed a strong linear relationship ( R2 = 0.99) to the R-scan. The Red-value had least variability ( gCoV = 0.97%, 95% CI: 0.76-1.35%). The IER grade showed a linear relationship with Red-value ( R2 = 0.99), bound by a floor effect; it did not discriminate changes in redness below a threshold of 1.75 units (Red-value = 33.0%), after which it paralleled the redness returned by the R-scan. Intra-method variability for the redness returned by the R-scan ( gCoV = 9.84%, 95% CI: 7.60-13.94%) and IER grades ( gCoV = 7.30%, 95% CI: 1.73-10.31%) was similar ( p > 0.05). Bland-Altman analysis showed the R-scan was consistently biased towards lower absolute redness scores than the IER. Conclusions Digital imaging processing, using relative Red-channel activity, was the least variable of the three techniques. The R-scan and IER showed similar intra-observer variability. The linear relationship between R-scan and Red-value suggests that the R-scan could be derived using similar methods. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
45. Support for management of programming assignments - Automated grading.
- Author
-
Paralic, M. and Martoncik, J.
- Abstract
In the paper we deal with question, how the practical skills in programming of complex - possibly distributed - systems can be achieved in an effective way. Based on our previous experience with the highly specialized system ShareMe, the generic system GLab is introduced. It offers learning environment that provides not only the description of the laboratory tasks to be implemented in Java, but also covers electronic submission, automatic grading and online access to grading and test results of any Java-based application. In distinction to ShareMe it is an open learning environment that could be used potentially for any software project implemented in Java. We focus on the subsystem called GLabTest that support automated grading and peer-to-peer system for sharing files ShareMe is used for its validation. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
46. Validity of a graph-based automatic assessment system for programming assignments: human versus automatic grading
- Author
-
Soundous Zougari, Mariam Tanana, and Abdelouahid Lyhyaoui
- Subjects
Programming assignments ,General Computer Science ,Computer aided education ,ComputingMilieux_COMPUTERSANDEDUCATION ,Graph-based assessment system ,Electrical and Electronic Engineering ,Automatic assessment ,Automated grading - Abstract
Programming is a very complex and challenging subject to teach and learn. A strategy guaranteed to deliver proven results has been intensive and continual training. However, this strategy holds an extra workload for the teachers with huge numbers of programming assignments to evaluate in a fair and timely manner. Furthermore, under the current COVID-19 distance teaching circumstances, regular assessment is a fundamental feedback mechanism. It ensures that students engage in learning as well as determines the extent to which they reached the expected learning goals, in this new learning reality. In sum, automating the assessment process will be particularly appreciated by the instructors and highly beneficial to the students. The purpose of this paper is to investigate the feasibility of automatic assessment in the context of computer programming courses. Thus, a prototype based on merging static and dynamic analysis was developed. Empirical evaluation of the proposed grading tool within an introductory C-language course has been presented and compared to manually assigned marks. The outcomes of the comparative analysis have shown the reliability of the proposed automatic assessment prototype.
- Published
- 2022
- Full Text
- View/download PDF
47. Screening for Diabetic Retinopathy in the Central Region of Portugal. Added Value of Automated 'Disease/No Disease' Grading.
- Author
-
Ribeiro, Luisa, Oliveira, Carlos Manta, Neves, Catarina, Ramos, João Diogo, Ferreira, Hélder, and Cunha-Vaz, José
- Subjects
- *
DIABETIC retinopathy , *DIABETES complications , *RETINAL disease diagnosis , *RETINAL diseases , *TREATMENT of eye diseases , *PEOPLE with diabetes , *DIAGNOSIS - Abstract
Purpose: To describe the procedures of a nonmydriatic diabetic retinopathy (DR) screening program in the Central Region of Portugal and the added value of the introduction of an automated disease/no disease analysis. Methods: The images from the DR screening program are analyzed in a central reading center using first an automated disease/no disease analysis followed by human grading of the disease cases. The grading scale used is as follows: R0 - no retinopathy, RL - nonproliferative DR, M - maculopathy, RP - proliferative DR and NC - not classifiable. Results: Since the introduction of automated analysis in July 2011, a total of 89,626 eyes (45,148 patients) were screened with the following distribution: R0 - 71.5%, RL - 22.7%, M - 2.2%, RP - 0.1% and NC - 3.5%. The implemented automated system showed the potential for human grading burden reduction of 48.42%. Conclusions: Screening for DR using automated analysis allied to a simplified grading scale identifies DR vision-threatening complications well while decreasing human burden. © 2014 S. Karger AG, Basel [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
48. Massive Open Online Courses (MOOCS): Emerging Trends in Assessment and Accreditation.
- Author
-
Chauhan, Amit
- Subjects
CURRICULUM ,EDUCATIONAL accreditation ,VIRTUAL universities & colleges ,COMPUTERS in education ,PSYCHOLOGICAL feedback ,INTERNET in education - Abstract
In 2014, Massive Open Online Courses (MOOCs) are expected to witness a phenomenal growth in student registration compared to the previous years (Lee, Stewart, & Claugar-Pop, 2014). As MOOCs continue to grow in number, there has been an increasing focus on assessment and evaluation. Because of the huge enrollments in a MOOC, it is impossible for the instructor to grade homework and evaluate each student. The enormous data generated by learners in a MOOC can be used for developing and refining automated assessment techniques. As a result, "Smart Systems" are being designed to track and predict learner behavior while completing MOOC assessments. These automated assessments for MOOCs can automatically score and provide feedback to students multiple choice questions, mathematical problems and essays. Automated assessments help teachers with grading and also support students in the learning processes. These assessments are prompt, consistent, and support objectivity in assessment and evaluation (Ala-Mutka, 2005). This paper reviews the emerging trends in MOOC assessments and their application in supporting student learning and achievement. The paper concludes by describing how assessment techniques in MOOCs can help to maximize learning outcomes. [ABSTRACT FROM AUTHOR]
- Published
- 2014
49. A new automated grading approach for computer programming.
- Author
-
Liu, Xiong'en
- Subjects
COMPUTER programming ,GRADING of students -- Universities & colleges ,APPLICATION software ,PROGRAMMING languages ,COMPUTER science education ,TEACHING machines - Abstract
The current grading systems for computer programming assignments have taken correctness, efficiency, complexity, and maintainability into account. Of these four components, the most important measurement is the correctness. However, the existing grading systems still have some drawbacks. It is hard to measure college students' overall programming skills based only on their answers to a single form of programming questions. The author proposes a new approach by presenting multiple forms of computer programming questions, such as statement filling-in, program modifying and algorithm designing, and by providing an automated grading algorithm to measure the correctness, time efficiency, space efficiency, complexity, and robustness. This proposed automated grading method has been employed successfully in the development of C programming and Delphi programming exam systems for Computer Application Ability Exam for College Students in Fujian. It has also been applied to the development of an online programming self-testing system for Data Structure course which is offered by Fujian Agriculture and Forestry University. © 2010 Wiley Periodicals, Inc. Comput Appl Eng Educ 21: 484-490, 2013 [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
50. Comparison of oral examination and electronic examination using paired multiple-choice questions
- Author
-
Ventouras, Errikos, Triantis, Dimos, Tsiakas, Panagiotis, and Stergiopoulos, Charalampos
- Subjects
- *
EDUCATIONAL tests & measurements , *EDUCATIONAL evaluation , *EDUCATIONAL technology case studies , *TEST design , *TEST scoring , *COMPUTER assisted instruction , *ORAL examinations (Education) , *MULTIPLE choice examinations , *HIGH technology & education , *TEST interpretation - Abstract
The aim of the present research was to compare the use of multiple-choice questions (MCQs) as an examination method against the oral examination (OE) method. MCQs are widely used and their importance seems likely to grow, due to their inherent suitability for electronic assessment. However, MCQs are influenced by the tendency of examinees to guess answers, warranting research concerning scoring rules different from the simple positive-grades-only scores rule. Alternatively, OE is used in tertiary education, since it enables the assessment of intellectual capabilities and personal traits to a level not found in most other examination formats. However, the significant resource requirements of OE, especially in structured forms, might excessively strain the resources of academic institutions. In the present study, an MCQ test was given to examinees, in the framework of a computer-based learning system. The same examinees took also an OE possessing elements of structure, with three examiners concurrently and independently grading each of the examinees. In the MCQs examination a set of pairs of MCQs was composed. The MCQs in each pair were similar concerning the same topic, but this similarity was not evident for an examinee that did not possess adequate knowledge on the topic addressed in the questions of the pair. The scoring of the paired questions avoided the procedure of mixed-scoring, i.e., both positive and negative markings, while at the same time a pair-wise bonus/penalty scoring rule was adopted. The results of the “paired” MCQs examination, when using the pair-wise scoring rule, were statistically indistinguishable with the grades produced by the OE, when made to the same sample of students, on the same topics and with the same levels of difficult. Both the results of the paired MCQs examination, when using the pair-wise scoring rule, and the OE results differed significantly from those obtained by scoring the same MCQs using a positive-grades-only scoring rule that ignored the pairing of MCQs. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.