1. From F to A on the New York Regents Science Exams -- An Overview of the Aristo Project.
- Author
-
Clark, Peter, Etzioni, Oren, Khashabi, Daniel, Khot, Tushar, Mishra, Bhavana Dalvi, Richardson, Kyle, Sabharwal, Ashish, Schoenick, Carissa, Tafjord, Oyvind, Tandon, Niket, Bhakthavatsalam, Sumithra, Groeneveld, Dirk, Guerquin, Michal, and Schmitz, Michael
- Subjects
ARTIFICIAL intelligence ,EXAMINATIONS ,NATURAL language processing - Abstract
Artificial intelligence has achieved remarkable mastery over games such as Chess, Go, and poker, and even Jeopardy!, but the rich variety of standardized exams has remained a landmark challenge. Even as recently as 2016, the best artificial intelligence system could only achieve 59.3 percent on an eighth-grade science exam (Schoenick et al. 2017). This article reports success on the Grade 8 New York Regents Science Exam, where, for the first time, a system scores more than ninety percent on the exam's nondiagram, multiple-choice questions. In addition, our Aristo system, building upon the success of recent language models, exceeded eighty-three percent on the corresponding Grade 12 Science Exam's non-diagram, multiple-choice questions. The results, on unseen test questions, are robust across different test years and different variations of this kind of test. They demonstrate that modern natural language processing methods can result in mastery on this task. While not a full solution to general question answering (the questions are limited to eighth-grade multiple-choice science), it represents a significant milestone for the field. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF