Jan 22, 2023 9 min read

Ideas in Education: AI-Assisted Assessment

Introduction

I'm always interested in reading and exploring new research, so I was excited when I came across a new paper, Assessment in the age of artificial intelligence (Swiecki et al., 2022). A long time ago, I did science research for my PhD, and part of that meant reading papers. Papers are written in a very formal way, and this means that is not easy to extract information from them. Mostly, we rely on others to interpret the contents. Historically this has meant that we have had the press mischaracterising findings if we hear about them at all. To help, for the past few months, I have been using OpenAI and now ChatGPT to enable me to simplify papers.

Apart from my educational benefit in doing this, I can see that it would be very useful for my students to do this. It's a good skill to pick up, being able to read a paper quickly. So, I am also using it to explain historical science papers for my classes. I decided to try this with the Swiecki et al. paper and use its assistance in this week's post. That way, you can see in practice another big benefit of AI text generation.

This post will go through the paper's main sections and explain simply what it says. I will edit and alter it as appropriate to make it clearer. If you want to check the accuracy of the AI, I suggest that you download the paper here; it is open-access, so free to download.

Why this paper?

Assessments are important for checking student learning, but traditional methods like multiple-choice questions and essays have limitations. They can be difficult to create and take up a lot of time, only provide limited information, and may not be tailored to individual students. We need to accept that teachers have limited time and won’t necessarily update resources yearly, so they may not accurately reflect real-world situations. Swiecki et al., in their paper, explore how artificial intelligence (AI) can address these issues whilst also acknowledging that traditional methods have their own value and challenges.

Background: The Standard Assessment Paradigm

The standard assessment paradigm (SAP) is an approach to educational assessment that uses a set of predefined items (such as questions or problems) to infer students' proficiency in certain traits. Traditional methods like multiple-choice questions, essays, and short answer questions are examples of the SAP. However, this approach has several potential issues.

it can be difficult and time-consuming for educators to design and implement these assessments.
the assessments only provide snapshots of student performance at a single point in time and do not give a full picture of learning.
the assessments may not consider individual student's prior knowledge, abilities, experiences, and cultural backgrounds.

Additionally, there has been a shift in the literature on learning that emphasizes the importance of studying learning processes over time and understanding how learning happens to improve student progress. Swiecki et al. argue that the SAP has these limitations and that AI can be used to address them.

How can AI help with Assessment?

AI can help make assessment practices more efficient by automating tasks such as generating assessment tasks, finding peers to grade work, and scoring student work. One way AI is used in assessment is by automating the generation of multiple-choice and open-answer questions through the use of deep neural networks. Studies have shown that the success of these approaches relies on the availability of large-scale and relevant datasets used to train the models. This is what we are now getting with the advent of GPT3.5; however, most existing datasets are not directly related to teaching and learning. Metrics can measure the quality of generated questions, but they do not ensure they are helpful for teaching.

AI-assisted peer assessment is a method which involves students evaluating each other's work. This has been shown to be a sustainable and effective way of providing feedback, especially in large class sizes. A number of educational platforms have been developed to support peer assessment (Presumably, they have a cost to schools).

However, more research is needed to understand the impact of AI on assessment practices fully. If you need to buy software to achieve this, it won’t be easy to use in school. However, you could potentially use AI to mark a student’s work and suggest

a) what mark would it get and
b) what to do to improve!

This would be done by the student interacting with the AI directly.

A more continuous view of student performance can be achieved through AI assessment, allowing for a deeper understanding of their learning process. One way is through electronic assessment platforms (EAPs). These allow for exams to be taken online and provide more detailed data on the student's behaviour. This data can be used to understand their behaviour during the exam, such as their test-taking effort and answering and revising habits.

Another approach is stealth assessment, which involves collecting data from students engaging in digital activities such as playing games. Shute and Ventura (2013) developed measures of:

- conscientiousness,
- creativity, and
- physics ability

by collecting data generated in a digital physics game commonly used in schools. They created models of how students' abilities in a game should improve over time, called construct maps. They then used this data to track each student's progress and give a dynamic assessment of their growing skills as they played.

Latent knowledge estimation is an AI technique that can be used to continuously track student actions and incorporate them into models of performance and learning. This technique is used in intelligent tutoring systems to collect data about students' actions to specific learning opportunities and whether they can correctly apply knowledge components. I suggest that the software provided by Education Perfect does this (as with a number of other providers).

A widely used technique for latent knowledge estimation is Bayesian knowledge tracing (BKT). BKT uses four parameters to estimate whether a student can apply a knowledge component, including:
- the probability that the student already knows the component,
- the probability of learning the component after a learning opportunity,
- the probability of correctly applying the component even if the student doesn't know it, and
- the probability of incorrectly applying it.

Recently, new techniques based on deep learning have been proposed, including the use of recurrent neural networks and transformers. Recurrent neural networks are used to process sequential data, such as speech or text. In contrast, transformers are used to process large amounts of data, making it easier for the model to understand and analyze. AI can help understand how people learn by looking at different data such as clicks, mouse movements, and eye-tracking while looking at the learning environment. Obviously, this will require additional software and hardware to do this, so this is probably more for researchers checking the effectiveness of learning strategies.

One of the more interesting parts of the paper for me was the section on Authentic Assessments. Authentic assessments simulate real-life tasks that professionals do in their field. If you look at recent learning models (e.g. 21st Century Learning Design), authentic assessments are up there in terms of importance. AI is now being used to create and analyze these tasks. For example, virtual internships are simulations where learners pretend to work at a fictional company and do tasks like conducting research and designing a product. AI creates a safe and effective environment for students to act like professionals by providing simulated tools, automated feedback, and messages from supervisors.

In physical simulations, like healthcare, students work with simulated patients who use AI to behave like actual patients. AI is also used to collect, represent, and assess data from these authentic assessments because it can be challenging for educators to understand everything happening in a simulation and provide detailed feedback.

For example, in virtual internships, AI is used to analyze student chat messages and create a dashboard for educators to monitor group interaction and plan interventions. In offline simulations, AI is used to capture millions of data points, such as system logs, speech, and physiological traces, and create interfaces for educators to understand the data and focus on one learning or reflection aspect at a time. This is a fascinating potential classroom use of AI. I can see educators training an AI using the API in OpenAI and creating training scenarios.

Computers, calculators, and software are used to help with tasks like writing. Digital word processors, like Microsoft Word and Google Docs, have been around since the 1970s and can help with tasks like editing and checking spelling and grammar. Today, these tools use AI to do even more advanced tasks, such as suggesting word and sentence completions, understanding tone and style (Grammarly), and even generating new sections of text (ChatGPT). This is becoming more common in everyday practices, and it is important to consider how these tools will impact assessment designs.

The Potential Downsides to AI in assessment

AI-enabled assessment is designed to support and guide teachers in making decisions, but there is a risk that it could lead to the sidelining of professional expertise. For example, in the past, teachers would decide if student submissions were too similar to one another or available sources, but now AI can handle this task quickly and effectively.

However, if teachers rely too heavily on these automated decisions, it could lead to their decision-making capacity being hollowed out. To prevent this, researchers are developing systems that make the decision-making process explainable to teachers. More work is needed to find the best balance between AI and teacher decision-making for teaching, learning, and assessment.

Much of the anti-ChatGPT commentary in the press is about just this. We need to remember that copying/plagiarism/buying essays have been present for a long time. I think teachers are unaware of biases (assuming certain students would never cheat or that they can easily spot work that is not from the students). I have said in another post that we need to be at least using milestones.

AI-enabled assessment can be an appealing option for many stakeholders in education because it can produce data at scale and avoid inconsistencies. However, AI-enabled assessment may not be as neutral and objective as it seems. It shifts the decision-making responsibility to programmers, learning engineers, and others who may not have direct knowledge of the students being assessed. This raises concerns about biases and assumptions of these distant others. To address these concerns, there needs to be clear oversight and accountability for the decisions made by AI-enabled assessment systems and software.

AI-enabled assessment may not be as neutral and objective as it seems.

The new tools have the potential to change the way teachers use assessment in their classrooms. It can limit the ability of teachers to use assessment as a tool for motivation and personal relationship-building with students. It can also prevent teachers from using alternate forms of assessment, such as peer assessment and self-assessment, which are used to support student reflection and engagement. AI-based assessments may also limit the use of assessment for promoting fairness and equality in education. It is important to think about how using AI in assessments might affect the role of assessment in education and to consider the values and beliefs that it promotes.

When AI-enabled assessment tools are used, they may only focus on certain types of learning and may not be able to recognize other forms of learning that are important. This can limit the forms of teaching and learning that are being recognized. Additionally, AI-enabled assessment may not be able to detect certain nuances in language or understand certain forms of creativity and originality. It is important to consider these limitations and acknowledge that there are certain aspects of learning that machines may not be able to assess. This is one of the reasons why I think that AI is not going to replace teachers. We still need to know our learners.

AI-enabled assessment can use data to monitor students' progress and provide feedback continuously, but this can also lead to surveillance of students and a lack of privacy. This can affect trust between educators and students and change how learning is approached. Instead of using assessment to evaluate or judge, it should be used for development and improvement.

Using AI-based assessment tools can make it more difficult to design and implement assessments. It can also change the traits, skills, and abilities that are assessed and make it harder to separate the contributions of human and AI in the assessment. Despite these challenges, AI-based assessments can provide a more nuanced view of learning and be more adapted to the individual student. However, it's important to consider these challenges when designing and implementing AI-based assessments to improve overall assessment practices.

My conclusions

In terms of the paper - it is well-researched and has a lot of ideas that are getting me thinking about how to use AI with assessment in my school. Some of the suggestions look like they will be expensive, so I need to find ways of using AI to help. It's a journey I am relishing.

With the ChatGPT simplification - I found it faster to understand the content than reading alone. The paper is ten pages long and has over 12000 words, compared to this entire blog post which spends around 1900 words on the paper itself. So, I call that a win.

What do you think?

Until next time...