You have /5 articles left.
Sign up for a free account or log in.
Generative artificial intelligence (AI) has broken higher education assessment. This has implications from the classroom to institutional accreditation. We are advocating for a one-year pause on assessment requirements from institutions and accreditation bodies. We should divert the time we would normally spend on assessment toward a reevaluation of how to measure student learning. This could also be the start of a conversation about what students need to learn in this new age.
The general plan for assessment in higher education is to aggregate student completion of tasks (essays, exams, code writing, etc.) and evaluate the work to see if students have met benchmarks across sections, programs, institutions and generations. We were not particularly good at assessment to start with. Complicating factors such as academic integrity and the subjectivity of interpretation already clouded this field. With the release of ChatGPT and other AI tools which can complete tasks entirely, or in combination with students’ own efforts, we are now analyzing some mix of student work, student/AI work, and AI work—but still taking it to show student learning in the same way it did before.
The issue with institution- or even program-level assessment that makes this problem more complicated than for any individual class is that no instructor can solve this problem. Adapting our courses to generative AI has most often fallen to instructors with limited examples of institutional leadership. Orientation toward AI varies considerably between institutions, programs, courses and even individual sections. Take an introductory writing class as an example. One instructor may not have an AI policy, another may have a “ban” in place and be using AI detection software, a third may love the technology and be requiring students to use it. These varied policies make the aggregated data as evidence of student learning worthless. The inconsistencies also complicate cross-sectional comparison to identify trends like failure rates or equity gaps. Currently, this data does not tell us anything about instruction or student learning.
A Specific Example
To make this point clearer, we will use an extended example from a capstone writing class taught by one of us, Nik Janos. Sociology 441: Public Sociology is listed as a Graduation Writing Assessment Requirement (GWAR) course at our institution. A GWAR class requires students to pass the class with a C- or better and to demonstrate general writing competency and disciplinary writing skills specific to each major. Using data and theory, students practice sociological storytelling, aimed at different public audiences.
Writing frequently, students in this course practice general writing competency and discipline-specific skills in writing for nonacademic audiences. Some examples include writing letters-to-the-editor, blog posts, encyclopedia entries and mock speeches to a city council. Many of these assignments help students practice writing to show general proficiency in grammar, syntax and style.
To practice sociology-specific writing skills, students complete a series of exercises structured around a foundational concept in sociology called the sociological imagination (SI), developed by C. Wright Mills. The concept helps people think sociologically by recognizing that what we think of as personal troubles, say being homeless, are really social problems, i.e., homelessness.
One of the core skills that sociology programs develop in their students is the ability to read, comprehend, and then write to demonstrate comprehension. For example, in the sociological imagination exercises, students are asked to provide, in their own words, without quotes, a definition of the SI. Students must role play, pretending that they are in a group of friends or family, and the topic of conversation is a current social problem, such as homelessness. They then explain how the SI helps illuminate the social problem. Lastly, students must craft a script of 75 words or fewer that defines the SI and uses it to shed light on the social problem. The script has to be written in everyday language, be set in a gathering of friends or family, use and define the concept, and make one point about the topic.
Generative AI, like ChatGPT, has broken assessment of student learning in an assignment like this. ChatGPT can meet or exceed students’ outcomes in mere seconds. Before fall 2022 and the release of ChatGPT, students struggled to define the sociological imagination, so a key response was to copy and paste boilerplate feedback to a majority of the students with further discussion in class. This spring, in a section of 27 students, 26 nailed the definition perfectly. There is no way to know whether students used ChatGPT, but the outcomes were strikingly different between the pre- and post-AI era.
The work students do in this course is assessed every year by the sociology department for the College of Behavioral and Social Sciences and is also, as mentioned, used to determine whether Chico State students have demonstrated writing proficiency required to graduate. As the class comes up for assessment this spring, in the first full academic year after ChatGPT, the data that the department will collect cannot be verified as to whether it was produced by a human or wholly or in part produced by generative AI. It’s a perfect example of “garbage in, garbage out.”
Implications and Options
The data we are collecting right now are literally worthless. These same trends implicate all data gathered from December 2022 through the present. So, for instance, if you are conducting a five-year program review for institutional accreditation you should separate the data from before the fall 2022 term and evaluate it independently. Whether you are evaluating writing, STEM outputs, coding, or anything else, you are now looking at some combination of student/AI work. This will get even more confounding as AI tools become more powerful and are integrated into our existing production platforms like Microsoft Office and Google Workspace.
This issue and others require a wholesale reevaluation of how to measure student learning. As such, we are calling for two major changes.
First, accreditation bodies, whether they are regional or discipline-specific, need to show leadership. They are supposed to be the connective tissue between employers, governments, and higher education. We need leadership, support, and conversation with these partners. The burden of adapting to artificial intelligence has fallen to faculty, but we are not positioned or equipped to lead these conversations across stakeholder groups.
Second, we should have a one-year pause wherein no assessment is expected or will be received. There is no risk to this since we are not assessing student learning anymore. This would be a break from the practice of assessment, not engagement with assessment. Instead of the regular work, we should be meeting, thinking, and working on what assessment of student learning looks like in this new world. We don’t know what we are going to find, but continuing to do the same thing in a world that has radically changed is naïve and a waste of resources. The units on campuses responsible for doing assessment or utilizing assessment, such as deans' offices, program review committees, and curriculum committees, need to lead this conversation and begin the work of retooling assessment.