Anatomy of an AI Essay

5th July 2024

220

an artificial intelligence illustration on the wall — Photo by Tara Winstead on Pexels.com

By Elizabeth Steere

Since OpenAI launched ChatGPT in 2022, educators have been grappling with the problem of how to recognize and address AI-generated writing. The host of AI-detection tools that have emerged over the past year vary greatly in their capabilities and reliability. For example, mere months after OpenAI launched its own AI detector, the company shut it down due to its low accuracy rate.

Understandably, students have expressed concerns over the possibility of their work receiving false positives as AI-generated content. Some institutions have disabled Turnitin’s AI-detection feature due to concerns over potential false allegations of AI plagiarism that may disproportionately affect English-language learners. At the same time, tools that rephrase AI writing—such as text spinners, text inflators or text “humanizers”—can effectively disguise AI-generated text from detection. There are even tools that mimic human typing to conceal AI use in a document’s metadata.

While the capabilities of large language models such as ChatGPT are impressive, they are also limited, as they strongly adhere to specific formulas and phrasing. Turnitin’s website explains that its AI-detection tool relies on the fact that “GPT-3 and ChatGPT tend to generate the next word in a sequence of words in a consistent and highly probable fashion.” I am not a computer programmer or statistician, but I have noticed certain attributes in text that point to the probable involvement of AI, and in February, I collected and quantified some of those characteristics in hopes to better recognize AI essays and to share those characteristics with students and other faculty members.

I asked ChatGPT 3.5 and the generative AI tool included in the free version of Grammarly each to generate more than 50 analytical essays on early American literature, using texts and prompts from classes I have taught over the past decade. I took note of the characteristics of AI essays that differentiated them from what I have come to expect from their human-composed counterparts. Here are some of the key features I noticed.

bionic hand and human hand finger pointing — Photo by cottonbro studio on Pexels.com

AI essays tend to get straight to the point.

Human-written work often gradually leads up to its topic, offering personal anecdotes, definitions or rhetorical questions before getting to the topic at hand.

AI-generated essays are often list-like.

They may feature numbered body paragraphs or multiple headings and subheadings.

The paragraphs of AI-generated essays also often begin with formulaic transitional phrases.

As an example, here are the first words of each paragraph in one essay that ChatGPT produced:

ALSO READ: Great Ife: Where is thy Greatness?

“Firstly”
“In contrast”
“Furthermore”
“On the other hand”
“In conclusion.”

Notably, AI-generated essays were far more likely than human-written essays to begin paragraphs with “Furthermore,” “Moreover” and “Overall.”

AI-generated work is often banal.

It does not break new ground or demonstrate originality; its assertions sound familiar.

AI-generated text tends to remain in the third person.

That’s the case even when asked a reader response–style question. For example, when I asked ChatGPT what it personally found intriguing, meaningful or resonant about one of Edgar Allan Poe’s poems, it produced six paragraphs, but the pronoun “I” was included only once. The rest of the text described the poem’s atmosphere, themes and use of language in dispassionate prose. Grammarly prefaced its answer with “I’m sorry, but I cannot have preferences as I am an AI-powered assistant and do not have emotions or personal opinions,” followed by similarly clinical observations about the text.

AI-produced text tends to discuss “readers” being “challenged” to “confront” ideologies or being “invited” to “reflect” on key topics.

In contrast, I have found that human-written text tends to focus on hypothetically what “the reader” might “see,” “feel” or “learn.”

AI-generated essays are often confidently wrong. Human writing is more prone to hedging, using phrases like “I think,” “I feel,” “this might mean …” or “this could be a symbol of …” and so on.

AI-generated essays are often repetitive.

An essay that ChatGPT produced on the setting of Rebecca Harding Davis’s short story “Life in the Iron Mills” contained the following assertions among its five brief paragraphs: “The setting serves as a powerful symbol,” “the industrial town itself serves as a central aspect of the setting,” “the roar of furnaces serve as a constant reminder of the relentless pace of industrial production,” “the setting serves as a catalyst for the characters’ struggles and aspirations,” “the setting serves as a microcosm of the larger societal issues of the time,” and “the setting … serves as a powerful symbol of the dehumanizing effects of industrialization.”

AI writing is often hyperbolic or overreaching.

The quotes above describe a “powerful symbol,” for example. AI essays frequently describe even the most mundane topics as “groundbreaking,” “vital,” “esteemed,” “invaluable,” “indelible,” “essential,” “poignant” or “profound.”

ALSO READ: Call for Submission – 3rd Ken Saro-Wiwa Prize for Book Review (2017)

AI-produced texts frequently use metaphors, sometimes awkwardly.

ChatGPT produced several essays that compared writing to “weaving” a “rich” or “intricate tapestry” or “painting” a “vivid picture.”

AI-generated essays tend to overexplain.

They often use appositives to define people or terms, as in “Margaret Fuller, a pioneering feminist and transcendentalist thinker, explored themes such as individualism, self-reliance and the search for meaning in her writings …”

AI-generated academic writing often employs certain verbs.

They include “delve,” “shed light,” “highlight,” “illuminate,” “underscore,” “showcase,” “embody,” “transcend,” “navigate,” “foster,” “grapple,” “strive,” “intertwine,” “espouse” and “endeavor.”

AI-generated essays tend to end with a sweeping broad-scale statement.

They talk about “the human condition,” “American society,” “the search for meaning” or “the resilience of the human spirit.” Texts are often described as a “testament to” variations on these concepts.

AI-generated writing often invents sources.

ChatGPT can compose a “research paper” using MLA-style in-text parenthetical citations and Works Cited entries that look correct and convincing, but the supposed sources are often nonexistent. In my experiment, ChatGPT referenced a purported article titled “Poe, ‘The Fall of the House of Usher,’ and the Gothic’s Creation of the Unconscious,” which it claimed was published in PMLA, vol. 96, no. 5, 1981, pp. 900–908. The author cited was an actual Poe scholar, but this particular article does not appear on his CV, and while volume 96, number 5 of PMLA did appear in 1981, the pages cited in that issue of PMLA actually span two articles: one on Frankenstein and one on lyric poetry.

AI-generated essays include hallucinations.

Ted Chiang’s article on this phenomenon offers a useful explanation for why large language models such as ChatGPT generate fabricated facts and incorrect assertions. My AI-generated essays included references to nonexistent events, characters and quotes. For example, ChatGPT attributed the dubious quote “Half invoked, half spontaneous, full of ill-concealed enthusiasms, her wild heart lay out there” to a lesser-known short story by Herman Melville, yet nothing resembling that quote appears in the actual text. More hallucinations were evident when AI was generating text about less canonical or more recently published literary texts.

This is not an exhaustive list, and I know that AI-generated text in other formats or relating to other fields probably features different patterns and tendencies. I also used only very basic prompts and did not delineate many specific parameters for the output beyond the topic and the format of an essay.

ALSO READ: UBA Foundation doubles grant prizes in 2024 National Essay Competition

It is also important to remember that the attributes I’ve described are not exclusive to AI-generated texts. In fact, I noticed that the phrase “It is important to … [note/understand/consider]” was a frequent sentence starter in AI-generated work, but, as evidenced in the previous sentence, humans use these constructions, too. After all, large language models train on human-generated text.

And none of these characteristics alone definitively point to a text having been created by AI. Unless a text begins with the phrase “As an AI language model,” it can be difficult to say whether it was entirely or partially generated by AI. Thus, if the nature of a student submission suggests AI involvement, my first course of action is always to reach out to the student themselves for more information. I try to bear in mind that this is a new technology for both students and instructors, and we are all still working to adapt accordingly.

Students may have received mixed messages on what degree or type of AI use is considered acceptable. Since AI is also now integrated into tools their institutions or instructors have encouraged them to use—such as Grammarly, Microsoft Word or Google Docs—the boundaries of how they should use technology to augment human writing may be especially unclear. Students may turn to AI because they lack confidence in their own writing abilities. Ultimately, however, I hope that by discussing the limits and the predictability of AI-generated prose, we can encourage them to embrace and celebrate their unique writerly voices.

Elizabeth Steere is a lecturer in English at the University of North Georgia.