Ask Runable forDesign-Driven General AI AgentTry Runable For Free
Runable
Back to Blog
Technology8 min read

'ChatGPT keeps getting flagged over and over again' — Gemini is the best AI at mimicking human writing and evading detection | TechRadar

New research shows how AI-written content is spreading and getting harder to spot online Discover insights about 'chatgpt keeps getting flagged over and over ag

TechnologyInnovationBest PracticesGuideTutorial
'ChatGPT keeps getting flagged over and over again' — Gemini is the best AI at mimicking human writing and evading detection | TechRadar
Listen to Article
0:00
0:00
0:00

'Chat GPT keeps getting flagged over and over again' — Gemini is the best AI at mimicking human writing and evading detection | Tech Radar

Overview

News, deals, reviews, guides and more on the newest computing gadgets

Start exploring exclusive deals, expert advice and more

Details

Unlock and manage exclusive Techradar member rewards.

'Chat GPT keeps getting flagged over and over again' — Gemini is the best AI at mimicking human writing and evading detection

New research shows how AI-written content is spreading and getting harder to spot online

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Unlock instant access to exclusive member features.

Get full access to premium articles, exclusive features and a growing list of member rewards.

Gemini produces the most human-like writing among major AI tools, according to researchers.

AI-written content has become increasingly difficult for many detectors to flag.

AI-detection tools vary widely in accuracy, leading to inconsistent results for the same piece of content.

Google Gemini outstrips its peers among AI chatbots when it comes to convincing people that content generated by the model comes from a human, researchers have found.

Articles and stories composed using Gemini slip past detection tools more often than those produced by rivals like Chat GPT or Grok, a dubious honor as the internet fills with poorly generated AI slop.

The findings come from an analysis by Open Resource Applications, which tested a dozen widely used AI systems by giving each the same assignment. Every model was asked to produce a long, human-sounding article. Those pieces were then run through three detection platforms, Grammarly, Quill Bot, and GPTZero, to see how easily they could be identified as machine-generated. Gemini came out ahead, with the lowest overall detection rate among the group.

Gemini 3.1 Pro vs Gemini 3 Pro: Google’s new AI is slower on purpose

Google Gemini could challenge Chat GPT for AI market dominance by 2027

That result is less about one model winning and more about what happens next. For readers, writers, and anyone who spends time online, the distinction between human and AI writing is becoming less reliable, even when tools are designed specifically to make that distinction clear.

The study’s numbers tell a straightforward story. Gemini’s output was flagged far less often by Grammarly and not at all by Quill Bot, while GPTZero still identified most AI text across the board. Still, the gap between those tools is significant. It means that the same piece of writing is perceived as entirely human in or clearly artificial solely based on an app the writer has no way of convincing.

A student submitting coursework might pass one detector and fail another. A paralegal writer could have their work questioned depending on which software their boss chooses to use. For the average person, the result is growing uncertainty about how writing is judged and understood.

Gemini proved to be the most convincing at mimicking human writing, with its output rarely flagged by Grammarly and not at all by Quill Bot, while Grammarly showed the weakest detection ability overall, identifying just 43.5% of AI-generated content, and GPTZero stood out as the most effective tool, correctly recognizing AI text nearly 98.8% of the time.

Part of Gemini’s advantage appears to come from how it varies from its rivals in putting sentences together. Detection tools often rely on patterns, looking for predictable structures or familiar phrasing. Models that vary their structure and develop ideas in less uniform ways are harder to catch because they do not follow the same recognizable rhythms.

“Tools like GPTZero flag predictability and overall structure, too, so a model that actually reasons through ideas rather than recycling familiar phrases is going to be a lot harder to catch," a spokesperson for ORA said.

"That gap between models is already wide enough that the same prompt produces completely different results depending on which tool you use. Most people choose an AI writing tool by grabbing whatever is most popular, which is exactly why Chat GPT keeps getting flagged over and over again.”

Chat GPT’s ads are giving Gemini an opening to scoop up new users

5 prompts for Gemini 3.1 that show off its full potential

You probably think you can spot an AI fake — research suggests you can’t

It would help explain why Chat GPT, despite its enormous reach, performed relatively poorly in the same test. With hundreds of millions of users, it has become the most familiar voice in AI writing. That familiarity has made it easier to recognize.

“Chat GPT ranks so low because it was the first big AI on the market, and everyone knows what it sounds like,” explains a spokesperson from Open Resource Applications. “Many models that came after it sounded like Chat first, before they became more unique. That’s why AI detectors flag it so easily.”

In a sense, Chat GPT’s influence has worked against it. By shaping early expectations of what AI writing sounds like, it gave detection tools a template to follow. Newer models like Gemini have moved beyond that template, introducing more variation and less predictability.

These kinds of tests matter a lot as millions more people keep trying AI tools and producing AI slop for publication. Some studies suggest that around half of online content is now generated by AI in some form.

Platforms have started to respond by filtering out content that appears overly artificial, but that approach depends on detection tools that are far from consistent. The problem is not false alarms but missed detections, especially as models improve.

The larger pattern is difficult to ignore. AI writing is not just improving; it's diversifying. Different models now produce distinct styles, making it harder to define a single 'AI voice.' That diversity complicates detection while also making the technology more useful.

Gemini’s performance in this study might suggest that it's better at writing, but what it's really successful at is avoiding the patterns that give AI away. That may be a temporary advantage, as detection tools adapt and other models follow suit, but it highlights how quickly the landscape is changing.

For readers, the takeaway is less about choosing sides and more about adjusting expectations. The internet is no longer a space where human and machine writing can be easily separated. It's a blend, and that blend is becoming more seamless.

In that environment, the question is no longer whether something sounds human — increasingly, everything does.

Follow Tech Radar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow Tech Radar on Tik Tok for news, reviews, unboxings in video form, and get regular updates from us on Whats App too.

➡️ Read our full guide to the best business laptops

  1. Best overall: Dell Precision 5690
  2. Best on a budget: Acer Aspire 5
  3. Best Mac Book: Apple Mac Book Pro 14-inch (M4)

Eric Hal Schwartz is a freelance writer for Tech Radar with more than 15 years of experience covering the intersection of the world and technology. For the last five years, he served as head writer for Voicebot.ai and was on the leading edge of reporting on generative AI and large language models. He's since become an expert on the products of generative AI models, such as Open AI’s Chat GPT, Anthropic’s Claude, Google Gemini, and every other synthetic media tool. His experience runs the gamut of media, including print, digital, broadcast, and live events. Now, he's continuing to tell the stories people want and need to hear about the rapidly evolving AI space and its impact on their lives. Eric is based in New York City.

You must confirm your public display name before commenting

1'Chat GPT keeps getting flagged over and over again' — Gemini is the best AI at mimicking human writing and evading detection

2I wasn’t driven mad by the puzzles in Cthulhu: The Cosmic Abyss, but some frustrating decisions and technical hiccups almost ruined this clever cosmic horror puzzler

3 My pre-teen son tested the Garmin Bounce 2 to see if it's really the top smartwatch for kids

4 Beef season 2 ending explained: everything that happened in the hit Netflix series' explosive and unhinged finale

5NYT Connections hints and answers for Thursday, April 16 (game #1040)

Tech Radar is part of Future US Inc, an international media group and leading digital publisher. Visit our corporate site.

© Future US, Inc. Full 7th Floor, 130 West 42nd Street, New York, NY 10036.

Key Takeaways

  • News, deals, reviews, guides and more on the newest computing gadgets

  • Start exploring exclusive deals, expert advice and more

  • Unlock and manage exclusive Techradar member rewards

  • 'Chat GPT keeps getting flagged over and over again' — Gemini is the best AI at mimicking human writing and evading detection

  • New research shows how AI-written content is spreading and getting harder to spot online

Cut Costs with Runable

Cost savings are based on average monthly price per user for each app.

Which apps do you use?

Apps to replace

ChatGPTChatGPT
$20 / month
LovableLovable
$25 / month
Gamma AIGamma AI
$25 / month
HiggsFieldHiggsField
$49 / month
Leonardo AILeonardo AI
$12 / month
TOTAL$131 / month

Runable price = $9 / month

Saves $122 / month

Runable can save upto $1464 per year compared to the non-enterprise price of your apps.