I Built an SEO Audit Tool. Then I Pointed It at My Own Sites.

The most uncomfortable demo I've ever run.

A few weeks ago I built a content audit tool called Revylo. The pitch is simple: every SEO tool out there scores your content on keyword density, meta tags, and readability — the stuff search engines used to care about. None of them measure what Google's quality systems actually score in 2026: search intent match, fact accuracy, E-E-A-T, originality, helpful-content signals.

Revylo does. Eight checks, run against any article, scored against the same dimensions Google's quality classifier has been demoting AI content farms for since the helpful content update rolled out in 2022.

I built it. I tested it on fixture data. I made sure it could correctly flag obviously-bad content as red and obviously-good content as green. The calibration held.

Then I made the mistake — or possibly the smartest decision — of pointing it at my own sites.

What I expected

I run six properties. I write the articles on some of them. AI writes the articles on others. Agencies write the articles on a third group. I figured the agency content would score worst. AI content would score middling. My own writing would score highest, because of course I'd rate my own work generously.

I was wrong about almost all of that.

The setup

I picked three sites for the first audit pass: ChiliStation (recipes, agency-written), Onaro (AI governance, mostly written by me), and day9.coffee (coffee freshness tracking, with articles drafted by AI). Five articles per site, audited against all eight Revylo checks.

Took about three minutes per article.

Here are the cross-site averages:

Check	ChiliStation	Onaro	day9.coffee
Search Intent	94.0	94.0	100.0
Fact Grounding	82.0	88.6	81.0
Helpful Content	76.8	75.6	83.0
E-E-A-T	39.8	42.0	42.0
Originality	55.0	65.0	56.2
Internal Linking	85.2	86.2	76.5
Brand Voice	92.0	92.6	85.0
Technical SEO	100.0	80.0	87.5
Overall	73.4	75.6	73.8

A few things jumped out.

The three sites scored within 2.2 points of each other. Agency-written, founder-written, AI-written — all yellow, all roughly the same. I'd assumed there'd be obvious gaps. There weren't.

Every single article on every single site failed E-E-A-T. Red across the board. The universal failure mode of AI-assisted publishing in 2026.

And then there were the specifics. The contradicted claims. The unsupported facts. The places where confident statements turned out to be wrong.

That's where it got uncomfortable.

Strike one: I wrote an article about the SCA standard that got the SCA standard wrong

day9.coffee publishes articles about coffee freshness — when beans peak, why they degrade, how to brew them. The content's drafted by AI (Claude, mostly) and reviewed by me. I'd been pretty happy with it.

Revylo ran the fact-grounding check on the field-notes archive. One article was titled Coffee-to-Water Ratio: The 1:16 Baseline and When to Deviate. The article opens by establishing the baseline: "The Specialty Coffee Association's Gold Cup ratio for filter brewing is one gram of coffee per sixteen grams of water."

Revylo's verdict on that claim: Contradicted.

Authoritative SCA materials describe the Gold Cup standard for filter brewing as 55 g of coffee per 1,000 g of water, which is approximately a 1:18 ratio, not 1:16.

The article was literally about the SCA standard. The first paragraph stated the standard. The stated number was wrong.

I went and checked. The audit was right. The SCA's actual Golden Cup specification calls for 55 grams of coffee per liter of water, which works out to about 1:18, with a range of roughly 49.5 to 60.5 g/L. The 1:16 number is a common home-brewing baseline you'll see in coffee subreddits and YouTube videos — but it isn't the SCA standard. The article confused a popular community ratio with an established institutional specification.

It got worse. Revylo flagged the same error in a second article (Field Notes), where the same wrong-attribution to the SCA appears. AI hallucinated a published standard with a precise number, confidently, in two places, in articles that are specifically about coffee brewing ratios. I'd reviewed both and didn't catch it.

That's strike one. The tool I built caught me being publicly wrong about a fact in my own publishing pipeline. Twice.

Strike two: a confident timing range that wasn't

Same site, different article. day9's content frequently makes claims about coffee degassing — the period after roasting when beans release CO2 and the flavor is changing. A foundational concept for the whole product.

The audit hit on this claim: "Coffee beans release CO2 for about three to four days after roasting."

Verdict: Unsupported.

Authoritative coffee education sources say beans release CO2 for days to weeks after roasting, with the most intense release in the first 24 hours and typical brew-ready windows ranging from about 2–12 days, so a flat claim of only 3–4 days is too narrow.

This one isn't a wrong fact so much as a confidently asserted narrow range where the actual literature shows wide variation. The most intense CO2 release does happen in the first 24 hours; degassing tapers over the following weeks. "Three to four days" is a clean-sounding number that doesn't match what published coffee research says.

The same article asserted that "washed coffees hit peak flavor around day seven according to standard practitioner consensus backed by published degassing research." Revylo's verdict: also unsupported. There is no standard consensus on day seven specifically. The community ranges from day five through day ten, with serious variation based on roast level, origin, and processing method. The phrase "standard practitioner consensus backed by published degassing research" was load-bearing confidence on a claim that couldn't carry the weight.

I had built a coffee tracking product whose marketing content was making up its specifics.

Strike three: a recipe whose measurements weren't measurements

I'm equal-opportunity about the bad news here. ChiliStation's content is written by an agency I pay. They produce recipe articles in bulk. I'd been pretty happy with them too — the articles read well, they hit good keywords, they look like what you'd expect a recipe site to publish.

Revylo audited five of them. One — Award Winning Chili Recipe That Delivers — scored 40/red on fact-grounding alone, dragging the article's overall to 52/red.

The audit found four unsupported or contradicted claims in a single recipe article.

The headline finding: "The recipe requires simmering uncovered for 45 minutes, then adding beans and continuing for another 30 to 45 minutes." Contradicted.

Authoritative recipe sources show chili is simmered after all ingredients are added, with no step matching "45 minutes uncovered, then beans for another 30 to 45 minutes"; instead, beans are typically added with the rest of the ingredients and the chili is simmered for about 30 minutes to 2 hours depending on style.

Competition chili — which the article claims to teach — uses combined simmer time of an hour or more after all ingredients are in. The two-stage timing in the article doesn't match how award-winning chili is actually made.

Then the audit got specific about ingredients. Three more claims came back unsupported:

"1 28-ounce can crushed tomatoes and 1 15-ounce can tomato sauce" — no authoritative recipe source uses that exact tomato split
"3 tablespoons chili powder and 1 tablespoon ground ancho chile" — no published recipe matches that proportion
"1 1/2 cups beef stock and 1 tablespoon Worcestershire sauce" — couldn't be backed against current sources

Read separately, each looks minor. Read together, the article's specific measurements appear to be inventions — assembled to look like a recipe rather than be one. The agency was filling in plausible-looking numbers without grounding them.

I'd been paying them by the article.

The universal failure: E-E-A-T

The factual problems were the most visible findings. The structural problem was everywhere, on every article, on every site.

E-E-A-T — Google's framework for Experience, Expertise, Authoritativeness, and Trustworthiness — was the lowest-scoring check on all three sites by a wide margin:

ChiliStation: 39.8 avg
Onaro: 42.0 avg
day9: 42.0 avg

Every article on every site scored red on this check. Not yellow. Red.

When I dug into the evidence, the same gaps appeared everywhere:

No named author. None of the articles had a byline. They were published as if by the site itself.
No author bio. No "About the author" block. No credentials anywhere on the page.
No first-person experience markers. No "in our testing," "we measured," "I tried." Just claims.
Zero external authoritative citations. Plenty of internal links — ChiliStation averaged 37 internal links per article — but not a single outbound link to a peer-reviewed paper, government source, or recognized industry authority.

This is the systemic finding. It doesn't matter whether content is written by humans, AI, or an agency. If the publishing infrastructure doesn't provide author identity, expertise signals, methodology disclosure, and authoritative citations, Google has nothing to grade on.

Most AI content debates focus on the writer: is the article human or machine? The data I have suggests that's the wrong binary. What matters is the publishing system around the content. A human-written article without a byline scores red on E-E-A-T. An AI-written article with a byline, bio, methodology, and citations scores green. The variable that moves the score isn't the writer. It's the infrastructure.

Which means the fix isn't to stop using AI. The fix is to publish like Google can tell the difference — because it can.

What I'm doing about it

I'm rolling out a consistent author template across all six of my properties. Brian Diamond, named, with a bio specific to each site's context — chili experience on ChiliStation, coffee experience on day9, AI consulting credentials on Onaro and BrianOnAI. One author identity, six properties, schema.org Person markup linking them all together via my LinkedIn, my company (LanStatus, an MSP I founded in 2001), and the cross-site footprint. Google can verify all of this.

I'm fixing the factual errors. The simmering technique on ChiliStation gets corrected with an inline citation to authoritative chili sources. The SCA ratio on day9 gets corrected. The degassing windows get reframed as ranges, not points. Every correction comes with a "last updated" date and a brief editor's note about what changed.

I'm going to run Revylo against the same articles again after the fixes deploy. The audit JSON is structured to make before/after comparison trivial. The lift should be measurable: E-E-A-T should move from red to yellow or green on every article. Fact-grounding should move from yellow/red to green on the corrected ones. Overall scores should rise from the 70s to the 80s.

I'll publish those numbers — good or bad — in a follow-up post.

Why I'm publishing this before I've fixed anything

The temptation when you build a tool like this is to show off only the perfect cases. Run it on a great article, watch it score 95, ship the screenshot. That's the version of the demo where the founder looks smart.

I think it's the wrong version. Tools that only catch problems in other people's work are easy to ignore. Tools that catch problems in their own creator's work prove they're worth paying attention to.

I built Revylo to find this kind of thing. It found it in me. The numbers above aren't the version where I look good — they're the version where the tool looks honest. That's a trade I'll take, because what I want from the launch isn't to seem credible. It's to be credible.

The follow-up post will have the lift data. The same articles, audited the same way, after the fixes. If the methodology works, the numbers will move. If it doesn't, you'll see that too.

What this means for your content

If you've read this far, you probably publish content somewhere. Maybe a company blog. Maybe a personal site. Maybe a portfolio of properties like mine.

Three things you can do this week, regardless of whether you ever try Revylo:

1. Audit your author template. If your articles don't have named authors with bios, that's the single largest E-E-A-T win available. It's a template change, not a per-article rewrite. One change lifts every article on your site.

2. Fact-check your most important pieces. Pick the five articles that drive your highest organic traffic. Read each claim that looks like a number, a standard, a date, or a specific measurement. Ask whether you could defend it against an authoritative source. The ones you can't defend are the ones to fix.

3. Stop assuming AI content is the problem. It isn't, by itself. AI content without authorship infrastructure is the problem. AI content with named authors, transparent methodology, real citations, and consistent voice can score green on the same audit as human content. The writer isn't the variable. The system is.

A note on the tool

Revylo is at revylo.app. You can audit one URL free, no signup. The eight checks are documented in our glossary and explained in depth in What Google's Helpful Content Classifier Actually Looks At. For the full audit process — crawl, indexation, content quality, links, and tracking — see How to Do an SEO Audit.

If you want to see the actual ChiliStation, Onaro, and day9 scorecards referenced in this post — the full evidence, all the verdict explanations, the exact source citations — they're available as sample audits from the homepage. For more case studies in the same format — diagnosis, scores, and what the failures teach — see SEO content audit case studies.

The follow-up post comes when the fixes are deployed and the re-audit data is in.

In the meantime, I have some corrections to ship.

This article was audited by Revylo.

Check	Score	Status
Search Intent	70	yellow
Fact Grounding	65	yellow
Helpful Content	87	green
E-E-A-T	72	yellow
Originality	100	green
Internal Linking	87	green
Brand Voice	—	—
Technical SEO	85	green