Robots.txt mistake

MORE NEWS

DIGITAL MARKETING

SEO

SEM

Invisible watermarking in AI content with Google SynthID

Invisible watermarking is a key innovation in authenticating and protecting content created by generative AI. Google SynthID is a state-of-the-art watermarking system designed to embed imperceptible digital signatures directly into AI-generated images, videos, text,...

Google Search API – A technical deep dive into ranking logic

📑 Key Takeaways from the API Leak If you don't have time to analyze 2,500 pages of documentation, here are the 3 most important facts that reshape our understanding of SEO: 1. Clicks are a ranking factor (End of Debate): The leak confirmed the existence of the...

Information gain in the age of AI

The digital information ecosystem stands at a precipice of transformation that is arguably more significant than the introduction of the hyperlink. For the past twenty-five years, the fundamental contract of the web was navigational. Users queried a search engine, and...

Google Discover optimization – technical guide

We have moved from a query-based retrieval model to a predictive push architecture. In this new environment, Google Discover is no longer a secondary traffic source. It is a primary engine for organic growth. The rise of zero-click searches, which now account for...

Parasite SEO strategy for weak domains

The barrier to entry for new digital entities has reached unprecedented heights in this year. For professionals entering competitive verticals, such as SaaS or finance, the mathematical reality of ranking algorithms presents a formidable challenge....

The resurrection protocol of toxic expired domains

The digital economy is littered with the remnants of abandoned web properties, often referred to in the cybersecurity sector as zombie domains. These are domain names that have expired, been dropped by their original registrants, and subsequently re-registered or...

Beyond the walled garden silo – true ROAS across platforms

Google says your campaign generated 150 sales. Amazon claims 200. Meta swears it drove 180. Add them up and you get 530 conversions. Check your actual revenue and you'll find you sold 250 units total.​ This is the walled garden nightmare every e-commerce marketer...

Data-driven CRO for PPC landing pages

In paid search campaigns, exceptional Quality Scores and high conversion rates don’t happen by accident—they’re the result of rigorous, data-driven optimization that blends user behavior insights with systematic testing. By combining visual tools like heatmaps and...

New YouTube Shorts campaign features in Google Ads

YouTube Shorts advertising has undergone significant transformation in 2025, introducing groundbreaking features that revolutionize how advertisers can target, optimize, and monetize short-form video content. The most notable advancement is the introduction...

The latest changes to Google Ads in 2025

Google Ads has undergone its most significant transformation in 2025, with artificial intelligence taking center stage in nearly every aspect of campaign management and optimization. The platform has evolved from a traditional keyword-based advertising system into a...

Jacek Białas

Holds a Master’s degree in Public Finance Administration and is an experienced SEO and SEM specialist with over eight years of professional practice. His expertise includes creating comprehensive digital marketing strategies, conducting SEO audits, managing Google Ads campaigns, content marketing, and technical website optimization. He has successfully supported businesses in Poland and international markets across diverse industries such as finance, technology, medicine, and iGaming.

How one robots.txt mistake cost us $47,000 monthly

Sep 17, 2025 | SEO

It was 10:30 AM when my boss walked into the office looking like someone had stolen his coffee. “Organic traffic dropped 74% this month. What the hell happened?” Based on his expression, I had about 24 hours to find the answer.

This was one of those moments where you feel your blood pressure spike. Our e-commerce site was generating around 185,000 organic sessions monthly, bringing in roughly $79,000 in revenue. Now I was staring at Google Analytics showing just 47,000 sessions over the past 30 days.

The math was brutal we were bleeding $47,000 monthly.

First diagnosis – when everything looks “normal”

I started with the SEO checklist every professional knows by heart:

  • Google Search Console – no critical errors showing
  • Keyword rankings – stable in SEMrush
  • Competition – no major moves detected
  • Site changes – developer swore nothing was touched

Everything appeared normal, but the numbers told a different story. It’s like searching for a needle in a haystack when you’re not even sure there’s a needle.

Screaming frog as the digital detective

I fired up a comprehensive crawl in Screaming Frog – 52,000 pages with all APIs connected. The setup looked standard:

  • JavaScript rendering enabled
  • Google Search Console integrated with 3 months of performance data
  • PageSpeed Insights API configured for mobile analysis
  • Unlimited crawl depth for complete site mapping

After two hours of crawling, I exported the data and began my analysis. At first glance, everything seemed fine – 200 status codes, meta tags in place, logical URL structure.

The eureka moment – the detail that changed everything

While reviewing the Response Codes tab, I noticed something bizarre. Pages under /products/* were showing 200 status codes, but in the “Crawl Depth” column I saw “Not Found”. This made no sense – how could a page be accessible to Screaming Frog but unreachable by bots?

I checked the “Blocked by Robots.txt” tab. Here’s what I found:

User-agent: *
Disallow: /products/
Disallow: /categories/  
Disallow: /blog/

My heart stopped. Our entire product section was blocked from Google.

How did this happen? Anatomy of a disaster

It turned out that a month earlier, our developer had been testing a new site version using test subdomains like /test-products/, /test-categories/, /test-blog/. After finishing the tests, instead of removing the blocks for the test folders, he accidentally copy-pasted the wrong directives, blocking our main site sections.

One copy-paste mistake = $47,000 monthly losses.

The recovery process step by step

Day 1. Immediate action

  • Fixed robots.txt within 30 minutes
  • Resubmitted sitemap through Google Search Console
  • Requested recrawling of 50 most important product pages
  • Set up real-time monitoring through GSC

Week 1. First signs of recovery
Screaming Frog showed pages were being crawled properly again. In GSC, the first signs of reindexing appeared – some product pages returned to the index.

Month 1. Partial recovery
Organic traffic climbed to 97,000 sessions (52% of original). Keyword positions began stabilizing, but some pages still hadn’t recovered their full rankings.

What this experience taught me

Always verify robots.txt in full site context

Standard robots.txt validation in GSC isn’t enough. Screaming Frog shows the complete picture – exactly which pages are blocked and how it affects crawl budget. Use filters in the Response Codes tab to spot discrepancies between accessibility and crawlability.

Integrate data from multiple sources

If I had relied only on GSC, I probably wouldn’t have found the problem for weeks. Only the combination of Screaming Frog data (crawl structure) + GSC (indexing history) + Analytics (traffic drop) + PageSpeed API (no performance issues) painted the full picture.

Monitor changes to critical files

We now use Change Detection in Jenkins that sends alerts whenever someone modifies robots.txt, .htaccess, or sitemap.xml. Cost of this system? 2 hours of developer time. Cost of not having it? $47,000 monthly.

Checklist of errors GSC won’t catch

  1. Robots.txt conflicts vs page accessibility
  2. Hidden canonicalization issues
  3. JavaScript rendering problems

My current audit configuration in screaming frog

API integrations:

  • GSC – 6 months data, mobile/desktop segmentation
  • PageSpeed – mobile-first, all Core Web Vitals metrics
  • Analytics – organic traffic + conversions, last-click attribution
  • Ahrefs API – backlink data for prioritizing fixes

Custom filters:

  • High traffic pages + high technical issues (cross-reference GSC + Response Codes)
  • Product URLs missing schema markup
  • Blog content with thin content (<300 words) but high impressions in GSC

What happened next?

After 3 months, traffic returned to 179,000 monthly sessions – 97% of pre-disaster levels. Some long-tail keywords never fully recovered their positions, permanently costing us about $2,800 in monthly revenue.

Lesson learned: Google doesn’t forget. Even if you fix errors quickly, some consequences can be permanent.

But most importantly, we gained an early warning system. Now when anything changes in critical SEO files, our entire team gets a Slack notification within 5 minutes.

Your homework for today

Check your robots.txt. Not in GSC’s Robots Testing Tool, but in reality:

  1. Run a crawl in Screaming Frog with “Follow Robots.txt” enabled
  2. Export the “Blocked by Robots.txt” tab
  3. Cross-reference with traffic data from Analytics
  4. If you find traffic-generating pages that are blocked – you have a problem

One mistake can cost a fortune. One good audit can save it.

How much is your website worth? And what would a month-long outage due to a robots.txt error cost you?

Sometimes the most expensive lessons are the most valuable. This one cost $47,000 but taught me systematic SEO auditing approaches that have since saved several other projects from similar disasters.

Share News on