Log analysis

MORE NEWS

DIGITAL MARKETING

SEO

SEM

Invisible watermarking in AI content with Google SynthID

Invisible watermarking is a key innovation in authenticating and protecting content created by generative AI. Google SynthID is a state-of-the-art watermarking system designed to embed imperceptible digital signatures directly into AI-generated images, videos, text,...

Google Search API – A technical deep dive into ranking logic

📑 Key Takeaways from the API Leak If you don't have time to analyze 2,500 pages of documentation, here are the 3 most important facts that reshape our understanding of SEO: 1. Clicks are a ranking factor (End of Debate): The leak confirmed the existence of the...

Information gain in the age of AI

The digital information ecosystem stands at a precipice of transformation that is arguably more significant than the introduction of the hyperlink. For the past twenty-five years, the fundamental contract of the web was navigational. Users queried a search engine, and...

Google Discover optimization – technical guide

We have moved from a query-based retrieval model to a predictive push architecture. In this new environment, Google Discover is no longer a secondary traffic source. It is a primary engine for organic growth. The rise of zero-click searches, which now account for...

Parasite SEO strategy for weak domains

The barrier to entry for new digital entities has reached unprecedented heights in this year. For professionals entering competitive verticals, such as SaaS or finance, the mathematical reality of ranking algorithms presents a formidable challenge....

The resurrection protocol of toxic expired domains

The digital economy is littered with the remnants of abandoned web properties, often referred to in the cybersecurity sector as zombie domains. These are domain names that have expired, been dropped by their original registrants, and subsequently re-registered or...

Beyond the walled garden silo – true ROAS across platforms

Google says your campaign generated 150 sales. Amazon claims 200. Meta swears it drove 180. Add them up and you get 530 conversions. Check your actual revenue and you'll find you sold 250 units total.​ This is the walled garden nightmare every e-commerce marketer...

Data-driven CRO for PPC landing pages

In paid search campaigns, exceptional Quality Scores and high conversion rates don’t happen by accident—they’re the result of rigorous, data-driven optimization that blends user behavior insights with systematic testing. By combining visual tools like heatmaps and...

New YouTube Shorts campaign features in Google Ads

YouTube Shorts advertising has undergone significant transformation in 2025, introducing groundbreaking features that revolutionize how advertisers can target, optimize, and monetize short-form video content. The most notable advancement is the introduction...

The latest changes to Google Ads in 2025

Google Ads has undergone its most significant transformation in 2025, with artificial intelligence taking center stage in nearly every aspect of campaign management and optimization. The platform has evolved from a traditional keyword-based advertising system into a...

Jacek Białas

Holds a Master’s degree in Public Finance Administration and is an experienced SEO and SEM specialist with over eight years of professional practice. His expertise includes creating comprehensive digital marketing strategies, conducting SEO audits, managing Google Ads campaigns, content marketing, and technical website optimization. He has successfully supported businesses in Poland and international markets across diverse industries such as finance, technology, medicine, and iGaming.

Server log analysis for SEO

Sep 26, 2025 | SEO

The reports in Google Search Console are useful, but they only show Google’s interpretation of the data. If you want to see the uncensored truth about how Googlebot (and other bots) sees and crawls your site, you have to go to the source, raw server logs. This is the only place you’ll find every single bot request, every server response it encounters, and every wasted byte of your crawl budget.

This 5-step guide will show you exactly how to conduct such an analysis to make SEO decisions based on hard data, not guesswork.

Step 1. Get access to raw server logs

Server logs are text files that record every single request made to the server. Before you can begin your analysis, you need to get these files.

  • Where to find them? The location depends on your server configuration, but you will most commonly find them in these paths:
  • How to download them?

You need the raw access logs (access.log), not the error logs (error.log). These files can be very large (several gigabytes for popular sites), so ensure you have enough disk space.

Choose and configure an analysis tool

Manually reading millions of lines in a text file is impossible. You need specialized software to process this data and present it in a readable format.

  • Screaming Frog SEO Log File Analyser (Paid, with a free version) – this is the industry standard. It’s relatively inexpensive and incredibly powerful. The free version allows you to analyze up to 1,000 log lines, which is enough to get familiar with the tool.
  • Other options:

Example of script in Python

Here’s the Python code. You can save this as a .py file e.g., seo_log_analyzer.py

How to use the script

Prerequisites:

  • Python – ensure Python 3.x is installed on your system.
  • Log file – obtain your access.log file(s) from your web server (Apache, Nginx). You can usually find these in /var/log/apache2/ or /var/log/nginx/ via SSH/FTP, or download them from your hosting panel’s “Raw Access Logs” section.
  • Placement – place your access.log file(s) in the same directory as your Python script, or update the LOG_FILE_PATH variable to point to its exact location.

Save the Script – save the code above as a Python file (e.g., seo_log_analyzer.py).

Prepare your log file:

  • If your log file is compressed (e.g., access.log.gz), you’ll need to decompress it first. You can use tools like gunzip on Linux/macOS or 7-Zip on Windows.
  • If you have multiple log files (e.g., access.log.1, access.log.2), you can concatenate them into one large file for a comprehensive analysis using cat access.log.* > combined_access.log (Linux/macOS) or manually combine them. Then, update LOG_FILE_PATH to point to this combined file.

Configure LOG_FILE_PATH:

  • Open seo_log_analyzer.py in a text editor.
  • Locate the LOG_FILE_PATH variable.
  • Change 'access.log' to the actual name/path of your log file. For example: LOG_FILE_PATH = 'my_website_access_logs.log' or LOG_FILE_PATH = '/path/to/your/logs/access.log'.

Remove dummy data block:

  • The if __name__ == "__main__" – block at the bottom of the script contains dummy data creation for testing. For analyzing your real logs, you must remove or comment out the lines that create the dummy_log_data and write it to LOG_FILE_PATH. Keep only the analysis_results = analyze_log_file(LOG_FILE_PATH) and print_report(analysis_results) lines.
  • The cleanup os.remove(LOG_FILE_PATH) at the end should also be removed if you’re working with your actual log files, as you don’t want to delete them.

Run the script:

  • Open your terminal or command prompt.
  • Navigate to the directory where you saved seo_log_analyzer.py and your log file.
  • Execute the script using: python seo_log_analyzer.py

Interpreting the report & taking action

The script will output a detailed report directly to your console, similar to this:

Starting analysis of file: access.log...
Found 8 potential Googlebot requests. Starting IP verification (this may take some time for large files)...
Successfully verified 7 requests from authentic Googlebot IPs.

========================================
        SEO LOG ANALYSIS REPORT        
========================================
Total 'Googlebot' User-Agent hits: 8
Number of **verified** Googlebot hits: 7
----------------------------------------

## Googlebot Response Status Codes Breakdown:
  - Code 200: 5 times
  - Code 404: 1 times
  - Code 503: 1 times

Status code category summary:
  - Category 2xx: 5 times
  - Category 4xx: 1 times
  - Category 5xx: 1 times
----------------------------------------

## TOP 20 Most Crawled URLs by Googlebot:
 1. /home-page (2 times)
 2. /products/new-model (2 times)
 3. /old-article (1 times)
 4. /assets/style.css (1 times)
 5. /api/data (1 times)
----------------------------------------

Actionable Insights:
  - High 4xx/5xx codes suggest broken internal links or server issues. Investigate these URLs.
  - If Googlebot frequently crawls low-value URLs (e.g., filtered results, old content), consider using robots.txt or canonical tags to manage crawl budget.
  - Pages with low crawl frequency but high importance might need improved internal linking or sitemap updates.
========================================

Configuration in Screaming Frog SEO Log File Analyser:

  1. Launch the program and create a new project.
  2. Drag and drop your .log or .gz file(s) into the program window.
  3. The tool will automatically start processing the data. In the “User Agents” tab, you’ll see a list of all bots that have visited your site.

Identify and Verify Googlebot’s Activity

Not every request with a “Googlebot” User-Agent actually comes from Google. Malicious bots often spoof their user agent to bypass security measures. That’s why verification is critical.

In the Screaming Frog Log File Analyser, navigate to the “Bots” tab. The tool automatically performs a reverse DNS lookup to verify if the request’s IP address truly belongs to Google. You will see a breakdown of “Verified Bots” and “Spoofed Bots.” For your analysis, only consider the verified bots.

Analyze key data – What is Googlebot actually doing?

This is the core of the entire process. You must now interpret the data to understand the bot’s behavior. Focus on the following reports and metrics:

  • Most Crawled URLs (URLs -> All URLs):
  • Server Response Codes (Response Codes):
  • Crawl Waste:
  • Crawl Frequency (Events):

Take concrete actions based on your analysis

The analysis itself is worthless without implementation. Here is a table of common problems and their solutions:

Problem Identified in LogsSpecific Action to Take
Googlebot frequently hits pages that return a 404 error.1. Identify these URLs. 2. If they have valuable replacements, set up 301 redirects. 3. Fix the internal links that point to these broken pages.
The bot wastes time on URLs with parameters (e.g., ?sort=price).1. Block these parameters in your robots.txt file using the Disallow directive. 2. Use the rel="canonical" tag to point to the “clean” version of the URL.
The most important business pages are rarely crawled.1. Increase the number of internal links pointing to these pages. 2. Ensure they are in your sitemap.xml with a high priority.
5xx server errors appear in the logs.1.Immediately contact your server administrator or hosting company. 2. Analyze the error.log files to diagnose the cause.
The bot is crawling non-canonical versions of pages (e.g., with and without www).

1. Implement server-level 301 redirects to force a single, preferred version of your domain.

Server log analysis is one of the most powerful techniques in a technical SEO’s toolkit. It allows you to stop guessing and start acting based on hard, undeniable evidence. It shows you where you’re losing money, where Google is encountering problems, and which elements of your site require immediate attention. Dedicate one day to this analysis, and you’ll get a concrete to-do list that will yield far better results than months of “creative” marketing.

Share News on