410 vs 404

MORE NEWS

DIGITAL MARKETING

SEO

SEM

Invisible watermarking in AI content with Google SynthID

Invisible watermarking is a key innovation in authenticating and protecting content created by generative AI. Google SynthID is a state-of-the-art watermarking system designed to embed imperceptible digital signatures directly into AI-generated images, videos, text,...

Google Search API – A technical deep dive into ranking logic

📑 Key Takeaways from the API Leak If you don't have time to analyze 2,500 pages of documentation, here are the 3 most important facts that reshape our understanding of SEO: 1. Clicks are a ranking factor (End of Debate): The leak confirmed the existence of the...

Information gain in the age of AI

The digital information ecosystem stands at a precipice of transformation that is arguably more significant than the introduction of the hyperlink. For the past twenty-five years, the fundamental contract of the web was navigational. Users queried a search engine, and...

Google Discover optimization – technical guide

We have moved from a query-based retrieval model to a predictive push architecture. In this new environment, Google Discover is no longer a secondary traffic source. It is a primary engine for organic growth. The rise of zero-click searches, which now account for...

Parasite SEO strategy for weak domains

The barrier to entry for new digital entities has reached unprecedented heights in this year. For professionals entering competitive verticals, such as SaaS or finance, the mathematical reality of ranking algorithms presents a formidable challenge....

Llms.txt guide for AI search optimization

The internet is currently undergoing a fundamental infrastructure shift driven by artificial intelligence. Webmasters and developers are facing a new challenge regarding how content is consumed by machines. Traditionally, we optimized websites for human eyes and...

Beyond the walled garden silo – true ROAS across platforms

Google says your campaign generated 150 sales. Amazon claims 200. Meta swears it drove 180. Add them up and you get 530 conversions. Check your actual revenue and you'll find you sold 250 units total.​ This is the walled garden nightmare every e-commerce marketer...

Data-driven CRO for PPC landing pages

In paid search campaigns, exceptional Quality Scores and high conversion rates don’t happen by accident—they’re the result of rigorous, data-driven optimization that blends user behavior insights with systematic testing. By combining visual tools like heatmaps and...

New YouTube Shorts campaign features in Google Ads

YouTube Shorts advertising has undergone significant transformation in 2025, introducing groundbreaking features that revolutionize how advertisers can target, optimize, and monetize short-form video content. The most notable advancement is the introduction...

The latest changes to Google Ads in 2025

Google Ads has undergone its most significant transformation in 2025, with artificial intelligence taking center stage in nearly every aspect of campaign management and optimization. The platform has evolved from a traditional keyword-based advertising system into a...

Jacek Białas

Holds a Master’s degree in Public Finance Administration and is an experienced SEO and SEM specialist with over eight years of professional practice. His expertise includes creating comprehensive digital marketing strategies, conducting SEO audits, managing Google Ads campaigns, content marketing, and technical website optimization. He has successfully supported businesses in Poland and international markets across diverse industries such as finance, technology, medicine, and iGaming.

The resurrection protocol of toxic expired domains

Nov 29, 2025 | SEO

The digital economy is littered with the remnants of abandoned web properties, often referred to in the cybersecurity sector as zombie domains. These are domain names that have expired, been dropped by their original registrants, and subsequently re-registered or hijacked by malicious actors. The primary objective of these actors is to exploit the residual trust, authority, and backlink equity previously accrued by the legitimate domain. By repurposing these assets, attackers inject spam content, ranging from counterfeit merchandise to illicit pharmaceuticals, into the search index. This practice fundamentally alters the semantic identity of the web property.

This report provides an exhaustive technical analysis of the lifecycle of a Zombie Domain, with a specific focus on the Japanese Keyword Hack (JKH). This prevalent form of SEO poisoning creates millions of spam pages containing auto-generated Japanese text, monetized through affiliate links. The analysis extends beyond mere cleanup; it addresses the complex challenge of “Entity Reconciliation” within Google’s Knowledge Graph. When a domain changes hands, the semantic web often retains the memory of the previous entity. Dissociating a new legitimate business from a compromised or reputationally damaged predecessor requires precise manipulation of structured data, strategic signaling to search algorithms, and rigorous server-side hardening.   

Furthermore, we examine the implications of these compromises in the era of Large Language Models (LLMs). As crawlers like GPTBot and CCBot ingest web content to train generative AI, the presence of spam on Zombie Domains introduces “data poisoning.” This report details the protocols for blocking these agents and cleansing the digital footprint to ensure that the domain does not contribute to the degradation of AI training sets.   

Anatomy of the Japanese Keyword Hack (JKH)

The Japanese Keyword Hack is a distinct classification of SEO spam where attackers gain unauthorized access to a web server, typically via vulnerabilities in Content Management Systems (CMS) like WordPress, and inject scripts that generate thousands of pages targeting Japanese search terms. These terms usually relate to “replica brands” or “fake merchandise.”

The infection vector and cloaking mechanisms

The sophistication of JKH lies in its ability to remain undetected by the site administrator while serving malicious content to search engine crawlers. This technique, known as “cloaking,” relies on server-side logic that differentiates requests based on the User-Agent string or IP address.

When a human user or the site administrator navigates to the homepage, the server returns the expected content. However, when a crawler such as Googlebot or Bingbot requests the site, the malicious scripts intercept the request and serve a page filled with Japanese characters and affiliate links. This bifurcation ensures that the hack can persist for weeks or months, allowing the spam pages to be indexed deeply before the site owner receives a manual action notification or observes a collapse in organic traffic.   

The technical implementation often involves modifying the .htaccess file on Apache servers or injecting PHP code into core CMS files such as header.php or index.php. The injected code typically checks the HTTP_USER_AGENT header. If the string matches a known bot, the script dynamically generates HTML content populated with keywords retrieved from a remote command-and-control (C2) server or a local hidden database file.   

File system compromise and directory structure

Forensic examination of infected servers reveals a predictable pattern of file system modification. Attackers frequently create random directory names located in the root directory or within public-facing folders like /wp-content/uploads/. Examples include directories such as /ltjmnjp/ or /341/. Inside these directories, one might find scripts that act as door generators.   

In many instances, the attackers do not create static HTML files for every spam page. Instead, they use a single PHP script and RewriteRules to map thousands of URL patterns to that script. For example, a request for example.com/japan-store-replica-watch is internally rewritten to spam_generator.php?keyword=replica-watch. This technique allows the attacker to generate an infinite number of pages without exhausting the server’s inode limit, although it places a significant load on the CPU and database.   

Character encoding and database injection

The JKH specifically utilizes Japanese characters (Kanji, Hiragana, and Katakana). This introduces specific challenges regarding character encoding. Malicious tables injected into the database often use utf8mb4 or Shift_JIS encodings to store the spam content. During remediation, standard ASCII grep searches may fail to identify these strings if the encoding is not explicitly handled.

The attackers may also create new user accounts with administrative privileges to maintain persistence. These accounts often mimic legitimate system names, such as wp_update_user or system_admin, to avoid detection during a cursory review of the user list.   

Forensic detection strategies

Detecting the presence of a Zombie Domain infection requires a multi-layered approach involving external scanning, server log analysis, and file system auditing.

External verification via search operators

The most immediate confirmation of a JKH infection comes from the search engine results page (SERP). Using the site: operator allows investigators to see what the search engine has indexed.

Command: site:example.com

If the results include pages with Japanese titles or descriptions on a site that should be English-only, the diagnosis is confirmed. Additionally, searching for common spam terms combined with the domain can reveal the extent of the infection.   

Command: site:example.com "japan" OR "replica" OR "cheap"

Server log analysis

Access logs provide the definitive record of the cloaking mechanism. Investigators should filter logs for requests made by known bot User-Agents and compare the response codes and content lengths to those of regular user traffic.

Indicator of Compromise (IoC) 1. Discrepancy in Response Size If requests from Googlebot for the homepage return a significantly different byte size than requests from Mozilla/5.0 (standard browser), cloaking is likely occurring.

Indicator of Compromise (IoC) 2. High Frequency of 404s or 200s on Nonsense URLs A sudden spike in traffic to nonexistent subdirectories, such as /a7b3/, returning 200 OK status codes indicates that the malicious script is successfully generating content for these paths.

Grep Command for Log Analysis:

This command isolates Googlebot requests returning a success status, extracts the requested URL path, counts unique occurrences, and lists the top hits. A high volume of random paths confirms the attack.   

File system auditing with regex

To locate the malicious files, forensic analysts employ grep with regular expressions designed to catch obfuscated code and specific spam signatures.

Pattern 1. Base64 decoding and eval Attackers often obfuscate their PHP code using base64_decode and eval. While legitimate plugins sometimes use these functions, their presence in core files or uploads directories is highly suspicious.

Command:

Bash

Pattern 2. Japanese character ranges Searching for the Unicode ranges of Japanese scripts can identify files containing the spam text.

  • Hiragana: 3040-309F
  • Katakana: 30A0-30FF
  • Kanji (Common): 4E00-9FBF

Command:

Bash

Note: The -P flag enables Perl-compatible regular expressions (PCRE), which is necessary for handling Unicode hex ranges in many grep implementations.

Pattern 3. Cloaking signatures Searching for code that checks for User-Agents is critical. Command:

Bash

Technical remediation and server hardening

Once the infection is mapped, the remediation phase involves surgical removal of malicious assets and the implementation of strict blocking rules.

Cleaning the file system and database

Step 1. Core File Replacement for CMS platforms like WordPress, the most reliable method to ensure core integrity is to replace all core system files (e.g., wp-admin, wp-includes) with fresh copies downloaded from the official repository.

Step 2. The wp-content Audit Tthe wp-content directory must be audited manually.

  • Plugins/themes – update everything. Reinstall active plugins from the source to overwrite infected files.
  • Uploads – executable files (PHP, Perl, Python) should typically not exist here.

Step 3. Database scrubbing attackers often inject spam posts directly into the wp_posts table.

  • SQL QuerySELECT * FROM wp_posts WHERE post_content REGEXP '[\\x3040-\\x309F]'; (Note: Syntax may vary based on MySQL/MariaDB version. For MySQL 8.0+, utilize full Unicode support).
  • User Check – review the wp_users table for unrecognized administrators and delete them immediately.

Strategic index cleanup – The 410 Gone protocol

When a resource is removed, the server can return a 404 Not Found or a 410 Gone status code. In the context of a massive spam infection, the distinction is critical.

  • 404 Not Found – tells Google the page is missing for now. The crawler may return later to verify, delaying de-indexing.
  • 410 Gone – explicitly tells the crawler the resource is permanently removed. Google engineers have stated that 410 accelerates the removal of URLs from the index.

For Zombie Domains recovering from JKH, we must configure the server to return 410 Gone for all identified spam patterns to force a rapid purge of the index, while strictly blocking PHP execution in upload folders to prevent reinfection.

Server configuration (Nginx & Apache)

Nginx Configuration

1. Security – block PHP execution in uploads, this prevents uploaded backdoors from running.

Nginx

2. SEO – force 410 Gone for spam patterns, nlock requests containing Japanese characters or known spam directories.

Nginx

Apache .htaccess configuration

1. Security – disable PHP in uploads, create a .htaccess file inside /wp-content/uploads/ with the following content:

Apache

2. SEO – force 410 Gone for Spam Patterns Place this in the root .htaccess file.

Apache

Post-remediation protocol – Accelerating recovery

Cleaning the server is only the first phase. The second phase is “rehabilitation” and that means convincing search engines that the domain is no longer a threat. Without these steps, a technically clean site may remain penalized or de-indexed for months.

The “Death Sitemap” strategy

Waiting for Googlebot to naturally recrawl millions of spam URLs can take an eternity. To accelerate this, you must force the crawler to visit the spam URLs so it encounters the 410 Gone status codes immediately.

You can gnerate a temporary XML sitemap containing a list of the spam URLs you can extract these from your access logs or from index statistics in Google Search Console.

Submit this “spam sitemap” to Google Search Console.

Googlebot prioritizes crawling these links, hits the 410 wall, and drops them from the index rapidly.

Once the index count drops, delete this sitemap.

Manual action reconciliation

Check GSC for a “Manual Action” notification e.g., “Hacked Site”.

  • If present – you must file a reconsideration request. Be concise and technical. Explicitly state: “We have removed the malware, updated the CMS, rotated all credentials, and implemented 410 headers for the spam resources.”
  • If absent – the penalty is likely algorithmic. No request is needed, the focus should be on the technical signals (410s) described above.

Credential rotation and session termination

Attackers often leave “keys” behind. It is imperative to invalidate all access points.

  • Database – change the database user password in your hosting panel and update wp-config.php.
  • Force logout – for WordPress, update the authentication keys and salts in wp-config.php. This immediately invalidates all active login cookies, forcing every user (including potential attackers with stolen sessions) to log in again.
  • User audit – verify that no hidden administrative accounts exist.

Perimeter Defense (WAF)

The bots that infected the site will likely return to check if the backdoor has been reopened.

  • Implement a Web Application Firewall (WAF) like Cloudflare or a host-based solution like Wordfence.
  • Configure the WAF to block aggressive crawling from unknown user agents to reduce server load during the recovery phase.
Share News on