anti-bot cloudflare akamai web-scraping reverse-engineering

Understanding Anti-Bot Protection: What Works in 2026

5 min read by Richard Feng

Anti-bot protection is an arms race. As someone who builds production scrapers that bypass these systems daily, here’s a practitioner’s view of the landscape — what the protections actually check and what legitimate bypass techniques look like.

The Detection Layers

Modern anti-bot systems operate in layers. Understanding these layers is the key to reliable bypass:

Layer 1: IP Reputation

The simplest check. Anti-bot services maintain databases of known datacenter IP ranges, VPN exits, and previously flagged IPs.

What they check:

  • Is this IP from AWS, GCP, Azure, or a known hosting provider?
  • Has this IP been flagged for bot activity before?
  • How many requests have come from this IP recently?

Counter-approach: Residential proxies from services like Apify Proxy or Bright Data provide IP addresses that belong to real ISPs, making them indistinguishable from regular users at the IP level.

Layer 2: TLS Fingerprinting

This is where it gets interesting. Every HTTP client has a unique TLS handshake signature based on:

  • Supported cipher suites and their order
  • TLS extensions and their order
  • Supported TLS versions
  • ALPN protocols

A standard axios or requests library has a TLS fingerprint that screams “bot” because it doesn’t match any real browser. Services like Akamai and Cloudflare maintain fingerprint databases for every browser version.

Counter-approach: Libraries like got-scraping (which my Shopify scraper uses) and specialized TLS clients can mimic browser-grade TLS fingerprints. My Sephora EU scraper uses browser-grade TLS fingerprinting to bypass Akamai WAF.

Layer 3: HTTP/2 Fingerprinting

Beyond TLS, HTTP/2 settings reveal the client type:

  • SETTINGS frame parameters (header table size, max concurrent streams)
  • WINDOW_UPDATE frame values
  • Priority tree structure
  • Header compression (HPACK) patterns

Each browser has characteristic HTTP/2 settings. Chrome, Firefox, and Safari all look different at this level.

Layer 4: JavaScript Challenges

Cloudflare’s “checking your browser” page and similar challenges execute JavaScript that:

  • Checks for browser APIs (canvas, WebGL, AudioContext)
  • Measures execution timing
  • Validates DOM properties
  • Sends challenge responses back to the server

Counter-approach: Headless browsers (Playwright, Puppeteer) execute these challenges natively. The key is ensuring your headless browser doesn’t leak automation signals (more on this below).

Layer 5: Behavioral Analysis

The most sophisticated layer. These systems analyze:

  • Mouse movement patterns (too linear = bot)
  • Scroll behavior (instant scroll to bottom = bot)
  • Time between actions (too consistent = bot)
  • Navigation patterns (going directly to product pages without browsing = suspicious)
  • Request cadence (perfectly uniform intervals = bot)

Protection Profiles: Know What You’re Facing

Cloudflare

Common on: Small to medium sites, blogs, APIs

Cloudflare offers several protection levels:

  • Basic — IP reputation + rate limiting. Datacenter proxies with rate respect usually work.
  • Managed Challenge — JavaScript challenge + turnstile. Needs browser or challenge solver.
  • Enterprise/Bot Management — Full behavioral analysis + fingerprinting. Needs residential proxy + proper fingerprinting.

Akamai Bot Manager

Common on: Enterprise e-commerce (Sephora EU, major retailers)

Akamai is one of the toughest to bypass because of:

  • Aggressive TLS fingerprinting
  • Sensor data collection via client-side JavaScript
  • Session-level behavioral analysis
  • Cookie integrity verification

My approach for Akamai: browser-grade TLS fingerprinting + guest token management + request pacing that mimics human browsing.

Datadome

Common on: E-commerce, ticketing

Datadome focuses on:

  • Device fingerprinting via JavaScript
  • CAPTCHA challenges for suspicious traffic
  • Real-time behavioral scoring

PerimeterX (now HUMAN)

Common on: Retail, financial services

Known for aggressive JavaScript challenges and behavioral analysis.

Legitimate Bypass Architecture

For production systems that need reliable, ongoing data extraction, here’s the architecture pattern I use:

1. API-First Approach

Before attempting to bypass any protection, check if there’s an API path that avoids the WAF entirely. Many protections only apply to browser-facing endpoints, not API routes.

My Sephora scraper converts every web URL to an API call. The API endpoints have lighter protection than the website because they’re designed for mobile apps.

2. Session Warming

Don’t jump straight to the data page. Build a realistic browsing session:

1
Visit homepage → Browse categories → View product listing → Access product detail

Each step builds session credibility. The anti-bot system sees a pattern that matches real user behavior.

3. Fingerprint Consistency

This is critical: your fingerprint must be internally consistent. If your TLS says “Chrome 120” but your User-Agent says “Chrome 118”, that’s a detection signal.

Align:

  • TLS fingerprint
  • HTTP/2 settings
  • User-Agent header
  • Accept-Language and other headers
  • JavaScript browser properties (if using headless)

4. Request Pacing

Real humans don’t make requests at precisely 1-second intervals. Introduce realistic variance:

  • Base delay between requests (2-5 seconds)
  • Random jitter (+/- 30%)
  • Longer pauses after navigation events
  • Occasional “idle” periods

5. Graceful Degradation

When you encounter a challenge or block:

  1. Don’t immediately retry — this confirms bot behavior
  2. Back off exponentially
  3. Rotate to a fresh session (new IP + new cookies)
  4. Try a different proxy region
  5. If persistent, switch to a browser-based approach

What Doesn’t Work (Anymore)

  • Just changing User-Agent — detection systems check dozens of signals, not just one header
  • Random delays alone — without proper fingerprinting, timing doesn’t help
  • Headless Chrome with default settings — automation signals leak everywhere (navigator.webdriver, missing plugins, Chrome DevTools Protocol artifacts)
  • Cookie replay — modern systems tie cookies to TLS fingerprints and IP ranges

Ethical Considerations

Anti-bot bypass is a tool. Like any tool, it can be used responsibly or irresponsibly.

Legitimate use cases:

  • Price comparison for consumer benefit
  • Market research with public data
  • Accessibility (making data available in structured formats)
  • Academic research
  • Quality assurance and monitoring

Always respect:

  • robots.txt directives
  • Rate limits (even if you can exceed them, don’t)
  • Personal data regulations (GDPR, CCPA)
  • Terms of service (understand the legal landscape in your jurisdiction)

All my tools are designed for legitimate data extraction with built-in rate limiting and proxy best practices.


Understanding anti-bot systems makes you a better scraping engineer. If you need production-grade scrapers that handle these challenges reliably, check out my tools or get in touch for custom work.

Richard Feng
Web scraping engineer with 12+ years of experience. Building production-grade data extraction tools.

Need This Data?

Check out our production-grade scraping tools or hire me for a custom solution.