Crawl budget SEO — the limited attention Googlebot allocates to your site per visit — matters most for large German websites (10,000+ indexable pages). Below that threshold, Google crawls everything; above it, Google selectively crawls and you control where the budget goes. Log file analysis reveals what Googlebot actually crawls (vs. what you want it to crawl) and surfaces optimization opportunities most SEO tools miss.
This guide walks through what crawl budget SEO and log file analysis actually require in 2026: when to care, optimization tactics, tools for log file analysis, common findings on German sites, and prioritization framework.
What is crawl budget?
Two concepts together:
Crawl rate limit
How much Googlebot’s crawl could impact your server. Slower server = lower crawl rate.
Crawl demand
How interested Google is in crawling your site. More popular + frequently updated sites get higher crawl demand.
Combined effect
Googlebot allocates limited time + requests per visit. Crawl budget = how many pages get crawled per visit.
When does crawl budget matter?
Doesn’t matter much for
- Sites under 1,000 indexable pages — Google crawls everything
- Sites under 10,000 with low duplicate/thin content
- Most small business websites
Matters for
- Sites with 10,000+ indexable pages
- E-commerce with large catalogs + parameter URLs
- Sites with significant duplicate / thin content
- Sites recently migrating with lots of new URLs
Critical for
- Sites with 100,000+ pages
- Marketplace / directory sites
- Large news / publishing sites
- Faceted-navigation-heavy e-commerce
For most German Mittelstand businesses: crawl budget is not a concern. For large German e-commerce, marketplaces, or content sites: critical.
What does log file analysis reveal?
Server log files record every request, including those from Googlebot. Analysis surfaces:
Which pages Googlebot crawls
Reality vs. expectations. Surprises common.
Crawl frequency per page type
Top-traffic pages crawled often. Long-tail pages crawled rarely. Reveal allocation.
Wasted crawl on low-value URLs
Faceted nav combinations, parameter URLs, session IDs all consume budget.
404s and redirect chains
Googlebot hitting 404s wastes budget. Redirect chains consume multiple requests.
Crawl response times
Slow pages = less crawling. Performance affects crawl budget.
Mobile vs desktop crawl
Mobile-first indexing means mobile crawl dominates. Track separately.
What tools analyze log files?
Screaming Frog Log File Analyser
Desktop tool. Imports log files, analyzes Googlebot activity. ~€100/year.
Sistrix Logfile
Cloud-based. Visual + actionable. Subscription tier.
JetOctopus
Cloud-based log file analyzer + SEO crawler. Mid-tier pricing.
Botify
Enterprise log file + crawl tool. €€€€.
Custom analysis (grep + Excel)
For DIY: extract Googlebot user agent rows + analyze in Excel / Google Sheets. Free.
For most large German sites: Screaming Frog Log File Analyser is the right starting point.
How do you access log files in Germany?
From your hosting provider
- Hetzner Cloud: SSH to server, /var/log/nginx/ or /var/log/apache2/
- Mittwald: admin panel access
- AWS: CloudWatch Logs or S3-stored logs
- Cloudflare: Logpush to your S3/storage
What you need
- Server access logs (not just app logs)
- Format: combined log format ideal
- At least 30 days of logs for analysis
- 90+ days better for pattern detection
Privacy considerations
Logs contain IPs. Treat as personal data for DSGVO. Anonymize before analysis if sharing with third parties.
What does crawl budget optimization involve?
Five tactics:
1. Reduce crawl on low-value URLs
Block via robots.txt or noindex:
- Parameter URLs (?utm_source=, ?sort=)
- Tag archive pages
- Search result pages
- Test/staging URLs
- Old paginated archives
2. Eliminate duplicate content
Canonical tags consolidate. Avoid generating duplicate URLs in the first place.
3. Fix 404s
Either redirect (if URL has equity) or remove from sitemap. Don’t let Googlebot hit dead ends.
4. Shorten redirect chains
Page A → Page B → Page C → Final = wastes budget. Should be Page A → Final.
5. Improve page speed
Faster pages = more pages crawled per visit. See our Shopify speed optimization guide for general performance principles.
What patterns waste crawl budget?
Six common patterns:
Faceted navigation indexed
Filter combinations creating millions of URLs. “All red size M cotton shirts under €50 in stock” as unique URL. Crawl waste.
Session IDs in URLs
?sessionid=abc123 appended to every URL. Each gets crawled as unique.
Tracking parameter URLs
?utm_source=facebook&utm_medium=cpc&utm_campaign=spring URLs indexed.
Test pages forgotten
/test-page-do-not-index/ indexed by accident.
Calendar pages going to year 2099
Some calendar apps generate URLs through year 2099. Googlebot crawls them all.
Archive pages too deep
Blog archive going back 15 years. Bottom-of-pile archives crawled but never indexed.
How do you prioritize fixes from log analysis?
A priority framework:
1: Critical (week 1-2)
- Block obviously-wasteful URL patterns via robots.txt
- Fix 404 errors that Googlebot hits frequently
- Resolve canonical conflicts on important pages
2: High-impact (week 2-4)
- Eliminate redirect chains
- Reduce parameter URL bloat
- Fix mobile-vs-desktop crawl difference
3: Optimization (week 4-8)
- Page speed improvements where Googlebot is slow
- Structural changes to reduce wasted crawl
4: Strategic (month 2-3)
- Architectural reorganization for crawl efficiency
- Tagging strategy that doesn’t bloat URL space
For broader technical SEO see our technical SEO audit Germany guide.
What does Googlebot actually crawl on German sites?
Common patterns observed in log analysis:
High crawl
- Homepage (multiple times per day)
- Top-traffic pages
- Recently published content
- Sitemap-listed URLs
Medium crawl
- Older but-still-trafficked pages
- Category / collection pages
- Top-level navigation pages
Low crawl
- Deep / older blog posts
- Long-tail product pages
- Pages with few internal links
Variable
- New pages — crawl spike on publication, then settles
If your high-value pages get low crawl: something’s wrong. Investigate internal linking + sitemap inclusion.
What’s the relationship between crawl + indexation?
Three states:
Crawled but not indexed
Google saw the page, decided not to index it. Often due to: thin content, duplicate content, low value perceived.
Crawled + indexed
Page in Google’s index. Eligible to rank.
Not crawled
Google hasn’t visited the page. Either: no internal links pointing to it, blocked by robots.txt, or crawl budget exhausted.
For large sites: pursue “all important pages crawled + indexed” as goal.
What about Search Console crawl data?
Search Console provides limited crawl data:
Crawl stats report
Settings → Crawl stats. Aggregate Googlebot activity. Not URL-level detail.
Index coverage report
Pages → which are indexed vs. not indexed + why.
URL Inspection tool
Per-URL: when Googlebot last crawled, indexation status.
For URL-level + frequency-level detail: need log files. Search Console alone insufficient for crawl budget work.
Frequently asked questions
Limited Googlebot attention per site visit.
Above 10,000 indexable pages; critical above 100,000.
Search Console crawl stats plus log file analysis.
Screaming Frog Log File Analyser, Sistrix, JetOctopus, Botify.
Via hosting provider, Cloudflare Logpush, or AWS CloudWatch.
Faceted nav, session IDs, tracking params, 404s, redirect chains.
Block waste; fix 404s; eliminate redirect chains; optimize speed.
No; below 10,000 pages Google crawls everything.
Need help with crawl budget analysis?
If your German site is 10,000+ pages and you want a 30-minute scoping conversation about log file analysis + crawl optimization, book a meeting or send details via our contact page.