Crawlability & Indexing: Make Sure Google Can Find Your Pages
Crawling vs indexing: what's the difference?
Crawling is when search engine bots visit your pages and read the HTML. Indexing is when they decide to store your page in their database for potential ranking.
A page can be crawled but not indexed (Google decides it's low quality or duplicate). A page that's not crawled can never be indexed. This is why crawlability is the foundation — if Google can't reach your pages, nothing else matters.
Robots.txt: controlling crawler access
The robots.txt file lives at your domain root (e.g., yoursite.com/robots.txt) and tells bots which pages they can and cannot crawl.
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Sitemap: https://yoursite.com/sitemap.xml
Common mistakes:
- Accidentally blocking CSS/JS files — this prevents Google from rendering your page
- Blocking entire sections you want indexed
- Using
Disallow: /which blocks everything (sometimes left from staging) - Forgetting to reference your sitemap
Tip
Use Google Search Console's URL Inspection tool to check if a specific page is blocked by robots.txt.
XML sitemaps
An XML sitemap is a file listing all the important pages on your site. It helps search engines discover pages they might miss through normal crawling.
- Include only pages you want indexed (no noindex pages, no redirects, no 404s)
- Add
<lastmod>dates to help Google prioritize recently updated content - Keep sitemaps under 50,000 URLs and 50MB (use sitemap index for larger sites)
- Submit your sitemap in Google Search Console
- Reference it in your robots.txt
Common indexing issues
- Noindex tag left on — Pages with
<meta name="robots" content="noindex">won't be indexed. Check after migrating from staging. - Orphan pages — Pages not linked from anywhere on your site are hard for crawlers to find.
- Redirect chains — Multiple redirects (A→B→C→D) waste crawl budget. Link directly to the final URL.
- Soft 404s — Pages that return a 200 status but display error content confuse Google.
- JavaScript rendering issues — If critical content is only rendered via JavaScript, Google may miss it. Use SSR or SSG when possible.
Frequently Asked Questions
Related Articles
Check how your website performs in this area
Get Your Growth Score