
Did you know that the average website has 25% more pages than its owners realize? I've audited over 200 websites in my SEO career, and hidden pages - orphans, duplicates, and forgotten test pages - are one of the most common issues I see draining crawl budget and confusing search engines.
Finding all pages on a website isn't just a technical exercise. It's the foundation of effective SEO. Whether you're conducting a site audit, planning a redesign, or analyzing a competitor, knowing exactly what pages exist gives you the complete picture you need to make smart decisions.
In this guide, I'll walk you through every method I use to uncover pages - from quick Google searches to command-line tools that reveal what crawlers miss. You'll learn which approach works best for different scenarios and what to do once you've mapped your site's full inventory.
You might wonder if this matters for smaller sites. After all, don't you already know what's on your website? In my experience, the answer is almost always no - at least not completely.
According to Botify's research, search engines don't crawl up to 50% of pages on large websites. For mid-sized sites, that number drops but orphan pages and crawl inefficiencies remain common. Here's why a complete page inventory matters:
Orphan pages have no internal links pointing to them. They're essentially invisible to both users and search engines. I recently audited a site where 15% of their blog posts were orphaned - that's valuable content that wasn't ranking because Google couldn't find it through normal crawling.
When you identify orphans, you can integrate them into your site structure with strategic internal links, immediately improving their discoverability.
Sometimes you'll find pages ranking for keywords they weren't optimized for. These "odd rankers" represent quick wins. With minor content adjustments, you can align them with user intent and boost their positions.
Duplicate pages compete with each other for rankings, splitting link equity and confusing search engines about which version to show. A complete inventory reveals these duplicates so you can consolidate them with canonical tags or 301 redirects.
Planning a redesign without knowing your full site structure is like renovating a house without blueprints. When you map every page, you can plan URL migrations properly, preserve SEO equity, and design a navigation that actually reflects your content.
According to Ahrefs' analysis of 23,000 websites, proper internal linking is one of the most underutilized SEO techniques. You can't build an effective internal linking strategy without knowing what pages you have to link between.
This isn't just about your own site. Learning to find all pages on any website lets you analyze competitors' content strategies, identify gaps in your own coverage, and discover content opportunities they're targeting.
Let's start with the approaches that work for most situations. I use these daily when beginning any site audit.
The fastest way to see what Google has indexed is the site: operator. Here's how to use it:
site:example.com (replace with your domain)site:example.com blog to filter results
Limitation: This only shows indexed pages. Non-indexed pages, those blocked by robots.txt, or pages Google chose to ignore won't appear.
The robots.txt file reveals what the site owner wants to hide from crawlers. Access it at example.com/robots.txt.
Look for Disallow: directives - these point to directories and pages hidden from search engines. While you won't find every page this way, you'll discover sections that other methods might miss.
Sitemaps are designed to list all important pages. Check example.com/sitemap.xml or use SEOmator's Sitemap Finder to locate it.
Well-maintained sitemaps provide a clean list of pages the site owner wants indexed. However, I've found that sitemaps are often outdated or incomplete, so don't rely on them exclusively.
Tools like Screaming Frog or Lumar (formerly DeepCrawl) crawl websites the same way search engines do. They start from the homepage and follow every internal link, building a complete map of discoverable pages.
These tools are my go-to for comprehensive audits. They reveal:
Pro tip: Run a crawl AND import your sitemap into the same project. Comparing the two datasets instantly reveals orphan pages.
For sites you own or manage, Google Search Console provides the most authoritative view of how Google sees your site. Navigate to Pages under the Indexing section to see:
This data comes directly from Google's crawl, making it invaluable for understanding your site's actual search visibility.
If the site uses Google Analytics, check Reports > Engagement > Pages and screens to see which pages receive traffic. While this won't show zero-traffic pages, it reveals the pages that actually matter for user engagement.
When basic methods aren't enough, these advanced approaches help uncover pages that slip through standard crawls.
For technical users, wget provides powerful crawling capabilities. Here's the command I use:
wget --spider --no-parent -r https://example.com This recursively crawls the site and logs every URL it finds. The --spider flag means it won't download files, just record the links.
Requirements: wget comes pre-installed on Mac and Linux. Windows users can use WSL or install it via Chocolatey.
Server logs record every request to your website, including requests from search engine bots. Analyzing these logs reveals:
Tools like Screaming Frog Log File Analyser make parsing these logs manageable.
WordPress sites can leverage plugins for page discovery:
Most content management systems provide page listings. Check your CMS documentation - whether it's Webflow, Shopify, Squarespace, or another platform - for built-in site inventory features.
Comprehensive SEO platforms like SEOmator, Semrush, and Ahrefs include site crawlers that map your pages while simultaneously checking for SEO issues. These tools combine discovery with actionable insights.
Discovery is just the beginning. Here's my workflow for turning a page inventory into SEO improvements.
I categorize pages into four buckets:
Add internal links from relevant pages to your orphans. This passes link equity and helps search engines discover the content. Contextual links within body content work better than footer or sidebar links.
For duplicate content, decide which version should be canonical. Then either:
rel="canonical" tags pointing to the primary versionnoindex on pages that need to exist but shouldn't rankAccording to Backlinko's ranking factors study, content freshness influences rankings for time-sensitive queries. Update statistics, remove outdated information, and add current best practices.
With your complete page inventory, create a linking plan that:
Page inventory isn't a one-time task. Set quarterly reminders to re-crawl your site and compare results. New orphans, duplicates, and issues emerge as content grows.
Mastering page discovery transforms how you approach SEO. Instead of guessing what's on your site, you'll have complete visibility into your content inventory. That visibility powers smarter decisions about optimization, content strategy, and site architecture.
The tools and techniques in this guide work for sites of any size. Start with the quick methods, add advanced techniques as needed, and build page auditing into your regular SEO workflow. Your rankings - and your sanity - will thank you.
For most sites, quarterly audits work well. Large sites with frequent content updates benefit from monthly crawls. Set up automated crawl schedules in tools like Screaming Frog to catch issues early.
Crawl your site with an SEO spider tool, then import your XML sitemap. Any URLs in the sitemap that weren't discovered during the crawl are likely orphans - they exist but have no internal links pointing to them.
Yes, most methods work on any public website. Google's site: operator, sitemap checks, and third-party crawlers don't require site ownership. Server logs and CMS access obviously require permissions.
Compare your crawl data with Google Search Console's indexed pages report. URLs that exist on your site but don't appear in GSC either weren't crawled, were crawled but not indexed, or are blocked. GSC tells you exactly why each page isn't indexed.
The Art of SEO: Mastering Search Engine Optimization - Comprehensive guide covering crawling, indexation, and website architecture from Eric Enge, Stephan Spencer, and Jessie Stricchiola.
What is SEO? How Does It Work? - Beginner-friendly introduction to search engine optimization fundamentals.
Best CMS Platforms for SEO - Comparison of content management systems and their SEO capabilities.
Content Optimization: The Complete Guide - Learn how to optimize the pages you discover for better search rankings.
