Skip to content

SEO · · 3 min read

How Google finds and indexes your pages — sitemaps and robots explained simply

Before Google can show your page, it has to find and understand it. We explain crawling, indexing, sitemaps and robots.txt in plain English — no jargon.

By Mediseo

You can have the best page in your industry, but if Google can't find it, it doesn't exist in search. Before anything can rank, Google first has to discover the page, read it and decide to include it. Here's how that happens — without the jargon.

Two steps: finding and understanding

Google does the work in two stages, and it helps to keep them apart:

  • Crawling is when Google "visits" your page. An automated robot follows links from page to page, much like a reader clicking onwards.
  • Indexing is when Google reads the content, works out what the page is about, and files it in the enormous catalogue it searches through.

A page can be crawled without being indexed. That means Google has seen it but chosen not to include it — often because the content is thin, a copy of another page, or blocked.

What a sitemap is

A sitemap is simply a list of all the pages you want Google to know about. It lives as a file on your website, usually at yourdomain.com/sitemap.xml, and works like a map you hand to Google.

You don't need a sitemap for Google to find you — it follows links regardless. But a sitemap helps in two cases:

  • When your site is new and has few links pointing to it.
  • When it's large, so some pages are buried deep and hard to reach.

Most website tools build a sitemap automatically. If you use WordPress, Shopify or a similar system, it usually already exists.

What robots.txt is

robots.txt is a small text file that lives at yourdomain.com/robots.txt. It tells search robots which parts of the site they should not visit — a login page or a shopping basket, for example.

This is where it's easy to make an expensive mistake. A single line in this file, if it's wrong, can ask Google to leave your entire website alone. It happens more often than you'd think, typically after a relaunch where a temporary block was forgotten.

The rule of thumb: don't touch robots.txt unless you know exactly what each line does. And if your traffic suddenly disappears, this file is the first thing to check.

Search Console is your control panel

Google offers a free tool called Search Console. This is where you actually see what Google has done with your pages. With it you can:

  • See which pages are indexed and which aren't.
  • Submit your sitemap.
  • Ask Google to look at a new page straight away, rather than wait.
  • Get alerts when something is wrong.

If you do only one technical thing for SEO, make it setting up Search Console. It's free, it takes a short sitting, and it gives you visibility you otherwise don't have.

Common reasons a page doesn't appear

When a page is missing from search, it's usually one of these:

  • It's too new, and Google hasn't got round to indexing it yet.
  • It's accidentally blocked by robots.txt or a "noindex" setting.
  • The content is too thin or too similar to another page.
  • No links point to it, so Google never finds it.

Most of these are simple to fix once you know where to look.

How to check your own site

You can test a lot yourself, completely free:

  1. Search Google for site:yourdomain.com. This shows roughly which pages Google has included.
  2. Open yourdomain.com/robots.txt in your browser and check it isn't blocking anything important.
  3. Log in to Search Console and look for pages marked as not indexed.

If the numbers look wrong, you've just found something worth fixing.

It's the foundation under everything else

Indexing isn't the most exciting part of SEO, but it comes first. Images, content and internal linking mean little if Google never sees the pages. Get this right and you've laid the groundwork everything else builds on.

Most of this you can check yourself in an afternoon. If you'd rather have someone walk through it with you, have a chat with us.

What we can do for you and your business.

Tell us briefly what you need help with — a new website, more visibility on Google, or just a once-over. We get back within a working day, usually with something concrete.