If you’ve ever managed SEO for a large website, you already know how painful it can be to manually check whether all your important URLs are indexed in Google. For smaller sites, running the occasional “site:” search works fine. But when you’re handling hundreds or even thousands of URLs across different projects, it becomes nearly impossible to stay on top of indexing status without automation.

Indexation is one of the most fundamental SEO signals. If your page isn’t indexed, it simply cannot rank—no matter how well-optimized the content might be. That’s why agencies, in-house SEO teams, and site owners constantly monitor index coverage. Traditionally, you would rely on tools like Google Search Console or perform manual “site:domain.com/page” checks, but these methods are limited. They require repetitive work, don’t provide historical tracking, and fail to scale with the size of modern content operations

This is exactly the problem we set out to solve with an automated n8n workflow. Instead of manually verifying each URL, the workflow periodically runs checks, simulates human browsing with randomized delays and browser fingerprints, fetches Google’s response for each URL, and then determines whether the page is indexed. Most importantly, it automatically logs these results into Google Sheets, so you have a living record of every URL’s status.

By combining n8n’s powerful automation nodes with simple JavaScript functions, we can replicate the manual process of checking indexation, but in a way that is repeatable, scalable, and resilient against being blocked by Google. This means you can focus on fixing indexation issues rather than wasting hours trying to find them.

In this step-by-step guide, I’ll walk you through the exact workflow I built in n8n to automate Google indexing checks. You’ll see how we:

  • Trigger the workflow automatically every 8 hours.
  • Maintain a clean list of target URLs.
  • Generate Google “site:” queries dynamically.
  • Loop through each URL one by one with random delays.
  • Rotate User-Agent headers to mimic real users.
  • Parse the search results to determine if a page is indexed.
  • Log everything neatly in Google Sheets for analysis.

By the end of this tutorial, you’ll have a reliable system that can monitor indexing status for hundreds of URLs without lifting a finger.

Step 1: Setting Up the Trigger

The first building block of this workflow is the trigger. Without it, you’d have to manually start the process every time, which defeats the purpose of automation. Instead, we want our indexing checks to run on a schedule—quietly in the background—so that our Google Sheet is always up to date with the latest results.

In n8n, this is handled by the Cron Trigger node. The Cron node is incredibly flexible; it lets you schedule workflows to run at fixed intervals, specific times of day, or even just once on a certain date. For our indexing checker, we set it to run every 8 hours daily. This ensures that if Google’s index changes—whether pages get added or drop out—we’ll detect it within a reasonable timeframe without constantly bombarding Google with requests.

Here’s how the Cron node was configured:

  • Trigger Type: Every X hours
  • Value: 8
  • Time Zone: Your server or project’s preferred time zone (important for consistency

This means the workflow automatically fires three times per day (every 8 hours). That frequency is a sweet spot: it’s frequent enough to catch indexation changes quickly, but not so frequent that it risks unnecessary load or triggering anti-bot defenses.

Why periodic scheduling matters

Indexing is not static. A URL that is indexed today might be deindexed tomorrow due to crawl budget issues, canonical conflicts, noindex tags, or algorithmic changes. By running checks automatically, you create a historical log of indexation health, which is incredibly valuable for diagnosing long-term SEO problems. For example:

  • You can spot patterns (e.g., blog posts drop out of the index after 30 days).
  • You can catch sudden deindexing (e.g., a technical change causing mass removals).
  • You can provide clients or stakeholders with transparent reporting (“Here’s the day Google dropped 40 of our URLs”).

Once this node is in place, the workflow doesn’t need any manual input to start. It becomes a hands-free system that wakes itself up every 8 hours, prepares the next batch of URLs, and checks their status.

Step 2: Defining the URL List

With the trigger in place, the next step is to tell our workflow which URLs we want to monitor. In n8n, the easiest way to do this is with the Set Node. This node lets us create and store static values that can be passed down to other nodes in the workflow. In our case, we’re going to define a list of URLs that need their indexing status checked.

Inside the Set Node, I created a new field called urlList. Under the Value section, I entered all the URLs I want to track, separated by commas. For example:

https://www.vesbhusha.com/types-of-uniforms-in-the-hotel-industry/, https://www.vesbhusha.com/common-mistakes-to-avoid-when-choosing-corporate-uniforms/, https://www.vesbhusha.com/importance-of-uniforms-in-branding/

When this node executes, the output is a JSON object containing that single field:

{ “urlList”: “https://www.vesbhusha.com/types-of-uniforms-in-the-hotel-industry/, https://www.vesbhusha.com/common-mistakes-to-avoid-when-choosing-corporate-uniforms/, https://www.vesbhusha.com/importance-of-uniforms-in-branding/” }

Why use the Set Node?

  1. Flexibility: You can easily update this list without touching any code.
  2. Scalability: If you have different workflows for different sites, each one can maintain its own list.
  3. Simplicity: This avoids hardcoding URLs into scripts. Everything is centralized and clean.

Step 3: Preparing the Google Search Query

Once we have a list of URLs defined in the Set Node, the next step is to prepare them in a format that can actually be queried on Google. If we were to paste all of them directly into Google at once, the workflow would quickly fail. That’s because Google needs one query at a time to return clear results, and sending multiple URLs in bulk will either break the process or return mixed results that are hard to analyze.

To solve this, we use a Function Node in n8n that runs a short JavaScript script. The purpose of this node is simple: take the comma-separated list of URLs we created earlier and split them into individual items. Each item is then transformed into a properly formatted Google search query using the site: operator. This operator is one of the most useful tricks for SEOs—it allows us to check if a specific page exists in Google’s index. For example, searching site:example.com/page will either show the indexed page or display a message that no results were found.

The Function Node code does three things:

  1. Splits the urlList into individual URLs.
  2. Cleans each URL by trimming whitespace
  3. Creates a JSON object for each URL with three fields:
    • searchUrl → the Google query with site:.
    • originalUrl → the actual page we are testing.
    • index → the position of the URL in the list (for tracking).

By doing this, the workflow ensures that each URL is ready to be processed one by one in the upcoming loop. The output of this node is a structured list where every item contains both the original URL and its corresponding Google search query.

This is a crucial step because it transforms our simple static list into actionable search requests that Google can understand. Without this preparation, the following nodes would not know how to handle multiple URLs in a systematic and reliable way.

Step 4: Looping Over Items Safely

Now that we have a clean list of prepared Google search queries, the next step is to make sure they are executed one by one in a controlled manner. This is where the Loop Over Items node in n8n comes in. By default, n8n can process multiple items in parallel, but when we’re dealing with Google requests, parallel execution can easily cause issues. Too many simultaneous requests look suspicious to Google, and that may trigger captchas or even temporary blocks.

To avoid this, we configure the workflow to process only one URL at a time. In the Loop Over Items node, we set the Batch Size to 1. This ensures that each Google query is executed individually, giving us full control over the sequence of events. With this setup, each URL will go through the next steps in isolation: randomized User-Agent assignment, timed delay, request execution, and result parsing.

This design also helps in debugging. If something goes wrong with a specific URL, it is much easier to track the error when URLs are processed sequentially rather than in parallel. For large websites with hundreds of URLs, sequential looping might take longer, but it’s safer and much more reliable.

Another benefit of using the Loop node is scalability. Whether you are testing 10 URLs or 1,000, the logic remains the same—one at a time, steady, and predictable. You can even extend this logic with conditional branches. For example, you could skip certain URLs under specific conditions or route failed URLs into a separate log for review.

In short, the Loop Over Items node ensures that our indexing checker behaves responsibly, reduces the risk of being flagged by Google, and maintains a professional-grade level of reliability. Without this step, the workflow could overwhelm Google’s systems or produce incomplete results.

Step 5: Adding Random User-Agent and Delay

One of the key challenges when automating Google queries is avoiding detection. If every request comes from the same “browser fingerprint” and occurs at the same interval, Google can quickly recognize the activity as automated. This might result in temporary IP blocks, captchas, or incomplete search results. To make our workflow behave more like a human user, we need to introduce randomness in two critical areas: the browser identity and the timing of requests.

This is handled through a Function Node in n8n. Inside the node, we define an array of common User-Agent strings representing different browsers and devices. This includes Chrome on Windows, Firefox on Linux, Safari on iOS, Edge on macOS, and more. For each URL request, the workflow randomly selects one of these User-Agents and attaches it to the HTTP header. As a result, one request may look like it came from a Windows desktop, while the next might appear as if it was sent from an iPhone.

In addition to rotating User-Agents, the code also generates a random delay between 12 and 30 seconds. This ensures that requests are not sent at fixed intervals, which would otherwise look suspicious. Instead, the workflow pauses unpredictably, much like how a real person might pause while browsing.

This step is critical because it adds a layer of authenticity to the automation. Without it, even the best-designed workflow could be flagged by Google after only a few runs. With User-Agent rotation and randomized delays, however, the workflow remains stable, resilient, and safe to run multiple times a day without issue.

This combination of tactics—browser rotation and natural timing—transforms the workflow from a simple script into a robust SEO tool that can operate at scale without tripping Google’s defenses.

Step 6: Implementing the Wait Node

Once the workflow has selected a random User-Agent and generated a randomized delay, we need to actually apply that pause before sending the request to Google. This is where the Wait Node in n8n comes into play.

The Wait Node is designed to temporarily hold the workflow until certain conditions are met. In our case, the condition is simply time. Instead of sending requests instantly back-to-back, the Wait Node forces each iteration of the loop to pause for a number of seconds. By using the delay value generated in the previous step, the workflow introduces realistic timing between queries.

This delay is absolutely crucial. Think of how a human behaves when searching Google: nobody enters ten different “site:” searches per second. There are always pauses—typing the query, reading results, clicking through. By inserting these pauses, our automation mimics real-world behavior and avoids looking like a bot.

In the Wait Node, the configuration is set to Resume After Time Interval. The input value is dynamic, meaning we don’t hardcode “20 seconds” or “30 seconds.” Instead, the node reads {{$json.delay}} directly from the output of the Random Agent step. That means every iteration can have a different delay—sometimes 18 seconds, sometimes 25, sometimes 30. This unpredictability is what makes the workflow resilient.

Without this step, even rotating User-Agents would not be enough. Google could still detect an unnatural pattern if requests were being sent at perfectly regular intervals. With the Wait Node in place, every request becomes staggered and inconsistent—just like genuine human searches.

By the end of this step, each URL in your list is ready to be queried in a safe and natural way. The workflow now behaves almost indistinguishably from a person manually performing “site:” searches throughout the day. This makes the entire system both effective and sustainable for long-term use.


// Wait Google Search Node
// Uses dynamic delay value from previous step
// Example: delay = 20 seconds

{
  "searchUrl": "https://www.google.com/search?q=site:https://www.vesbhusha.com/types-of-uniforms-in-the-hotel-industry/",
  "originalUrl": "https://www.vesbhusha.com/types-of-uniforms-in-the-hotel-industry/",
  "userAgent": "Mozilla/5.0 (iPad; CPU OS 17_2 like Mac OS X)...Safari/604.1",
  "delay": 20
}

Step 7: Executing the Google Search

With delays and User-Agent rotation in place, the workflow is now ready to perform the actual search on Google. This is handled using the HTTP Request Node in n8n. The purpose of this node is straightforward: it sends a GET request to Google with the prepared site: query, and captures the raw HTML response.

The most important part of this step is ensuring that Google receives the request in a way that looks natural. To do this, we configure the HTTP Request node with the following key settings:

  • Request Method: GET (we are only fetching results, not posting anything).
  • URL: Dynamically mapped from the prepared queries, e.g., {{$json[“searchUrl”]}}.
  • Authentication: None (Google Search is public, no authentication required).
  • Response Format: String, so the node stores the entire HTML page as text.
  • Property Name: data, which means the response will be available in that property.
  • Headers: A randomized User-Agent value injected from the previous step, so every request looks like it’s coming from a different browser/device.

For example, if your URL is:

https://www.google.com/search?q=site:https://www.vesbhusha.com/types-of-uniforms-in-the-hotel-industry/

The HTTP Request node fetches the full HTML of the search results page for that query. This raw HTML doesn’t look readable at first, but it contains the clues we need to determine whether the page is indexed or not.

By capturing the Google search result this way, we have complete control. We are no longer relying on external SEO tools or even the Search Console API. Instead, we’re directly replicating what a human SEO would do—search for a URL using the site: operator—but in an automated, scalable way.

This step is the gateway between preparation and analysis. Once the HTML is captured, the next node can parse it and decide whether the target URL is indexed.


// HTTP Request Node Configuration
Request Method: GET
URL: {{$json["searchUrl"]}}
Response Format: String
Property Name: data
Headers:
  User-Agent: {{$json.userAgent}}

Step 8: Parsing Results and Checking Indexing Status

At this stage, we already have the raw HTML from Google Search for each site: query. The next challenge is to interpret that response and decide if the page is indexed or not. Google doesn’t provide a simple “yes/no” answer in the search result HTML—it has to be inferred from the content.

In our workflow, we solve this problem using a Function Node with JavaScript. The logic is simple but effective: search the returned HTML for a specific phrase that Google displays when no results are found. Typically, this message looks like “did not match any documents”. If this phrase is present in the response, it means Google couldn’t find the URL—so the page is not indexed. If the phrase is missing, the page is assumed to be indexed.

The function performs three main tasks:

  1. Extract the HTML: It reads the body field from the HTTP Request node output, which contains the full search result page in text form.
  2. Check for the phrase: It uses html.includes(“did not match any documents”) to see if the error message is present.
    • If true → mark as isIndexed: false.
    • If false → mark as isIndexed: true.
  3. Return clean output: It generates a simplified JSON object with two fields:
    • searchUrl → the original URL being checked.
    • isIndexed → a boolean (true or false).

This makes the workflow extremely efficient. Instead of dealing with messy HTML or regex parsing, we rely on a single reliable text check. Of course, this logic can be extended later to handle edge cases or localized Google messages, but for most scenarios, it works perfectly.

The output from this node is now structured, human-readable, and ready to be logged. With every URL clearly marked as Indexed or Not Indexed, we can confidently pass the data into our reporting system—in this case, Google Sheets.


const html = $json["body"];
const found = typeof html === "string" && html.includes("did not match any documents") ? false : true;

const itemIndex = $runIndex;
const allPreviousData = $node["Wait Google Search"].json;

return [{
  json: {
    searchUrl: allPreviousData.originalUrl || 'TEST_URL',
    isIndexed: found,
  }
}];

Step 9: Logging Results in Google Sheets

Once the indexing status has been determined for each URL, the final step is to save these results somewhere we can easily monitor them over time. For this workflow, we use the Google Sheets node in n8n. Sheets is a great choice because it provides an accessible, shareable, and visual way to track indexation changes without requiring any additional infrastructure.

In the Google Sheets node, we configure the workflow to either append a new row or update an existing row depending on whether the URL already exists in the sheet. This makes the system flexible: if a URL is being checked for the first time, it’s added as a new entry. If it’s already there, its isIndexed status simply updates with the latest result.

Typical Configuration:

  • Credentials: Connect your Google account to n8n.
  • Resource: Sheet within your Google document.
  • Operation: “Append or Update Row.”
  • Document: The target Google Sheet (for example, n8nsheet).
  • Sheet: The sheet/tab inside the document (also named n8nsheet).
  • Column to Match On: url (so that rows update based on URL).
  • Mapping Mode: Manual mapping, where you map the originalUrl field to url and the is Indexed field to a column like status.
URL isIndexed Last Checked
https://www.vesbhusha.com/types-of-uniforms-in-the-hotel-industry/ true 2025-09-11
https://www.vesbhusha.com/common-mistakes-to-avoid-when-choosing-corporate-uniforms/ false 2025-09-11

The beauty of this approach is that it provides long-term visibility into indexation trends. Over time, you can see which URLs frequently drop out of Google’s index, which remain stable, and how changes to your site may affect crawling and indexing. This data can also be exported, filtered, or visualized in Google Data Studio for deeper reporting.

By storing results in Google Sheets, we make the workflow not just automated, but also actionable. The information is always there, organized, and ready for SEO decision-making

Step 10: Final Node (No Operation)

After processing each URL, applying delays, running Google searches, parsing the results, and saving everything into Google Sheets, the workflow needs a clear stopping point. This is where the No Operation (Do Nothing) node comes in.

At first glance, this step may seem unnecessary because it doesn’t actively perform any task. However, in practice, it serves an important role in keeping the workflow clean and organized. The No Operation node acts as a placeholder that signals the official end of the process. This makes the entire sequence easier to read and maintain, especially in larger workflows where many branches of logic may converge.

Why use a No Operation node?

  1. Clarity: It visually marks the end of your indexing workflow, making it clear to you or any team member where the process concludes.
  2. Flexibility: You can later replace or extend this node with new functionality—for example, sending a Slack message, an email notification, or a webhook to another system summarizing the results.

Error Handling: In some workflows, ending abruptly on a Google Sheets update could leave things unclear if you later debug issues. Having a dedicated endpoint node keeps the workflow structured.

Conclusion

Managing SEO for large websites can be overwhelming, especially when it comes to monitoring whether hundreds or even thousands of URLs remain indexed in Google. Manually checking these with “site:” searches is time-consuming, inconsistent, and simply not practical at scale. That’s why automation is no longer a luxury—it’s a necessity.

By leveraging the power of n8n, we built a fully automated workflow that performs all the heavy lifting for you. From triggering the workflow on a schedule, splitting URLs into manageable queries, introducing random delays and User-Agent rotation, fetching results with HTTP requests, parsing responses for indexing status, and finally logging everything neatly into Google Sheets—this workflow covers the entire lifecycle of indexation monitoring.

The beauty of this setup is its simplicity and scalability. Once configured, it runs in the background, updating your Sheets automatically with fresh data every few hours. You’ll always have a historical record of indexation trends that can help diagnose problems early, spot sudden drops, or confirm improvements after implementing SEO changes.

More importantly, the workflow is flexible. The No Operation node at the end is a deliberate design choice—it allows you to extend the system whenever you’re ready. For example, you could add Slack alerts for URLs that remain unindexed after several runs, email reports for clients, or even connect it to a database or BI dashboard.

This isn’t just about saving time (though it will save hours each week). It’s about creating a reliable, repeatable, and scalable process for one of SEO’s most fundamental needs: making sure your content is actually findable in Google.

If you’re an agency, this workflow can be a game-changer for client reporting. If you’re an in-house SEO, it can give you peace of mind knowing nothing slips through the cracks