• Marketing Land
  • Sections
    • CMO
    • Social
    • SEM
    • SEO
    • Analytics
    • Display
    • Retail
    • MarTech
    • Resources
    • More
    • Home
  • Marketing Land
  • CMO
  • Social
  • SEM
  • SEO
  • Analytics
  • Display
  • Retail
  • MarTech
  • Resources
  • More
  • SUBSCRIBE

Marketing Land

Marketing Land
  • CMO
  • Social
  • SEM
  • SEO
  • Analytics
  • Display
  • Retail
  • MarTech
  • Resources
  • More
  • Home
  • Newsletters
  • Home
SEO

The ultimate guide to bot herding and spider wrangling — Part Two

Next up in a series on bots and why crawl budgets are important, Columnist Stephan Spencer explains how to direct the engine bots to what's important on your site and how to avoid common coding issues.

Stephan Spencer on May 4, 2018 at 11:01 am
  • More

In Part One of our three-part series, we learned what bots are and why crawl budgets are important. Let’s take a look at how to let the search engines know what’s important and some common coding issues.

How to let search engines know what’s important

When a bot crawls your site, there are a number of cues that direct it through your files.

Like humans, bots follow links to get a sense of the information on your site. But they’re also looking through your code and directories for specific files, tags and elements. Let’s take a look at a number of these elements.

Robots.txt

The first thing a bot will look for on your site is your robots.txt file.

For complex sites, a robots.txt file is essential. For smaller sites with just a handful of pages, a robots.txt file may not be necessary — without it, search engine bots will simply crawl everything on your site.

There are two main ways you can guide bots using your robots.txt file.

1. First, you can use the “disallow” directive. This will instruct bots to ignore specific uniform resource locators (URLs), files, file extensions, or even whole sections of your site:

User-agent: Googlebot
Disallow: /example/

Although the disallow directive will stop bots from crawling particular parts of your site (therefore saving on crawl budget), it will not necessarily stop pages from being indexed and showing up in search results, such as can be seen here:

The cryptic and unhelpful “no information is available for this page” message is not something that you’ll want to see in your search listings.

The above example came about because of this disallow directive in census.gov/robots.txt:

User-agent: Googlebot
Crawl-delay: 3

Disallow: /cgi-bin/

2. Another way is to use the noindex directive. Noindexing a certain page or file will not stop it from being crawled, however, it will stop it from being indexed (or remove it from the index). This robots.txt directive is unofficially supported by Google, and is not supported at all by Bing (so be sure to have a User-agent: * set of disallows for Bingbot and other bots other than Googlebot):

User-agent: Googlebot
Noindex: /example/
User-agent: *
Disallow: /example/

Obviously, since these pages are stil…

[Read the full article on Search Engine Land.]


Opinions expressed in this article are those of the guest author and not necessarily Marketing Land. Staff authors are listed here.



About The Author

Stephan Spencer
Stephan Spencer is the creator of the 3-day immersive SEO seminar Traffic Control; an author of the O’Reilly books The Art of SEO, Google Power Search, and Social eCommerce; founder of the SEO agency Netconcepts (acquired in 2010); inventor of the SEO proxy technology GravityStream; and the host of two podcast shows The Optimized Geek and Marketing Speak.

Related Topics

Channel: SEOGoogle: MarketingGoogle: SEOSearch MarketingSearch Marketing ColumnSEO -- Search Engine Optimization

We're listening.

Have something to say about this article? Share it with us on Facebook, Twitter or our LinkedIn Group.

Get the daily newsletter digital marketers rely on.
See terms.

ATTEND OUR EVENTS

MarTech 2021: March 16-17

MarTech 2021: Sept. 14-15

MarTech 2020: Watch On-Demand

×

Attend MarTech - Click Here


Learn More About Our MarTech Events

April 13, 2021: SMX Create

May 18-19, 2021: SMX London

June 8-9, 2021: SMX Paris

June 15-16, 2021: SMX Advanced

June 21-22, 2021: SMX Advanced Europe

August 17, 2021: SMX Convert

November 9-10, 2021: SMX Next

December 14, 2021: SMX Code

Available On-Demand: SMX

Available On-Demand: SMX Report

×


Learn More About Our SMX Events

White Papers

  • State of Email Marketing 2021 Report
  • Three Pillars of CRM Data Management
  • What Customer Experience Means in 2021
  • The 7 Phases of a Website Redesign
  • Rearchitecting Revenue: Accelerating Demand Through Data
See More Whitepapers

Webinars

  • Crawl Your Way Towards Better Search Results With Dynamic Rendering
  • The AI Revolution Is Coming to Every Stage of Your Buyer’s Journey
  • The Fundamentals of Link Building for E-Commerce & Affiliate Sites in 2021
See More Webinars

Research Reports

  • Local Marketing Solutions for Multi-Location Businesses
  • Enterprise Digital Asset Management Platforms
  • Identity Resolution Platforms
  • Customer Data Platforms
  • B2B Marketing Automation Platforms
  • Call Analytics Platforms
See More Research

Attend SMX For Only $99

h
Receive daily marketing news & analysis.

Channels

  • MarTech
  • CMO
  • Social
  • SEM
  • SEO
  • Mobile
  • Analytics
  • Retail
  • Display

Our Events

  • MarTech
  • SMX

Resources

  • White Papers
  • Research
  • Webinars

About

  • About Us
  • Contact
  • Privacy
  • Marketing Opportunities
  • Staff

Follow Us

  • Facebook
  • Twitter
  • LinkedIn
  • Newsletters
  • RSS
  • Youtube

© 2021 Third Door Media, Inc. All rights reserved.