Estimated reading time: 5 minutes
It sounds like something out of a sci-fi movie, but the robots.txt file is a valuable component to every e-commerce website.
Briefly, this is a text-only file created by e-store owners that helps search bots to crawl the content on their pages. In essence, the items within the file dictate the areas of a website that bots are allowed (or not allowed) to crawl.
This guide covers everything you need to know about robots.txt files and how they can help optimize your website for search engines.
An Overview of Robots.txt Files
The role of web robots, particularly search engine robots, is to discover the type of content contained on a website or web page so that they are able to serve up that information in relevant searches. The more they know about your website, the better they can match your content to user queries.
The robots.txt file is designed to provide these bots with directives on how to crawl the content on your website. Also known as the Robots Exclusion Protocol, this file is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned.
Though designed for search bots, the file can be accessed by human visitors, too. However, unless they’re looking for it your shoppers will never see it, nor will it change their browsing experience.
Within the file, robots can see how to treat specific links (dofollow or nofollow), whether or not to crawl a web page, and even create different crawling directions for different user agents (e.g. Google,MSN bot, etc).
Robots.txt Use Cases
Within robots.txt files are Robots Meta Directives, which are the pieces of code that tell robots how to crawl the content. These tags are especially valuable for E-commerce stores that may have duplicate content on their web pages, though this is less of a concern today than it was several years ago.
There is a downside to this method, however. If you’re blocking URLs to avoid crawling duplicate content, you may be blocking pages that others are backlinking to, which can thwart any SEO benefits of those backlinks.
To skirt this issue, you can use the assign a Meta Robots tag of “noindex/follow”, which looks like this:
<meta name=”robots” content=”noindex,follow” />
It signals to the robot not to index the page, but to still follow the links within the page to add to your SEO value.
Another (and better) option is to include a canonical link tag, which looks like this:
<link rel=”canonical” href=”http://(main content page)” />
This tag is treated similarly to 301 redirects but has a different impact on the user experience. Using the canonical link tag, users continue their browsing experience along the same path. For example, if someone is looking for golf shoes in a sporting goods category, they’d remain in the sporting goods category rather than being directed to the shoes category.
On the flip side, a 301 redirect may take the same shopper to the shoes category, even though the shopper started searching in the sporting goods category. Even though it’s essentially the same product, the shopper ends up in two different areas of the website while browsing.
How to Create a Robots.txt File for Your E-Commerce Store
Your store’s robots.txt should go in the top level directory of your website.
To create it, you’ll need to understand the correct language, directory names, and any specific bots that will need to follow special instructions. There’s less work involved if you want all bots to follow the same protocol.
Let’s look at a few definitions and symbols you’ll need to know:
- User Agent – Also known as the specific web crawler you’re giving direction to. For example, you may want to give Google different permissions than MSN.
- Allow – This one only applies to Googlebot. It allows a subfolder or subpage to be crawled, even if the parent folder or page was disallowed.
- Disallow – This term restricts bots from crawling a particular URL.
- Crawl Delay – This tells bots how many seconds to wait before crawling a page. Google doesn’t abide by Crawl Delay, but you can set crawl speeds in your Google Admin console.
- Asterisk (*) – This is a wildcard of sorts to prevent your robots.txt file from getting overly complex. The asterisk is usually used to indicate ALL.
- Dollar Sign ($) – This refers to all URLs that end in a specific sequence. For example, you might want to exclude all URLs that end in .gif, which would look like this: Disallow: /*.gif$
Each rule begins with the User Agent, then indicates which items to Allow or Disallow. An example may look like this:
User Agent: *
In the above example, the Asterisk indicates that all bots are allowed to crawl the website. The Allow rule indicates that all pages are able to be crawled. There’s a Crawl Delay of 3 seconds. It also provides the location of sitemap.
Robots.txt files aren’t always this simple or straightforward. Every website has different requirements and specifications. You can find a complete list of robots.txt syntax here.
A few additional reminders when creating the robots.txt file:
- The file MUST be named robots.txt. No capitals, no spaces, no other changes.
- Not all bots will follow the instructions in the file. This is especially common for malware and email address scrapers.
- If you have a sitemap associated with your website, include it at the bottom of your robots.txt file for better crawling.
Are Robots.txt Files Mandatory for E-Commerce Stores?
You won’t be fined by the E-commerce police for not having a robots.txt file, but you could be suffocating your own potential by choosing not to have one.
While not mandatory, Glendale Designs always recommends creating a robots.txt file to give you a competitive advantage of sorts over other stores that may not have one. Because the files are designed to provide more information about your content, you gain an edge by having your content displayed in relevant searches.
Creating a robots.txt file isn’t exactly difficult, but it does require specialized knowledge and a unique syntax to properly include or exclude your content.
Our team at Glendale Designs is well versed in creating robots.txt files for our E-commerce clients. Reach out today to schedule a consultation to ensure your store is optimized for search engines to its highest potential.