HOME > P17 - Indexing Allowed By Robots.txt But Restricted by Noindex Attribute

P17 - Indexing Allowed By Robots.txt But Restricted by Noindex Attribute

Alert Properties	Alert Description
Alert Name	Indexing allowed by robots.txt but restricted by noindex attribute
Code	P17
Description	Found Web Pages allowed to be indexed by robots.txt but restricted by noindex attribute in tag of the Pages
Level	Warning

If you want search engines like Google to disallow pages from featuring in their search results, you can use the noindex meta tag although it crawls the page with its bots. If you want to completely block the search engines from any activity on the page including indexing of the page, you should use the Disallowrule in robots.txt file.

robots.txt

User-agent: *
Disallow: /products/

How to resolve the problem?

Ideally, you should not block the search engine to index the pages, as it will drop those pages and associated links from search results. Before adding the noindex page, be clear of your purpose in blocking the page.

Learn about noindex here: Block search indexing with ‘noindex’

Why can’t I use robots.txt to block my file from Google?

If you use a robots.txt file on your website, you can tell Google not to crawl a page. However, if Google finds a link to your page on another site, with descriptive text, the link could still generate search results. If you have included a “noindex” tag on the page, Google won’t see it, because Google would be restricted from seen in search engine results by the robots.txt file that is blocking it! Therefore, you should let Google crawl the page and see the “noindex” tag or header. It sounds counterintuitive, but you need to let Google try to fetch the page and fail (because of password protection) and then check the “noindex” tag to ensure it’s omitted from its search engine results.

This does not apply to images; for images, robots.txt is the correct way to block images from search results.