Alert Properties | Alert Description |
---|---|
Alert Name | Indexing allowed by robots.txt but restricted by noindex attribute |
Code | P17 |
Description | Found Web Pages allowed to be indexed by robots.txt but restricted by noindex attribute in tag of the Pages |
Level | Warning |
If you want search engines like Google to disallow pages from featuring in their search results, you can use the noindex meta tag although it crawls the page with its bots. If you want to completely block the search engines from any activity on the page including indexing of the page, you should use the Disallow
rule in robots.txt file.
robots.txt
User-agent: *
Disallow: /products/
Ideally, you should not block the search engine to index the pages, as it will drop those pages and associated links from search results. Before adding the noindex page, be clear of your purpose in blocking the page.
Learn about noindex here: Block search indexing with ‘noindex’
If you use a robots.txt file on your website, you can tell Google not to crawl a page. However, if Google finds a link to your page on another site, with descriptive text, the link could still generate search results. If you have included a “noindex” tag on the page, Google won’t see it, because Google would be restricted from seen in search engine results by the robots.txt file that is blocking it! Therefore, you should let Google crawl the page and see the “noindex” tag or header. It sounds counterintuitive, but you need to let Google try to fetch the page and fail (because of password protection) and then check the “noindex” tag to ensure it’s omitted from its search engine results.
This does not apply to images; for images, robots.txt is the correct way to block images from search results.