15 Crawlability Problems & How to Fix Them


Questioning why a few of your pages don’t present up in Google’s search outcomes?

Crawlability issues could possibly be the culprits.

On this information, we’ll cowl what crawlability issues are, how they have an effect on search engine marketing, and repair them.

Let’s get began.

What Are Crawlability Issues?

Crawlability issues are points that forestall serps from accessing your web site’s pages.

Search engines like google like Google use automated bots to learn and analyze your pages—that is known as crawling.

However these bots could encounter obstacles that hinder their potential to correctly entry your pages if there are crawlability issues.

Frequent crawlability issues embody:

  • Nofollow links (which inform Google to not observe the hyperlink or cross rating energy to that web page)
  • Redirect loops (when two pages redirect to one another to create an infinite loop)
  • Dangerous web site construction
  • Gradual web site pace

How Do Crawlability Points Have an effect on search engine marketing?

Crawlability issues can drastically have an effect on your search engine marketing sport. 


As a result of crawlability issues make it in order that some (or all) of your pages are virtually invisible to serps.

They’ll’t discover them. Which implies they will’t index them—i.e., save them in a database to show in related search outcomes.

infographic explaining "How search engines work"

This implies a possible lack of search engine (natural) visitors and conversions.

Your pages should be each crawlable and indexable to rank in serps.

15 Crawlability Issues & How you can Repair Them

1. Pages Blocked In Robots.txt

Search engines like google first take a look at your robots.txt file. This tells them which pages they need to and shouldn’t crawl.

In case your robots.txt file seems like this, it means your whole web site is blocked from crawling:

Consumer-agent: *
Disallow: /

Fixing this downside is straightforward. Simply change the “disallow” directive with “enable.” Which ought to allow serps to entry your whole web site.

Like this:

Consumer-agent: *
Enable: /

In different circumstances, solely sure pages or sections are blocked. As an example:

Consumer-agent: *
Disallow: /merchandise/

Right here, all of the pages within the “merchandise” subfolder are blocked from crawling. 

Clear up this downside by eradicating the subfolder or web page specified—serps ignore the empty “disallow” directive.

Consumer-agent: *

Or you would use the “enable” directive as an alternative of “disallow” to instruct serps to crawl your whole web site like we did earlier.

The nofollow tag tells serps to not crawl the hyperlinks on a webpage.

And the tag seems like this:

<meta title="robots" content material="nofollow">

If this tag is current in your pages, the opposite pages that they hyperlink to won’t get crawled. Which creates crawlability issues in your web site.

Test for nofollow hyperlinks like this with Semrush’s Site Audit device.

Open the device, enter your web site, and click on “Begin Audit.”

Site Audit tool with "Start audit" button highlighted

The “Web site Audit Settings” window will seem.

From right here, configure the essential settings and click on “Begin Web site Audit.”

“Site Audit Settings” window

As soon as the audit is full, navigate to the “Points” tab and seek for “nofollow.”

“Issues” tab with “nofollow” search

If nofollow hyperlinks are detected, click on “# outgoing inner hyperlinks comprise nofollow attribute” to view an inventory of pages which have a nofollow tag.

page with “902 outgoing internal links contain nofollow attribute”

Evaluation the pages and take away the nofollow tags in the event that they shouldn’t be there.

3. Dangerous Web site Structure

Site architecture is how your pages are organized throughout your web site. 

An excellent web site structure ensures each web page is just some clicks away from the homepage—and that there aren’t any orphan pages (i.e., pages with no internal links pointing to them). To assist serps simply entry all pages.

Site architecture infographic

However a foul web site web site structure can create crawlability points. 

Discover the instance web site construction depicted beneath. It has orphan pages.

"Orphan pages" infographic

As a result of there’s no linked path to them from the homepage, they could go unnoticed when serps crawl the positioning.

The answer is simple: Create a web site construction that logically organizes your pages in a hierarchy by means of inner hyperlinks.

Like this:

"SEO-friendly site architecture" infographic

Within the instance above, the homepage hyperlinks to class pages, which then hyperlink to particular person pages in your web site.

And this gives a transparent path for crawlers to seek out all of your vital pages.

Pages with out inner hyperlinks can create crawlability issues.

Search engines like google could have hassle discovering these pages.

So, determine your orphan pages. And add inner hyperlinks to them to keep away from crawlability points.

Discover orphan pages utilizing Semrush’s Site Audit device.

Configure the tool to run your first audit.

Then, go to the “Points” tab and seek for “orphan.”

You’ll see whether or not there are any orphan pages current in your web site.

“Issues” tab with “orphan” search

To unravel this downside, add inner hyperlinks to orphan pages from different related pages in your web site.

5. Dangerous Sitemap Administration

A sitemap gives an inventory of pages in your web site that you really want serps to crawl, index, and rank.

In case your sitemap excludes any pages you wish to be discovered, they may go unnoticed. And create crawlability points. A device similar to XML Sitemaps Generator may also help you embody all pages meant to be crawled.

Enter your web site URL, and the device will generate a sitemap for you robotically.

XML Sitemaps Generator search bar

Then, save the file as “sitemap.xml” and add it to the basis listing of your web site. 

For instance, in case your web site is www.instance.com, then your sitemap URL ought to be accessed at www.instance.com/sitemap.xml.

Lastly, submit your sitemap to Google in your Google Search Console account.

To try this, access your account.

Click on “Sitemaps” within the left-hand menu. Then, enter your sitemap URL and click on “Submit.”

"Add a new sitemap" in Google Search Console

6. ‘Noindex’ Tags

A “noindex” meta robots tag instructs serps to not index a web page.

And the tag seems like this:

<meta title="robots" content material="noindex">

Though the noindex tag is meant to manage indexing, it may possibly create crawlability points when you go away it in your pages for a very long time.

Google treats long-term “noindex” tags as nofollow tags, as confirmed by Google’s John Mueller.

Over time, Google will cease crawling the hyperlinks on these pages altogether.

So, in case your pages aren’t getting crawled, long-term noindex tags could possibly be the wrongdoer.

Establish these pages utilizing Semrush’s Site Audit device.

Set up a project within the device to run your first crawl.

As soon as it’s full, head over to the “Points” tab and seek for “noindex.”

The device will checklist pages in your web site with a “noindex” tag.

“Issues” tab with “noindex” search

Evaluation these pages and take away the “noindex” tag the place acceptable.

7. Gradual Web site Pace

When search engine bots go to your web site, they’ve restricted time and sources to commit to crawling—generally known as a crawl budget

Gradual web site pace means it takes longer for pages to load. And reduces the variety of pages bots can crawl inside that crawl session. 

Which implies vital pages could possibly be excluded.

Work to resolve this downside by enhancing your total web site efficiency and pace.

Begin with our information to page speed optimization.

Inside broken links are hyperlinks that time to useless pages in your web site. 

They return a 404 error like this:

example of “404 error” page

Damaged hyperlinks can have a big affect on web site crawlability. As a result of they forestall search engine bots from accessing the linked pages.

To search out damaged hyperlinks in your web site, use the Site Audit device.

Navigate to the “Points” tab and seek for “damaged.”

“Issues” tab with “broken” search

Subsequent, click on “# inner hyperlinks are damaged.” And also you’ll see a report itemizing all of your damaged hyperlinks.

report listing “4 internal links are broken”

To repair these damaged hyperlinks, substitute a unique hyperlink, restore the lacking web page, or add a 301 redirect to a different related web page in your web site.

9. Server-Aspect Errors

Server-side errors (like 500 HTTP status codes) disrupt the crawling course of as a result of they imply the server could not fulfill the request. Which makes it troublesome for bots to crawl your web site’s content material. 

Semrush’s Site Audit device may also help you remedy for server-side errors.

Seek for “5xx” within the “Points” tab.

“Issues” tab with “5xx” in the search bar

If errors are current, click on “# pages returned a 5XX standing code” to view a whole checklist of affected pages.

Then, ship this checklist to your developer to configure the server correctly.

10. Redirect Loops

A redirect loop is when one web page redirects to a different, which then redirects again to the unique web page. And kinds a steady loop.

"What is a redirect loop" infographic

Redirect loops forestall search engine bots from reaching a closing vacation spot by trapping them in an limitless cycle of redirects between two (or extra) pages. Which wastes essential crawl finances time that could possibly be spent on vital pages. 

Clear up this by figuring out and fixing redirect loops in your web site with the Site Audit device.

Seek for “redirect” within the “Points” tab. 

“Issues” tab with “redirect” search

The device will show redirect loops. And provide recommendation on deal with them while you click on “Why and repair it.”

results show redirect loops with advice on how to fix them

11. Entry Restrictions

Pages with entry restrictions (like these behind login kinds or paywalls) can forestall search engine bots from crawling them.

Consequently, these pages could not seem in search outcomes, limiting their visibility to customers.

It is sensible to have sure pages restricted. 

For instance, membership-based web sites or subscription platforms typically have restricted pages which can be accessible solely to paying members or registered customers.

This permits the positioning to offer unique content material, particular affords, or customized experiences. To create a way of worth and incentivize customers to subscribe or turn into members.

But when important parts of your web site are restricted, that’s a crawlability mistake.

So, assess the necessity for restricted entry for every web page and hold them on pages that really require them. Take away restrictions on people who don’t.

12. URL Parameters

URL parameters (also called question strings) are components of a URL that assist with monitoring and group and observe a query mark (?). Like instance.com/footwear?shade=blue

And so they can considerably affect your web site’s crawlability.


URL parameters can create an nearly infinite variety of URL variations.

You’ve most likely seen that on ecommerce class pages. Once you apply filters (dimension, shade, model, and so forth.), the URL typically adjustments to replicate these choices.

And in case your web site has a big catalog, abruptly you’ve got hundreds and even hundreds of thousands of URLs throughout your web site.

In the event that they aren’t managed effectively, Google will waste the crawl finances on the parameterized URLs. Which can end in a few of your different vital pages not being crawled.

So, you have to determine which URL parameters are useful for search and ought to be crawled. Which you are able to do by understanding whether or not persons are looking for the precise content material the web page generates when a parameter is utilized.

For instance, folks typically like to go looking by the colour they’re searching for when buying on-line. 

For instance, “black footwear.” 

Keyword Overview tool's dashboard showing metrics for "black shoes"

This implies the “shade” parameter is useful. And a URL like instance.com/footwear?shade=black ought to be crawled.

However some parameters aren’t useful for search and shouldn’t be crawled.

For instance, the “ranking” parameter that filters the merchandise by their buyer scores. Similar to instance.com/footwear?ranking=5.

Virtually no one searches for footwear by the client ranking. 

Keyword Overview tool's dashboard for "5 start rated shoes" shows no results

Meaning you must forestall URLs that aren’t useful for search from being crawled. Both through the use of a robots.txt file or utilizing the nofollow tag for inner hyperlinks to these parameterized URLs.

Doing so will guarantee your crawl finances is being spent effectively. And on the precise pages.

13. JavaScript Sources Blocked in Robots.txt

Many trendy web sites are constructed utilizing JavaScript (a preferred programming language). And that code is contained in .js information. 

However blocking entry to those .js information by way of robots.txt can inadvertently create crawlability points. Particularly when you block important JavaScript information.

For instance, when you block a JavaScript file that hundreds the principle content material of a web page, the crawlers could not be capable to see that content material.

So, assessment your robots.txt file to make sure that you’re not blocking something vital. 

Or use Semrush’s Site Audit device. 

Go to the “Points” tab and seek for “blocked.” 

If points are detected, click on on the blue hyperlinks.

Issues with blocked internal and external resources in robots.txt found in Site Audit tool

And also you’ll see the precise sources which can be blocked.

A list of blocked resources in Site Audit tool

At this level, it’s greatest to get assist out of your developer.

They’ll let you know which JavaScript information are essential in your web site’s performance and content material visibility. And shouldn’t be blocked.

14. Duplicate Content material

Duplicate content material refers to an identical or almost an identical content material that seems on a number of pages throughout your web site.

For instance, think about you publish a weblog publish in your web site. And that publish is accessible by way of a number of URLs:

  • instance.com/weblog/your-post
  • instance.com/information/your-post
  • instance/articles/your-post

Regardless that the content material is similar, the URLs are completely different. And serps will purpose to crawl all of them.

This wastes crawl finances that could possibly be higher spent on different vital pages in your web site. Use Semrush’s Site Audit to determine and remove these issues. 

Go to the “Points” tab and seek for “duplicate content material.” And also you’ll see whether or not there are any errors detected. 

4 pages with duplicate content issues found in Site Audit

Click on the “# pages have duplicate content material points” hyperlink to see an inventory of all of the affected pages. 

A list of pages that have duplicate content issues

If the duplicates are errors, redirect these pages to the principle URL that you just wish to hold.

If the duplicates are needed (like when you’ve deliberately positioned the identical content material in a number of sections to deal with completely different audiences), you may implement canonical tags. Which assist serps determine the principle web page you wish to be listed. 

15. Poor Cell Expertise

Google makes use of mobile-first indexing. This implies they take a look at the cellular model of your web site over the desktop model when crawling and indexing your web site.

In case your web site takes a very long time to load on cellular gadgets, it may possibly have an effect on your crawlability. And Google could have to allocate extra time and sources to crawl your whole web site.

Plus, in case your web site isn’t responsive—which means it doesn’t adapt to completely different display screen sizes or work as supposed on cellular gadgets—Google could discover it more durable to know your content material and entry different pages.

So, assessment your web site to see the way it works on cellular. And discover slow-loading pages in your web site with Semrush’s Site Audit device.

Navigate to the “Points” tab and seek for “pace.” 

The device will present the error when you have affected pages. And provide recommendation on enhance their pace.

An example of why and how to fix a slow page load speed issue

Keep Forward of Crawlability Points

Crawlability issues aren’t a one-time factor. Even when you remedy them now, they may recur sooner or later. Particularly when you have a big web site that undergoes frequent adjustments.

That is why commonly monitoring your web site’s crawlability is so vital.

With our Site Audit device, you may carry out automated checks in your web site’s crawlability.

Simply navigate to the audit settings in your web site and activate weekly audits.

Schedule weekly audits under "Site Audit Settings" window

Now, you don’t have to fret about lacking any crawability points.


Leave a Reply

Your email address will not be published. Required fields are marked *