1. Home
  2. Use Cases
  3. Use Log Hero to Find and Fix HTTP Error Codes

Use Log Hero to Find and Fix HTTP Error Codes

Finding and fixing HTTP error codes such as 404s is painstaking and complicated, right? Wrong! Log Hero enables you to find problem URLs by piping server log file data straight into Google Analytics.

Log Hero allows you to see HTTP error codes right alongside your Google Analytics metrics. You can see HTTP errors codes at a URL level, meaning finding those problem pages is super easy. Not only that but by combining this with Log Hero’s other custom dimensions, you can also discover whether search engine crawlers are encountering 404s or redirects which are draining your crawl budget–the amount of time Google allocates to crawling your site.Overivew of HTTP Custom Dimension

Overview of the most critical HTTP error codes

When a client communicates with a server, status codes indicate the success (or failure) of the particular request. Server log files store this critical information. If anything goes wrong, log files keep a record of this. Usually, this is called a HTTP error code.

(If you are just getting started with log files check out our beginners’ guide to log file analysis.)

Here is a  list of some of the most common HTTP error codes.

Each code is grouped by the first digit as follows:

  • 2xx indicates success
  • 3xx denotes redirects
  • 4xx signifies client errors
  • 5xx shows server errors

HTTP Status Code – 200 OK

The request has succeeded.

HTTP Status Code – 301 Moved Permanently

The requested resource has been assigned a new permanent URI. This error code is the classic redirect.

HTTP Status Code – 307 Temporary Redirect

The requested resource resides temporarily under a different URI.

HTTP Status Code – 403 Forbidden

The server understood the request, but access was denied. Usually, the client does not have the correct permissions.

HTTP Status Code – 404 Not Found

The server has not found anything matching the URL request. This is one of the most common error codes.

404-LogHero-Screen

HTTP Status Code – 500 Internal Server Error

The server encountered an unexpected condition which prevented it from fulfilling the request.

HTTP Status Code – 503 Service Unavailable

Your web server is unable to handle your HTTP request at the time. This can be due to server overload, crash or malicious attack.

How to Find and Fix Typical Errors in Log Hero

Now, it’s time to see how you can fix these errors in Log Hero.

Fixing 4xxs with Log Hero

Recall that 4xx errors signify that a resource could not be served. Typically, these are 404s where the resource could not be found or 403s where access was forbidden. In general, you’ll want to keep these errors to an absolute minimum.

For human visitors, it’s a bad user experience if they try to access a page and they cannot. Worse still, if many visitors bounce from your site, it could be sending search engines like Google negative metrics as you are not meeting user intention. Too many of these and other pages on your site could begin to suffer. All this because of a few lousy 404s.

For bots like Googlebot, it makes it difficult to crawl and index your site correctly. Again, to ensure your website is correctly indexed, it’s essential to make it crawler-friendly.

To find your 4xx status codes, you can either use this custom report by adding it to your Log Hero property in Google Analytics or click on the advanced filter section next to the magnifying glass and use the following filter settings:

GA_AdvanceFilter_4xx

Add the custom report to the correct view within your Log Hero property.

Add_CustomReport_Screen

Once you have selected the correct view, click Create. Under CUSTOMIZATION > Custom Reports, you can now view your new custom report. Click on the 4xx Redirect Source report.

CustomReports_Overview
This custom report allows you to see all 4xx errors (mainly 404s, followed by 403s) on a URL basis. Note this report is ordered by the number of hits, i.e. how many times this error occurred, and also includes Previous Page Path.

404_CustomReport_Results

Including Previous Page Path is very useful because it shows you from which page the error occurred. For example, result number 2 shows you the “/” character for Previous Page Path which indicates the homepage. Under Page you can see that someone tried to access a .png image from the homepage, but this image was missing, resulting in the 404. Now you have three main courses of action:

  1. The resource is old or depreciated, and you don’t want people to access it. In this case, you should remove all links to this resource, in this example from the home page.
  2. The resource has been moved somewhere else. In this case, you should consider adding a redirect to the new URL. Check out the redirects section below for more details.
  3. The resource hasn’t moved, and you want people to access it. Then you need to re-upload the resource and check that it can be accessed.

Following these steps can significantly reduce the number of 404s your site serves.

Note that in this example many of the 404s are caused by image files that seem to be missing. CSS, image and javascript data is not available in every integration, so please read this resource for an overview of what information you can access.

403s are usually to do with a permission error. The server does not allow a user to access a specific page.

For WordPress users, this can be caused by a plugin error, or a faulty .htaccess file or a file permissions error. If you find a 403 error, you recommend getting in touch with WordPress support with a screenshot of the error.  For users who have full access to their .htaccess files, check that the correct permissions are set for the resource serving 403s.

How many 404s do you serve to Google’s bots?

This is an important question, but one that can be quite difficult to answer. To determine whether you are serving errors codes to bots, and if there are any patterns, typically you would need to check the user agent string to see if it matches that of a Googlebot.

This has quite a few drawbacks. Firstly, even if you find a user agent string, you don’t know if it is a fake Googlebot or not. Up to 25% of Googlebots are users who fake the user agent string. It is also very time consuming to do this manually because you need to match each URL and user agent to an error code.

Log Hero allows you to do this in a couple of clicks. Using custom dimensions Is Bot and Bot Name. These dimensions can automatically detect if a user visiting your site was a human or a bot. Not only that, but they can also identify whether the bot was genuine or fake.

To benefit from these custom dimensions, you’ll need to upgrade to a paid plan.

Let’s make a copy of the report you just created. Enter Custom Reports and click Copy from the drop-down menu.

CopyReport

Give the report a name and add the custom dimensions Is Bot and Bot Name.

Create_Copy_Custom_Report

Next, you can add a filter using regex. As you copied the custom report, this is already set up for you. (Here’s a handy resource for regex in Google Analytics for further reading).

Custom_Report_404s_Add_Filter

By default, Google will sort Is Bot by results that are true, i.e. that is a bot. Click Save and let’s view the results.

Custom_Report_Bot_404s

In the bottom right corner you can see that total number of results: in this case 65. This means there were 65 times when a bot encountered a 4xx error code. By comparing this to the total number of results in the first 4xx error report (around 1800), you can calculate the percentage of errors served to bots, about 3%. Therefore, the more significant problem is the 4xx codes served to humans. However, if you are serving a large percentage of errors to crawlers, then this can hurt your SEO and needs to be managed.

In the column Bot Name, you can see the name of the bot. This is granular and can even identify different types of bots, for example, mobile Googlebot. This information makes it easy to resolve 404 errors on a granular level.

Note that Unknown Bot signifies a fake Googlebot (or fake bot from another search engine).

These 4xx error codes can be resolved using the steps outlined above. This will keep the crawlers happy.

Phew. Next up, redirects.

Fixing Redirects with Log Hero

Redirects, typically 301s, are when one URL automatically redirects a user to another. Google is actually ok with 301s, but in general, it is best practice to keep them to a minimum. The reason for this is that they negatively impact your crawl budget and your SEO. If a Googlebot is redirected, then it typically spends a little less time crawling and indexing your site which is not good. Also, link juice from backlinks that get redirected usually is less powerful (85% is the figure commonly estimated) than direct backlinks. This loss of link juice can also influence your page ranking.

Redirect chains

A redirect chain should be avoided at all cost. A redirect chain is simply more than one redirect linked together:

Page A > Page B > Page C

Common reasons for redirect chains_

  • Built over time–old pages were updated and redirected, and this process was repeated without correcting the redirect logic.
  • Site migrations–the website was migrated, and during the process, multiple redirects occurred.
  • HTTP vs HTTPS–often sites have HTTP and HTTPS version. Redirect chains can occur when redirecting from one to the other, and then to another page.

More than one redirect can result in even more link equity loss which can negatively impact page rankings. Googlebot may have problems crawling your site, and it could also result in slightly slower site speed.

To get started finding redirects with Log Hero you can use this custom report or search using the advanced filter as above; you only need to replace the 4 with a 3.

This custom report is set up in the same way as the 4xx custom report, from left to right: Previous Page Path, Page, HTTP Code and finally Hits.

3xx_Custom_Report_Results

This report returns all 3xxx error codes on a URL basis ordered by volume. Firstly, it is essential to check whether the redirect is strictly necessary. Do you need the old page? Or can you delete it?

Next, it is important to check that this redirect is not a chain. By checking Previous Page Path you can investigate where the redirect occurred. For every redirect chain you discover, you should follow the steps below to resolve it:

  • If you want to keep all pages in the redirect chain: Redirect all old pages directly to the newer page. Page A > B or Page B > C (But not: Page A > Page B > Page C)
  • If you want to keep only two pages in the chain: Delete the oldest asset and use a 301 redirect to the newer page Page B > C
  • You don’t need to keep any page in the redirect chain except the newest version: Then you can delete all pages that redirect to the newer page.

Great! So this goes a long way to keeping humans and Googlebots happier. But how do you know which is which?

As in the 4xx error code example, Log Hero’s custom dimensions allow users to detect search engine crawlers automatically. To benefit from these custom dimensions, you’ll need to upgrade to a paid plan.

Let’s take another look at our example. Create a custom report precisely as above. This time we will copy the 3xx redirect report.

Give the report a name and add the custom dimensions Is Bot and Bot Name.

Make sure the filter is set to “^3” to pick up all the 3xx error codes. Save the report to view the results.

Custom_Report_3xx_Errors

Looking at the results, we can see that we have 168 hits for redirect errors served to crawlers. By comparing this number to the first result, we can calculate that this is around 36% of all redirects! This is a significant amount and could be draining valuable crawl budget.

Custom_Bot_3xx_Report_Results

Interestingly, we can also see that some of the redirects are for Googlebot mobile. Note that the results for redirects and 4xx errors were different. Log Hero allows us to zoom in on the most critical issues. In this case, we could check that the redirects on the mobile version of the site are correct.

Ok, you’ve made it this far. We only have one type of error left to go: the 500 errors.

Dealing with 5xx Errors

Most 5xx errors are server-side errors. In this post, we will only look at 500s and 503s, which represent internal server errors and service unavailable errors. Here, it is important to note that if you are using a web hosting solution such as WordPress, these errors are likely to occur due to an error on their side, and they will need to resolve it for you. If you are running or have access to your own server, then you may need to fix the error yourself.

The most common reasons for 5xx errors are:

  • A Permissions Error. In most cases, a 500 Internal Server Error results from an incorrect permission. This could be file or folder access or a PHP script.
  • A PHP Timeout. If a PHP script accesses resources and those resources time out, this can cause a 500 error.
  • A Coding Error in .htaccess. An error with the .htaccess file is not very common but can cause 5xx error.

To view your 5xx errors using Log Hero, add HTTP Code as a secondary dimension. Then click on the advanced filter and filter for begins with 5.

Filter_5xx_errors

In our results we only have one 500 error occurring on one page which is related to a WordPress comments PHP script.

5xx_Error_Results

If you find these errors and you are using a web hosting solution, Log Hero enables you to report them quickly and easily. Web hosting support teams should be able to help you resolve them.

Upgrade Your Log Hero Plan Today

This post covered the basics for detecting the most common HTTP errors automatically with Log Hero. Log Hero makes it quick and painless to fix HTTP error codes and identify problem URLs. Protect your search engine crawl budget and your page rankings using Log Hero’s custom dimensions. Benefit from the full array of Log Hero features by upgrading to a paid plan today.

If you have any further questions about Log Hero get in touch with us.

Updated on September 13, 2018

Was this article helpful?

Related Articles

Add A Comment