Broken Link Checker for WordPress

Sometimes, links get broken. A page is deleted, a subdirectory forgotten, a site moved to a different domain. Most likely many of your blog posts contain links. It is almost inevitable that over time some of them will lead to a “404 Not Found” error page. Obviously you don’t want your readers to be annoyed by clicking a link that leads nowhere. You can check the links yourself but that might be quite a task if you have a lot of posts. You could use your webserver’s stats but that only works for local links.

So I’ve made a plugin for WordPress that will check your posts (and pages), looking for broken links, and let you know if any are found.

Features

  • Detects links that don’t work, missing images, deleted YouTube videos and other problems.
  • Periodically checks links in posts, pages, comments, custom fields and the blogroll.
  • New and modified entries are checked ASAP.
  • Notifies you on the Dashboard if any problems are found.
  • Lets you edit all instances of a specific link at once.
  • Gives you a list of all links ever posted on your site, with the ability to search and filter it.
  • Lets you apply custom CSS styles to broken and removed links.
  • Highly configurable.

The broken links show up in the Tools -> Broken Links tab along. If any invalid URLs are found a notification will also show up on the Dashboard widget. To save screen real-estate, the widget can be configured to stay closed most of the time and automatically expand when broken links are detected.

Download

broken-link-checker.zip (412 KB)

    Requirements

    • WordPress 3.0 or later
    • MySQL 4.1 or later

    The current version of this plugin is only compatible with WordPress 3.0 and up. If you have an older version of WP, try one of the older releases. Specifically, version 0.8.1 is the last one that’s still compatible with the WP 2.8 branch, and version 0.4.14 is the last one compatible with WP 2.1 – 2.6.x.

    Installation

    Install “Broken Link Checker” just like any other WordPress plugin :

    1. Download the .zip file (see below).
    2. Unzip.
    3. Upload the broken-link-checker folder to you /wp-content/plugins directory.
    4. Activate the plugin in the Plugins tab.
    Related posts :

    2,295 Responses to “Broken Link Checker for WordPress”

    1. Gary Gordon says:

      Couple questions. 1. Does Broken Link Checker work with Multisites? 2. I’ve seen numerous complaints on WordPress.org. (Can you address the most common ones here and just provide your feedback to the negative comments, which are at about 25% of all reviews that appear there? (e.g., heavy load experienced when using BLC, etc.) Thanks. Gary

    2. Jānis Elsts says:

      Does Broken Link Checker work with Multisites?

      No, at least not officially. It might work if you activate it separately on each site, but no guarantees.

      2. I’ve seen numerous complaints on WordPress.org. (Can you address the most common ones here and just provide your feedback to the negative comments, which are at about 25% of all reviews that appear there? (e.g., heavy load experienced when using BLC, etc.)

      Heavy server load appears to be a problem in some cases, but I’ve never been able to pin down the exact cause. The reported performance problems do not show up in my tests, and users who reported them are (understandably) not too keen on keeping the plugin active just to collect more debugging data.

      At a guess, it could be a combination of large sites and servers that are already under significant load. When the plugin is activated, the first few things it does – creating link database entries for all posts, parsing content for links, etc – can be fairly heavy on the database. Unfortunately, there doesn’t seem to be a better way to do that – the plugin has to find all links to be able to check them, and scanning thousands of posts is always going to have a performance impact.

      The second most common complaint is false positives – links that are reported as broken, but actually work fine. In most cases, these are caused by one of these two:

      • Temporary downtime of the linked site.
      • Servers deliberately blocking crawlers and bots, but allowing human visitors. This can range from something as simple hotlink protection scripts that block direct access to images, to poorly configured firewalls and intrusion detection systems.
    3. ajale says:

      since i start my blog your blog was a helpful one for me. I am a daily visitor of your blog.
      Thank you.

    4. ajale says:

      your site is super.

    5. Dennis says:

      Jānis, the latest version of the plugin slowed down processing of links and unfortunately that now has my site unable to completely process. It seems to always be processing old links. I have it set to check every 14 days. In the past this has never been an issue.

      It is not server load which has throttled the link checking as I never hit the amount I allow. How many links are able to be processed in an hour with the newer version?

      Thanks, Dennis

    6. Jānis Elsts says:

      There is no hard limit on links processed per hour, but there will obviously be a physical limit to how many links your server can actually handle. Also, there is a limit on HTTP requests per domain name.

      The latest version will wait about 5 seconds between each request to the same site. It will also wait a short while after certain SQL queries. This behaviour was introduced in an attempt to mitigate repeated complaints that the plugin was “using too many resources” and “breaking sites”. Note that requests to different sites are not throttled.

      How many links do you have, and what kind of server are you hosting your site on? I’m asking this so I can get a rough idea of how many links the plugin would need to process per second/minute/day to check each of them every 14 days.

    7. Dennis says:

      Thanks for the reply Janis, It is likely the new throttling that is affecting the ability of the software to work it’s way through the link count. My website currently shows:

      Detected 164116 unique URLs in 286550 links

      I am hosted on a single server instance of 12 cpu’s and 96 gig ram. This is the only site, and because of the power behind it I don’t have issues with using too many resources that others complain about. My server load rarely goes over 1. By my count the program only has to check 488 links an hour to work it’s way through the 164116 links, or 8 a minute, which is why I set it to 14 days. That should be doable under your present throttling which makes me think something else has been affected that you may not have realized. The domain name limit is probably the biggest thing. Is there someplace in the code I can change that to see if it helps speed up the processing?

    8. Jānis Elsts says:

      Your calculations look right, and the server specs are definitely high enough to support that kind of workload.

      Try the development version of the plugin. It provides a way to decrease or disable some of the throttling: go to Settings -> Link Checker -> Advanced and set “Target resource usage” to 100%.

      As for the domain name limit, you can tweak that in /broken-link-checker/modules/checkers/http.php, lines 32 to 34. Try increasing “http_throttle_rate” and decreasing “http_throttle_min_interval”.

      Alternatively, you can disable it entirely by commenting out line 79:
      $this->token_bucket_list->takeToken($domain);

    9. Brandon says:

      I have been using your plugin for some time on a number of installs (thanks!), but recently ran into a problem when using it on a private install (IP blocking in effect). It shows every single link as a 403 error. This is accurate, because the entire site is using IP blocking, but I don’t want to have to click “not broken” on every single link I create. Wasn’t sure if you would be interested in adding to the UI some type of settings where the user could choose status codes to ignore, etc? In the meantime, I updated the /modules/checkers/http.php, line 117 to be:

      $good_code = ( ($http_code >= 200) && ($http_code < 400) ) || ( $http_code == 401 ) || ( $http_code == 403 );

    10. Jānis Elsts says:

      You could also add an exception for your domain name to prevent the plugin from checking internal links. To do that, add the domain to the “Exclusion list” field in Settings -> Link Checker -> Advanced.

    11. Brandon says:

      Sure, but I WANT to check internal links, to make sure they aren’t 404 (because an author typed one incorrectly, a page no longer exists or a path has changed). I just want to ignore the 403 status code.

    12. Jānis Elsts says:

      Correct me if I’m wrong, but wouldn’t IP blocking also prevent the plugin from accessing links that are 404? Wouldn’t the server just return 403 Forbidden regardless of whether a page exists or not?

    13. Brandon says:

      Hmph. You are absolutely right. I just tested this theory. It comes in handy when the internal page is outside my WordPress install, but you’re right, it isn’t detecting broken links at all now, because they are all 403, regardless of being broken. So the real question is, how do I allow access for the plugin to check links, while maintaining my IP blocking rules? Can I add a rule in the .htaccess file to allow access from the plugin or something?

    14. Jānis Elsts says:

      Maybe you could add the server’s own IP address to the list of allowed IPs. Depending on how your network is set up, this could be either its external IP address or the loopback address (i.e. 127.0.0.1).

    15. Brandon says:

      Grrr, I thought I had done that, but apparently, it’s dependent on the server’s configuration. I have 4 IPs allocated to a server, but only the IP assigned to the services can be added to htaccess, otherwise a 500 error is encountered. Finally got it working, thanks!!!

    Leave a Reply