Broken Link Checker for WordPress

Sometimes, links get broken. A page is deleted, a subdirectory forgotten, a site moved to a different domain. Most likely many of your blog posts contain links. It is almost inevitable that over time some of them will lead to a “404 Not Found” error page. Obviously you don’t want your readers to be annoyed by clicking a link that leads nowhere. You can check the links yourself but that might be quite a task if you have a lot of posts. You could use your webserver’s stats but that only works for local links.

So I’ve made a plugin for WordPress that will check your posts (and pages), looking for broken links, and let you know if any are found.

Download it now! (10 KB)

Features

  • Checks your posts (and pages) in the background (whenever the WP admin panel is open ).
  • Detects links that don’t work and missing images. Checks both internal and outbound links.
  • Notifies you on the Dashboard if any problems are found.
  • Link checking intervals can be configured.
  • New/modified posts are checked ASAP.

The broken links show up in the Manage -> Broken Links tab. If any invalid URLs are found a notification will also show up in the sidebar on the Dashboard.

The Broken Links tab displays a list of invalid URLs found along with the relevant posts and the anchor text of the links. “View” and “Edit Post” do exactly what they say and “Discard” will remove the message about a broken link, but not the link itself (so it will show up again later unless you fix it; this plugin doesn’t modify your links).

By default all old posts/links are re-checked every 72 hours, or you can set a different time period.

Notes (Semi-Technical)
I realize there’s a lot of features that could be added to improve this plugin considerably. However, this release is intended to “test the waters” and see if there’s demand for a plugin like this, so I only implemented the most basic functions. The plugin has been upgraded to be slightly beyond “basic” ;)

I thought about using WP’s pseudo-cron to run the link checker by schedule and decided against it. AFAIK the cronjobs execute when a page is requested; since this plugin does some lengthy processing it may increase page load times unacceptably when used in this manner. That’s why I set it to run the checks asynchronously (AJAX) and invisibly in the admin panel.

Installation
Just like any other WordPress plugin -

  1. Download (see below).
  2. Unzip.
  3. Upload the broken-link-checker folder to you wp-content/plugins directory.
  4. Activate the plugin in the Plugins tab.

Upgrading

  1. Deactivate the plugin (important!).
  2. Do steps 1.-3. from “Installation”.
  3. Upload the broken-link-checker folder to you wp-content/plugins directory.
  4. Re-activate the plugin in the Plugins tab.

DownloadCC-GNU GPL
Version 0.3.5 : broken-link-checker.zip (10 Kb)
(It needs at least WordPress 2.0.x to work, maybe 2.1.x. I’ve tested on 2.1.3 - 2.5)

Related posts:

249 Responses to “Broken Link Checker for WordPress”

Pages: [9] 8 7 6 5 4 3 2 1 » Show All

  1. 249
    White Shadow Says:

    @Nokao - Do you really want to know all the details?…

    Okay, for my plugin it’s like that : the link checks run in the background using AJAX in the admin menu. This is necessary because I can’t expect every user to set up a cronjob for the broken link script.

    In WordPress plugins an AJAX-y component can be built in several ways. This plugin uses a different PHP file for the AJAX tasks. The file, wsblc_ajax.php, needs to be able to access the WordPress database to be able to read posts and check links in them, etc. That’s why it needs to load the WP core engine and blog configuration. This is done by including wp-config.php. Also, the plugin needs to load the database layer, so it includes wpdb.php.

    That’s it :P

  2. 248
    Nokao Says:

    @White Shadow -

    Uh :(

    Even nextgen-gallery have SOME problems, in the admin menu’, the link is wrong and it’s a mix of server path and symlink.

    I don’t understand honestly why this is happening… and why you need to look for wp-config.php

  3. 247
    White Shadow Says:

    @Nokao - Oops, I forgot about that. You’re right, my suggestion wouldn’t work.

    Other plugins work because either they don’t use AJAX at all or use it in a different way (possible, but as I mentioned it would take a while to convert this plugin to use the other method).

  4. 246
    Nokao Says:

    @White Shadow -

    :)

    Ok… I’ll surrender to the fact that I can’t use your plugin in that server :(
    (i’ll continue to use that for my personal blog)

    Infact, I can’t change the ../../../ folder with something, because, you know, I did that “hack” to have the SAME plugins folder with more than 25 wordpress websites.
    So I can’t know wich website is calling the function, and I can’t address a wp-config.php in an absolute way.

    I really hope that your plugin is the only that have that problem… I’m curious about that.
    Actually, all seems to be working fine.

  5. 245
    White Shadow Says:

    @Nokao - It doesn’t “recognize” it, it just tries to include the file as if the plugin was in the usual location, and, obviously, fails. I don’t think there’s anything I can do to the plugin to fix this except overhauling the entire AJAX part of it and getting rid of wsblc_ajax.php altogether. That would take a lot of time & work, so it probably won’t happen in the foreseeable future.

    For the time being, you should just edit the plugin and replace “../../../wp-config.php” with a full path to the wp-config.php file. Hopefully I’ll be able to remove this need for hacks in the next version - but, as I said, it probably won’t be soon.

    Ah, and thanks for the compliments :)

  6. 244
    Nokao Says:

    Originally Posted By White Shadow@Nokao - I see why the plugin wouldn’t work in that configuration, but I’m not quite sure what you mean by “in a relative way”. It’s already using a relative path to include wp-config.php, do you mean I should use an absolute path, or what?

    Hi White Shadow (and compliments for your storms pictures, we have the same photographic taste for natural strength).

    I said “relative” because someway, your plugin recognizes that he is in a “symbolic link jailed folder”.
    The point is that it’s the only plugin that it’s not working and I think that it’s not the only that uses the wp-config someway.

    I’m not a WordPress hacker, so maybe you can look how do plugins like nextgen gallery look into the wp-config file…

    I also have another question I posted months ago:
    to remove the check of links that have different protocols from http, like for example skype://
    Your plugin hates my msn and skype “addme” buttons :)

    Thanks again for your work!

  7. 243
    White Shadow Says:

    @Nokao - I see why the plugin wouldn’t work in that configuration, but I’m not quite sure what you mean by “in a relative way”. It’s already using a relative path to include wp-config.php, do you mean I should use an absolute path, or what?

  8. 242
    Nokao Says:

    Hi man.

    I just tested a hack:
    http://wordpress.org/support/topic/190128?replies=4#post-840910

    And I have this problems, can you kindly search the wp-config file in a relative way?:
    Warning: require_once(../../../wp-config.php) [function.require-once]: failed to open stream: No such file or directory in /home/wordpress/plugins/broken-link-checker/wsblc_ajax.php on line 5

    Fatal error: require_once() [function.require]: Failed opening required ‘../../../wp-config.php’ (include_path=’.:/usr/share/pear:/usr/share/php’) in /home/wordpress/plugins/broken-link-checker/wsblc_ajax.php on line 5

  9. 241
    White Shadow Says:

    @Lucy - Hmm, here are some ideas :

    * When you click the “re-check all” button note the URL of the “search page” (before it redirects). That might give some clues.

    * Check your .htaccess. Maybe there are some security-related rules that are blocking parts of the plugin.

    * Try disabling/enabling any cache-related plugins. My intuition tells me those could have something to do with it (somehow).

    * Check file permissions/file owner of broken-link-checker.php and wsblc_ajax.php (especially the second one). The “right” permissions depend on your server configuration, but in general these files should both have the same permissions as the .php files of other, working WP plugins on your server.

  10. 240
    Lucy Says:

    This is a great plugin - found loads of dead links on one of my blogs. But on another, I can’t get it to run.

    Both blogs are 2.6.1, and both are hosted on the same server, so I think there must be some interference with another plugin, one that I’m not using with the first, successful installation. Specifically, what happens is that it doesn’t run automatically, and when I click ‘re-check all pages’ in an attempt to trigger it, I get my ‘no results of a search’ page, suggesting other links, up for a few seconds. Then the default Broken Link Checker start page comes up again. Very odd!

    I’ve tried deactivating all the obvious plugins - do you have any suggestions?

  11. 239
    White Shadow Says:

    @Dai - Nah, it’s fine, though feel free to investigate if that helps your peace of mind.

  12. 238
    Dai Says:

    Thank you for update.

    > using “isset($wpdb)” instead of “class_exists(’wpdb’)” will be a better choice

    I guess you’re right, actually, it works fine.
    ‘wp-db.php’ is not one of the “Popular Libraries” like PclZip, but a part of
    WordPress system.
    The article is may not suitable for this situation.

    Anyway, the problem has gone.
    I won’t have to edit when this plugin updated.

    If you need, I’ll try to find out which plugin cause the conflict I wrote.

  13. 237
    Wordpress Plugins That Work. | 7Wins.eu Says:

    [...] plugin to show your valuable bookmarks on sidebar | Let’s explore the web technologies togetherø Broken Link Checker for WordPress | W-Shadow.com ø Tags casting call for cast call talent agency call casting talent agencies This product is [...]

  14. 236
    SEO Tips for Beginners Says:

    [...] solution: Broken Link Checker for WordPress will check and detect both internal and outbound links that don’t work and notifies you on the [...]

  15. 235
    White Shadow Says:

    @Dai - While I haven’t encountered this problem before your suggested modification certainly can’t hurt. I’ll add it to the plugin :)

    However, I think using “isset($wpdb)” instead of “class_exists(’wpdb’)” will be a better choice because the comments in wp-db.php indicate that it is possible to replace this class by something else by setting the global variable $wpdb.

    Including wp-config.php should be okay, that just loads the WordPress core.

  16. 234
    Dai Says:

    Hello, nice to meet you, Janis.

    This is nice plugin.
    I fixed a lot of broken links on my blog.

    But, first time, I got a error below at “Status : ” area on admin page:

    Fatal error: Cannot redeclare class wpdb in /var/www/wp-includes/wp-db.php on line 53

    So, I remove the line

    require_once(”../../../wp-includes/wp-db.php”);

    from wsblc_ajax.php , thinking it was negative solution.

    Today, I found an article, “Tackle Plugin Compatibility Issues While Using Popular Libraries” : http://weblogtoolscollection.com/archives/2008/08/27/tackle-plugin-compatibility-issues-while-using-popular-libraries/

    Yes, That might be the answer!!

    I changed like below :

    /*
    The AJAX-y part of the link checker.
    */
    require_once(”../../../wp-config.php”);
    if(!class_exists(’wpdb’)) {
    require_once(”../../../wp-includes/wp-db.php”);
    }

    I GOT IT!!!

    It potentially has same problem in require_once(”../../../wp-config.php”), maybe.

    I wish my work will help you, thank you.

  17. 233
    Çim Çit Says:

    really works thanks

  18. 232
    White Shadow Says:

    @Michael Hampton - I’ll use a different workaround. The new version should be up soon.

    I think one of my antispam plugins is causing the comment problem. I’ll investigate, but it might take a while.

  19. 231
    Michael Hampton Says:

    Sure, it’s a performance problem if you’re randomizing a few million records. Randomizing 100 is trivial.

    P.S. Please fix your comment form. I’m going nuts having to re-type my information in all over again every time.

  20. 230
    White Shadow Says:

    @Michael Hampton - I seem to recall that “ORDER BY RAND()” is considered a bad thing performance-wise.

  21. 229
    Michael Hampton Says:

    Well, curl should be timing out, but it isn’t. Looks like the CURLOPT_TIMEOUT isn’t being honored. I don’t know if that’s a bug in curl or in PHP, but it’s certainly a bug.

    Anyway, I worked around it by just having the link checker pull links from the work queue randomly, i.e.:

    /* check the queue and process any links unchecked */
    $sql="SELECT * FROM $linkdata_name WHERE ".
    " ((last_check<'$check_treshold') OR ".
    " (broken=1 AND check_count<5 AND last_check<'$recheck_treshold')) ".
    " ORDER BY RAND() LIMIT 100";

    Now it’s at least checking most of them. :)

  22. 228
    White Shadow Says:

    @Michael Hampton - Hmm, I guess I’ll have to think of something to detect cases like that.

  23. 227
    Michael Hampton Says:

    OK, more on my broken link checker stopping in the queue. It appears to get hung up when checking a link to http://www.phl.org/ . The site never loads up, the curl call never seems to actually time out as it should, and so the AJAX call comes back with a 504 (Gateway Timeout) error. When I manually removed this link from the blc_linkdata table, it went happily on checking the rest of the queue.

  24. 226
    neues aus der roiberhöhle - [wordpress:] Aktualisierung von Links verfolgen Says:

    [...] plugin Broken Link Checker habe ich erst kürzlich eingebaut. es läuft im hintergrund und scannt alte postings nach [...]

  25. 225
    Search Engine Optimization wordpress - Says:

    [...] ) 15-Auto Social Poster 16-Broken Link Checker 17-Global Translator [...]

  26. 224
    White Shadow Says:

    Nah, it makes perfect sense ;)

    Actually I need a regexp for HTML links. I think I’ll need to update the existing one to use a backreference to ensure it only accepts matching types of quotes for the opening and closing quote of the href parameter (I think that sentence got a bit tangled).

    Anyway, I’m putting that off, because currently I have a more pressing issue - finding out how somebody managed to hack this site today. I’ve got it back working, but now I need to read the logs and whatnot…

  27. 223
    Michael Hampton Says:

    The official regex for URIs is ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

    Confused yet? :)

  28. 222
    White Shadow Says:

    @hillman - There is no log.

    The plugin can probably get confused by links that contain a quote. I’ll need a more advanced link-finding regexp to solve that, but it’s doable.

  29. 221
    Branko Collin Says:

    Xenu Link Sleuth produces some sort of error message on Wikipedia links (”forbidden”), I am guessing the same thing that stops XLS also blocks the BLC4WP.

  30. 220
    hillman Says:

    @White Shadow - I see. Perhaps I’ll just discard those errors. Thanks for the reply!

    Does Broken Link Checker keep a log of the checking result? Like, whether it reports the link as broken coz it gets a 404 or 503 when checking?

    Also, one of the link in my blog is linking to:

    http://en.wikipedia.org/wiki/Pandora’s_box

    but BLC is reporting the link as:

    http://en.wikipedia.org/wiki/Pandora

    Does it stumble on single quote in the URL?

Pages: [9] 8 7 6 5 4 3 2 1 » Show All

Leave a Reply