FindBroken Beta

Here’s something I’ve been working on for the last month or so: It’s an automated link checker that periodically scans your site for broken links and alerts you by email if any are found. Basically, it’s like a hosted version of the Broken Link Checker plugin, only simpler and not WordPress-specific.

The site is still in early beta, so some features may a little rough around the edges or just plain missing. Nevertheless, I encourage you to go check it out and let me know what you think. All feedback is welcome.

What already works

  • Add/remove sites.
  • Monitor multiple sites with one account.
  • Automatic daily scans.
  • Email notifications.

What doesn’t quite work

  • The crawler that I use is slow to get going. Sometimes it can take ~30 minutes before any progress is visible.
  • Redirects are currently not followed.
  • There’s a fair amount of false positives. Luckily, most of them are temporary and don’t show up when using the “persistent” filter.
  • The site could use some UX love.


Related posts :

41 Responses to “FindBroken Beta”

  1. MK Safi says:

    Cool. I subscribed and added my site. Thing is, 90% of the links on my site are multiple 301 redirects. They are affiliate links. I don’t think will be very helpful in this case?

  2. White Shadow says:

    Yes, it doesn’t handle redirects yet. But that’s something I’m definitely going to work on.

  3. Simon Brown says:

    Would it be possible to allow login via OpenID rather than people having to deal with another password?

  4. White Shadow says:

    Hmm, I think that probably falls into the “nice-to-have” category of site improvements. I’ll consider it, but no promises.

  5. MK Safi says:

    @Simon Brown:

    @White Shadow: It already helped me discover one broken link — thanks! 🙂

  6. Mike Essex says:

    The main problem I find with broken link checkers is they get stuck in loops and often never finish. This tends to happen if page links are dyanmic (E.g. tags). Does this work better at getting around that issue?

  7. White Shadow says:

    It has a page limit so it won’t loop forever, but it can still get carried away scanning dynamic links and never get to checking the actually useful ones. It doesn’t do any explicit loop-detection.

    I think the current best workaround is to disallow the dynamic links in robots.txt. If you’re worried that this will prevent your whole site from being indexed by search engines, you can make a separate robots.txt section that only applies to link checker user agents.

    This particular checker identifies itself as “80legs” when scanning links, and it obeys robots.txt.

  8. MK Safi says:

    You know, sometimes my affiliate links get broken but not by leading to 404s, 503s or whatever. They break by leading visitors to undesired destination — like to a page that belongs to the affiliate management company telling visitors that the vendor is no longer with them.

    I have plans to implement this sort of link checking in my affiliate links manager plugin. You would simply tell it the intended final destination of the link, and if that changes, the plugin will notify you. Maybe FindBroken can do something similar.

  9. White Shadow says:

    That’s highly specific to affiliate links. However, it might be possible to add something that detects any link destinations – when I figure out how to get redirects working, that is.

  10. Avanano says:

    Your effort is appreciated..

  11. Simon Brown says:

    It’s currently giving me false positives. I’m guessing it’s because my site uses the base tag.

  12. White Shadow says:

    It currently ignores base tags, so that could be it. Can you give me a few examples where it gives you false positives?

  13. Simon Brown says:

    Most of the broken links detected on my site:

  14. MK Safi says:

    I have an intentionally placed broken link on my site (to point out to visitors that a certain software documentation is always unavailable). Does it automatically ignore a broken link after a few days — or can I manually ignore it? Thanks!

  15. White Shadow says:

    There’s currently no way to ignore specific links, but that’s on my to-do list. Hopefully, I’ll get to it sometime next week.

    Edit: And no, it doesn’t automatically ignore old broken links.

  16. White Shadow says:

    @Simon Brown: <base> tag support has been added. Those false positives should go away within 24 hours.

  17. MK Safi says:

    I’m actually loving this service. It let’s me know whenever a link breaks. But I really need to check my affiliate links now with their redirects — does it do that now or not yet?

  18. White Shadow says:

    It now follows redirects (has been for a few weeks actually), but doesn’t detect changes in redirect destination.

    I still can’t figure out how to do that efficiently and there’s plenty of unobvious questions.

    – What if the redirect changes more than once?
    – What about “good” changes, like moving a product to a different domain name?
    – What about redirects that you don’t care about?
    – What about situations where the redirect URL doesn’t change, but the page content gets replaced with something “undesirable” – like the domain expiring and being picked up by a squatter that keeps the same URLs but only serves ads?

    That said, something like this might work:

    click here

    Or perhaps:

    click here
  19. MK Safi says:

    In my last comment I was asking if it could merely follow redirects and notify if final destination is broken. I’m really happy it does that now. Thank you!

    About following affiliate redirects and detecting changes in destination, I originally envisaged that the user would manually specify the intended destination (like the markup examples you showed), but now I think it’s a lot better if the service notified the user of destination changes without user having to specify a destination for every link.

    User can then ignore a changed link, like how you can ignore some broken links right now.

    – What do you mean by if “the redirect changes more than once”? (Do you mean that you wanna make change detection automatic, but what if the link changes multiple times? Maybe store link destination at first crawl. If at 2nd crawl the destination is different, notify user. If at 3rd crawl destination goes back to original, hmmm, remove notification? But you know those plugins and even services that let affiliates split-test destinations of the same link to see which converts best. These could cause a problem. So, maybe, you’ll have to have an “ignore permanently” option.)

    – “Good” changes should generate notifications, I think. They don’t happen often and the way you currently handle ignoring links is very convenient — just press the little x.

    – What are the redirects that you don’t care about?

    – About situations where the destination is the same but content changes, well, it’s probably very difficult to detect that. Maybe when the service is very popular, this feature can be part of the $600/month platinum package.

    I don’t have nearly as much experience in this as you do, so I don’t know how difficult this feature is to implement. But from a user standpoint I think this would be a really nice feature to have if it worked intuitively and reliably.

    Thanks a lot!

  20. White Shadow says:

    Yes, by “changes more than once” I mean situations where the redirect changes multiple times, possibly going back to the original URL for a while, then changing to an entirely different one, etc.

    It looks like the app would have to keep a full list of changes that have ever occurred for every URL, which can get unwieldy quickly. It’s also unclear how “ignore” should work in these situations – would it ignore that particular change, all changes up until then, all changes on that URL, …? Similarly, what if the user later removes their site from FindBroken, then re-adds it – do the “ignores” and link history get preserved? What if the same redirected link exists on multiple sites owned by multiple users? What about…

    Sorry, I tend to do that 😉 It probably looks like just so much nit-picking, but all of those tiny questions and edge cases have to be handled somehow to arrive at something one can actually implement in code. And the answers better match the users’ intuitive, fuzzy model of how things “should” work or your new feature will end up annoying them instead of helping them.

    Yes, a programmer can sometimes just “filll in the gaps” on his/her own, but so far, I haven’t been able to figure out how to do that for this feature.

    Okay, rant over.

    As to detecting content changes, it would be quite easy code-wise – just hash the page contents on every check and see if the hash changes. But it would also give you an incredible number of useless notifications because it would detect even single-character changes. Alas, algorithmically and reliably figuring out what changes are important is extremely hard (as in, PhD-level hard).

Leave a Reply