FindBroken Beta

Here’s something I’ve been working on for the last month or so: FindBroken.com. It’s an automated link checker that periodically scans your site for broken links and alerts you by email if any are found. Basically, it’s like a hosted version of the Broken Link Checker plugin, only simpler and not WordPress-specific.

The site is still in early beta, so some features may a little rough around the edges or just plain missing. Nevertheless, I encourage you to go check it out and let me know what you think. All feedback is welcome.

What already works

Add/remove sites.
Monitor multiple sites with one account.
Automatic daily scans.
Email notifications.

What doesn’t quite work

The crawler that I use is slow to get going. Sometimes it can take ~30 minutes before any progress is visible.
Redirects are currently not followed.
There’s a fair amount of false positives. Luckily, most of them are temporary and don’t show up when using the “persistent” filter.
The site could use some UX love.

Screenshots

Related posts :

This entry was posted on Monday, March 7th, 2011 at 12:48 and is filed under Announcements, Web Apps. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

« How To Convert Your WordPress Blog To A Static Site | Automatic Updates For Private And Commercial Themes »

41 Responses to “FindBroken Beta”

MK Safi says:

March 7, 2011 at 14:16

Cool. I subscribed and added my site. Thing is, 90% of the links on my site are multiple 301 redirects. They are affiliate links. I don’t think FindBroken.com will be very helpful in this case?
White Shadow says:

March 7, 2011 at 14:25

Yes, it doesn’t handle redirects yet. But that’s something I’m definitely going to work on.
Simon Brown says:

March 7, 2011 at 19:33

Would it be possible to allow login via OpenID rather than people having to deal with another password?
White Shadow says:

March 7, 2011 at 20:00

Hmm, I think that probably falls into the “nice-to-have” category of site improvements. I’ll consider it, but no promises.
MK Safi says:

March 7, 2011 at 21:08

@Simon Brown: LastPass.com

@White Shadow: It already helped me discover one broken link — thanks! 🙂
Mike Essex says:

March 10, 2011 at 15:09

The main problem I find with broken link checkers is they get stuck in loops and often never finish. This tends to happen if page links are dyanmic (E.g. tags). Does this work better at getting around that issue?
White Shadow says:

March 10, 2011 at 22:47

It has a page limit so it won’t loop forever, but it can still get carried away scanning dynamic links and never get to checking the actually useful ones. It doesn’t do any explicit loop-detection.

I think the current best workaround is to disallow the dynamic links in robots.txt. If you’re worried that this will prevent your whole site from being indexed by search engines, you can make a separate robots.txt section that only applies to link checker user agents.

This particular checker identifies itself as “80legs” when scanning links, and it obeys robots.txt.
MK Safi says:

March 10, 2011 at 23:13

You know, sometimes my affiliate links get broken but not by leading to 404s, 503s or whatever. They break by leading visitors to undesired destination — like to a page that belongs to the affiliate management company telling visitors that the vendor is no longer with them.

I have plans to implement this sort of link checking in my affiliate links manager plugin. You would simply tell it the intended final destination of the link, and if that changes, the plugin will notify you. Maybe FindBroken can do something similar.
White Shadow says:

March 10, 2011 at 23:28

That’s highly specific to affiliate links. However, it might be possible to add something that detects any link destinations – when I figure out how to get redirects working, that is.
Avanano says:

March 17, 2011 at 09:09

Your effort is appreciated..
Simon Brown says:

March 22, 2011 at 21:47

It’s currently giving me false positives. I’m guessing it’s because my site uses the base tag.
White Shadow says:

March 22, 2011 at 21:55

It currently ignores base tags, so that could be it. Can you give me a few examples where it gives you false positives?
Simon Brown says:

March 23, 2011 at 16:17

Most of the broken links detected on my site:

http://findbroken.com/alerts/12
MK Safi says:

March 23, 2011 at 21:02

I have an intentionally placed broken link on my site (to point out to visitors that a certain software documentation is always unavailable). Does it automatically ignore a broken link after a few days — or can I manually ignore it? Thanks!
White Shadow says:

March 23, 2011 at 21:13

There’s currently no way to ignore specific links, but that’s on my to-do list. Hopefully, I’ll get to it sometime next week.

Edit: And no, it doesn’t automatically ignore old broken links.
White Shadow says:

March 25, 2011 at 00:02

@Simon Brown: <base> tag support has been added. Those false positives should go away within 24 hours.
MK Safi says:

May 18, 2011 at 18:30

I’m actually loving this service. It let’s me know whenever a link breaks. But I really need to check my affiliate links now with their redirects — does it do that now or not yet?
White Shadow says:

May 18, 2011 at 18:58

It now follows redirects (has been for a few weeks actually), but doesn’t detect changes in redirect destination.

I still can’t figure out how to do that efficiently and there’s plenty of unobvious questions.

– What if the redirect changes more than once?
– What about “good” changes, like moving a product to a different domain name?
– What about redirects that you don’t care about?
– What about situations where the redirect URL doesn’t change, but the page content gets replaced with something “undesirable” – like the domain expiring and being picked up by a squatter that keeps the same URLs but only serves ads?

That said, something like this might work:
```
click here
```
Or perhaps:
```
click here
```
MK Safi says:

May 18, 2011 at 20:19

In my last comment I was asking if it could merely follow redirects and notify if final destination is broken. I’m really happy it does that now. Thank you!

About following affiliate redirects and detecting changes in destination, I originally envisaged that the user would manually specify the intended destination (like the markup examples you showed), but now I think it’s a lot better if the service notified the user of destination changes without user having to specify a destination for every link.

User can then ignore a changed link, like how you can ignore some broken links right now.

– What do you mean by if “the redirect changes more than once”? (Do you mean that you wanna make change detection automatic, but what if the link changes multiple times? Maybe store link destination at first crawl. If at 2nd crawl the destination is different, notify user. If at 3rd crawl destination goes back to original, hmmm, remove notification? But you know those plugins and even services that let affiliates split-test destinations of the same link to see which converts best. These could cause a problem. So, maybe, you’ll have to have an “ignore permanently” option.)

– “Good” changes should generate notifications, I think. They don’t happen often and the way you currently handle ignoring links is very convenient — just press the little x.

– What are the redirects that you don’t care about?

– About situations where the destination is the same but content changes, well, it’s probably very difficult to detect that. Maybe when the service is very popular, this feature can be part of the $600/month platinum package.

I don’t have nearly as much experience in this as you do, so I don’t know how difficult this feature is to implement. But from a user standpoint I think this would be a really nice feature to have if it worked intuitively and reliably.

Thanks a lot!
White Shadow says:

May 18, 2011 at 21:31

Yes, by “changes more than once” I mean situations where the redirect changes multiple times, possibly going back to the original URL for a while, then changing to an entirely different one, etc.

It looks like the app would have to keep a full list of changes that have ever occurred for every URL, which can get unwieldy quickly. It’s also unclear how “ignore” should work in these situations – would it ignore that particular change, all changes up until then, all changes on that URL, …? Similarly, what if the user later removes their site from FindBroken, then re-adds it – do the “ignores” and link history get preserved? What if the same redirected link exists on multiple sites owned by multiple users? What about…

Sorry, I tend to do that 😉 It probably looks like just so much nit-picking, but all of those tiny questions and edge cases have to be handled somehow to arrive at something one can actually implement in code. And the answers better match the users’ intuitive, fuzzy model of how things “should” work or your new feature will end up annoying them instead of helping them.

Yes, a programmer can sometimes just “filll in the gaps” on his/her own, but so far, I haven’t been able to figure out how to do that for this feature.

Okay, rant over.

As to detecting content changes, it would be quite easy code-wise – just hash the page contents on every check and see if the hash changes. But it would also give you an incredible number of useless notifications because it would detect even single-character changes. Alas, algorithmically and reliably figuring out what changes are important is extremely hard (as in, PhD-level hard).

W-Shadow.com

FindBroken Beta

What already works

What doesn’t quite work

Screenshots

41 Responses to “FindBroken Beta”

Leave a Reply

RSS Feed

Recent Posts

Categories

Search