Take the top 100 blogs from Technorati, scrape some connectedness statistics from Yahoo! Site Explorer, mix in a bit of improvised PHP scripting and Processing magic, and we’ve got ourselves a map of the equatorial blogosphere 🙂
About The Map
This graph shows the connectedness level of some (see rant below) of the most popular blogs. Each node represents a blog. The diameter of a circle is proportional to the blog’s Technorati authority. Each line represents the volume of links from one blog to another – a thicker line means more links. Also, a line acts like a spring, trying to pull together closely interlinked blogs and keep apart those that share only a few links (technically speaking, the natural length of the spring is inversely proportional to the logarithm of the number of links between two sites). The layout is automatically generated based on the forces exerted by the springs.
The Making Of The Map
Ah, I wish I had the copywriting skills to put a pop-sci spin on this, or something of the kind. One can only dream. Anyway, the initial idea was quite simple – the map of the Internet is cool, lets make something like that. However, it soon turned out I didn’t have the faintest idea about weighted graph visualization and related topics, so I wasted several days on this little experiment and
learned played with a new programming language in the process.
Collecting the data was easy. I scraped the Top 100 list from Technorati and put it in a database. Then I went over all the records and determined how many times any particular blog linked to any other. You can get this info from Yahoo! with a query like this : “site:firstblog.com linkdomain:secondblog.com”. The downside is that this query won’t work if one of the blogs is in a subdirectory of the main domain (sudomains are fine though). So I had to skip 16 of the Top 100 blogs that didn’t have their own (sub)domain.
The hard part was displaying the collected statistics. I wanted a program/algorithm that would automatically organize the sites so that blogs that frequently link to each other were displayed close together, etc. After hours of googling I finally stumbled upon the right key phrase : “force-directed layout”, and an example implementation of the algorithm in the Processing programming language. Now I just needed to learn a Java-based language that I had never heard of before.
Luckily, Processing is a relatively simple, graphics-oriented language. Despite the fact that I hadn’t written a single line of Java before, I was able to create the image you see above in just two days. The source code is obviously full of horrible hacks and is mostly undocumented, but if you want to take a look at it, download links are below 🙂
Note : In addition to the sites that were skipped because they didn’t have their own domain, the layout algorithm also threw away (literally) some very weakly connected sites. So don’t fret if you don’t see the entire Top 100 in the map.
- Graph layout code + data in XML format (RAR, 126 KB)
- Link analysis and favicon download PHP scripts (RAR, 4 KB)
- traer.animation library for Processing
The PHP scripts are intended mainly for educational purposes. They won’t work “out of the box” because they were built to interact with my MySQL database (not included).Related posts :