The Statistics on Link Rot

By sampling 4,200 ran­dom URLs span­ning a 14 year period, Maciej CegÅ‚owski, the cre­at­or of book­mark­ing web­site Pinboard.in, decided to gath­er stat­ist­ics on the extent of link rot and how it pro­gressed across time. Inter­ested in find­ing out if there is some sort of ‘half life of links’, he found instead that it is a fairly lin­ear, fast deteri­or­a­tion:

Links appear to die at a steady rate (they don’t have a half life), and you can expect to lose about a quarter of them every sev­en years.

And even that is an optim­ist­ic res­ult, says Maciej, as not all dead links were able to be dis­covered programmatically. There are also sev­er­al unanswered ques­tions:

  • How many of these dead URLs are find­able on archive.org?
  • What is the attri­tion rate for shortened links?
  • Is there a simple pro­gram­mat­ic way to detect parked domains?
  • Giv­en just a URL, can we make any intel­li­gent guesses about its vul­ner­ab­il­ity to  link rot?

Inter­est­ingly, link rot is what inspired the cre­ation of Pinboard.in (it fea­tures page archiv­ing fun­citon­al­ity). This is sim­il­ar to why I star­ted Lone Gun­man: I was los­ing track of inter­est­ing links and art­icles, and wanted a way to eas­ily find them again as well as help me build con­nec­tions between dis­par­ate art­icles and top­ics.