Tag Archives: maciej-ceglowski

The Statistics on Link Rot

By sampling 4,200 ran­dom URLs span­ning a 14 year period, Maciej CegÅ‚owski, the cre­at­or of book­mark­ing web­site Pinboard.in, decided to gath­er stat­ist­ics on the extent of link rot and how it pro­gressed across time. Inter­ested in find­ing out if there is some sort of ‘half life of links’, he found instead that it is a fairly lin­ear, fast deteri­or­a­tion:

Links appear to die at a steady rate (they don’t have a half life), and you can expect to lose about a quarter of them every sev­en years.

And even that is an optim­ist­ic res­ult, says Maciej, as not all dead links were able to be dis­covered programmatically. There are also sev­er­al unanswered ques­tions:

  • How many of these dead URLs are find­able on archive.org?
  • What is the attri­tion rate for shortened links?
  • Is there a simple pro­gram­mat­ic way to detect parked domains?
  • Giv­en just a URL, can we make any intel­li­gent guesses about its vul­ner­ab­il­ity to  link rot?

Inter­est­ingly, link rot is what inspired the cre­ation of Pinboard.in (it fea­tures page archiv­ing fun­citon­al­ity). This is sim­il­ar to why I star­ted Lone Gun­man: I was los­ing track of inter­est­ing links and art­icles, and wanted a way to eas­ily find them again as well as help me build con­nec­tions between dis­par­ate art­icles and top­ics.

The Intricacies and Joys of Arabic

I ima­gine that most people with a passing interest in lin­guist­ics read Maciej CegÅ‚owski’s short essay in praise of the Arab­ic lan­guage when it was ‘redis­covered’ by pop­u­lar social net­works a few months ago.

As one who has stud­ied Arab­ic (albeit MSA and only for nine months or so), the essay brought back fond memor­ies of strug­gling to com­pre­hend the strange-yet-won­der­ful intric­a­cies of the Arab­ic lan­guage. Here are just a few the ways that Arab­ic “twists healthy minds”, accord­ing to CegÅ‚owski:

  • The Root/Pattern Sys­tem: Nearly all Arab­ic words con­sist of a three-con­son­ant root slot­ted into a pat­tern of vow­els and help­er con­son­ants.
  • Broken Plur­als: Most of the time to make a plur­al you have to change the struc­ture of the word quite dra­mat­ic­ally.
  • The Writ­ing Sys­tem: The Arab­ic writ­ing sys­tem is exot­ic look­ing but easy to learn, which is a rare com­bin­a­tion.
  • Dual: Arab­ic has a gram­mat­ic­al dual — a spe­cial form for talk­ing about two of some­thing.
  • The Fem­in­ine Plur­al: Form­al Arab­ic dis­tin­guishes between groups com­posed entirely of women and groups that con­tain one or more men.
  • Crazy Agree­ment Rules: e.g. [Maciej’s] abso­lute favor­ite is that all non-human plur­als are gram­mat­ic­ally fem­in­ine sin­gu­lar
  • Funky Num­bers: Ù© Ù¨ Ù§ Ù¦ Ù¥ Ù¤ Ù£ Ù¢ Ù¡ – The names of the num­bers come with truly ter­ri­fy­ing agree­ment rules, like “if the num­ber is great­er than three but less than elev­en, it must take the oppos­ite gender of the noun that it mod­i­fies”.
  • Diglos­sia: This is where it really helps to love lan­guage study.