Tag Archives: web

The Statistics on Link Rot

By sampling 4,200 ran­dom URLs span­ning a 14 year period, Maciej CegÅ‚owski, the cre­at­or of book­mark­ing web­site Pinboard.in, decided to gath­er stat­ist­ics on the extent of link rot and how it pro­gressed across time. Inter­ested in find­ing out if there is some sort of ‘half life of links’, he found instead that it is a fairly lin­ear, fast deteri­or­a­tion:

Links appear to die at a steady rate (they don’t have a half life), and you can expect to lose about a quarter of them every sev­en years.

And even that is an optim­ist­ic res­ult, says Maciej, as not all dead links were able to be dis­covered programmatically. There are also sev­er­al unanswered ques­tions:

  • How many of these dead URLs are find­able on archive.org?
  • What is the attri­tion rate for shortened links?
  • Is there a simple pro­gram­mat­ic way to detect parked domains?
  • Giv­en just a URL, can we make any intel­li­gent guesses about its vul­ner­ab­il­ity to  link rot?

Inter­est­ingly, link rot is what inspired the cre­ation of Pinboard.in (it fea­tures page archiv­ing fun­citon­al­ity). This is sim­il­ar to why I star­ted Lone Gun­man: I was los­ing track of inter­est­ing links and art­icles, and wanted a way to eas­ily find them again as well as help me build con­nec­tions between dis­par­ate art­icles and top­ics.

How to Internet: Epilogue

I’ve only scratched the sur­face of things that you may or may not want to do on the inter­net. I know that, I accept that, and I hope you don’t mind.

Two things I might have liked to address but didn’t: pod­casts and Twit­ter. These were both kicked in pref­er­ence to what I did address because they’re rather easi­er and bet­ter known than the top­ics I did write about. For 90% of pod­cast listen­ers iTunes does “pod­catch­ing” so effort­lessly they didn’t know that was a word. Twit­ter is world-fam­ous and pretty well under­stood, so my advice would mostly be super­flu­ous.

But what I want to take a second to say is this: don’t wait for per­fect under­stand­ing of some­thing to give it a try. As Mer­lin Mann makes clear, the first time, per­haps times, you do some­thing you’ll really be ter­rible at it. As Ze Frank said, sav­ing up ideas with noth­ing but the notion that you’ll one day execute them per­fectly and be greeted with immense volumes of praise and money is a sure recipe for stag­na­tion.

The internet’s the nat­ive home for ama­teurs. It’s a place where 90% of the stuff is made by people who could nev­er have con­vinced someone to pay them for what they built but felt a strong enough desire to that they put it out here on the web for us. The pur­pose of learn­ing How to Inter­net is so that you can bet­ter deal with the wealth of that diversity of stuff that exists on the inter­net and use it to enter­tain, inform, and improve your­self.

The inter­net is a freer place than any oth­er because of the twin engines of anonym­ity and low costs of entry. Surely anonym­ity has prob­lems, which /b/ shows well, but it also cre­ates scary bril­liance. Ima­gine how unlikely someone would have been to pub­lish LOLcats if they were risk­ing their repu­ta­tion on it.

A low bar­ri­er to entry makes it pos­sible in a way it nev­er was to be only con­strained by your effort. This is incred­ibly empower­ing and a little scary. Nev­er before have you been so able to rise through a rather pure mer­ito­cracy, nev­er before have you been so unable to blame some gate­keep­er for your lack of suc­cess.

Great things are afoot on the inter­net. Mind-bend­ingly great things are pro­duced every single second of the day and put on the inter­net. What I hope I man­aged to give you this week was a com­pet­ent sampling of the tools you can use to find, fol­low, and share those great inter­net things you love.

Thanks for your time and atten­tion.

How to Internet: Publishing

As you get bet­ter at the inter­net, you’ll likely start to feel a desire to share some­thing with the world. Thank­fully, the inter­net is awash with tech­no­lo­gies that make that easy and pain­less.

Out­side of Face­book, the can-be-used-for pub­lish­ing plat­form that most civil­ians are likely to have heard about is Twit­ter, which hardly qual­i­fies as a pub­lish­ing plat­form. If you’re ever look­ing for an old tweet, you’ll quickly real­ize that the medi­um is built to be short-lived. That’s not an inher­ently bad thing, but any­one who has the com­pul­sion to record their thoughts in a pub­lic way prob­ably doesn’t want to do so on such an eph­em­er­al plat­form. Add to that the char­ac­ter lim­it and I would con­tend that any­one try­ing to use Twit­ter for much more than fool­ing around is act­ing fool­ishly. So, one won­ders, how do I pub­lish things in a pub­lic way so they can be found later?

My answer, at least for any word pub­lish­ing (I’ve nev­er tried to pub­lish lots of pho­tos, video, or audio, so I can offer no expert­ise) is to use either Tumblr or Word­Press (either fla­vor).

Lloyd has a Tumblr, which I like, and it illus­trates one of the cent­ral strengths of Tumblr. For pulling togeth­er dis­par­ate media types and pub­lish­ing them quickly, I don’t think a bet­ter tool exists. And even though it was really built for that, there are oth­er ways to use Tumblr. More than a few hip design­er-types use it for blogs very much like this one.

But com­pared to Word­Press, Tumblr’s fea­tures for a com­plete per­son­al blog are some­what lack­ing. It’s cer­tainly not ter­rible, it’s just not as awe­some and adapt­able as a self-hos­ted install­a­tion of Word­Press. Lone Gun­man is online because of a self-hos­ted Word­Press install­a­tion, as are my sites. Self-hos­ted Word­Press offers a wealth of fea­tures Tumblr doesn’t have, like auto­mat­ic post revi­sions, full cat­egory and tag sup­port, and the abil­ity to access your posts in thou­sands of dif­fer­ent way with just a little PHP know-how.

But if you’re just get­ting star­ted, self-hos­ted does have the ser­i­ous down­side of requir­ing you to have and main­tain your own serv­er space. That’s where WordPress.com comes in, it’s more dir­ectly com­par­able to Tumblr—only requir­ing you to cre­ate a log in for it to work—but it also offers fea­tures like post revi­sions, as well as a great full-screen writ­ing view, and a bevy of things not men­tioned. (If you’re inter­ested, I recently made a longer write-up of the Tumblr vs WordPress.com ques­tion.)

Lest we for­get, there are also a num­ber of tools oth­er than those two, both free and paid. Not­able free ones include: Google’s Blog­ger (which, after what feels like a dec­ade of neg­lect, finally has an inter­est­ing-look­ing future), Pos­ter­ous, Joomla, Live­Journ­al, and Drupal. Some paid ones are Type­pad and Move­able Type (tech­nic­ally free or paid), Squarespace, and Expres­sion­En­gine. In both cat­egor­ies there are cer­tainly even more I can’t think of. I don’t have enough exper­i­ence with any of those to have much guid­ance about them, but if you don’t like Tumblr or Word­Press, they’re all cer­tainly viable options.

Really, though, the import­ance of the tool you use to pub­lish pales in com­par­is­on to the way in which you use it. An act­ive Tumblr may be mar­gin­ally worse for long-form writ­ing than Word­Press, but it’s vastly bet­ter than a dis­used Word­Press site. And that’s hard work that I don’t nearly have the abil­ity to cov­er this week. If you’re look­ing to actu­ally get some help with that, please allow me to recom­mend Mer­lin Mann’s ouvre, and par­tic­u­larly this little riff about mak­ing the clack­ity noise.

What you should write about, when, with what fre­quency, those are all non-trivi­al ques­tions, but I’d again emphas­ize that they pale in com­par­is­on to the import­ance of doing work rather than think­ing about it.

And a final point: writ­ing, espe­cially on the inter­net, is hardly the quick­est path to fame and for­tune. If you’re only inter­ested in pub­lish­ing stuff on the inter­net for that reas­on, get out now. The prob­ab­il­ity you’ll find more than heart­break and frus­tra­tion down that road to fame is lot­tery-win­ning small.

I don’t mean to end on a crush­ing note. There’s huge value in inter­net pub­lish­ing bey­ond its minute poten­tial for sav­ing you from ever need­ing “a real job.” But for a while I thought it would have that poten­tial for me and it didn’t. Instead, what I got was an unex­pec­ted com­munity of people to learn from, and a chance to work with people like Lloyd. People inter­ested in mak­ing good stuff on the inter­net, even if it nev­er gets us any­thing. That’s the reas­on to try your hand at web-pub­lish­ing: it’s a beach-head onto the wider world of sub­stant­ive accom­plish­ment and rela­tion­ships in a way that no Twit­ter account or Face­book page is. But it hardly guar­an­tees you of any­thing but a mod­est square of sand.

How to Internet: Reading

One of the first prob­lems you’re likely to run across as someone who’s now find­ing lots of inter­est­ing things on the inter­net is that you’re amass­ing more stuff you want to read than you’ve ever had before and it’s get­ting hard to track. If you’re like I was for about five years, this will likely take the form of hav­ing 80 tabs open per­sist­ently caus­ing your browser to be slow and your poten­tial for cata­stroph­ic data loss to be high.

There are three big obstacles to get­ting read­ing done on the inter­net. The first, and hard­est to fix tech­nic­ally, is your con­text. That is: if you’re used to just get­ting on the inter­net to offer con­stant par­tial atten­tion to your brows­ing while instant mes­saging, listen­ing to music, and watch­ing video clips, set­tling in to a multi-page essay will feel very dif­fi­cult. So too, if you fre­quently focus only on the inter­net, but click like mad and just skim everything, read­ing will feel broken to you.

There are two solu­tions to this prob­lem: change you situ­ation and change your mind. Fre­quently people who find them­selves unable to focus at the com­puter will find them­selves much more able to do so on a tab­let, e-read­er, or even phone because they have dif­fer­ent habits there. This is a subtle and auto­mat­ic way to change what you’re expect­ing on the inter­net without expend­ing the men­tal effort to actu­ally execute with the oth­er option, which is just to put some effort into calm­ing your mind and allow­ing your­self to focus. (Like most things I’ve writ­ten about this weeks, whole books could be writ­ten about this para­graph.)

The second obstacle is in some sense the most mundane, but if one is to judge by the amount it gets talked about, also the most frus­trat­ing. If you spend much time at all try­ing to read on the inter­net you’ll soon notice the fre­quency with which pub­lish­ers (espe­cially those com­ing from oth­er media) divide their con­tent to max­im­ize page views. A 1000 word art­icle split over ten pages is a good way to drive page views but ter­rible for read­er sat­is­fac­tion. There a num­ber of ways to un-pagin­ate an article—browser exten­sions, web ser­vices, and loc­al soft­ware all exist to do this pars­ing for you—but the most used is simply the print­er-friendly view that most such sites provide.

But that solu­tion gets us to the final not­able prob­lem, which is that many pages on the inter­net that house art­icles you want to read weren’t really built for read­ing. Prob­ably the most import­ant way in which they aren’t is that they have (visu­ally) loud ads and oth­er con­tent sur­round­ing them that pulls your eye and atten­tion away from read­ing. Anoth­er prob­lem is type set poorly, things like: type set too small or too large, type set in very wide columns so you con­stantly lose your place (espe­cially com­mon on print­er-friendly pages), and poor con­trast between the type and the back­ground. I believe that these prob­lem are today best solved with Read­able. What Read­able offers is a book­mark­let (a bit of Javas­cript dis­guised as a book­mark) that auto­mat­ic­ally changes any page on the inter­net to exactly the format­ting you’ve told it you want pages to have for read­ing. This concept first came from Read­ab­il­ity, but that has sub­sequently become a far more fea­ture-full and com­plex tool.

Finally, we need to tackle that tab over­load issue, because even as browsers get bet­ter at not los­ing such data they still do. And, as people get more and more power­ful and mobile phones and tab­lets, keep­ing everything on your desktop is ever less feas­ible. The best solu­tion I know of is to effect­ively out­source your tabs. Send all of them off to a book­mark­ing tool, be it deli­cious, Pin­board, nor­mal book­marks (with or without syncing), or a tool that’s pur­pose-built to handle all those art­icles you want to read.

Instapa­per is what I use, but it’s optim­ized for an Apple-cent­ric tech­nic­al envir­on­ment. It’s great if you want read art­icles off­line on an iPad or iPhone, but doesn’t have nat­ive cli­ents for any oth­er platform. Read­ab­il­ity, which was men­tioned earli­er, is a more plat­form-agnost­ic altern­at­ive (by vir­tue of a web app) which offers the nice perk that you auto­mat­ic­ally pass on a por­tion of your mem­ber­ship cost to the pub­lish­ers you most fre­quently use the ser­vice to read. (Though the fact the you’re pay­ing for mem­ber­ship is a non-trivi­al down­side.) Bey­ond those there are num­ber of oth­er ser­vices built for this pur­pose, the most prom­in­ent of which is Read it Later. I have no exper­i­ence or expert­ise at all with any of this last class.

I hope you now under­stand the import­ance of the triple threat of the print­er-friendly view, in-situ reformat­ter, and the read­ing-cent­ric book­mark­ing ser­vice. Far more import­antly, I hope you’ve found a solu­tion to your most frus­trat­ing struggle in actu­ally read­ing all that great web-con­tent you’re now find­ing.

How to Internet: Staying Current

For the unini­ti­ated, phrases like “Subscribe to this Blo­g”, “RSS feed”, and “Feed Read­er” are just so much noise. So here’s a very short explan­a­tion: you use a “feed read­er” to “sub­scribe” to a blog using its “RSS feed”. Make sense?

To use a slightly more ana­log story, you can think of this whole thing as a way to build a news­pa­per of your choos­ing. (That’s the feed read­er.) You build this news­pa­per by choos­ing indi­vidu­al report­ers who your like (RSS feeds), and then their con­tent is auto­mat­ic­ally added to your news­pa­per every time they pro­duce it. This can be, as you might guess, a much bet­ter way to know what hap­pen­ing at the sites you care about than manu­ally try­ing to check them at an inter­val you care about.

It’s prob­ably true, though I have no data on this, that RSS feeds are known to about 20% of inter­net users. And that among those 20%, about 80% use and enjoy them. That oth­er 20% doesn’t like them for a vari­ety of reas­ons and so uses some­thing else.

In most cases, “some­thing else” means some type of book­marks sys­tem. The most com­mon form of this is a flat set of book­marks that you pick through and vis­it as it strikes your fancy. A slightly improved ver­sion of this is a simple folder set where you reg­u­larly open the con­tents of your folders into tabs. This can be fur­ther enhanced by break­ing down said folders into the approx­im­ate fre­quency you want to vis­it the site, and then open­ing them on roughly this sched­ule.

The whole book­marks option is not use­less or totally fool­ish, but giv­en the choice I don’t under­stand why any­one would choose it. RSS feeds are a clearly bet­ter solu­tion as they make it pos­sible for you to nev­er miss any­thing, make it easy to save things to revis­it at a bet­ter time, and can be made massively flex­ible and mobile in a way that web­sites rarely are.

There were once oth­er not­able RSS read­ers, but today if you’re doing it you’re almost cer­tainly util­iz­ing Google Read­er in some way. If you refuse, there are oth­er solu­tions that exist: many email cli­ent have RSS read­ers built-in, most browsers let you set up RSS folders, and some stan­dalone non-Google using cli­ents exist. But because they’re so obscure and rarely used, I’m not going to explain them to you.

Google Read­er is the best option for in-browser RSS brows­ing, and it’s an even bet­ter option if you like out-of-browser RSS brows­ing (because so many cli­ents for smart­phones, tab­lets, and the desktop use it for syn­chron­iz­a­tion). Bey­ond the fact that you’ll want a Google Read­er account, there’s not much advice about tech­no­logy to give. If you find the browser ver­sion inad­equate you can find one of many cli­ents for your desktop, iPad, or Android phone. Any spe­cif­ic recom­mend­a­tions I may have about soft­ware are too plat­form spe­cif­ic for me to feel they’ll be valu­able to share.

But as someone who’s been using RSS feeds for about sev­en years, I have a recom­mend­a­tion about man­aging all that stuff that you’ll now find so easy to col­lect. All feeds can be under­stood as belong­ing to one of two cat­egor­ies: Noise—content that you like brows­ing but rarely care to pay care­ful atten­tion to; for me this is things like The Awl, Giz­modo, and Boing Boing—and Signal—stuff you’ll be quite sad to miss items from; for me, things like I recom­men­ded yes­ter­day. This is the basic type of folder sys­tem I recom­mend set­ting up in Google Read­er.

A lot of people choose to only have Sig­nal in their feed read­er, and I do think that’s a val­id way to deal with the very real danger for gath­er­ing an over­whelm­ing volume of stuff that feeds cre­ate. But over the last couple years I’ve built a sys­tem that I think I pre­serves much of the serendip­ity that makes the inter­net such a magic­al place but removes much of the too-much-stuff feel­ing that fre­quently goes along with it. My Sig­nal & Noise sys­tem also works great for read­ing on the go.

Regard­less of your feed volume, I think you want to stick to less than 100 new items com­ing in as “Sig­nal” each day. This is the stuff that you most want to read, so keep it to a volume that you can really give care­ful atten­tion. Sig­nal is also the stuff you’ll cut last when you’re low on time to check these things, and you don’t really want it at so high a volume you have to cut it too.

Noise is your fail safe. When it all gets to feel like too much volume, you can mark all that Noise as read and feel little con­cern because you know you rarely find light­ning in there. But to my mind, you can eas­ily go through more than 1000 “Noise” items a day and you won’t feel much pain. (Though if you do have that much volume, I recom­mend you actu­ally have mul­tiple “Noise” folders, divided by top­ic area.) The time you spend on your Noise should come out about equal to what you spend on Sig­nal.

That’s because you can eas­ily “read” your Noise by rel­at­ively quickly glan­cing past the head­lines and click­ing just the 20 or so that strike your interest. Sort­ing your Sig­nal should inher­ently be harder, as it’s got a rather large pro­por­tion of things that you like, want to read care­fully, and maybe even spend a week think­ing about.

A final note on this sys­tem: because of the amount of stuff I churn daily and the per­cent­age of time that I do it without an inter­net con­nec­tion (anoth­er advant­age RSS has over web­sites) I per­son­ally find it use­ful to have an inter­me­di­ate folder. A “Noisy Sig­nal” folder of feeds that have between 1 in 5 to 1 in 20 items that I really care to see closely. That allows me to more eas­ily keep the inter­est­ing stuff I don’t have time to closely exam­ine while on the go togeth­er, for future exam­in­a­tion beside my Sig­nal folder. Wheth­er or not that’s a valu­able idea for you I’ll not spec­u­late.

To wrap up, RSS feeds are your friend if you have an interest in fol­low­ing more web­sites than you can check manu­ally at sane inter­vals. They can over­whelm if you jump in too deep, or without enough pre­par­a­tion. But using the Sig­nal & Noise sys­tem, I see more than most people could even fathom on a daily basis, but it takes just a frac­tion of my time and energy. And any such advant­age you can get, I recom­mend using.