Tag Archives: web

The Statistics on Link Rot

By sampling 4,200 random URLs spanning a 14 year period, Maciej CegÅ‚owski, the creator of bookmarking website Pinboard.in, decided to gather statistics on the extent of link rot and how it progressed across time. Interested in finding out if there is some sort of ‘half life of links’, he found instead that it is a fairly linear, fast deterioration:

Links appear to die at a steady rate (they don’t have a half life), and you can expect to lose about a quarter of them every seven years.

And even that is an optimistic result, says Maciej, as not all dead links were able to be discovered programmatically. There are also several unanswered questions:

  • How many of these dead URLs are findable on archive.org?
  • What is the attrition rate for shortened links?
  • Is there a simple programmatic way to detect parked domains?
  • Given just a URL, can we make any intelligent guesses about its vulnerability to  link rot?

Interestingly, link rot is what inspired the creation of Pinboard.in (it features page archiving funcitonality). This is similar to why I started Lone Gunman: I was losing track of interesting links and articles, and wanted a way to easily find them again as well as help me build connections between disparate articles and topics.

How to Internet: Epilogue

I’ve only scratched the surface of things that you may or may not want to do on the internet. I know that, I accept that, and I hope you don’t mind.

Two things I might have liked to address but didn’t: podcasts and Twitter. These were both kicked in preference to what I did address because they’re rather easier and better known than the topics I did write about. For 90% of podcast listeners iTunes does “podcatching” so effortlessly they didn’t know that was a word. Twitter is world-famous and pretty well understood, so my advice would mostly be superfluous.

But what I want to take a second to say is this: don’t wait for perfect understanding of something to give it a try. As Merlin Mann makes clear, the first time, perhaps times, you do something you’ll really be terrible at it. As Ze Frank said, saving up ideas with nothing but the notion that you’ll one day execute them perfectly and be greeted with immense volumes of praise and money is a sure recipe for stagnation.

The internet’s the native home for amateurs. It’s a place where 90% of the stuff is made by people who could never have convinced someone to pay them for what they built but felt a strong enough desire to that they put it out here on the web for us. The purpose of learning How to Internet is so that you can better deal with the wealth of that diversity of stuff that exists on the internet and use it to entertain, inform, and improve yourself.

The internet is a freer place than any other because of the twin engines of anonymity and low costs of entry. Surely anonymity has problems, which /b/ shows well, but it also creates scary brilliance. Imagine how unlikely someone would have been to publish LOLcats if they were risking their reputation on it.

A low barrier to entry makes it possible in a way it never was to be only constrained by your effort. This is incredibly empowering and a little scary. Never before have you been so able to rise through a rather pure meritocracy, never before have you been so unable to blame some gatekeeper for your lack of success.

Great things are afoot on the internet. Mind-bendingly great things are produced every single second of the day and put on the internet. What I hope I managed to give you this week was a competent sampling of the tools you can use to find, follow, and share those great internet things you love.

Thanks for your time and attention.

How to Internet: Publishing

As you get better at the internet, you’ll likely start to feel a desire to share something with the world. Thankfully, the internet is awash with technologies that make that easy and painless.

Outside of Facebook, the can-be-used-for publishing platform that most civilians are likely to have heard about is Twitter, which hardly qualifies as a publishing platform. If you’re ever looking for an old tweet, you’ll quickly realize that the medium is built to be short-lived. That’s not an inherently bad thing, but anyone who has the compulsion to record their thoughts in a public way probably doesn’t want to do so on such an ephemeral platform. Add to that the character limit and I would contend that anyone trying to use Twitter for much more than fooling around is acting foolishly. So, one wonders, how do I publish things in a public way so they can be found later?

My answer, at least for any word publishing (I’ve never tried to publish lots of photos, video, or audio, so I can offer no expertise) is to use either Tumblr or WordPress (either flavor).

Lloyd has a Tumblr, which I like, and it illustrates one of the central strengths of Tumblr. For pulling together disparate media types and publishing them quickly, I don’t think a better tool exists. And even though it was really built for that, there are other ways to use Tumblr. More than a few hip designer-types use it for blogs very much like this one.

But compared to WordPress, Tumblr’s features for a complete personal blog are somewhat lacking. It’s certainly not terrible, it’s just not as awesome and adaptable as a self-hosted installation of WordPress. Lone Gunman is online because of a self-hosted WordPress installation, as are my sites. Self-hosted WordPress offers a wealth of features Tumblr doesn’t have, like automatic post revisions, full category and tag support, and the ability to access your posts in thousands of different way with just a little PHP know-how.

But if you’re just getting started, self-hosted does have the serious downside of requiring you to have and maintain your own server space. That’s where WordPress.com comes in, it’s more directly comparable to Tumblr—only requiring you to create a log in for it to work—but it also offers features like post revisions, as well as a great full-screen writing view, and a bevy of things not mentioned. (If you’re interested, I recently made a longer write-up of the Tumblr vs WordPress.com question.)

Lest we forget, there are also a number of tools other than those two, both free and paid. Notable free ones include: Google’s Blogger (which, after what feels like a decade of neglect, finally has an interesting-looking future), Posterous, Joomla, LiveJournal, and Drupal. Some paid ones are Typepad and Moveable Type (technically free or paid), Squarespace, and ExpressionEngine. In both categories there are certainly even more I can’t think of. I don’t have enough experience with any of those to have much guidance about them, but if you don’t like Tumblr or WordPress, they’re all certainly viable options.

Really, though, the importance of the tool you use to publish pales in comparison to the way in which you use it. An active Tumblr may be marginally worse for long-form writing than WordPress, but it’s vastly better than a disused WordPress site. And that’s hard work that I don’t nearly have the ability to cover this week. If you’re looking to actually get some help with that, please allow me to recommend Merlin Mann’s ouvre, and particularly this little riff about making the clackity noise.

What you should write about, when, with what frequency, those are all non-trivial questions, but I’d again emphasize that they pale in comparison to the importance of doing work rather than thinking about it.

And a final point: writing, especially on the internet, is hardly the quickest path to fame and fortune. If you’re only interested in publishing stuff on the internet for that reason, get out now. The probability you’ll find more than heartbreak and frustration down that road to fame is lottery-winning small.

I don’t mean to end on a crushing note. There’s huge value in internet publishing beyond its minute potential for saving you from ever needing “a real job.” But for a while I thought it would have that potential for me and it didn’t. Instead, what I got was an unexpected community of people to learn from, and a chance to work with people like Lloyd. People interested in making good stuff on the internet, even if it never gets us anything. That’s the reason to try your hand at web-publishing: it’s a beach-head onto the wider world of substantive accomplishment and relationships in a way that no Twitter account or Facebook page is. But it hardly guarantees you of anything but a modest square of sand.

How to Internet: Reading

One of the first problems you’re likely to run across as someone who’s now finding lots of interesting things on the internet is that you’re amassing more stuff you want to read than you’ve ever had before and it’s getting hard to track. If you’re like I was for about five years, this will likely take the form of having 80 tabs open persistently causing your browser to be slow and your potential for catastrophic data loss to be high.

There are three big obstacles to getting reading done on the internet. The first, and hardest to fix technically, is your context. That is: if you’re used to just getting on the internet to offer constant partial attention to your browsing while instant messaging, listening to music, and watching video clips, settling in to a multi-page essay will feel very difficult. So too, if you frequently focus only on the internet, but click like mad and just skim everything, reading will feel broken to you.

There are two solutions to this problem: change you situation and change your mind. Frequently people who find themselves unable to focus at the computer will find themselves much more able to do so on a tablet, e-reader, or even phone because they have different habits there. This is a subtle and automatic way to change what you’re expecting on the internet without expending the mental effort to actually execute with the other option, which is just to put some effort into calming your mind and allowing yourself to focus. (Like most things I’ve written about this weeks, whole books could be written about this paragraph.)

The second obstacle is in some sense the most mundane, but if one is to judge by the amount it gets talked about, also the most frustrating. If you spend much time at all trying to read on the internet you’ll soon notice the frequency with which publishers (especially those coming from other media) divide their content to maximize page views. A 1000 word article split over ten pages is a good way to drive page views but terrible for reader satisfaction. There a number of ways to un-paginate an article—browser extensions, web services, and local software all exist to do this parsing for you—but the most used is simply the printer-friendly view that most such sites provide.

But that solution gets us to the final notable problem, which is that many pages on the internet that house articles you want to read weren’t really built for reading. Probably the most important way in which they aren’t is that they have (visually) loud ads and other content surrounding them that pulls your eye and attention away from reading. Another problem is type set poorly, things like: type set too small or too large, type set in very wide columns so you constantly lose your place (especially common on printer-friendly pages), and poor contrast between the type and the background. I believe that these problem are today best solved with Readable. What Readable offers is a bookmarklet (a bit of Javascript disguised as a bookmark) that automatically changes any page on the internet to exactly the formatting you’ve told it you want pages to have for reading. This concept first came from Readability, but that has subsequently become a far more feature-full and complex tool.

Finally, we need to tackle that tab overload issue, because even as browsers get better at not losing such data they still do. And, as people get more and more powerful and mobile phones and tablets, keeping everything on your desktop is ever less feasible. The best solution I know of is to effectively outsource your tabs. Send all of them off to a bookmarking tool, be it delicious, Pinboard, normal bookmarks (with or without syncing), or a tool that’s purpose-built to handle all those articles you want to read.

Instapaper is what I use, but it’s optimized for an Apple-centric technical environment. It’s great if you want read articles offline on an iPad or iPhone, but doesn’t have native clients for any other platform. Readability, which was mentioned earlier, is a more platform-agnostic alternative (by virtue of a web app) which offers the nice perk that you automatically pass on a portion of your membership cost to the publishers you most frequently use the service to read. (Though the fact the you’re paying for membership is a non-trivial downside.) Beyond those there are number of other services built for this purpose, the most prominent of which is Read it Later. I have no experience or expertise at all with any of this last class.

I hope you now understand the importance of the triple threat of the printer-friendly view, in-situ reformatter, and the reading-centric bookmarking service. Far more importantly, I hope you’ve found a solution to your most frustrating struggle in actually reading all that great web-content you’re now finding.

How to Internet: Staying Current

For the uninitiated, phrases like “Subscribe to this Blog”, “RSS feed”, and “Feed Reader” are just so much noise. So here’s a very short explanation: you use a “feed reader” to “subscribe” to a blog using its “RSS feed”. Make sense?

To use a slightly more analog story, you can think of this whole thing as a way to build a newspaper of your choosing. (That’s the feed reader.) You build this newspaper by choosing individual reporters who your like (RSS feeds), and then their content is automatically added to your newspaper every time they produce it. This can be, as you might guess, a much better way to know what happening at the sites you care about than manually trying to check them at an interval you care about.

It’s probably true, though I have no data on this, that RSS feeds are known to about 20% of internet users. And that among those 20%, about 80% use and enjoy them. That other 20% doesn’t like them for a variety of reasons and so uses something else.

In most cases, “something else” means some type of bookmarks system. The most common form of this is a flat set of bookmarks that you pick through and visit as it strikes your fancy. A slightly improved version of this is a simple folder set where you regularly open the contents of your folders into tabs. This can be further enhanced by breaking down said folders into the approximate frequency you want to visit the site, and then opening them on roughly this schedule.

The whole bookmarks option is not useless or totally foolish, but given the choice I don’t understand why anyone would choose it. RSS feeds are a clearly better solution as they make it possible for you to never miss anything, make it easy to save things to revisit at a better time, and can be made massively flexible and mobile in a way that websites rarely are.

There were once other notable RSS readers, but today if you’re doing it you’re almost certainly utilizing Google Reader in some way. If you refuse, there are other solutions that exist: many email client have RSS readers built-in, most browsers let you set up RSS folders, and some standalone non-Google using clients exist. But because they’re so obscure and rarely used, I’m not going to explain them to you.

Google Reader is the best option for in-browser RSS browsing, and it’s an even better option if you like out-of-browser RSS browsing (because so many clients for smartphones, tablets, and the desktop use it for synchronization). Beyond the fact that you’ll want a Google Reader account, there’s not much advice about technology to give. If you find the browser version inadequate you can find one of many clients for your desktop, iPad, or Android phone. Any specific recommendations I may have about software are too platform specific for me to feel they’ll be valuable to share.

But as someone who’s been using RSS feeds for about seven years, I have a recommendation about managing all that stuff that you’ll now find so easy to collect. All feeds can be understood as belonging to one of two categories: Noise—content that you like browsing but rarely care to pay careful attention to; for me this is things like The Awl, Gizmodo, and Boing Boing—and Signal—stuff you’ll be quite sad to miss items from; for me, things like I recommended yesterday. This is the basic type of folder system I recommend setting up in Google Reader.

A lot of people choose to only have Signal in their feed reader, and I do think that’s a valid way to deal with the very real danger for gathering an overwhelming volume of stuff that feeds create. But over the last couple years I’ve built a system that I think I preserves much of the serendipity that makes the internet such a magical place but removes much of the too-much-stuff feeling that frequently goes along with it. My Signal & Noise system also works great for reading on the go.

Regardless of your feed volume, I think you want to stick to less than 100 new items coming in as “Signal” each day. This is the stuff that you most want to read, so keep it to a volume that you can really give careful attention. Signal is also the stuff you’ll cut last when you’re low on time to check these things, and you don’t really want it at so high a volume you have to cut it too.

Noise is your fail safe. When it all gets to feel like too much volume, you can mark all that Noise as read and feel little concern because you know you rarely find lightning in there. But to my mind, you can easily go through more than 1000 “Noise” items a day and you won’t feel much pain. (Though if you do have that much volume, I recommend you actually have multiple “Noise” folders, divided by topic area.) The time you spend on your Noise should come out about equal to what you spend on Signal.

That’s because you can easily “read” your Noise by relatively quickly glancing past the headlines and clicking just the 20 or so that strike your interest. Sorting your Signal should inherently be harder, as it’s got a rather large proportion of things that you like, want to read carefully, and maybe even spend a week thinking about.

A final note on this system: because of the amount of stuff I churn daily and the percentage of time that I do it without an internet connection (another advantage RSS has over websites) I personally find it useful to have an intermediate folder. A “Noisy Signal” folder of feeds that have between 1 in 5 to 1 in 20 items that I really care to see closely. That allows me to more easily keep the interesting stuff I don’t have time to closely examine while on the go together, for future examination beside my Signal folder. Whether or not that’s a valuable idea for you I’ll not speculate.

To wrap up, RSS feeds are your friend if you have an interest in following more websites than you can check manually at sane intervals. They can overwhelm if you jump in too deep, or without enough preparation. But using the Signal & Noise system, I see more than most people could even fathom on a daily basis, but it takes just a fraction of my time and energy. And any such advantage you can get, I recommend using.