Hacker Newspaper: A Case Study in Signal and Noise
- February 8th, 2010
- Posted in General Nerdery
- By AMB
- Write comment
The inimitable Giles Bowkett is the mad genius behind an awesome little Hacker News filter called Hacker Newspaper. It’s quickly become the only way I read Hacker News. It cuts out a lot of the bullshit and presents what’s left in a meaningful, easy-to-read fashion. In essence it takes a channel, removes noise, and then improves the signal somewhat by reshaping the message to be easier to understand. Noise down, signal up, SNR goes way up.
Compare this to horrid, ad-laden spam blogs that scrape other sites for original content and then wrap it ads. These take a signal and, without modifying the signal at all, add noise. This dilutes the signal, causing the SNR to go down. Sometimes they even combine the signal from multiple sources, adding interference and confusing things even further.
In a recent post, Bowkett ponders what the difference is between his tool and site-scraping spam blogs. I’m fairly sure that our conclusions on this point are isomorphic, but just framed in different terms. In my opinion, it all comes down to signal and noise. Hacker Newspaper remains true to the original signal, but filters out a lot of noise. This means that it’s unequivocally a better source for that signal. (Why listen to Hacker News when Hacker Newspaper provides the same signal, only stronger?) Spam blogs, on the other hand, are reverse-filters that add noise and cause signals to bleed into one another.
To put it another way, Hacker Newspaper is a filter and spam blogs are anti-filters. I think this not only explains the difference between them, but also hits to the heart of why people express moral outrage at spam blogs and (usually) don’t at things like Hacker Newspaper. People, especially those in the hacker community, appreciate good filters. They appreciate them so much in fact, that their willing to totally ignore any principles they may have with regards to “stealing content” if what’s done with that content makes the signal come through stronger.
On the flip side, hackers are rightfully annoyed by obfuscation and irritated when someone willfully makes our lives harder just to make a buck. This is essentially what happens when spam blogs wrap a signal in noise that profits the owner of the spam blog. The message gets harder to interpret just so someone who’s too lazy and useless to make their own content can make some easy money on someone else’s message. It should go with out saying that this makes spam blogs unmitigated bullshit that are full of naught but evil and fail.
People’s reaction to this lameness often gets expressed as moral outrage. (“How dare you steal content!”) I think it gets framed this way because, given the choice between making an argument from utility and from morals, most people prefer the moral high ground. After all, righteous indignation is much cooler than simple annoyance.
That’s an over simplification of course, but most people’s principles, when it comes to things like “stealing of content” (scare quotes employed because the theft metaphor is a pretty inaccurate one when it comes to digital content) make a lot of allowances for adding value. For many people, the principles are genuine, but misstated. The objection isn’t to reusing content, it’s to diluting someone else’s hard work. Or to plagiarism for the sake of profit. Or simply to the loss of value that occurs with any loss of SNR.
In other words, people don’t really mind it when someone reuses content, they mind when someone ruins content.

Some random thoughts:
1) I am a big fan of filters. Clay Shirky’s talk on “It’s not information overload; it’s filter failure” was absolutely true. Filtering is the most important technology related to the web, and the biggest presence on the web (Google) is devoted to it as a primary mission. It’s precisely how we take the good and leave the bad of the “everyone can publish” paradox.
2) When I skim a site like boingboing or slashdot, I know I miss things. A filter I do have in place: When friends share links, it can serve as automatic mental-spam-folder validation if I happened to skim over it. If I don’t see it in multiple places, it was even more likely to not be worth my time.
3) If there was someone who had basically my same taste in what is interesting on the web, I would definitely subscribe to their feed. I’d even be willing to pay something for the privilege.
1.) Amen. We live in an increasingly information-centric culture. And there is no more important tool in such a culture than the filter. It’s exactly because of good filters that naive “Cult of the Amateur” critiques of the Internet are bullshit. It’s one of the most compelling reasons (though from the only reason) why people like Shirky are right and people like Andrew Keen are epic wrong.
2.) Noise is one of the reasons why I have a hard time really reading BoingBoing etc. It’s also why, mercifully, I’m not addicted to checking them like some folks I know. The odds that hitting F5 is going to turn up useful or interesting information is vanishingly small. The odds that checking it once a day or so might turn up something is significantly higher.
3.) Hmmm, agreed. I’m thinking devilish thoughts of hackery. I’m thinking RSS + Personal Up-Down Voting + Bayesian Filter = ??? Something awesome maybe? We should talk.