Bridge

Trainwreck, or, It's a whole new way of looking at the world

[ Home page | Things that suck ]

This is the (sadly) rather unstructured transcript of an IRC conversation on the topic of web-loggers and their itty bitty protocols and file formats. After finishing this rant, I hadn't the energy to format it into prose, but I'm egotistical enough to want to expose it to the web anyway. Sorry.

(Also, apologies for physical formatting in this page. It Just Looks Better That Way.)

chris jesus!
chris libxml2 is 3.7MB of source code!
chris how complicated can recognising bloody angle brackets be?
chris how can it be that understanding a simple "Atom" feed requires installing two bazillion fucking perl modules? and what the fuck is SAX, and why should I want anything to do with it?
J has studiously avoided becoming infected with answers to any of these questions.
J But 3.7MB takes the piss.
chris and what, i mean what the FUCK, is "X-WSSE authentication"?
chris oh for fuck's sake. fucking "Atom" has invented A WHOLE NEW AUTHENTICATION SCHEME! it's not as if there are two in HTTP already! no, those are no good, because those DON'T HAVE A FUCKING X IN THEIR FUCKING NAMES.
chris fucktards
J "Ex-wussy authentication"?
J WTF is "Atom"?
chris it's a replacement for RSS
chris RSS, a simple format for news headlines
chris of which there are now SEVEN MUTUALLY INCOMPATIBLE VERSIONS
chris (or is it nine?)

meteobot NOW: wind 8 knots (force 3, gentle breeze) from W →

meteobot NOW: wind 2 knots (force 1, light air) from SW ↗

chris The simple task of producing a library which parses "Atom" has so far required the installation of 20 archives full of perl modules and one gigantic library from GNOME
chris allegedly this standard is supposed to be better than RSS (it would be hard to be worse...) but it's not looking promising so far
meteobot NOW: wind 8 knots (force 3, gentle breeze) from W →
J Mein gott.

chris oh, i give up
chris it turns out that atom sucks just as much as rss (probably more) :(
J RSS wins solely on the number of MB of extraneous source not required, by the sound of things.
chris well, i can't actually remember how much other crap i had to install to make rss work
J had to install a fair amount of XML and XSLT related libraries, IHRC.
chris actually the problem with both of them is that they're not so much protocols as manifestos for new societal norms, just like XML is.
J wonders what the critical differences between societal norms and protocols are in practice
chris ok, my comment was (intended to be) a bit soundbiteish
J (mine was more a genuine question than a refutation of yours)

chris the protocols that actually work -- smtp and http are basically the only long-surviving examples on the internet, I think, perhaps IRC too -- are characterised by the strict-send/loose-receive principles

J ah
chris XML and the blogger-weenie protocols are characterised by the statement "if you don't obey the protocol then the rightful wrath of people who actually give a fuck about XML will descend on you *and that will show you!*"
chris this argument was made frequently in the early days of HTML, and look where that got us.
chris "those who do not learn from history are doomed to repeat it"
chris in more depth, the blogging weenie protocols are crippled at birth by the fact that the people who write blogging software are, as a class, completely unaware of their own limitations
chris so rather than getting something really simple (list of dates, titles and URLs, separated by whitespace) we instead get some complicated mound of crap which exists in seven mutually-incompatible versions. (see mark pilgrim for the gory details.)
J Interestingly, the successful protocols are a layer or two removed from the problematic ones. SMTP, HTTP, IRC are all essentially simple - the information passed over them can be complex, but they all treat it is a blob of no great significance.
chris (yeah, that's another place where the successful protocols go right and the others go wrong. and it applies in more detail too. the hairy bits of http/smtp that Don't Quite Work are the ones where the servers have to look inside the box -- e.g. anything involving character sets, content-types, languages, etc.)
J It could be that the problems with XML, HTTP, RSS and friends are merely reflections of the complexity (and complete lack of "this is the right answer"ness property) of the data they are supposed to deal with.
meteobot NOW: wind 0 knots (calm)
chris (probably. but that doesn't explain the failure to start with limitations in mind and build robust protocols. instead we have this enormous scaffolding of XML, which is supposed to solve all of these problems but doesn't, and then a crowd of users who can't get the simplest thing right.
chris the rss community even disagree over how html-in-rss should be encoded!
chris (i think in practice one interpretation is now settled, but even so!)
J Very true, twicely.
chris anyway, it's very annoying (and slightly mystifying).
meteobot NOW: wind 6 knots (force 2, light breeze) from W →
chris worse still, it's pretty clear that for lots of the sites i try to read via rss, nobody actually uses the rss feed -- often it will turn out to be weeks or even months out of date, *and nobody has noticed*.
chris lots of people rely on third-party scrapers to scrape the content on their sites, and these just stop working
chris the other problem is that that idiot dave winer "owns" rss in some complicated territorial sense (for him, it's the equivalent of pissing on the web so that it smells of him for ever after) and so the blogger people won't use it
chris of course, it's not true that quality of writing is influenced by quality of software, so there are some sites i'd like to read via my headlines aggregator which are blogger sites.
chris so i need atom support.
chris but now i go and look at some of these sites in more detail, and it turns out that the atom feeds have EXACTLY THE SAME PROBLEMS THE THIRD-PARTY SCRAPED RSS FEEDS HAVE -- they're out of date, or have only a subset of the content, or (in one case) seem to have content completely unrelated to what actually appears on the site
chris (i suppose one should file this under "reasons not to use a shared hosting facility")
chris anyway, it's all a total trainwreck

J got sucked in to dull material linked from Dave Winer's website, and escaped only just in time.
chris * J got sucked in to dull material linked from Dave Winer's website <-- oh god, it may be too late!
J is pretty certain he managed to pull out before becoming infected.
chris dull material linked from Dave Winer's website <-- as opposed to what kind of material linked from Dave Winer's website?
J apologises for the tautology.

And to finish with a question: should I add links to the above?


Copyright (c) 2004 Chris Lightfoot. All rights reserved.