denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)
Denise ([staff profile] denise) wrote2011-03-29 05:34 pm
Entry tags:

Technical debt and the making of payments on it

I saw an entry posted the other day where someone said sie was disappointed with (among other things) DW's development pace slowing down: new features being released more slowly, things that we were working on delayed/postponed, etc. And there were totally some valid criticisms in there, don't get me wrong! (In fact, I'm not linking it because I don't want there to be an overwhelming impassioned defense of DW in hir comments.) But that's one criticism that made me realize I've been doing a poor job of explaining precisely what's been going on in DW development and why there's been a paucity of user-facing changes, which can look to an outsider like there's a massive slowdown going on in DW development.

The answer is at once both very simple and very complicated: we've spent the past six months or so concentrating on paying down our technical debt.

Technical debt, as detailed in that Wikipedia article, is the collective IOU you-the-developer write to your future self. Let me just do this fast now, and do it right later, you think: let me do the quick but messy way now and do the correct way later. But of course, the minute you ship, you're moving on to the next big thing, and you never do go back to fill in those FIXMEs and TODOs in the code, until the next time you have to touch that area and what would otherwise be a five-hour fix has turned into a twenty-hour fix because you have to "pay back" the debt you incurred last time.

More than that, you've got to pay the debt plus interest: it's a common truism that code you yourself wrote six months ago is as impenetrable as code written by a complete stranger, and you have to spend a great deal of time puzzling out what the heck you were thinking back then. (Code that is brilliant, flawless, and crystal clear when you write it slowly morphs into idiotic, bug-ridden, and clear as mud over time. This is a well-known process. I suspect pixies in the source code repository, working their anti-magic while nobody's looking.)

Now, like real-world, financial debt, sometimes there is prudent technical debt: sometimes it makes sense to incur those future obligations in order to accomplish something you otherwise couldn't have. We saw a lot of instances of that during the ramp-up to open beta, where we did it one way then to "get it out the door" and opened bugs for later refactoring so we didn't lose track of the debt we were incurring. (There's a good article about the technical debt quadrant that discusses these various types of technical debt, and when they might be useful.)

There's also scenarios where you don't realize you're committing to technical debt until later: you do something one way based on your best understanding at the time, and down the road -- due to new technology becoming available, new people joining your team with new strengths that allow you to work on things you couldn't before, new sources of funding appearing, new progress in hardware capabilities, etc -- you realize, oh, hey, we could have done it this way instead, and it would have worked much better.

In short, there's no way to completely eliminate technical debt in software design, and more than that, you wouldn't want to even if you could. Technical debt, like the responsible use of revolving lines of credit or obtaining a mortgage to buy a house in the real world, is an important part of the software lifecycle, and responsible use of technical debt is a tool that can enable a software project to succeed where they otherwise might have failed.

In a 'typical' two-year-old software project, the amount of technical debt you've built up -- if responsibly managed, which I like to think we've been pretty good at with the things we've made so far -- is fairly negligible, and you can probably get away with spending 10-20% of your coding time and effort to making payments on your technical debt without problems, thus leaving you 80-90% of your time to advance new user-facing features and improvements.

The problem is, Dreamwidth isn't a two-year-old software project. We've only been open for two years, true -- but when we opened, we forked from LiveJournal, which sprang into life in Brad Fitzpatrick's dorm room in 1999. In forking our code from LiveJournal's, we inherited that decade's worth of features, fixes, and improvements, but we also inherited a decade's worth of deferred technical debt. Dreamwidth isn't a two-year-old software project; it's a twelve-year-old software project. And the face of technology has changed quite a bit in those 12 years.

In the decade LJ was under development before we forked our code from theirs, there were of course payments being made on that technical debt; it's necessary in order to move forward. Really big payments, though -- the technical equivalent of paying down your credit card in one lump sum because you've come into a windfall -- were mostly deferred.

So, one of the things we need to do in order to move forward in a lot of instances, to take advantage of the advances that have been made in technology since various features and bits of site design were first coded, is to do all the work necessary to get to a place where we can take advantage of them.

I'll give you an example: when we forked from LJ, the code "out of the box" would only work on the Apache 1.3 series of web server software -- and early releases of 1.3, at that. (The web server is the program that runs on your machines and handles how to serve web pages to your browser when you ask for them -- if you don't have a web server running, nothing else works.)

The first release of Apache 1.3 was June 6, 1998 -- it was cutting-edge when Brad started LJ in 1999. The latest version that the LJ code would work with was (I believe -- I may be wrong) 1.3.17, released January of 2001. Subsequent versions of Apache wouldn't work with the LJ code -- it would cause horrible errors and prevent the site from running at all. To upgrade to later versions of Apache (which had more features, fixed security holes, and in general were more technically advanced) would have taken a lot of work on the code.

Even at that time, Apache was working on Apache 2.0 -- a much more technologically advanced version. The first 2.0 release was in March of 2000. The problem was, it was mostly incompatable with the optimization tricks used under Apache 1.3, and LJ code was highly optimized to take advantage of the 1.3 series. To port LJ over to the Apache 2 series would take an incredible amount of effort -- aka, technical debt.

By the time we forked the code, in mid-2008, Apache 1.3 was nearing the "end of life" -- the point past which Apache would refuse to support it, refuse to issue any additional updates, and generally say, look, c'mon, it's been nearly a decade, upgrade already. (It was formally EOL'd in February of 2010, but everyone agreed it was well past time.) So if we accepted the requirement for Apache 1.3, we were already tying our hands, and making it incredibly hard for our developers: the Apache 2 series had been the standard since at least the middle of the decade, and if someone wanted to run DW, either as a production website or to do development work, they would first have to spend hours actually downgrading their server in order to make the code work.

Before you guys even saw DW, before we could even get to the point where people could install and start hacking on the code, [staff profile] mark had to spend months making the code work under the Apache 2.0 series -- he had to sit down and pay the technical debt. The same thing happened with Perl, the language DW is written in. LJ had been written in an earlier version of Perl, and later versions had some backwards-incompatabilities; DW had to modernize before we could upgrade.

The things we need(ed) to do aren't all that obvious, of course. Those of you who came from LJ know that each page you see on LJ has the suffix ".bml" on the end of it. BML stands for "Better Markup Language" (or "Brad's Markup Language"), a templating system that produces the framework to generate a page. (Things like the site skins --Tropospherical, Celerity, Gradation, etc -- are a function of BML; the contents of the page are the same no matter what, but the templating system builds the various 'looks' of the page so you can just swap them in and out and the contents don't change but the display does.)

In 1997-1998, when Brad was working on FreeVote (his project before LJ) and the first iterations of what would become LiveJournal, there wasn't anything better out there, so he had to "roll his own". He got it to a place where it would work for LJ, and then -- because there wasn't any real need to advance things further, since it already did everything he needed it to do -- mostly stopped work on it. The face of the web changed a lot since that point (this may be a wee bit of understatement), and today there are a lot of incredibly powerful templating systems out there that do way, way more than BML does -- and are under active development, so there will be future awesomeness coming out of them.

Additionally, making people learn BML -- which is a fairly impenetrable system -- in order to contribute to DW would be silly -- it's a barely-documented custom language that only exists on less than half a dozen websites in the world. The other templating systems out there are in wide use, and it's way more likely that someone will already have the skill set necessary to contribute. (Not to mention, it'd be nice to have a templating system that has actual books written about it rather than a few web pages here and there.) Switching to a more standard version makes an incredible amount of sense.

It's also an incredible amount of work. (We've been plugging away at it, bit by bit, for over a year. At this point, the pages you see on DW itself are half generated by BML, half by Template Toolkit (the templating system we chose) -- you never see the difference, if we do our work right, but our end goal is to get rid of BML entirely.)

There are a lot of examples of things like this, from the "big project" issues to smaller things (like the need to take duplicated code -- where the same block of code is repeated in multiple places -- and move it into a function that can be called from anywhere instead, so that people only have to update one area instead of many). There are cases where functions and features that were revolutionary when they were implemented on LJ have aged over time while the technology has advanced in leaps and bounds -- a good example there is the implementation of the inline cut-tag expander; when "lj cuts" were first introduced on LJ waybackwhen, that technology didn't exist, or would've been too much of a pain in the neck to implement. Over time, it became easy; from becoming easy, it became expected, until the point where a site that didn't implemented it started to look clunky or backwards.

People have come to expect a lot of things from their social websites, in terms of 'standard' technical abilities -- I'm sure you can think of about a dozen things that other, more recently-designed websites implement that DW doesn't. We'd love to have those features. (I probably curse their lack about twice a week.) The problem, again, is that the backend code for those pages was written so long ago that before we can drop in those features and functions, we have to modernize everything. You'll never see the work. But we have to do it before we can move forward.

Some people have asked us why we committed to the LJ platform when we knew we'd be accepting the IOUs of ten years of the programmers who came before us. The answer is twofold:

1) In many cases, we were those programmers. We knew the code, knew what it could and couldn't do, had a reasonable perspective on what we were in for, and were incredibly familiar with the way it worked, the way it ran, and the way it was put together. (I say 'we', but I mostly mean [staff profile] mark there. I didn't get much into development until we started DW. But I still followed along with the technical discussions, and I had a pretty strong grasp of the technical end of things even though I hadn't been doing the coding myself.)

2) In addition to the ten years of technical debt, we also inherited the benefit of ten years of bugfixes, security fixes, architecture/performance improvements (in another 10 years when historians write the history of the early 2000s on the internet, I fully believe they will point to LJ as the technical pioneer that made a very great deal of Web 2.0 happen; the problems LJ solved back then are universal to any high-load system, and the solutions they/we came up with are still in use today), and feature development.

We believed, and continue to believe, that the LiveJournal system and code contains some of the most incredible social features out there, to the point where even today, ten years later, there is no other site that does everything the LJ code does and does it as well. A lot of that amazingness is buried, now, under a lot of "usability problems" that are actually relics of the fact that nobody went back to modernize things once the first draft was released. (To be fair, there are tons of usability problems that are actual usability problems, and were at the time the feature was released, as well. And I don't want to intimate that LJ-now is ignoring these problems either; they've been doing a lot of work on their own technical debt lately, as evidenced by the number of people who accuse them of not working on any new features either.)

One of our major goals with DW is to take the awesomeness that is inherent in the LJ codebase and bring it into the "modern era" of web design and function. We've made some great strides, but we're still only part of the way, and every time we set out to do something new, another whole chunk of problems that we have to address first pop up. It's the technical equivalent of having to learn to crawl before you can learn how to walk: before we can complete the new update page redesign, for instance, we need to do an incredible amount of work in order to make it possible to intermingle JQuery (the Javascript library that we need for the modernized widgets on the update page) with the existing Javascript the site uses. (Among many other things.)

So, when you see a code tour that's full of nothing but backend improvements, don't think of it as "DW isn't doing any feature development". Think of it as "DW is doing the necessary background work to enable awesome feature development in the future". The work we're doing now is going to pay off in the future, and it's going to allow us to do epic things.
sophie: A cartoon-like representation of a girl standing on a hill, with brown hair, blue eyes, a flowery top, and blue skirt. ☀ (Default)

[personal profile] sophie 2011-03-31 10:13 pm (UTC)(link)
I have actually apologised to the next developer in comments because of this!