In this first instalment in a series on Legacy Code in the Startup world, Engineering Lead, James Murphy, discusses the psychological aspects of dealing with legacy code and how, as developers, we can start the long and arduous journey to, like a recovering alcoholic, deal with the repercussions of our sustained bad behaviours.

The journey for Rideways began over 18 months ago at the start of 2015 with one thing; an idea. Could we as part of the Rentalcars.com family, build and launch a product in less than 6 months that would disrupt the ground transportation business?

An idea is the one thing that all successful (and unsuccessful) startups have in common, but an idea isn't enough.

When I joined in Summer 2015, Rideways was being built primarily by a team of contractors, brought in to deliver a ‘prototype’ that could realise that ‘disruptive’ vision in its entirety. Fingers were furiously tapping away to deliver the elusive MVP (minimum viable product).

Pragmatism is King

The MVP focus left little time for nice-to-have's and even less time (if any) for things like end-to-end and unit-testing. Spec? What spec?

Coming from the sheltered world of the BBC, where there was plenty of time and human resources, this came as somewhat of a culture shock.

Welcome to the startup world and startup culture, where pragmatism, common sense and pure speed of delivery are winning characteristics to ensure your startup doesn't become yet another startup statistic.

However, the time comes in any early startup where you need to begin acting like a professional, and testing stuff. If you don't, you'll start suffering issues with regressions, and despite all the will in the world to test and deliver, ultimately one or two will always slip through (to err is human!).

Proof of Concepts are Untested

My first task was to begin by developing a widget feature on the Rideways platform that would enable us to onboard various affiliates, cross-selling our offering on external websites. Initially written by a contractor, it was inevitably coded as a proof of concept without tests (writing concepts with tests could be argued to be a colossal waste of time if the prototype is ultimately thrown away).

When you're asked to take someone else's work, extend it and add further features, what do you do? It's technically legacy code, you didn't write it, and you can't be sure that, if you end up changing or refactoring any of the features, you won’t end up break anything. Without an initial specification to go by, what do you do?

“Legacy code. The phrase strikes disgust in the hearts of programmers. It conjures images of slogging through a murky swamp of tangled undergrowth with leaches beneath and stinging flies above. It conjures odours of murk, slime, stagnancy and offal. Although our first joy of programming may have been intense, the misery of dealing with legacy code is often sufficient to extinguish that flame.”
― Michael C. Feathers, Working Effectively with Legacy Code

Legacy Code is Daunting

To change code, refactor, add or extend existing code, without tests, is like navigating a rocky shore without a lighthouse to guide you. It’s treacherous, perilous and fraught with danger. So before you embark, write tests.

For those of you unfamiliar with wrapping legacy code with tests, this can seem a daunting task indeed. With an unforeseen territory ahead where and how do we start?

I find myself, when confronted with new code that I do not fully comprehend, daunted. I don't understand it.

At least, I don't understand it yet!

To overcome this hurdle, I often start with a happy path. The simplest scenario that cuts through the heart of the code and defines its true intention. Not only does this psychologically give you momentum, it also enables you to build the foundation you'll need to start writing more tests, quicker. I often find the first test in legacy code is sometimes the test that breaks the Camel’s back if you get it right, making subsequent tests easier and creating momentum.

You could argue that this test we're writing sounds like a bit of a gargantuan task. However, this is legacy we're talking about, and we can't begin refactoring without covering it first with at least a couple of tests!

In my case I was quite fortunate. The classes I needed to cover were written using the Spring framework, so Dependency Injection (DI) was included as standard. This makes things significantly easier because I can understand, test and reason about classes in isolation by mocking code boundaries (such as service layers or HTTP clients).

However, in many cases when dealing with true legacy code, we have little choice and we're left with the unenviable task of attempting to test something that isn't testable in traditional ways.

When we think of testing we can think of it in the simplest of terms. We're using inputs, processing those inputs in a black box, and then asserting something about what comes out the other end.

If we have legacy that is so badly written that it is next to impossible to internally test, we can still do our best to write ‘black box’ style testing at the closest of testable levels to ensure we cover some functionality.

Testing Legacy Code by Example

Taken from Uncle Bob's Clean Code (I'm sure you've read it, but if you haven't it's a classic!) we have the following code example. In this case we have a single class (PrintPrimes) that was generated from Donald Knuth's WEB tool. Let's work through this example and demonstrate how we work with legacy code.

Understand the domain

Before I dive into legacy code I try and wrap my head around the basic theme of it. We don't necessarily have to understand absolutely what the code is doing from the outset but we need to get a general feel for it before we start.

What is this code doing?

    package literatePrimes;

    public class PrintPrimes {
      public static void main(String[] args) {
        final int M = 1000;
        final int RR = 50;
        final int CC = 4;
        final int WW = 10;
        final int ORDMAX = 30;
        int[] P = new int[M + 1];
        int PAGENUMBER;
        int PAGEOFFSET;
        int ROWOFFSET;
        int C;

        int J;
        int K;
        boolean JPRIME;
        int ORD;
        int SQUARE;
        int N;
        int MULT[] = new int[ORDMAX + 1];

        J = 1;
        K = 1;
        P[1] = 2;
        ORD = 2;
        SQUARE = 9;

        while(K < M) {
          do {
            J = J + 2;
            if (J == SQUARE) {
              ORD = ORD + 1;
              SQUARE = P[ORD] * P[ORD];
              MULT[ORD - 1] = J
            }

            N = 2;
            JPRIME = true;
            while (N < ORD && JPRIME) {
              while (MULT[N] < J)
                MULT[N] = MULT[N] + P[N] + P[N];
              if (MULT[N] == J)
                JPRIME = false;
              N = N + 1;
            }
          } while (!JPRIME);
          K = K + 1;
          P[K] = J;
        }
        {
          PAGENUMBER = 1;
          PAGEOFFSET = 1;
          while (PAGEOFFSET <= M) {
            System.out.println("The First " + M +
                " Prime Numbers --- Page " + PAGENUMBER);
            System.out.println("");
            for (ROWOFFSET = PAGEOFFSET; ROWOFFSET < PAGEOFFSET + RR; ROWOFFSET++) {
              for (C = 0; C < CC; C++)
                if (ROWOFFSET + C * RR = M)
                  System.out.format("%10d", P[ROWOFFSET + C *RR]);
                System.out.println("");
            }
            System.out.println("\f");
            PAGENUMBER = PAGENUMBER + 1;
            PAGEOFFSET = PAGEOFFSET + RR * CC;
          }
        }
      }
    }

This example is definitely extreme (no one will name variables like M, RR or CC - if they do nuke them from orbit!), however it serves to illustrate that we can have poorly written code with bad naming, and sometimes we have to deal with it.

In this case we have an application called PrintPrimes. Looking at the constants at the top of our main they are very badly named; it's hard to know what they are used for without some form of context.

Some of the named variables betray meaning in the code. Variables PAGENUMBER, PAGEOFFSET, ROWOFFSET and JPRIME imply that we're printing primes out in pages.

This knowledge is actually all we need to get started writing tests so we can start the simplifying and refactoring process. We mentioned earlier that testing is essentially using inputs to test a blackbox and asserting outputs. That's exactly what we have here.

If we run the PrintPrimes application we get the following output:

The First 1000 Prime Numbers --- Page 1

         2       233       547       877
         3       239       557       881
         5       241       563       883
         7       251       569       887
        11       257       571       907
        13       263       577       911
        17       269       587       919
        19       271       593       929
        23       277       599       937
        29       281       601       941
        31       283       607       947
        37       293       613       953
        41       307       617       967
        43       311       619       971
        47       313       631       977
        53       317       641       983
        59       331       643       991
        61       337       647       997
        67       347       653      1009
        71       349       659      1013
        73       353       661      1019
        79       359       673      1021
        83       367       677      1031
        89       373       683      1033
        97       379       691      1039
       101       383       701      1049
       103       389       709      1051
       107       397       719      1061
       109       401       727      1063
       113       409       733      1069
       127       419       739      1087
       131       421       743      1091
       137       431       751      1093
       139       433       757      1097
       149       439       761      1103
       151       443       769      1109
       157       449       773      1117
       163       457       787      1123
       167       461       797      1129
       173       463       809      1151
       179       467       811      1153
       181       479       821      1163
       191       487       823      1171
       193       491       827      1181
       197       499       829      1187
       199       503       839      1193
       211       509       853      1201
       223       521       857      1213
       227       523       859      1217
       229       541       863      1223
 
The First 1000 Prime Numbers --- Page 2

      1229      1597      1993      2371
      1231      1601      1997      2377
      1237      1607      1999      2381
      1249      1609      2003      2383
      1259      1613      2011      2389
      1277      1619      2017      2393
      1279      1621      2027      2399
... full file is too large to output but you get the point

What we can do though, is write a test that runs our application and asserts that the output is exactly the same as the output above. This will ensure that we can't break the contract when running the code.

To start, I placed the above output in a file output.txt and wrote the following test:

package literatePrimes;

import org.junit.Test;

import java.nio.file.Files;
import java.nio.file.Paths;

import static org.junit.Assert.assertEquals;

public class PrintPrimesTest {

    @Test
    public void equalsFirstThousandPrimesOutput() throws Exception {
        String outputPrimes = new String(Files.readAllBytes(Paths.get("src/test/java/literatePrimes/output.txt"))).trim();
        assertEquals(outputPrimes.replaceAll("\n", ""), PrintPrimes.print().replaceAll("\n", ""));
    }
}

I also then changed the main of PrintPrimes to be a function that returned a String rather than a System print to console and appends the output instead to a StringBuffer, so we can check the output is working. Sure, we're changing the code slightly to facilitate testing but the changes in this case are transparent.

The important thing is we haven't changed the guts of the business logic so we can begin changing the code with relative impunity and refactoring it out into the respective classes.

We'll leave that though, for another day.

Legacy code can always be a tricky beast. It's often not what we intended, especially when we're beginning a startup, however, due to pressures of business and demands of delivery we can often be left with legacy before it's time.

What we shouldn't do though, is fear it...

Remember, understand it, cover it with tests and refactor with impunity!

In the next instalment. James will take you through how you can refactor the above code example taken from Uncle Bob's Clean Code and take it to a much cleaner, more manageable solution.