In this first instalment in a series on Legacy Code in the Startup world, Engineering Lead, James Murphy, discusses the psychological aspects of dealing with legacy code and how, as developers, we can start the long and arduous journey to, like a recovering alcoholic, deal with the repercussions of our sustained bad behaviours.
The journey for Rideways began over 18 months ago at the start of 2015 with one thing; an idea. Could we as part of the Rentalcars.com family, build and launch a product in less than 6 months that would disrupt the ground transportation business?
An idea is the one thing that all successful (and unsuccessful) startups have in common, but an idea isn't enough.
When I joined in Summer 2015, Rideways was being built primarily by a team of contractors, brought in to deliver a ‘prototype’ that could realise that ‘disruptive’ vision in its entirety. Fingers were furiously tapping away to deliver the elusive MVP (minimum viable product).
Pragmatism is King
The MVP focus left little time for nice-to-have's and even less time (if any) for things like end-to-end and unit-testing. Spec? What spec?
Coming from the sheltered world of the BBC, where there was plenty of time and human resources, this came as somewhat of a culture shock.
Welcome to the startup world and startup culture, where pragmatism, common sense and pure speed of delivery are winning characteristics to ensure your startup doesn't become yet another startup statistic.
However, the time comes in any early startup where you need to begin acting like a professional, and testing stuff. If you don't, you'll start suffering issues with regressions, and despite all the will in the world to test and deliver, ultimately one or two will always slip through (to err is human!).
Proof of Concepts are Untested
My first task was to begin by developing a widget feature on the Rideways platform that would enable us to onboard various affiliates, cross-selling our offering on external websites. Initially written by a contractor, it was inevitably coded as a proof of concept without tests (writing concepts with tests could be argued to be a colossal waste of time if the prototype is ultimately thrown away).
When you're asked to take someone else's work, extend it and add further features, what do you do? It's technically legacy code, you didn't write it, and you can't be sure that, if you end up changing or refactoring any of the features, you won’t end up break anything. Without an initial specification to go by, what do you do?
“Legacy code. The phrase strikes disgust in the hearts of programmers. It conjures images of slogging through a murky swamp of tangled undergrowth with leaches beneath and stinging flies above. It conjures odours of murk, slime, stagnancy and offal. Although our first joy of programming may have been intense, the misery of dealing with legacy code is often sufficient to extinguish that flame.”
― Michael C. Feathers, Working Effectively with Legacy Code
Legacy Code is Daunting
To change code, refactor, add or extend existing code, without tests, is like navigating a rocky shore without a lighthouse to guide you. It’s treacherous, perilous and fraught with danger. So before you embark, write tests.
For those of you unfamiliar with wrapping legacy code with tests, this can seem a daunting task indeed. With an unforeseen territory ahead where and how do we start?
I find myself, when confronted with new code that I do not fully comprehend, daunted. I don't understand it.
At least, I don't understand it yet!
To overcome this hurdle, I often start with a happy path. The simplest scenario that cuts through the heart of the code and defines its true intention. Not only does this psychologically give you momentum, it also enables you to build the foundation you'll need to start writing more tests, quicker. I often find the first test in legacy code is sometimes the test that breaks the Camel’s back if you get it right, making subsequent tests easier and creating momentum.
You could argue that this test we're writing sounds like a bit of a gargantuan task. However, this is legacy we're talking about, and we can't begin refactoring without covering it first with at least a couple of tests!
In my case I was quite fortunate. The classes I needed to cover were written using the Spring framework, so Dependency Injection (DI) was included as standard. This makes things significantly easier because I can understand, test and reason about classes in isolation by mocking code boundaries (such as service layers or HTTP clients).
However, in many cases when dealing with true legacy code, we have little choice and we're left with the unenviable task of attempting to test something that isn't testable in traditional ways.
When we think of testing we can think of it in the simplest of terms. We're using inputs, processing those inputs in a black box, and then asserting something about what comes out the other end.
If we have legacy that is so badly written that it is next to impossible to internally test, we can still do our best to write ‘black box’ style testing at the closest of testable levels to ensure we cover some functionality.
Testing Legacy Code by Example
Taken from Uncle Bob's Clean Code (I'm sure you've read it, but if you haven't it's a classic!) we have the following code example. In this case we have a single class (PrintPrimes) that was generated from Donald Knuth's WEB tool. Let's work through this example and demonstrate how we work with legacy code.
Understand the domain
Before I dive into legacy code I try and wrap my head around the basic theme of it. We don't necessarily have to understand absolutely what the code is doing from the outset but we need to get a general feel for it before we start.
What is this code doing?
package literatePrimes;
public class PrintPrimes {
public static void main(String[] args) {
final int M = 1000;
final int RR = 50;
final int CC = 4;
final int WW = 10;
final int ORDMAX = 30;
int[] P = new int[M + 1];
int PAGENUMBER;
int PAGEOFFSET;
int ROWOFFSET;
int C;
int J;
int K;
boolean JPRIME;
int ORD;
int SQUARE;
int N;
int MULT[] = new int[ORDMAX + 1];
J = 1;
K = 1;
P[1] = 2;
ORD = 2;
SQUARE = 9;
while(K < M) {
do {
J = J + 2;
if (J == SQUARE) {
ORD = ORD + 1;
SQUARE = P[ORD] * P[ORD];
MULT[ORD - 1] = J
}
N = 2;
JPRIME = true;
while (N < ORD && JPRIME) {
while (MULT[N] < J)
MULT[N] = MULT[N] + P[N] + P[N];
if (MULT[N] == J)
JPRIME = false;
N = N + 1;
}
} while (!JPRIME);
K = K + 1;
P[K] = J;
}
{
PAGENUMBER = 1;
PAGEOFFSET = 1;
while (PAGEOFFSET <= M) {
System.out.println("The First " + M +
" Prime Numbers --- Page " + PAGENUMBER);
System.out.println("");
for (ROWOFFSET = PAGEOFFSET; ROWOFFSET < PAGEOFFSET + RR; ROWOFFSET++) {
for (C = 0; C < CC; C++)
if (ROWOFFSET + C * RR = M)
System.out.format("%10d", P[ROWOFFSET + C *RR]);
System.out.println("");
}
System.out.println("\f");
PAGENUMBER = PAGENUMBER + 1;
PAGEOFFSET = PAGEOFFSET + RR * CC;
}
}
}
}
This example is definitely extreme (no one will name variables like M, RR or CC - if they do nuke them from orbit!), however it serves to illustrate that we can have poorly written code with bad naming, and sometimes we have to deal with it.
In this case we have an application called PrintPrimes
. Looking at the constants at the top of our main
they are very badly named; it's hard to know what they are used for without some form of context.
Some of the named variables betray meaning in the code. Variables PAGENUMBER
, PAGEOFFSET
, ROWOFFSET
and JPRIME
imply that we're printing primes out in pages.
This knowledge is actually all we need to get started writing tests so we can start the simplifying and refactoring process. We mentioned earlier that testing is essentially using inputs to test a blackbox and asserting outputs. That's exactly what we have here.
If we run the PrintPrimes application we get the following output:
The First 1000 Prime Numbers --- Page 1
2 233 547 877
3 239 557 881
5 241 563 883
7 251 569 887
11 257 571 907
13 263 577 911
17 269 587 919
19 271 593 929
23 277 599 937
29 281 601 941
31 283 607 947
37 293 613 953
41 307 617 967
43 311 619 971
47 313 631 977
53 317 641 983
59 331 643 991
61 337 647 997
67 347 653 1009
71 349 659 1013
73 353 661 1019
79 359 673 1021
83 367 677 1031
89 373 683 1033
97 379 691 1039
101 383 701 1049
103 389 709 1051
107 397 719 1061
109 401 727 1063
113 409 733 1069
127 419 739 1087
131 421 743 1091
137 431 751 1093
139 433 757 1097
149 439 761 1103
151 443 769 1109
157 449 773 1117
163 457 787 1123
167 461 797 1129
173 463 809 1151
179 467 811 1153
181 479 821 1163
191 487 823 1171
193 491 827 1181
197 499 829 1187
199 503 839 1193
211 509 853 1201
223 521 857 1213
227 523 859 1217
229 541 863 1223
The First 1000 Prime Numbers --- Page 2
1229 1597 1993 2371
1231 1601 1997 2377
1237 1607 1999 2381
1249 1609 2003 2383
1259 1613 2011 2389
1277 1619 2017 2393
1279 1621 2027 2399
... full file is too large to output but you get the point
What we can do though, is write a test that runs our application and asserts that the output is exactly the same as the output above. This will ensure that we can't break the contract when running the code.
To start, I placed the above output in a file output.txt
and wrote the following test:
package literatePrimes;
import org.junit.Test;
import java.nio.file.Files;
import java.nio.file.Paths;
import static org.junit.Assert.assertEquals;
public class PrintPrimesTest {
@Test
public void equalsFirstThousandPrimesOutput() throws Exception {
String outputPrimes = new String(Files.readAllBytes(Paths.get("src/test/java/literatePrimes/output.txt"))).trim();
assertEquals(outputPrimes.replaceAll("\n", ""), PrintPrimes.print().replaceAll("\n", ""));
}
}
I also then changed the main of PrintPrimes
to be a function that returned a String rather than a System print to console and appends the output instead to a StringBuffer, so we can check the output is working. Sure, we're changing the code slightly to facilitate testing but the changes in this case are transparent.
The important thing is we haven't changed the guts of the business logic so we can begin changing the code with relative impunity and refactoring it out into the respective classes.
We'll leave that though, for another day.
Legacy code can always be a tricky beast. It's often not what we intended, especially when we're beginning a startup, however, due to pressures of business and demands of delivery we can often be left with legacy before it's time.
What we shouldn't do though, is fear it...
Remember, understand it, cover it with tests and refactor with impunity!
In the next instalment. James will take you through how you can refactor the above code example taken from Uncle Bob's Clean Code and take it to a much cleaner, more manageable solution.