Gilded Rose Katas

Introduction


Last Summer, I had the pleasure of bringing Amitai Schleier into my company for a few days, as part of his coding tour. One thing that really inspired me was his emphasis on a Learning Hour each day, and on using katas. 

Katas are a short set-piece practices. The word “kata” comes from the Japanese martial arts tradition (形 - literally, "form") and is used to describe a regular repetitive, choreographed practice. Wikipedia gives a good description here https://en.m.wikipedia.org/wiki/Kata. In software, there is a good collection of these set-pieces. Implementing FizzBuzz in a new language is a good example. 

So, after Amitai’s visit, I started running katas every day. It took us a while to establish them, because we don’t have set starting or finishing times. But we now do half an hour of katas every day after lunch. We have used the time to look at TDD, TCR, Haskell, and we have used several well-known katas. 

But the one we spent the longest on was the Gilded Rose. This is a great kata for investigating how to deal with legacy code. The code is only a few tens of lines long, but it is convoluted and entangled, and almost impossible to understand. And you have to implement a new feature without breaking any of the existing functionality. The best approach to manage this is to put some tests in place so that you can be confident that you aren’t breaking existing functionality. 

We mainly program in C++, and I initially thought of using the Gilded Rose as an excuse to look at the C++ Approval Test library,  after I saw Clare Macrae’s excellent talk at CppOnSea. I thought that we could use the tests, finish the kata, and move on. But we ended up going a lot further, and investigated a lot of different tools and techniques, and I thought our experience would make a good blog post. 

Gilded Rose

So, what is the Gilded Rose kata? Emily Bache has done a great job of collecting versions is different languages here: https://github.com/emilybache/GildedRose-Refactoring-Kata, including a C++ version. The main piece of code is an updateQuality function:

void GildedRose::updateQuality() {
    for (int i = 0; i < items.size(); i++) {
        if (items[i].name != "Aged Brie" &&
            items[i].name != "Backstage passes to a TAFKAL80ETC concert") {
            if (items[i].quality > 0) {
                if (items[i].name != "Sulfuras, Hand of Ragnaros") {
                    items[i].quality = items[i].quality - 1;
                }
            }
        } else {
            if (items[i].quality < 50) {
                items[i].quality = items[i].quality + 1;
 
                if (items[i].name == "Backstage passes to a TAFKAL80ETC concert") {
                    if (items[i].sellIn < 11) {
                        if (items[i].quality < 50) {
                            items[i].quality = items[i].quality + 1;
                        }
                    }
 
                    if (items[i].sellIn < 6) {
                        if (items[i].quality < 50) {
                            items[i].quality = items[i].quality + 1;
                        }
                    }
                }
            }
        }
 
        if (items[i].name != "Sulfuras, Hand of Ragnaros") {
            items[i].sellIn = items[i].sellIn - 1;
        }
 
        if (items[i].sellIn < 0) {
            if (items[i].name != "Aged Brie") {
                if (items[i].name != "Backstage passes to a TAFKAL80ETC concert") {
                    if (items[i].quality > 0) {
                        if (items[i].name != "Sulfuras, Hand of Ragnaros") {
                            items[i].quality = items[i].quality - 1;
                        }
                    }
                } else {
                    items[i].quality = items[i].quality - items[i].quality;
                }
            } else {
                if (items[i].quality < 50) {
                    items[i].quality = items[i].quality + 1;
                }
            }
        }
    }
}
This is a mess. What else have we got? There's a helpful ReadMe:
======================================
Gilded Rose Requirements Specification
======================================

Hi and welcome to team Gilded Rose.As you know, we are a small inn with a prime
location in a prominent city ran by a friendly innkeeper named Allison.We also
buy and sell only the finest goods. Unfortunately, our goods are constantly
degrading in quality as they approach sell by date. We have a system in place
that updates our inventory for us. It was developed by a no-nonsense type name
Leeroy, who has moved on to new adventures.Your task is to add the new feature
to our system so that we can begin selling a new category of items. First an 
introduction to our system:

- All items have a SellIn value which denotes the number of days we
  have to sell the item
- All items have a Quality value which denotes how valuable the item is
- At the end of each day our system lowers both values for every item

Pretty simple, right ? Well this is where it gets interesting :

- Once the sell by date has passed, Quality degrades twice as fast
- The Quality of an item is never negative
- "Aged Brie" actually increases in Quality the older it gets
- The Quality of an item is never more than 50
- "Sulfuras", being a legendary item, never has to be sold or decreases
  in Quality
- "Backstage passes", like aged brie, increases in Quality as its SellIn
  value approaches;

Quality increases by 2 when there are 10 days or less and by 3 when there are
5 days or less but Quality drops to 0 after the concert

We have recently signed a supplier of conjured items.This requires an update
to our system :

-"Conjured" items degrade in Quality twice as fast as normal items

Feel free to make any changes to the UpdateQuality method and add any new code
as long as everything still works correctly.However, do not alter the Item
class or Items property as those belong to the goblin in the corner who will
insta - rage and one - shot you as he doesn't believe in shared code ownership
(you can make the UpdateQuality method and Items property static if you like,
we'll cover for you).

Just for clarification, an item can never have its Quality increase above 50,
however "Sulfuras" is a legendary item and as such its Quality is 80 and it
never alters.
Right, we have a running, live system, and we need to add in some new features, without breaking existing ones.. Maybe there are tests in place..?

Yes, there are two unit test files:
#include "pch.h"
#include "..GildedRoseGildedRose.h"
#include <iostream>
 
using namespace std;
 
ostreamoperator<<(ostreams, Item& item) {
    s << item.name << ", " << item.sellIn << ", " << item.quality;
    return s;
}
 
int main() {
    vector<Item> items;
    items.push_back(Item("+5 Dexterity Vest", 10, 20));
    items.push_back(Item("Aged Brie", 2, 0));
    items.push_back(Item("Elixir of the Mongoose", 5, 7));
    items.push_back(Item("Sulfuras, Hand of Ragnaros", 0, 80));
    items.push_back(Item("Sulfuras, Hand of Ragnaros", -1, 80));
    items.push_back(Item("Backstage passes to a TAFKAL80ETC concert", 15, 20));
    items.push_back(Item("Backstage passes to a TAFKAL80ETC concert", 10, 49));
    items.push_back(Item("Backstage passes to a TAFKAL80ETC concert", 5, 49));
    // this Conjured item doesn't yet work properly
    items.push_back(Item("Conjured Mana Cake", 3, 6));
    GildedRose app(items);
 
    cout << "OMGHAI!" << endl;
 
    for (int day = 0; day <= 30; day++) {
        cout << "-------- day " << day << " --------" << endl;
        cout << "name, sellIn, quality" << endl;
        for (vector<Item>::iterator i = items.begin(); i != items.end(); i++) {
            cout << *i << endl;
        }
        cout << endl;
 
        app.updateQuality();
    }
}
and
#include "pch.h"
#include <gtest/gtest.h>
#include "..GildedRoseGildedRose.h"
 
TEST(GildedRoseTest, Foo) {
    vector<Item> items;
    items.push_back(Item("Foo", 0, 0));
    GildedRose app(items);
    app.updateQuality();
    EXPECT_EQ("fixme", app.items[0].name);
}
 
void example() {
    vector<Item> items;
    items.push_back(Item("+5 Dexterity Vest", 10, 20));
    items.push_back(Item("Aged Brie", 2, 0));
    items.push_back(Item("Elixir of the Mongoose", 5, 7));
    items.push_back(Item("Sulfuras, Hand of Ragnaros", 0, 80));
    items.push_back(Item("Backstage passes to a TAFKAL80ETC concert", 15, 20));
    items.push_back(Item("Conjured Mana Cake", 3, 6));
    GildedRose app(items);
    app.updateQuality();
}
OK...
The first file just has a main function that just prints out the result of running the function 50 times. There are no tests.
And the second file has a test that will fail, and a function that isn't called.

Clearly this code is NOT under test!

This is a problem, because we need to modify the code, without changing the existing behaviour. But there are no tests to verify the existing behaviour.

And this is where Approval Testing can help!

Approval Testing

Approval Testing is designed to help when you have a system that repeatedly gives consistent output given consistent inputs. They are a way of performing "Golden Master" testing. You create an "ideal" state, commit it, and then test that subsequent runs of the test produce the same result.
The Golden Master can be an image, a log file, a stream of text, anything. As long as you can compare two versions of the object and store it in version control.

The C++ Approval Test library does not execute tests on its own. Instead, you integrate it with a test framework, such as Catch2, GoogleTest, BoostTest, etc. I really like Catch2, especially for small projects, because it is very fast to set up.

Precompiled Headers

The next step we took was to set up Catch2, and then set up Approval Tests on top of that. Both Catch2 and Approval Tests are header-only libraries. This makes them easy to get going with, but the code gets compiled every time you compile anything. And they're pretty slow.
Now, one of the goals of this kata was to practice Test Driven Development. And for that to work, you need rapid feedback between writing tests or code, and seeing the test results. So the first thing we did was to move the includes into the precompiled header.
In this small codebase, this only had a small effect, but it's very good practice. By the time we had written all our tests, the debug compile time for a test change was 1.4 seconds with precompiled headers, and 2.2 seconds without. And this difference will increase as you add more tests if you do not precompile the test headers.

What to test..?

In our case, we do have a "test" that prints out the result for a collection of inputs after 30 days have passed. We could run this function a few times to make sure that the results are consistent, and then store the results in a file. We could then make changes to the code, run our 30-day test and make sure that the output hasn't changed. Approval testing manages most of this for us.
We currently have this pointless Main function:
int main() {
    vector<Item> items;
    items.push_back(Item("+5 Dexterity Vest", 10, 20));
    items.push_back(Item("Aged Brie", 2, 0));
    items.push_back(Item("Elixir of the Mongoose", 5, 7));
    items.push_back(Item("Sulfuras, Hand of Ragnaros", 0, 80));
    items.push_back(Item("Sulfuras, Hand of Ragnaros", -1, 80));
    items.push_back(Item("Backstage passes to a TAFKAL80ETC concert", 15, 20));
    items.push_back(Item("Backstage passes to a TAFKAL80ETC concert", 10, 49));
    items.push_back(Item("Backstage passes to a TAFKAL80ETC concert", 5, 49));
    // this Conjured item doesn't yet work properly
    items.push_back(Item("Conjured Mana Cake", 3, 6));
    GildedRose app(items);
 
    cout << "OMGHAI!" << endl;
 
    for (int day = 0; day <= 30; day++) {
        cout << "-------- day " << day << " --------" << endl;
        cout << "name, sellIn, quality" << endl;
        for (vector<Item>::iterator i = items.begin(); i != items.end(); i++) {
            cout << *i << endl;
        }
        cout << endl;
 
        app.updateQuality();
    }
}
We can turn this into a test fairly simply, once we have the test framework in place:
TEST_CASE("ApprovingText") {
 
    std::vector<Item> items;
    items.push_back(Item("+5 Dexterity Vest", 10, 20));
    items.push_back(Item("Aged Brie", 2, 0));
    items.push_back(Item("Elixir of the Mongoose", 5, 7));
    items.push_back(Item("Sulfuras, Hand of Ragnaros", 0, 80));
    items.push_back(Item("Sulfuras, Hand of Ragnaros", -1, 80));
    items.push_back(Item("Backstage passes to a TAFKAL80ETC concert", 15, 20));
    items.push_back(Item("Backstage passes to a TAFKAL80ETC concert", 10, 49));
    items.push_back(Item("Backstage passes to a TAFKAL80ETC concert", 5, 49));
    // this Conjured item doesn't yet work properly
    items.push_back(Item("Conjured Mana Cake", 3, 6));
    GildedRose app(items);
 
    std::stringstream out_stream;
 
    for (int day = 0; day <= 30; day++) {
        out_stream << "-------- day " << day << " --------" << ' ';
        out_stream << "name, sellIn, quality" << ' ';
        for (const auto item : items) {
            out_stream << item << ' ';
        }
        out_stream << ' ';
 
        app.updateQuality();
    }
 
    Approvals::verify(out_stream.str());
}
All that we have really done here is to replace the standard output with a stringstream, and then to ask Approval Tests to verify the string for us.

The First Run

So what happens when we run the test? The test framework runs the "ApprovingText" test case, generates a string, and then it hits the "Approvals::verify" step. At this point, it tries to compare the result with... something. But there's nothing to compare it to.
Luckily, the framework expects this, and it pops up a diff tool for you. In my case, it just finds winmerge. And on the left is the new output, and on the right is nothing. All that I need to do is copy over the output to the right, save, and exit. The test run then finishes, and the "golden master" output is stored in a text file.

The output is like this:
-------- day 0 --------
name, sellIn, quality
+5 Dexterity Vest, 10, 20
Aged Brie, 2, 0
Elixir of the Mongoose, 5, 7
Sulfuras, Hand of Ragnaros, 0, 80
Sulfuras, Hand of Ragnaros, -1, 80
Backstage passes to a TAFKAL80ETC concert, 15, 20
Backstage passes to a TAFKAL80ETC concert, 10, 49
Backstage passes to a TAFKAL80ETC concert, 5, 49
Conjured Mana Cake, 3, 6
 
-------- day 1 --------
name, sellIn, quality
+5 Dexterity Vest, 9, 19
Aged Brie, 1, 1
Elixir of the Mongoose, 4, 6
Sulfuras, Hand of Ragnaros, 0, 80
Sulfuras, Hand of Ragnaros, -1, 80
Backstage passes to a TAFKAL80ETC concert, 14, 21
Backstage passes to a TAFKAL80ETC concert, 9, 50
Backstage passes to a TAFKAL80ETC concert, 4, 50
Conjured Mana Cake, 2, 5
And so on up to day 50.
And if I run the tests again, they pass, because the output is the same, and winmerge is not displayed. But if I change the behaviour, and re-run the tests, the system detects the difference, and prompts me to look at the change in winmerge again.
All that remains to be done is to commit the new approved output, so that it lives alongside the code.

Coverage

So, that's great. We have some tests running, and these are now committed. But... How do we know if this test is comprehensive? The inputs are just the random strings that were in the unused test Main() function. We can tell that some code is being invoked, because the values in the output change day to day. But are the tests covering all of the production code? We need a code coverage tool!

Bullseye

On my team, we normally use Bullseye to check our code coverage, with integration into Visual Studio. So, we can turn on Bullseye, rebuild, run the tests, and then have a look at the output. 
We can see that every part of the UpdateQuality() function has been called. That's good, because the test is just one that we found lying around, with a bunch of random inputs.

Imagine if we didn't have the Aged Brie in the list:
TEST_CASE("ApprovingText") {
    std::vector<Item> items;
    items.push_back(Item("+5 Dexterity Vest", 10, 20));
    // items.push_back(Item("Aged Brie", 2, 0));
    items.push_back(Item("Elixir of the Mongoose", 5, 7));
    items.push_back(Item("Sulfuras, Hand of Ragnaros", 0, 80));
    items.push_back(Item("Sulfuras, Hand of Ragnaros", -1, 80));
    items.push_back(Item("Backstage passes to a TAFKAL80ETC concert", 15, 20));
    items.push_back(Item("Backstage passes to a TAFKAL80ETC concert", 10, 49));
    items.push_back(Item("Backstage passes to a TAFKAL80ETC concert", 5, 49));
    // this Conjured item doesn't yet work properly
    items.push_back(Item("Conjured Mana Cake", 3, 6));
    GildedRose app(items);
 
    std::stringstream out_stream;
 
    for (int day = 0; day <= 30; day++) {
        out_stream << "-------- day " << day << " --------" << ' ';
        out_stream << "name, sellIn, quality" << ' ';
        for (const auto item : items) {
            out_stream << item << ' ';
        }
        out_stream << ' ';
 
        app.updateQuality();
    }
 
    Approvals::verify(out_stream.str());
}
If we run the tests now, we can clearly see that we no longer have full coverage:
So, this tells us that our Approval Tests are giving us 100% coverage, and we haven't had to investigate any production code or behaviour at all yet. Nice :-) Of course, we don't know what we're testing, but we can be confident that if we start making breaking changes to the code without meaning to, then our test harness will (hopefully) catch the mistake.

Refactoring

The code is still a mess. But it is now under test, and if we break the existing behaviour, we have reasonable confidence that the tests will fail. There might be some untested edge cases, but we can still proceed with caution. So we can start tidying up, and make the code vaguely understandable. In fact, refactoring is a great way of learning about code. 

The goal of refactoring is to change the structure of the code, without changing the behaviour

The first step we can take is to remove one of the two test files. We have made one work for our Approval Tests, and the other just declares a broken test, and an unused function. So it can go. 

And of course, we run the tests, and commit after this change. Small steps are vital.

Our next step is to start adding some meaning to the main function.  For example, we have two lines like this:
if (items[i].name != "Sulfuras, Hand of Ragnaros")
We can replace that conditional with a named variable, and get this instead:
if (not_sulfuras)
It's a small step. But we can repeat these small steps, running the tests after each step, and reverting if the tests fail. If the tests pass, we can commit and bank this new good state.

And, after a few iterations of this, we can get to something like this:
namespace {
 
    void IncrementQuality(Itemitem) {
        if (item.quality < 50) {
            item.quality++;
        }
    }
 
    void DecrementQuality(Itemitem) {
        const auto quality_above_0 = item.quality > 0;
        if (quality_above_0) {
            item.quality--;
        }
    }
 
    void IncreaseBackstagePassValue(Itemitem) {
        if (item.sellIn < 11) {
            IncrementQuality(item);
            if (item.sellIn < 6) {
                IncrementQuality(item);
            }
        }
    }
 
    void UpdateQualityForExpired(Itemitem) {
 
        const auto backstage_pass = item.name == "Backstage passes to a TAFKAL80ETC concert";
        const auto brie = item.name == "Aged Brie";
 
        if (brie) {
            IncrementQuality(item);
        } else if (backstage_pass) {
            item.quality = 0;
        } else {
            DecrementQuality(item);
        }
    }
 
    void UpdateSingleItemQuality(Itemitem) {
 
        const auto backstage_pass = item.name == "Backstage passes to a TAFKAL80ETC concert";
        const auto brie = item.name == "Aged Brie";
        const auto quality_decreases = !backstage_pass && !brie;
 
        if (quality_decreases) {
            DecrementQuality(item);
        } else {
            IncrementQuality(item);
            if (backstage_pass) {
                IncreaseBackstagePassValue(item);
            }
        }
    }
}

GildedRose::GildedRose(std::vector<Item>& items) : m_items(items) {}
 
void GildedRose::updateQuality() {
    for (auto& item : m_items) {
        if (item.name == "Sulfuras, Hand of Ragnaros") {
            continue;
        }
 
        UpdateSingleItemQuality(item);
        item.sellIn--;
 
        const auto expired = item.sellIn < 0;
        if (expired) {
            UpdateQualityForExpired(item);
        }
    }
}
Now, this code is not perfect. But I hope that you agree that this is more understandable. The functions are much smaller; there are no crazy nested conditionals, and the code has been modernised. And more to the point, it is now understandable enough for us to try and implement the new requirements. Amitai Schleier has written on the importance of knowing when to stop with this kind of refactoring: https://schmonz.com/2015/02/25/the-when-to-stop-kata. If you want to see someone completely tidying the code (in Ruby), then this talk is really good: https://www.youtube.com/watch?v=8bZh5LMaSmE.

New Requirements

So, now we have the code in an understandable state, and under test. We can now move on to adding some new functionality. We'll do this using Test Driven Development (or TDD for short).
What is the new requirement? Here is the relevant section from the Readme:
We have recently signed a supplier of conjured items. This requires an update to our system:
 
 - "Conjured" items degrade in Quality twice as fast as normal items
 
Feel free to make any changes to the UpdateQuality method and add any new code as long as everything
still works correctly. However, do not alter the Item class or Items property as those belong to the
goblin in the corner who will insta-rage and one-shot you as he doesn't believe in shared code
ownership (you can make the UpdateQuality method and Items property static if you like, we'll cover
for you).
 
Just for clarification, an item can never have its Quality increase above 50, however "Sulfuras" is a
legendary item and as such its Quality is 80 and it never alters.
OK, we need to support "Conjured" items, whatever they are. If we look at our Approval Test, there is a hint there too:
// this Conjured item doesn't yet work properly
items.push_back(Item("Conjured Mana Cake", 3, 6));
So, it looks like someone in the past thought that you should define a conjured item by starting the name with the word "Conjured". We can work with that.

TDD

We started by adding tests to detect whether something is conjured or not. 
TEST_CASE("Non-conjured items are not Conjured") {
    auto named_item = Item{"Aged Brie", 2, 0};
    REQUIRE(!Detail::IsConjured(named_item));
}
This test fails initially, because we don't have an IsConjured() function. So we need to add that:
bool IsConjured(Itemitem) {
    return false;
}
This is the minimum change that we need to make the tests pass. And now they do, so we can commit :-)
Now we can add a test to detect that something is conjured:
TEST_CASE("A conjured item is Conjured") {
    auto named_item = Item{"Conjured Mana Cake", 3, 6};
    REQUIRE(Detail::IsConjured(named_item));
}
This now forces us to do something more clever in our implementation:
bool IsConjured(Itemitem) {
    return item.name.substr(0,8) == "Conjured";        
}
And now both tests pass.
We can now move on to look at quality. We can start by testing our assumptions about the current behaviour:
TEST_CASE("Non-conjured items degrade by 1 per day") {
    std::vector<Item> item_list = {{"Mana Cake", 3, 6}};
    auto gilded_rose = GildedRose(item_list);
    gilded_rose.updateQuality();
    REQUIRE(item_list.back().quality == 5);
}
We expect this to pass, and it does. We can commit this new test, and then move on to the real requirement...
TEST_CASE("Conjured items degrade twice as fast") {
    std::vector<Item> item_list = {{"Conjured Mana Cake", 3, 6}};
    auto gilded_rose = GildedRose(item_list);
    gilded_rose.updateQuality();
    REQUIRE(item_list.back().quality == 4);
}
As expected, this fails, so we need to make it pass.
We have our existing DecrementQuality() function:
void DecrementQuality(Itemitem) {
    const auto quality_above_0 = item.quality > 0;
    if (quality_above_0) {
        item.quality--;
    }
}
And it is not too hard to modify it to make use of our new IsConjured() function, and to decrement the quality again for conjured items. This passes our test :-)
void DecrementQuality(Itemitem) {
    const auto conjured = IsConjured(item);
 
    const auto quality_above_0 = item.quality > 0;
    if (quality_above_0) {
        item.quality--;
        if (conjured) {
            item.quality--;
        }
    }
}
Ah, but now our unit tests pass, but our Approval Tests fail...
Of course, the Conjured Mana Cake is now losing quality at twice the speed, so we get different output. So, we can check the new output, make sure it's fine, and copy it over to the Approved file. This can then be committed as the new Gold Standard.

If we look at our new DecrementQuality() function, we can see that we are decrementing the quality twice if the item is  conjured. This is fine if the quality is large enough. But what if the quality of a conjured item starts at 1? We'll decrement it once, and then again because it is conjured. We should add a test for this!
TEST_CASE("Quality of conjured items does not go below zero") {
    std::vector<Item> item_list = { {"Conjured Mana Cake", 3, 1} };
    auto gilded_rose = GildedRose(item_list);
    gilded_rose.updateQuality();
    REQUIRE(item_list.back().quality == 0);
}
And we can make this pass by checking the quality before decrementing it again:
void DecrementQuality(Itemitem) {
    const auto conjured = IsConjured(item);
 
    auto quality_above_0 = item.quality > 0;
    if (quality_above_0) {
        item.quality--;
        if (conjured) {
     auto quality_above_0 = item.quality > 0;      if (quality_above_0) {
             item.quality--;
            }
     }     } }
This is ugly, but at least it works, and we can start refactoring and tidying.

Property Based Tests

We have pretty much finished the kata here, but it's always good to leave the code in a good state for the next person to come across it. And the main thing that's missing at the moment is comprehensive unit tests. So we should add them, and refactor as we go.

And it was while doing this with my team that we discovered a very nice feature of Catch2: Generators.

Let's look at an example. We have an IncrementQuality() function, and this should never increment the quality to more than 50. So we have some interesting edge cases to think about. Obviously, any quality below 50 should get incremented. So 3 should go to 4, 10 should go to 11. And 49 should go to 50, but 50 should stay at the same value. And so should anything above 50. The function also takes in an int, so we should make sure it copes well with extreme values, like the min and max int values.
In other words, we're very interested in the few values around 50, to make sure that we're not out by one, and we're fairly interested in the numerical limits. And we want it to work for any other value.

Initially, we wrote tests for 0, for 1, for 49, for 50, for 51, and for the numerical limits. But it was very verbose. Then we discovered generators.

They allow you to specify a range of range of numbers, and it will run the test for all of them. You can also specify a huge range, and ask for a few random number from that range.
Both take and range return values are greater than or equal to the lower number, and strictly less than the larger number.
In the first test, we take the minimum numerical limit, 100 random numbers between that limit and zero, and every number between 0 and 50. And we make sure that the quality is incremented.
In the second test, we take every number between 50 and 60, 100 random numbers between 60 and the max numerical limit, and the max numerical limit itself.
TEST_CASE("Quality of items increments when the quality is 49 or less") {
    auto original_quality = GENERATE(
        std::numeric_limits<int>::min(),
        take(100, random(std::numeric_limits<int>::min(), 0)),
        range(0,50)
    );
 
    auto item = Item{"Aged Brie", 2, original_quality};
    Detail::IncrementQuality(item);
 
    REQUIRE(item.quality == original_quality + 1);
}
 
TEST_CASE("Quality of items does not increase when the quality is 50 or more") {
    auto original_quality = GENERATE(
        range(50,61),
        take(100, random(61, std::numeric_limits<int>::max())),
        std::numeric_limits<int>::max()
    );
 
    auto item = Item{"Aged Brie", 2, original_quality};
    Detail::IncrementQuality(item);
 
    REQUIRE(item.quality == original_quality);
}
This really allows us to write expressive tests that capture the meaning of the inputs, without having to choose a set of random numbers and hard-code them.

Test Doubles

Once we have all of the testing infrastructure in place, we can write all of the unit tests that we want, all while continually having the safety net of the approval tests. 

When I ran this with my team, I wanted to look at test doubles: stubs, fakes and mocks. So I added in a new requirement that there should be some logging if something went wrong. Say, if IncrementQuality() was called when the item already had a quality above 50. 

By itself, this is not too difficult, and it even lends itself well to Approval Testing. If we run the application, or if we call the main function from a test, them it should produce a log, and we can store that as an approved Golden Master. 

Approving log files

Our first test is just an Approval Test that makes sure that something is logged.
TEST_CASE("log logs to file") {
    const auto log_file = "log.log";
    std::filesystem::remove(log_file);
    {
        auto flogger = FLogger();
        flogger.log("Hello, I am a log.");
    }
    Approvals::verifyExistingFile(log_file);
}
This removes any existing log file, creates a logger, writes a message, lets the logger go out of scope so that it closes, and then verifies the contents against the approved copy.

The logger itself is a simple wrapper round spdlog, and just forwards calls on to it.
We start with an interface:
class ILogger
{
public:
    virtual void log(std::string str) = 0;
};
And then we can define a concrete implementation:
class FLogger final : public ILogger
{
public:
    FLogger() {
        m_logger = spdlog::basic_logger_mt("basic_logger""log.log");
    }
    ~FLogger() {
        spdlog::drop_all();
    }
    void log(std::string message) override {
        m_logger->info(message);
    }
private:
    std::shared_ptr<spdlog::logger> m_logger;
};
The first time we run our test, we get the following
[2019 - 11 - 28 09:07 : 40.866] [basic_logger] [info] Hello, I am a log.
We can commit this, and we have our Golden Master.
Let's just re-run it to make sure it passes... But it doesn't. It fails, because the new output doesn't match the old output. The second run produces this:
[2019 - 11 - 28 09:10 : 21.768] [basic_logger] [info] Hello, I am a log.
Hmmm. The times don't match. And of course they don't. The logger writes out the time and date. And that will happen every time we run the tests.

Luckily, the Approval Test framework has thought of this! All we need to do is to intercept the part where the file contents are compared, and make sure that the dates and times are always consistent. We can do this trivially by hard-coding a date-time.
class SpdLogComparator : public TextFileComparator {
public:
    [[nodiscard]]
    bool contentsAreEquivalent(
        std::string receivedPath,
        std::string approvedPath
    ) const override {
        spdlog::shutdown();
        const auto new_contents = MungedFileContents(receivedPath);
        WriteFileContents(receivedPath, new_contents);
        return TextFileComparator::contentsAreEquivalent(receivedPathapprovedPath);
    }
private:
    [[nodiscard]]
    std::string MungedFileContents(std::string receivedPath) const {
        std::stringstream new_contents;
        std::regex word_regex("[.*] [(.*)] [(.*)] (.*)");
        std::smatch sm;
        std::string line;
 
        std::ifstream rstream(receivedPath);
        while (std::getline(rstream, line)) {
            std::regex_match(line, sm, word_regex);
            if (sm.size() >= 2) {
                new_contents << "[TIME_AND_DATE MUNGED BY TESTS] [" << sm[1] << "] [" << sm[2].str() << "] " << sm[3].str() << '
';
            } else {
                new_contents << line << '
';
            }
        }
        return new_contents.str();
    }
 
    void WriteFileContents(std::string receivedPath, std::string new_contents) const {
        std::ofstream fileout(receivedPath, std::ifstream::out);
        fileout << new_contents;
    }
};
Yes, the regex code is a mess. This wasn't a regex kata! This class is given two file paths in contentsAreEquivalent(). It loads the new one, finds the date and time, and replaces it with [TIME_AND_DATE MUNGED BY TESTS]. All we need to do is to use our new comparator.
TEST_CASE("log logs to file") {
    FileApprover::registerComparator(".log", std::make_shared< SpdLogComparator>());
    const auto log_file = "log.log";
    std::filesystem::remove(log_file);
    {
        auto flogger = FLogger();
        flogger.log("Hello, I am a log.");
    }
    Approvals::verifyExistingFile(log_file);
}
Now, each time we run the test, the date and time will be consistent (wrong, but then we're not trying to verify that the date and time works. Spdlog is good, and that part works.

Mocks and Stubs

Having logger is all very well, but we don't want our unit tests to write to file. File IO is very slow, and we really want our unit tests to be fast.

And yet, we want to be able to test that our functions are writing to the log when they should. Furthermore, we also want our other unit tests to work smoothly without having to worry about the logs, and to still run fast.

This is where mocks and stubs can help us. Let's look at stubs first.

Stubs

There is a lot of disagreement about words like test double, mock, stub, fake etc. I tend to go along with Martin Fowler's definition. Here's what he says for stub:

"Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what's programmed in for the test."

In our case, we want a logger that does nothing when it's asked to log. The only external API on our logger is
void log(std::string str);
So it's quite easy to create a stub that does nothing:
class StubLogger : public ILogger {
public:
    void log(std::string str)override {}
};
If we use this logger, whenever something asks to log, the logger will just ignore the request. That's a perfect stub. So if we have an function like this, which expects a logger:
void GILDED_ROSE_DLL IncrementQuality(ItemitemILoggerlogger);
Then we can easily use a stub in our tests, with a simple helper like this:
void TestIncrementQuality(Item i) {
    StubLogger l;
    Detail::IncrementQuality(i, l);
}
 
TEST_CASE("Increment quality cannot increment an item if quality is 50 or more") {
    StubLogger logger;
    auto original_quality = GENERATE(
        range(50,61),
        take(100, random(61, std::numeric_limits<int>::max())),
        std::numeric_limits<int>::max()
    );
    auto item = Item{"Aged Brie", 2, original_quality};
    TestIncrementQuality(item);
    REQUIRE(item.quality == original_quality);
}
Now when our test runs, the implementation might try and log, but it will use the stub logger, and it will do nothing. The unit test will still run quickly :-)

Mocks 

But what if we want to know that something was logged? This is where mocks come in. Again, here's Martin Fowler's definition:

"Mocks are pre-programmed with expectations which form a specification of the calls they are expected to receive. They can throw an exception if they receive a call they don't expect and are checked during verification to ensure they got all the calls they were expecting."

In other words, we can set expectations on a mock, such as "log should be called", and then we can assert that this expectation was met.

For this, we really need a mocking framework. They allow us to create mocks quickly, without loads of boilerplate. When I ran this with my team, we used Björn Fahller's nice Trompeloeil framework.

With this framework, it is trivial to create a mock logger:
class MockLogger : public ILogger {
public:
    MAKE_MOCK1(log, void(std::string),override);
};
Now we can write a test with expectations:
TEST_CASE("Incrementing the quality logs if quality is already max") {
    MockLogger logger;
    REQUIRE_CALL(logger, log(ANY(std::string)));
 
    auto excessive_quality = 60;
    auto item = Item{"Aged Brie", 2, excessive_quality };
    Detail::IncrementQuality(item, logger);
}
The "REQUIRE_CALL" says that the log() function on logger must be called, and that any string is valid. We could go further, and test the exact string, but maybe that's too strict a requirement. We can also write a similar test that nothing is logged if the quality is not too excessive:
TEST_CASE("Incrementing the quality does not log if quality is reasonable") {
    MockLogger logger;
    FORBID_CALL(logger, log(ANY(std::string)));
 
    auto reasonable_quality = 40;
    auto item = Item{"Aged Brie", 2, reasonable_quality };
    Detail::IncrementQuality(item, logger);
}
Here, "FORBID_CALL" sets an expectation that there will be no calls to the log() function on logger, and the test will fail if there are any.

Conclusion

So, that's the end of our dive into the Gilded Rose. We've covered how to use Approval Testing to bring a legacy system under test coverage, how to verify that this has been successful using a code coverage tool, how to go about refactoring and adding tests, how to use property-based tests, and finally how to use mocks and stubs to allow tests to work with slow sub-systems.

Comments

Popular posts from this blog

SoCraTes UK 2024 trip report

We always start from where we are

Naming things and Cunningham’s Law