The problem with using fixtures in Rails

I’ve been trying to largely ignore the recent TDD discussion prompted by DHH’s RailsConf 2014 keynote. I think I can understand where he’s coming from, and my only real concern with not sharing his point of view is that it makes it less likely that Rails will be steered in a direction which makes TDD easier. But that’s OK, and if my concern grows then I have opportunities to propose improvements.

I don’t even really mind what he’s said in his latest post about unit testing the Basecamp codebase. There are a lot of Rails applications – including ones I’ve written – where a four-minute test suite would’ve been a huge triumph.

I could make some pithy argument like:

Sorry, I couldn't resist

… but let’s be honest, four minutes for a substantial and mature codebase is pretty excellent in the Rails world.

So that is actually pretty cool.

Using fixtures

A lot of that speed is no doubt because Basecamp is using fixtures: test data that is loaded once at the start of the test run, and then reset by wrapping each test in a transaction and rolling back before starting the next test.

This can be a benefit because the alternative – assuming that you want to get some data into the database before your test runs – is to insert all the data required for each test, which can potentially involve a large tree of related models. Doing this hundreds or thousands of times will definitely slow your test suite down.

(Note that for the purposes of my point below, I’m deliberately not considering the option of not hitting the database at all. In reality, that’s what I’d do, but let’s just imagine that it wasn’t an option for a second, yeah? OK, great.)

So, fixtures will probably make the whole test suite faster. Sounds good, right?

The problem with fixtures

I feel like this is glossing over the real problem with fixtures: unless you are using independent fixtures for each test, your shared fixtures have coupled your tests together. Since I’m pretty sure that nobody is actually using independent fixtures for every test, I am going to go out on a limb and just state:

Fixtures have coupled your tests together.

This isn’t a new insight. This is pain that I’ve felt acutely in the past, and was my primary motivation for leaving fixtures behind.

Say you use the same ‘user’ fixture between two tests in different parts of your test suite. Modifying that fixture to respond to a change in one test can now potentially cause the other test to fail, if the assumptions either test was making about its test data are no longer true (e.g. the user should not be admin, the user should only have a short bio, or so on).

If you use fixtures and share them between tests, you’re putting the burden of managing this coupling on yourself or the rest of your development team.

Going back to DHH’s post:

Why on earth would you run your entire test harness for every single line change in a particular model? If you have so little confidence in the locality of your changes, the tests are indeed telling you that the system has overly high coupling.

What fixtures do is introduce overly high coupling in the test suite itself. If you make any change to your fixtures, I do not think it’s possible to be confident that you haven’t broken a single test unless you run the whole suite again.

Fixtures separate the reason test data is like it is from the tests themselves, rather than keeping them close together.

I might be wrong

Now perhaps I have only been exposed to problematic fixtures, and there are techniques for reliably avoiding this coupling or at least managing it better. If that’s the case, then I’d really love to hear more about them.

Or, perhaps the pain of managing fixture coupling is objectively less than the pain of changing the way you write software to both avoid fixtures AND avoid slowing down the test suite by inserting data into the database thousands of times?

That’s certainly possible. I am skeptical though.

interblah.net - Notes on 'Notes on "Counting Tree Nodes"'

Notes on 'Notes on "Counting Tree Nodes"'

Having now finished watching Tom’s episode of Peer to Peer, I finally got around to reading his Notes on “Counting Tree Nodes” supplementary blog post. There are a couple of ideas he presents that are so interesting that I wanted to highlight them again here.

If you haven’t seen the video, then I’d still strongly encourage you to read the blog post. While I can now see the inspiration for wanting to discuss these ideas1, the post really does stand on it’s own.

Notes on ‘Enumerators’

Here’s the relevant section of the blog post. Go read it now!

I’m not going to re-explain it here, so yes, really, go read it now.

What I found really interesting here was the idea of building new enumerators by re-combining existing enumerators. I’ll use a different example, one that is perhaps a bit too simplistic (there are more concise ways of doing this in Ruby), but hopefully it will illustrate the point.

Let’s imagine you have an Enumerator which enumerates the numbers from 1 up to 10:

> numbers = 1.upto(10)
=> #<Enumerator: 1:upto(10)>
> numbers.next
=> 1
> numbers.next
=> 2
> numbers.next
=> 3
...
> numbers.next
=> 9
> numbers.next
=> 10
> numbers.next
StopIteration: iteration reached an end

You can now use that to do all sorts of enumerable things like mapping, selecting, injecting and so on. But you can also build new enumerables using it. Say, for example, we now only want to iterate over the odd numbers between 1 and 10.

We can build a new Enumerator that re-uses our existing one:

> odd_numbers = Enumerator.new do |yielder|
    numbers.each do |number|
      yielder.yield number if number.odd?
    end
  end
=> #<Enumerator: #<Enumerator::Generator:0x007fc0b38de6b0>:each>

Let’s see it in action:

> odd_numbers.next
=> 1
> odd_numbers.next
=> 3
> odd_numbers.next
=> 5
> odd_numbers.next
=> 7
> odd_numbers.next
=> 9
> odd_numbers.next
StopIteration: iteration reached an end

So, that’s quite neat (albeit somewhat convoluted compared to 1.upto(10).select(&:odd)). To extend this further, let’s imagine that I hate the lucky number 7, so I also don’t want that be included. In fact, somewhat perversely, I want to stick it right in the face of superstition by replacing 7 with the unluckiest number, 13.

Yes, I know this is weird, but bear with me. If you have read Tom’s post (go read it), you’ll already know that this can also be achieved with a new enumerator:

> odd_numbers_that_arent_lucky = Enumerator.new do |yielder|
    odd_numbers.each do |odd_number|
      if number == 7
        yielder.yield 13
      else
        yielder.yield number
      end
    end
  end
=> #<Enumerator: #<Enumerator::Generator:0x007fc0b38de6b0>:each>
> odd_numbers.next
=> 1
> odd_numbers.next
=> 3
> odd_numbers.next
=> 5
> odd_numbers.next
=> 13
> odd_numbers.next
=> 9
> odd_numbers.next
StopIteration: iteration reached an end

In Tom’s post he shows how this works, and how you can further compose enumerators to to produce new enumerations with specific elements inserted at specific points, or elements removed, or even transformed, and so on.

So.

A hidden history of enumerable transformations

What I find really interesting here is that somewhere in our odd_numbers enumerator, all the numbers still exist. We haven’t actually thrown anything away permanently; the numbers we don’t want just don’t appear while we are enumerating.

The enumerator odd_numbers_that_arent_lucky still contains (in a sense) all of the numbers between 1 and 10, and so in the tree composition example in Tom’s post, all the trees he creates with new nodes, or with nodes removed, still contain (in a sense) all those nodes.

It’s almost as if the history of the tree’s structure is encoded within the nesting of Enumerator instances, or as if those blocks passed to Enumerator.new act as a runnable description of the transformations to get from the original tree to the tree we have now, invoked each time any new tree’s children are enumerated over.

I think that’s pretty interesting.

Notes on ‘Catamorphisms’

In the section on Catamorphisms (go read it now!), Tom goes on to show that recognising similarities in some methods points at a further abstraction that can be made – the fold – which opens up new possibilities when working with different kinds of structures.

What’s interesting to me here isn’t anything about the code, but about the ability to recognise patterns and then exploit them. I am very jealous of Tom, because he’s not only very good at doing this, but also very good at explaining the ideas to others.

Academic vs pragmatic programming

This touches on the tension between the ‘academic’ and ‘pragmatic’ nature of working with software. This is something that comes up time and time again in our little sphere:

Now I’m not going to argue that anyone working in software development should have a degree in Computer Science. I’m pretty sympathetic with the idea that many “Computer Science” degrees don’t actually bear much of direct resemblance to the kinds of work that most software developers do2.

Ways to think

What I think university study provides, more than anything else, is exposure and training in ways to think that aren’t obvious or immediately accessible via our direct experience of the world. Many areas of study provide this, including those outside of what you might consider “science”. Learning a language can be learning a new way to think. Learning to interpret art, or poems, or history is learning a new way to think too.

Learning and internalising those ways to think give perspectives on problems that can yield insights and new approaches, and I propose that that, more than any other thing, is the hallmark of a good software developer.

Going back to the blog post which, as far I know, sparked the tweet storm about “programming and maths”, I’d like to highlight this section:

At most academic CS schools, the explicit intent is that students learn programming as a byproduct of learning CS. Programming itself is seen as rather pedestrian, a sort of exercise left to the reader.

For actual developer jobs, by contrast, the two main skills you need these days are programming and communication. So while CS still does have strong ties to math, the ties between CS and programming are more tenuous. You might be able to say that math skills are required for computer science success, but you can’t necessarily say that they’re required for developer success.

What a good computer science (or maths or any other logic-focussed) education should teach you are ways to think that are specific to computation, algorithms and data manipulation, which then

Plus, it’s fun to achieve that kind of insight into a problem. It’s the “a-ha!” moment that flips confusion and doubt into satisfaction and certainty. And these insights are also interesting in and of themselves, in the very same way that, say, study of art history or Shakespeare can be.

So, to be crystal clear, I’m not saying that you need this perspective to be a great programmer. I’m really not. You can build great software that both delights users and works elegantly underneath without any formal training. That is definitely true.

Back to that quote:

the ties between CS and programming are more tenuous … you can’t necessarily say that they’re required for developer success.

All I’m saying is this: the insights and perspectives gained by studying computer science are both useful and interesting. They can help you recognise existing, well-understood problems, and apply robust, well-understood and powerful solutions.

That’s the relevance of computer science to the work we do every day, and it would be a shame to forget that.

  1. In the last 15 minutes or so of the video, the approach Tom uses to add a “child node” to a tree is interesting but there’s not a huge amount of time to explore some of the subtle benefits of that approach 

  2. Which is, and let’s be honest, a lot of “Get a record out of a database with an ORM, turn it into some strings, save it back into the database”.