Factories

This is part of a series of posts on Python unit testing.

Factories are a really quick and easy way to create realistic objects to use in your tests. For example if you’re testing the code for editing user accounts your tests may need some User objects to test with, if you’re testing groups you may need to create a lot of Group objects, tests for the annotations API may need to create Annotation objects to test updating and deleting them, etc.

It’s not always easy to create these kinds of objects, their classes in the h code may require a lot of different arguments that would take time to write, clutter up your test code, and make tests easy to break. In some cases there are dependencies, for example that you can’t have two users with the same username, and other complexities. Factories exist to make all this really simple and easy.

For example, let’s look at the tests for tag_uri_for_annotation(). This is a function that takes a memex.models.Annotation object and returns a tag URI for it. Here’s (a slightly simplified version of) one of the tests for it:

def test_tag_uri_for_annotation(factories):
    annotation = factories.Annotation(
        created=datetime.datetime(year=2015, month=3, day=19),
        target_uri="http://www.example.com/example_page")

    tag_uri = util.tag_uri_for_annotation(annotation)

    assert tag_uri == "tag:example.com,2015-09:" + annotation.id

factories.Annotation() creates an Annotation object for us to pass into the tag_uri_for_annotation() function that we’re testing. Since we care about the created date and target_uri of the annotation (they form parts of the expected tag URI) we pass those in to factories.Annotation() as arguments, but we leave all of the other Annotation fields unspecified and factories automatically generates suitable values for us. This makes the test easier to write (we only have to type out the fields that we’re interested in) and easier to read (the test isn’t cluttered up with the values of fields that are necessary to create an Annotation but that aren’t relevant to this particular test).

Note that we also use annotation.id in our assertion. The id for an Annotation object isn’t generated until that annotation is added to the database, so our test would need to get a db connection and add the annotation to it. factories does that for us, as well.

As I said, we can simply omit any of the fields of the Annotation class and factories will generate suitable values for them. Here’s another test that specifies the created date but doesn’t care about the target_uri:

def test_feed_from_annotations_item_guid(factories):
    """Feed items should use the annotation's tag URI as their GUID."""
    annotation = factories.Annotation(
        created=datetime.datetime(year=2015, month=3, day=11))

    feed = rss.feed_from_annotations([annotation])

    assert feed['entries'][0]['guid'] == (
        'tag:hypothes.is,2015-09:' + annotation.id)

If we just want an annotation and don’t care about the values of any of its fields than we can just call factories.Annotation() with no arguments. If we want more than one annotation, we just call it multiple times:

def test_feed_from_annotations_with_3_annotations(factories):
    """If there are 3 annotations it should return 3 entries."""
    annotations = [factories.Annotation(), factories.Annotation(),
                   factories.Annotation()]

    feed = rss.feed_from_annotations(annotations)

    assert len(feed['entries']) == 3

factory_boy

factories is implemented using the factory_boy library, and h comes with factories for creating users, documents, annotations, groups, and others. See tests/common/factories.py and tests/memex/factories.py for all the available classes.

Let’s take a quick tour of factory_boy features in a Python shell:

$ hypothesis --dev shell
>>> from tests.common import factories
>>> 

Each time you call factories.User() (for example) it returns a new User object with different, realistic but randomly generated values for all of its fields:

>>> first_user = factories.User()
>>> first_user.__class__
h.models.user.User
>>> first_user.username
u'pamela72'
>>> second_user = factories.User()
>>> second_user.username
u'christopher86'

You can specify the values for any fields you want as keyword arguments, and the other fields will still be automatically generated:

>>> user = factories.User(email='[email protected]', nipsa=True)
>>> user.email
'[email protected]'
>>> user.nipsa
True
>>> user.username
u'jonathanmathis'

Sometimes an object from one factory can be passed as an argument to another factory. For example every annotation has a document. Normally the annotation factory would generate a new random document for each annotation:

>>> annotation = factories.Annotation()
>>> annotation.document
<Document 1>
>>> annotation_2 = factories.Annotation()
>>> annotation_2.document
<Document 2>

To create a test annotation of a particular test document you can pass the test document as an argument to the annotation factory:

>>> document = factories.Document()
<Document 3>
>>> annotation = factories.Annotation(document=document)
>>> annotation.document
<Document 3>
>>> second_annotation = factories.Annotation(document=document)
>>> second_annotation.document
<Document 3>

Here’s a test that makes use of this technique:

def test_feed_from_annotations_item_titles(factories):
    """Feed items should include the annotation's document's title."""
    document = factories.Document(title='Hello, World')
    annotation = factories.Annotation(document=document)

    feed = rss.feed_from_annotations([annotation])

    assert feed['entries'][0]['title'] == 'Hello, World'

Creating objects without adding them to the database

Normally, factories adds objects that you create to the test database:

>>> annotation = factories.Annotation()  # This adds annotation to the db.

This shouldn’t do any harm (the database is wiped after each test, before running the next test function) but it can make the tests unnecessarily slow if they don’t really need to be writing to the db. Tests for a models.py file probably do really need to use the db. But tests for a views.py file, though they may need model objects such as Users and Annotations to test with, probably don’t really need these objects to be written to the db.

To create a factory object without writing it to the db, use the build() method. This works with any factory class:

>>> annotation = factories.Annotation.build()  # A real Annotation object is
                                               # created but not added to the
                                               # database.

It’s probably best to use build() whenever you can, as the tests will be a tiny bit faster (and this will add up over time as more and more tests are written).

One thing to be aware of is that the values of some fields are generated when the object is added to the database. For example annotation ids are generated like this. An annotation created with build() has no id:

>>> annotation = factories.Annotation.build()
>>> annotation.id is None
True

If you need an id, you can probably get away with specifying a fake one:

>>> annotation = factories.Annotation.build(id='test_id')
>>> annotation.id is None
'test_id'

You’ll find factories used all over the Hypothesis tests, and you should try to use it whenever possible to create the test objects that you need.

In the next post we’ll look at parametrize, a tool for covering many test cases at once.

Sean Hammond,