This post covers testing multiple test cases at once using
@pytest.mark.parametrize()
.
In Writing Simple Tests we tested the validate_url()
function that validates the links that users add to their profiles.
util.py
also contains a validate_orcid()
function for validating the ORCID ID’s that users add
to their profiles (an ORCID is a unique identifier for a scientific author):
def validate_orcid(orcid):
"""
Validate an ORCID.
Verify that an ORCID conforms to the structure described at
http://support.orcid.org/knowledgebase/articles/116780-structure-of-the-orcid-identifier
Returns the normalized ORCID if successfully parsed or raises a ValueError
otherwise.
"""
...
We want to test that validation succeeds (returning the string unmodified,
without raising an exception) for a variety of valid ORCIDs.
We could test this by writing a separate test method for each valid ORCID that
we want to test, but that would be a lot of duplication - each test method
would be exactly the same except for the ORCID used.
Another way we could do this is with a for
loop in a single test method:
def test_validate_orcid_accepts_valid_ids():
for orcid_id in ['0000-0002-1825-0097', '0000-0001-5109-3700', '0000-0002-1694-233X']:,
assert validate_orcid(orcid_id) == orcid_id
This is better - there’s no duplication in the test code.
You could say that this breaks the arrange, act, assert
pattern because it calls the method under test three times, not once. But there
is only one line of code that calls the method under test, it just happens to
be a line inside a loop, so I think it’s okay.
This approach does have the disadvantage that it clutters up the body of our
test method with a for
loop. If the function we were testing had more than
one parameter and we wanted to test it with different combinations of multiple
parameters then the loop and the clutter would be more complicated.
Pytest gives us a better way to write this kind of test - @pytest.mark.parametrize():
@pytest.mark.parametrize('orcid', [
'0000-0002-1825-0097',
'0000-0001-5109-3700',
'0000-0002-1694-233X',
])
def test_validate_orcid_accepts_valid_ids(orcid):
assert validate_orcid(orcid) == orcid
The parametrize()
causes pytest to run this function three times, each time
passing the next one of the ORCIDs as the orcid
argument to the test.
The first argument to parametrize()
, called the argnames
argument, is the
string 'orcid'
:
@pytest.mark.parametrize('orcid', [
'0000-0002-1825-0097',
'0000-0001-5109-3700',
'0000-0002-1694-233X',
This tells pytest the name of the test parameter that we’re going to be parametrizing. Pytest finds the test method parameter with the matching name:
@pytest.mark.parametrize('orcid', [
'0000-0002-1825-0097',
'0000-0001-5109-3700',
'0000-0002-1694-233X',
])
def test_validate_orcid_accepts_valid_ids(orcid):
assert validate_orcid(orcid) == orcid
The second argument to parametrize()
is the argvalues
argument, this is
the list of values that pytest will pass to the test function’s orcid
parameter:
@pytest.mark.parametrize('orcid', [
'0000-0002-1825-0097',
'0000-0001-5109-3700',
'0000-0002-1694-233X',
])
def test_validate_orcid_accepts_valid_ids(orcid):
assert validate_orcid(orcid) == orcid
Pytest takes the first argument from this list and calls
test_validate_orcid_accepts_valid_ids('0000-0002-1825-0097')
, then it takes
the second argument and calls
test_validate_orcid_accepts_valid_ids('0000-0001-5109-3700')
,
and so on.
So essentially we’ve got three separate tests in one.
This separates test data (in this case, the different ORCIDs being tested)
from test logic (the code in the body of the test method), and avoids
cluttering up the method body with a for
loop.
parametrize() with multiple parameters
In the example above we parametrized just a single parameter - orcid
.
Where parametrize()
really comes into its own is when you have a test
function with multiple parameters. You can reduce what might have been many
separate tests down to just one.
As an example, let’s look at the tests for the update_web_uri() method. (Here’s the real tests for this method.)
When someone annotates a document using Hypothesis we collect many URIs for that document. The URL from the browser’s location bar is one URI (which we call a “self-claim” URI). Some web pages contain canonical links in the HTML, like:
<link rel="canonical" href="http://example.com/wordpress/seo-plugin/">
We collect these and call them “rel-canonical” URIs for the document.
There can also be rel="alternate"
and rel="shortlink"
URIs in an HTML
document. And there are many other kinds of document URIs that we collect as well.
When we want to display the domain name of a document (e.g. www.example.com
)
on a Hypothesis page, or render a link to a document, we need to consider all
of the different URIs for that document that we’ve collected into our database
and select the “best” one to use.
We call this best URI the document’s web_uri
, and the update_web_uri()
method is responsible for updating it whenever we receive new URIs for a document:
def update_web_uri(self):
"""
Update the value of the self.web_uri field.
Set self.web_uri to the "best" http(s) URL from self.document_uris.
Set self.web_uri to None if there's no http(s) DocumentURIs.
"""
We want to test the behaviour of update_web_uri()
with many different
combinations of URIs that a document might have - when we have just one URI
for a document it should just choose that one URI, for different possible
combinations of multiple types of URI it should choose the one that we think is
best in each case. Here’s what the tests for update_web_uri()
could look like:
class TestDocumentWebURI(object):
def test_given_a_single_http_or_https_uri_it_returns_it(self, factories):
document = factories.Document()
uri = 'http://example.com'
factories.DocumentURI(uri=uri, type='self-claim', document=document)
document.update_web_uri()
assert document.web_uri == uri
def test_if_there_are_no_http_or_https_uris_it_returns_None(self, factories):
document = factories.Document()
factories.DocumentURI(uri='ftp://example.com', type='self-claim',
document=document)
factories.DocumentURI(uri='android-app://example.com',
type='rel-canonical', document=document)
factories.DocumentURI(uri='urn:x-pdf:example',
type='rel-alternate', document=document)
factories.DocumentURI(uri='doi:http://example.com',
type='rel-shortlink', document=document)
document.update_web_uri()
assert document.web_uri is None
...
… and so on, there were several more tests for the expected results given different combinations of URIs. All of these tests have the same test logic:
- Create a
Document
with someDocumentURI
s - Call
update_web_uri()
- Check that
web_uri
is equal to the expected URI
They differ only in the list of DocumentURI
s that’s created and in the value
that we expect web_uri
to have at the end. With parametrize()
we can
combine all of these into a single test.
Here’s what the beginning of a parametrize()
call for this test might look like:
@pytest.mark.parametrize('document_uris,expected_web_uri', [
# Given a single http or https URL it just uses it.
([('http://example.com', 'self-claim')], 'http://example.com'),
([('https://example.com', 'self-claim')], 'https://example.com'),
...
])
def test_update_web_uri(self, document_uris, factories, expected_web_uri):
document = factories.Document()
for docuri_tuple in document_uris:
factories.DocumentURI(uri=docuri_tuple[0], type=docuri_tuple[1],
document=document)
document.update_web_uri()
assert document.web_uri == expected_web_uri
Here we’re parametrizing two arguments to the test method, not just one
as before. The argnames
argument to parametrize()
gives two argument names
separated by a comma:
@pytest.mark.parametrize('document_uris,expected_web_uri', [
# Given a single http or https URL it just uses it.
([('http://example.com', 'self-claim')], 'http://example.com'),
([('https://example.com', 'self-claim')], 'https://example.com'),
...
])
def test_update_web_uri(self, document_uris, factories, expected_web_uri):
...
Notice that the test function also has a third argument, factories
.
This is the same factories for creating test objects that we saw earlier.
Having this argument in there doesn’t do any harm - it doesn’t confuse parametrize.
And it doesn’t even matter that the factories
argument appears in-between
document_uris
and expected_web_uri
- the order of the parameters doesn’t
matter, pytest looks at their names to figure out what they are.
Since we’re now parametrizing two parameters instead of one, the argvalues
argument to parametrize()
becomes a list of two-tuples:
@pytest.mark.parametrize('document_uris,expected_web_uri', [
# Given a single http or https URL it just uses it.
([('http://example.com', 'self-claim')], 'http://example.com'),
([('https://example.com', 'self-claim')], 'https://example.com'),
...
])
def test_update_web_uri(self, document_uris, factories, expected_web_uri):
...
Pytest will first call the test function passing the values from the first two-tuple as arguments:
@pytest.mark.parametrize('document_uris,expected_web_uri', [
# Given a single http or https URL it just uses it.
([('http://example.com', 'self-claim')], 'http://example.com'),
([('https://example.com', 'self-claim')], 'https://example.com'),
it’ll call:
test_update_web_uri([('http://example.com', 'self-claim')], factories, 'http://example.com'):
It’ll then take the second two-tuple:
@pytest.mark.parametrize('document_uris,expected_web_uri', [
# Given a single http or https URL it just uses it.
([('http://example.com', 'self-claim')], 'http://example.com'),
([('https://example.com', 'self-claim')], 'https://example.com'),
and run the test with those arguments:
test_update_web_uri([('https://example.com', 'self-claim')], factories, 'https://example.com'):
and so on.
Here’s the full, final version of the test with all parametrized cases:
@pytest.mark.parametrize('document_uris,expected_web_uri', [
# Given a single http or https URL it just uses it.
([('http://example.com', 'self-claim')], 'http://example.com'),
([('https://example.com', 'self-claim')], 'https://example.com'),
([('http://example.com', 'rel-canonical')], 'http://example.com'),
([('https://example.com', 'rel-canonical')], 'https://example.com'),
([('http://example.com', 'rel-shortlink')], 'http://example.com'),
([('https://example.com', 'rel-shortlink')], 'https://example.com'),
# Given no http or https URLs it sets web_uri to None.
([], None),
([
('ftp://example.com', 'self-claim'),
('android-app://example.com', 'rel-canonical'),
('urn:x-pdf:example', 'rel-alternate'),
('doi:http://example.com', 'rel-shortlink'),
], None),
# It prefers self-claim URLs over all other URLs.
([
('https://example.com/shortlink', 'rel-shortlink'),
('https://example.com/canonical', 'rel-canonical'),
('https://example.com/self-claim', 'self-claim'),
], 'https://example.com/self-claim'),
# It prefers canonical URLs over all other non-self-claim URLs.
([
('https://example.com/shortlink', 'rel-shortlink'),
('https://example.com/canonical', 'rel-canonical'),
], 'https://example.com/canonical'),
# If there's no self-claim or canonical URL it will return an https
# URL of a different type.
([
('ftp://example.com', 'self-claim'),
('urn:x-pdf:example', 'rel-alternate'),
# This is the one that should be returned.
('https://example.com/alternate', 'rel-alternate'),
('android-app://example.com', 'rel-canonical'),
('doi:http://example.com', 'rel-shortlink'),
], 'https://example.com/alternate'),
# If there's no self-claim or canonical URL it will return an http
# URL of a different type.
([
('ftp://example.com', 'self-claim'),
('urn:x-pdf:example', 'rel-alternate'),
# This is the one that should be returned.
('http://example.com/alternate', 'rel-alternate'),
('android-app://example.com', 'rel-canonical'),
('doi:http://example.com', 'rel-shortlink'),
], 'http://example.com/alternate'),
])
def test_update_web_uri(self, document_uris, factories, expected_web_uri):
document = factories.Document()
for docuri_tuple in document_uris:
factories.DocumentURI(uri=docuri_tuple[0], type=docuri_tuple[1],
document=document)
document.update_web_uri()
assert document.web_uri == expected_web_uri
Parametrize is a great feature of pytest that can really help to test many cases while minimising the amount of test code. You should always be on the look out for groups of test methods that could be reduced to a single method by using parametrize, and use if as often as possible.