Relativistic unit test dilation
Sorry to disappoint you, if you expected references to Einstein and Lorentz. The story is about testing in Python. Actually the subtitle is:
Subtitle: unit test runs longer that it runs
New feature or some quick experiment was on its way, I finished as small as possible touch on code in PyCharm and ran the tests to ensure nothing is broken. Of course it was. As usual in such event, I prefer to debug the changed code behavior at some specific test. I picked one, and just ran it again alone, to “warm up” the command line. And then it happened:
$ ./test_suite.py MyTest.test_me
F
======================================================================
FAIL: test_me (__main__.MyTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./test_suite.py", line 7, in test_me
self.assertTrue(False)
AssertionError: False is not true
----------------------------------------------------------------------
Ran 1 test in 0.001s
FAILED (failures=1)
Er. Stop looking at the output above. It is completely OK, nothing juicy. I did not tell you I saw the problem, but rather I experienced it. And then I asked Bash to verify my feelings. The output was:
$ time ./test_suite.py MyTest.test_me
F
# truncated
----------------------------------------------------------------------
Ran 1 test in 0.001s
FAILED (failures=1)
real 0m1.145s
user 0m x.xxxs # irrelevant
sys 0m x.xxxs
Now you can look. And see. The test ran a millisecond or so, but the entire command took more than a second. Why the test runs longer… than it runs? Apparently there is some spot either in tests or in the project that does something, hopefully useful, during that second. This is possibly an appropriate moment to confess in you. Tests in that project, they are not really fast. I mean the unit tests, and one waits for about half a minute for the entire suite to complete. This sucks, but again, not enough. And I was telling myself for a long period of time to reduce timing, and it had never been important and impelling enough. I’m telling this now to explain, why that second wasn’t noticed earlier, immediately when it was introduced.
How to find out, what a process does in that second? Profiler to help. The
suite runs in a pretty simple manner with a built-in Python framework. Of
course, you can be used to Nose
, or advocate for pytest
, which I prefer
using myself recently. But anyhow, the built-in unittest
is good enough.
Usually, it goes like the following:
from unittest import TestCase, main
# tests implementation comes here
if __name__ == '__main__':
main()
We have an explicit entry point to profile:
if __name__ == '__main__':
from cProfile import run
run('main()')
All nice and fun, just the answer is not found:
----------------------------------------------------------------------
Ran 1 test in 0.014s
FAILED (failures=1)
265 function calls (262 primitive calls) in 0.035 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.035 0.035 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 case.py:178(__init__)
...
Meaning, the profiler was unable to catch the offending method, that apparently is invoked somewhere at start up or at importing. Two approaches can be used to overcome this. First the profiler can run itself the entire script, like this:
$ python -m cProfile ./test_suite.py
----------------------------------------------------------------------
Ran 0 tests in 0.000s
OK
861 function calls (849 primitive calls) in 1.028 seconds
Here the added one second is detected properly, meaning the start up or importing logic is invoked and is measured by the profiler. But for some reason the tests do not run. Apparently the initialization logic for profiling and for testing clash in some way. For this reason and for a couple of others, I personally prefer to explicitly run the profiler directly from a code.
How to do it here? Before actual execution of the tests one by one and rendering
a nice report, the tests framework should get a list of those tests. This stage
is called “discovery”. In unittest
it is usually done implicitly, but can be
called directly by a user as following:
loader = TestLoader()
# start the search at the current working directory
return loader.discover('.')
The snippet above can be wrapped as a function and profiled. And, voila, this does help:
$ ./test_suite.py
1080 function calls (1077 primitive calls) in 1.036 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.036 1.036 <string>:1(<module>)
...
1 0.000 0.000 1.036 1.036 loader.py:145(discover)
...
1 0.000 0.000 1.001 1.001 test_suite.py:12(make_data)
1 0.000 0.000 1.001 1.001 test_suite.py:21(OtherTest)
And the culprit gets detected. It is make_data()
, which was called from the
test case OtherTest
. Let us look at them more closely:
def make_data():
# please bear with me; yes, this is contrived, and comes
# instead of somewhat heavy computation in the real project
sleep(1)
class OtherTest(TestCase):
data = make_data()
def test_other(self):
self.assertTrue(self.data)
Finally, the “aha!” moment. The problem is with the class member data
that is
being initialized by some long running function. This happens at import time or
at discovery stage. And it will run and take its time, even if the OtherTest
is not used at all. Not good. Fortunately, unittest
supports TestCase
initialization for stuff common to all tests:
class OtherTest(TestCase):
@classmethod
def setUpClass(cls):
cls.data = make_data()
def test_other(self):
self.assertTrue(self.data)
Yay. Timing is good again:
$ time ./test_suite.py MyTest.test_me
Ran 1 test in 0.001s
FAILED (failures=1)
real 0m0.095s
user 0m0.064s
sys 0m0.024s
Basically, this is the end. The happy end to the story. But I can’t refrain but to grumble a bit at frameworks. A good unittest harness allows to get the fruits early and at low cost. Same thing with UI or Web server. Just when misbehavior sneaks in, it might be difficult to attach the diagnostic tools to the spots in or under the frame.
As for me, I will continue running tests in a framework. And now I’ve learned and shared with you another bit of Python workings in class initialization.