This is a pattern I use when I want to test a lot of edge cases in a similar way:
import unittest import sys ENCODING = 'ascii' def convert_to_string(arg): return arg.encode(ENCODING) class ConvertToStringTest(unittest.TestCase): """ Example of a test that tests the same testing logic for multiple cases. """ def test_handles_unicode(self): # Check different unicode codepoints # to make sure we don't get an error for codepoint in [0, 1, 2, 127, 128, 129, 130, sys.maxunicode - 2, sys.maxunicode - 1, sys.maxunicode]: try: result = convert_to_string(unichr(codepoint)) except: self.fail('Failed for codepoint {0}'.format(codepoint))
import unittest
import sys
ENCODING = 'ascii'
def convert_to_string(arg):
return arg.encode(ENCODING)
class ConvertToStringTest(unittest.TestCase):
"""
Example of a test that tests the same testing logic
for multiple cases.
"""
def test_handles_unicode(self):
# Check different unicode codepoints
# to make sure we don't get an error
for codepoint in [0, 1, 2, 127, 128, 129, 130,
sys.maxunicode - 2, sys.maxunicode - 1, sys.maxunicode]:
try:
result = convert_to_string(unichr(codepoint))
except:
self.fail('Failed for codepoint {0}'.format(codepoint))The test fails unless you set ENCODING = 'utf-8'.
The for loop lets us re-use the test logic for different edge cases. If the test fails, we include the codepoint under test in the message so we can track down the error.
For numerical edge case boundaries (in this case, the codepoints 0, 128, and sys.maxunicode), I usually test the boundary plus or minus two, where mistakes are most likely to occur.
There’s a temptation here to be really thorough and change the for loop to:
Since the new version iterates over 17 * 2**16 values, execution time will likely increase by at least 1 second. Most of those test cases aren’t likely to find additional errors, but sometimes they do.
For example, for certain codepoints (called surrogates), under certain operating systems (Linux), in certain configurations (when you include a single surrogate, not a pair), this code will raise an exception.
import json # This works: json.loads(json.dumps(unichr(55295))) # This doesn't: json.loads(json.dumps(unichr(55296)))
import json # This works: json.loads(json.dumps(unichr(55295))) # This doesn't: json.loads(json.dumps(unichr(55296)))
The exception it raises is: ValueError: Unpaired high surrogate
Unbelievable but true: json can create an encoding that it cannot load. (As of this writing, there’s an unresolved Python bug report about this, and it will hopefully get patched soon.)