Generating many integration tests for 2021 taxes #2389

martinholmer · 2023-06-03T20:48:27Z

martinholmer
Jun 3, 2023
Collaborator

PolicyEngine-US is a microsimulation model of US federal and state taxes and benefit programs. Each federal and state income tax module contains many code units, each one of which is tested with unit tests. It would be desirable to subject each tax module to a number of integration tests so that the interaction of the code units in the tax module was tested. The cost of developing even a few integration tests by hand is high because the developer must fill out the tax form by hand to obtain the expected output of each integration test. And for the integration tests to be useful they must test a wide variety of tax filing units with different income sources and demographic attributes, which would require many thousands of different tax filing units for each tax module.

This discussion describes a method that has been used to automate the generation of hundreds of thousands of integration tests for the federal income tax module and for each state income tax module. The method used here is a kind of unguided differential fuzzing, in which the actual taxes generated by PolicyEngine-US are compared with the expected taxes generated by another tax microsimulation model. A difference between actual and expected taxes is a sensitive indicator of (at least) one model calculating taxes incorrectly.

The ability to pursue this method of testing has been made possible by the generosity of Daniel Feenberg, who has made available a private copy of the Fortran source code for TAXSIM35. This microsimulation model has been under development for decades and simulates both federal and state income taxes, making it ideal for generating expected taxes. The testing described here uses the 12/30/22 version of the source code that has been patched periodically when actual versus expected tax differences are found to require minor adjustments to TAXSIM35 logic.

The results of using the method described below are summarized in the following issue:

Resolve integration test failures for 2021 taxes #993

The method involves repeating the following steps:

1. Generating random samples of tax filing units

Using a set of sample assumptions about the range of incomes and the frequency of different family types, a randomly-generated sample of tax filing units is written to a CSV-formatted file suitable for TAXSIM35 input. Each sample contains 100,000 tax filing units and is identified by a letter that signifies the assumption set used to generate the sample.

The testing uses a sequence of samples, each one of which contains more complicated filing units. More complicated means that either additional income sources or additional expenses are added to those present in the prior sample in the sequence.

Two different sequences of samples are used in the testing: the p-through-x sequence (that contains 900,000 different filing units in total) and the e-through-k sequence (that contains 700,000 different filing units in total). Both the x sample and the k sample contain the same income sources and expense types. The details of the sample assumptions in both sequences are described below the test results table in issue #993.

2. Generating expected taxes for each filing unit

A sample file generated in step 1 is used as TAXSIM35 input, which generates a CSV-formatted TAXSIM35 output file.

3. Generating actual taxes for each filing unit

A sample file generated in step 1 is translated into an HDF5-formatted file suitable for PolicyEngine-US input. That input Dataset is used to create a PolicyEngine-Core Microsimulation object in a Python script.

That script can call the Microsimulation object's calc method once (the default behavior) or multiple times in a looping pattern. The optional multiple-calc method of operation handles complex situations that are difficult for PolicyEngine-US to handle in a single calc call. These situation arise when a state's income tax has features that make the calculation of the state income tax and the federal income tax simultaneous, a logical situation that the PolicyEngine core framework does not handle (raising a circular logic error). These situations arise if a state allows itemized deductions only if the taxpayer itemizes on the federal form. And these situations can also arise if the state allows the deduction of federal taxes for all state taxpayers.

The generation of actual taxes is done both ways: with a single direct calc call and with the loop method of operation. The loop method involves these steps: (a) generate federal plus state income taxes assuming all taxpayers itemize on their federal return, (b) generate federal plus state income taxes assuming no taxpayers itemize on their federal return, and (c) generate taxes using the federal itemization decision for each taxpayer that minimizes the sum of taxpayer's federal and state income taxes. In both the (a) and (b) steps, there is an inner loop that assumes the state income tax amount generated in the prior calc loop. This inner loop terminates when the state income tax converges to a fixed point so that the calc generated state income tax is the same as produced in the prior loop.

These two methods of operation produce the same actual taxes in many simple situations.

4. Identifying differences in actual and expected taxes

The actual taxes generated in step 3 are compared with the expected taxes generated in step 2 for each tax filing unit in the sample. A difference in taxes is usually taken to mean actual taxes and expected taxes differing by more than one cent. There are a few special cases in which this one-cent threshold is increased. These cases are listed at the very bottom of issue #993.

5. Extracting a tax filing unit with a difference

The output produced by step 4 is sorted to see which tax filing units have the largest tax differences. Then one of the units with a large difference is extracted from the sample files. This process produces a single-unit CSV-formatted file suitable for TAXSIM35 input and a single-unit YAML-formatted PolicyEngine-US test file.

6. Diagnosing a tax filing unit with a difference

Using the single-taxpayer files produced in step 5, the two tax models are executed in a manner that produces more detailed intermediate output such as adjusted gross income, exemptions, deductions, taxable income, income tax before credits, and various tax credits. A comparison of these two sets of intermediate outputs will indicate the source(s) of the tax difference.

Once the source of the tax difference is identified, the final diagnostic step is to determine which model is incorrect. This is done by filling out the part of the tax form that contains the source of the tax difference. If the expected tax amount is incorrect, the TAXSIM35 source code is patched and steps 2 through 5 are repeated. If the actual tax amount is incorrect, a PolicyEngine-US issue is raised that reports the failing integration test in the YAML-formatted file produced in step 5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating many integration tests for 2021 taxes #2389

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Generating many integration tests for 2021 taxes #2389

martinholmer Jun 3, 2023 Collaborator

1. Generating random samples of tax filing units

2. Generating expected taxes for each filing unit

3. Generating actual taxes for each filing unit

4. Identifying differences in actual and expected taxes

5. Extracting a tax filing unit with a difference

6. Diagnosing a tax filing unit with a difference

Replies: 0 comments

martinholmer
Jun 3, 2023
Collaborator