-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about extremely large C-TAM imputed benefits in the CPS data #281
Comments
On Mon, 3 Sep 2018, Martin Holmer wrote:
But it seems to me that including military retirement pensions in vet_ben is incorrect
because as taxable income they should be added to the e01700 variable.
And the fact that vet_ben seems to include largely military (retirement or disability)
pensions and retiree medical benefits raises another question in my mind. Given that
vet_ben are largely deferred compensation for those who served in the military, why
would this kind of income ever be considered for repeal as part of a UBI reform? If
they are thought to be "welfare" (rather than deferred compensation), why didn't the
C-TAM project include the pension benefits and health insurance benefits accruing to
retired federal (or state and local) government employees? If retired government
employees were not a focus in the C-TAM work because they are getting not "welfare"
but deferred compensation, then why was the deferred compensation of those with
military service included in the scope of the C-TAM project?
I agree that veterans benefits are deferred comp and not something that
could be eliminated by UBI.
Dan
|
@martinholmer, the duplicate records relate back to how we currently handle top coding. If a record is flagged for top coding then fifteen copies are made each with the same information except in the top coded values. See here. These imputations are all being done in C-TAM and are then merged with TaxData. |
@andersonfrailey said in taxdata issue #281:
Thanks for the link to the SAS code that splits CPS records with top-coded values into fifteen near-replicate records. I'll look over that code soon, although I have no SAS programming experience. Am I correct in understanding that we now think splitting these records is unnecessary, and that this splitting will be eliminated sometime after you translate the CPS-creation SAS code into Python code? That's my understanding of taxdata issues #253 and #174. Am I thinking about this correctly? |
@andersonfrailey said in taxdata issue #281:
OK, thanks for the information. I'll try looking at the C-TAM code to see if these high-benefit values are taken straight off the raw CPS files or if they are somehow imputed in the C-TAM code. |
@martinholmer Sorry for delayed reply -- I'm still in a chaotic moving trip. I look into the case you bring up in issue #135, specifically the household with ID (h_seq) 41675. It is indeed an imputed TANF receiving family. In the original CPS, this family has four members, with a total wage income of $400,000, while the maximum wage income of CPS TANF recipients, prior to imputation, is $330,000. So in other words, there are two separate issues here. First is that, as you bring up, some benefits are imputed to high-income families. Second is that, some families get move up on the income ladder during the tax-unit create process. To put it another way, the wage distribution of raw CPS is not the same as that of CPS tax unit. The second issue is something we have been aware for a while but I don't remember we have found a clear answer or not. Maybe Anderson can recall better. Back to the case you bring up, 400k is high income range but still quite below a million. The first issue, from my perspective, is intrinsically rooted in the algorithm we use -- we tend to replicate whatever distribution the CPS has at the beginning. In the example of TANF, the proxy we use (paw_val) has high-income recipients, and then we have similar high-income recipients as well. This high-income recipient thing is not unique to TANF. If I remember correctly, SSI and SNAP both have high income recipients in the raw CPS, which is the result after the Census Bureau's data cleaning before they release the datasets. John has offered a scenario where a person could become unemployed for some periods during the year, and thus become eligible for some programs. This is certainly not a full explanation. When I was doing the imputation for SSI, I took out all the high-income recipients, as you can see in the documentation. |
@Amy-Xu said in taxdata issue #281:
My understanding is (via taxdata issues #174 and #253) is that this second issue is a bug in the CPS data creation logic that will be addressed after the programs that create the CPS tax filing units is converted from SAS to Python. @Amy-Xu continued:
You're correct that this is a complex topic. Benefit programs vary in their "filing unit" (which people's circumstances are considered in calculating a benefit) and in their "accounting period" (what period of time is considered in determining the filing unit's circumstances). Many benefit programs (certainly SNAP) have monthly accounting periods, so the point John made about people being low-income in some months of the year is relevant. Maybe the family we're looking at in #135 is low income in some months of the year, but with earnings of $400,000 (before the top-coding logic makes it $1,000,000) for the year, it is hard to believe they were low-income for much (if any) of the year. But the biggest question about the family in #135 is how they can get a TANF benefit of $136,000 for the year. Because if they were low-income for just a few months, they the TANF benefit that they received during those few months was enormous. So, for example, if they received TANF benefits for two months, the annual rate of the benefit was six times $136,000, or an annual rate of benefit receipt of over $800,000. I don't find that believable. A final question about your statement: "the [TANF] proxy we use (paw_val)". What does Thanks for all the explanation. This kind of information needs to be prominent in the C-TAM repository. Did you ever consider using SIPP data to get at this monthly issue? If my memory is correct, SIPP has information about monthly benefit receipt. |
On Wed, 5 Sep 2018, Martin Holmer wrote:
@Amy-Xu said in taxdata issue #281:
You're correct that this is a complex topic. Benefit programs vary in their
"filing unit" (which people's circumstances are considered in calculating a
benefit) and in their "accounting period" (what period of time is considered
in determining the filing unit's circumstances). Many benefit programs
(certainly SNAP) have monthly accounting periods, so the point John made
about people being low-income in some months of the year is relevant.
Maybe the family we're looking at in #135 is low income in some months of
the year, but with earnings of $400,000 (before the top-coding logic makes
it $1,000,000) for the year, it is hard to believe they were low-income for
much (if any) of the year. But the biggest question about the family in #135
is how they can get a TANF benefit of $136,000 for the year. Because if they
were low-income for just a few months, they the TANF benefit that they
received during those few months was enormous. So, for example, if they
received TANF benefits for two months, the annual rate of the benefit was
six times $136,000, or an annual rate of benefit receipt of over $800,000. I
don't find that believable.
I expect this is keypunching dollars and cents as dollars. $8,000 TANF and
$4,000 earnings seems more likely. In the CPS there should be an hours
worked and occupation for comparison also. Is there a small child in the
household?
Dan
|
@martinholmer In terms of the benefits received, I do agree it looks way too large. A few month ago, I was 'fixing' TANF imputation because this issue without contemplating much about potential outcomes. Looking at the extra large benefit amount, I think at least partially it is due to the non-cash benefits. Before the fix, the imputation only includes the so-called 'assistance' portion of TANF, which probably includes cash and non-cash already. After the fix, most of the added benefits, I'm afraid, is not cash. Originally this UBI project aimed at imputing cash benefits only, because that's how the MTRs (benefit reduction rates) come in play. But later on it seems the cash part quietly faded away as we were adding programs like housing. In the case of TANF, the imputed cash and non-cash benefits could get really confusing and possibly misleading. One of the difficulties in TANF imputation has been picking out cash from non-cash benefits. Now it seems quite debatable whether this non-cash benefits should be assigned to participants, if somehow cash and non-cash can be separated. |
@martinholmer also suggest
Agreed. I was hesitant mostly because a good portion of explanation is just speculations about what's going on in the raw CPS. And frankly I have found no definitive answer to date. What do you think is the best way to highlight this part of the information? |
@andersonfrailey and @Amy-Xu, Now that taxdata pull requests #178 (fix TANF values), #185 (use Medicare and Medicaid actuarial values) and #278 (ignore veterans benefits in the distribution of other benefits to filing units) have been merged over the past month or so, I've spent some time looking at filing units that have what seem to me to be extremely large benefits.
I've found CPS records to look at by using a two-step process. First, the Python script below is used to find
RECID
values for filing units that have large benefits in thecps.csv.gz
file. Second, the non-zero variables in each of those records are produced using thecsv_show.sh
bash script, which is part of the Tax-Calculator repository.One filing unit found in this way is shown in my recent comment on taxdata pull request #135. That filing unit has an imputed TANF benefit of about $136,000 even though the taxpayer and spouse have combined earnings of over one million dollars.
Looking at the filing units with large
tanf_ben
andvet_ben
values raises a question about how the CPS filing units are constructed. Among those with extremely largetanf_ben
andvet_ben
values are two different groups of fifteen records, all of whom appear to have exactly the same demographics and earnings (but different unearned incomes) and exactly the same large benefit. What's going on here? Why do the CPS data include these nearly identical records? Why are there fifteen near replicates? Where are the fifteen near replicates created in the code?But quite apart from the groups of fifteen nearly identical filing units, I don't understand how people with high incomes can be thought to be getting TANF benefits. Is that imputation being done in C-TAM code or in taxdata code?
The one filing unit (represented by fifteen near replicates) with a very large
vet_ben
value could plausibly be a retired three-star general with somewhere around 35 years of service as @feenberg suggested in C-TAM issue 73. The taxpayer is 57 years old and has avet_ben
value of $169,920. That amount includes our estimate of the actuarial value of access to the VA hospital system, which is about $9,890. So, the amount of what seems to be a pension for military service is roughly $160,000 per year.But it seems to me that including military retirement pensions in
vet_ben
is incorrect because as taxable income they should be added to thee01700
variable.And the fact that
vet_ben
seems to include largely military (retirement or disability) pensions and retiree medical benefits raises another question in my mind. Given thatvet_ben
are largely deferred compensation for those who served in the military, why would this kind of income ever be considered for repeal as part of a UBI reform? If they are thought to be "welfare" (rather than deferred compensation), why didn't the C-TAM project include the pension benefits and health insurance benefits accruing to retired federal (or state and local) government employees? If retired government employees were not a focus in the C-TAM work because they are getting not "welfare" but deferred compensation, then why was the deferred compensation of those with military service included in the scope of the C-TAM project?Now the details. First, the Python script called
bentab.py
:And now the output from that Python script:
@MattHJensen @MaxGhenis
The text was updated successfully, but these errors were encountered: