Differencing cells across observations

Lorens Elvang

Join Date: Apr 2023

Posts: 22
#1

Differencing cells across observations

28 Jun 2023, 11:26

Hi,

I have two variables called: morder_treated and morder_control, which is the order the observations were added to the dataset previously for respectively treated and control units (ETS == 1 for treated). Furthermore i have the variables: green_pat9903 and green_pat0412.
Each treated unit has a morder_treated value and is paired to the unit that has the same value on morder_control, for example if the treated has morder_treated == 1 then it is paired to the observation that has morder_control == 1.

For example:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input double(ETS green_pat9903 green_pat0412 morder_treated morder_control) 1 0 0 1 203.5 1 0 0 2 204.5 0 0 0 103.5 1 0 0 0 104.5 2 end

Here the first observation is paired to the third, and the second to the fourth.

I want to compute: green_pat0412 (if ETS == 1) - green_pa9903 (if ETS == 1) - (green_pat0412 (if ETS == 0) - green_pat9903 (if ETS == 0)). That is, the difference in post-pre for treated units minus the difference in post-pre for control units.

I'm struggling with telling stata which observation it should be paired to.

Any suggestions to how to go about this?

// Lorens
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30166
#2

28 Jun 2023, 17:14

On condition that for each case (ETS == 1) there is at most one control (ETS == 0) observation whose morder_control value matches the case's morder_treated value, the following will do what you ask:

Code:

frame put _all if ETS == 0, into(controls) frame controls { rename green* ctrl_green* isid morder_control, sort } frlink 1:1 morder_treated, frame(controls morder_control) frget ctrl*, from(controls) gen wanted = (green_pat0412-green_pat9903) - (ctrl_green_pat0412-green_pat9903)

Note: the code verifies the assumption about at most one matching control and will break with an assertion is false error message if it is not correct.

Requires version 16 or later because it uses frames.

Added: I just realized there is a problem. morder_* are not integers. The non-integer values shown in the example data won't cause a problem, because the decimal part is, in both cases, 0.5. But, more generally, exact matching of floating point numbers is treacherous and missed matches arise commonly. This is because most decimal fractions (other than those whose denominators are powers of 2) have no exact finite binary representation (just as 1/3 has no exact finite decimal representation). Consequently if one of the values of a variable is supposed to be, say, 5.7, the actual value Stata works with will be the closest 4 byte approximation to 5.7. Now, if morder_treated and morder_control were created in exactly the same way, then those closest approximations should be equal. But if they were calculated in different ways, with the possibility of different rounding of intermediate results of the calculation, then they might not exactly be equal. Here's a frightening example:

Code:

. assert 5.7 == 5.6 + 0.1 assertion is false r(9);

So if there are floating point mismatches, this code won't work, but neither will anything else with this data. You would either have to use different code that allowed inexact matches, or you would have to do something to replace the non-integer values with integers, or integers + fractions whose denominators are powers of 2.

Last edited by Clyde Schechter; 28 Jun 2023, 17:24.
Comment
Lorens Elvang

Join Date: Apr 2023

Posts: 22
#3

29 Jun 2023, 02:21

Hello Clyde,

This worked wonders!

As morder_* was generated in stata they were created in exactly the same way, so I had no problems with matching the observations.

Just in case anyone needs to use similar code in the future, there's a small mistake in: gen wanted = (green_pat0412-green_pat9903) - (ctrl_green_pat0412-green_pat9903).
green_pat9903 in the parenthesis on the rhs. should be ctrl_green_pat9903.

Thank you very much for the help
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30166
#4

29 Jun 2023, 09:11

And thank you for fixing the error!
Comment

Announcement

Differencing cells across observations

Comment

Comment

Comment