Panel Data: Replace missing values with a calculation based on other cells

Morten Mejer

Join Date: Mar 2017
Posts: 6

Panel Data: Replace missing values with a calculation based on other cells

26 Mar 2017, 07:52

Hi,

I have difficulties in replacing certain missing values with a calculation based on two other cells. In order to explain, let me use a sketched example. In my data set I have missing values for the variable FR in 2011 and 2012 for every observed unit, denoted ID. To explain the mechanism of the variable: FR_t takes the value FR_t-1 - DF_t. As we see below, for ID#1, FR₂₀₁₀ is FR₂₀₀₉ - DF₂₀₁₀. I need to generate a new FR variable (or replace missing values) that for every missing value in 2011 and 2012, by ID, take the cell above [_n-1] minus the cell to the left. However, I am uncertain how to refer to the cell to the left properly - I presume I can utilise the row number.

ID	Year	DF	FR
1	2009	0	1
1	2010	0.50	0.50
1	2011	0.25	(missing)
1	2012	0	(missing)
2	2009	0.30	0.70
2	2010	0.40	0.30
2	2011	0	(missing)
2	2012	0.15	(missing)
3	2009	0	1
3	2010	0	1
3	2011	0.90	(missing)
3	2012	0.07	(missing)

I hope this explanation was clear.

I would guess I could start my code with the following:

Code:

bysort ID (Year) : [...]

Thanks in advance for any help.

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17679
#2

26 Mar 2017, 08:25

Morten:
you may want to try:

Code:

bysort ID (Year): replace FR =FR[_n-1]-DF[_n] if FR==.

Kind regards,
Carlo
(Stata 19.0)
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35467

26 Mar 2017, 08:47

Stata certainly has the notion of a previous observation but is indifferent to the order of variables (columns, in spreadsheet terms) in the dataset. You may want something like this:

Code:

clear
input byte ID int Year float DF double FR
1 2009   0  1
1 2010  .5 .5
1 2011 .25  .
1 2012   0  .
2 2009  .3 .7
2 2010  .4 .3
2 2011   0  .
2 2012 .15  .
3 2009   0  1
3 2010   0  1
3 2011  .9  .
3 2012 .07  .
end

bysort ID (Year) : replace FR = FR[_n-1] - DF if missing(FR)

list, sepby(ID)


     +-----------------------------+
     | ID   Year    DF          FR |
     |-----------------------------|
  1. |  1   2009     0           1 |
  2. |  1   2010    .5          .5 |
  3. |  1   2011   .25         .25 |
  4. |  1   2012     0         .25 |
     |-----------------------------|
  5. |  2   2009    .3          .7 |
  6. |  2   2010    .4          .3 |
  7. |  2   2011     0          .3 |
  8. |  2   2012   .15   .14999999 |
     |-----------------------------|
  9. |  3   2009     0           1 |
 10. |  3   2010     0           1 |
 11. |  3   2011    .9   .10000002 |
 12. |  3   2012   .07   .03000002 |
     +-----------------------------+

The numerous decimal places in the new values are just a side-effect of precision: a different display format would be a good idea.

More crucially,

1. For future questions, please post data examples using dataex. Yours is helpful, but dataex examples are easier to use.

2. For this question, see http://www.stata.com/support/faqs/da...issing-values/ for a discussion of what the code above does exactly.

Comment

Priyesh VP

Join Date: Jul 2016
Posts: 38

23 Sep 2017, 13:05

Dear all,
I have my data in the following format.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(proholy fiiy) float(quarter code)
    .     . 164 1
    .     . 165 1
    .     . 166 1
    .     . 167 1
    .     . 168 1
    .     . 169 1
    .     . 170 1
    .     . 171 1
    .     . 172 1
    .     . 173 1
    .     . 174 1
    .     . 175 1
    .     . 176 1
    .     . 177 1
    .     . 178 1
    .     . 179 1
    .     . 180 1
    .     . 181 1
    .     . 182 1
    .     . 183 1
    .     . 184 1
    .     . 185 1
    .     . 186 1
    .     . 187 1
    .     . 188 1
    .     . 189 1
    .     . 190 1
    .     . 191 1
    .     . 192 1
42.54  9.95 193 1
42.54  2.06 194 1
    .     . 195 1
47.54  2.05 196 1
47.46   .46 197 1
50.61   .29 198 1
47.21  2.05 199 1
51.93   .29 200 1
53.12   .29 201 1
54.13   .29 202 1
51.75   .29 203 1
54.99   .29 204 1
57.09   .29 205 1
 58.6     0 206 1
54.24   .29 207 1
59.65     0 208 1
61.91     0 209 1
62.53     0 210 1
59.65     0 211 1
63.76     0 212 1
63.74     0 213 1
59.68     0 214 1
63.76   .01 215 1
 54.2     0 216 1
52.34     0 217 1
 54.2     0 218 1
 55.3     0 219 1
54.02     0 223 1
    .     . 164 2
    .     . 165 2
    .     . 166 2
    .     . 167 2
    .     . 168 2
    .     . 169 2
    .     . 170 2
    .     . 171 2
    .     . 172 2
    .     . 173 2
    .     . 174 2
    .     . 175 2
    .     . 176 2
    .     . 177 2
    .     . 178 2
    .     . 179 2
54.29   6.4 180 2
54.25  4.45 181 2
54.21  3.59 182 2
92.45     0 183 2
53.92  4.08 184 2
48.41  6.24 185 2
48.32  4.88 186 2
54.04  3.46 187 2
41.69  4.69 188 2
39.73  5.66 189 2
 39.6  6.55 190 2
45.85  4.53 191 2
39.52  4.77 192 2
39.49  4.99 193 2
39.49  5.92 194 2
39.55  5.53 195 2
39.46  7.74 196 2
29.97 16.59 197 2
23.61  8.08 198 2
39.49  4.41 199 2
20.35  8.22 200 2
20.34  8.05 201 2
20.34 11.69 202 2
23.13  9.48 203 2
20.33 14.15 204 2
20.33 12.98 205 2
20.33   4.5 206 2
end
format %tq quarter

What if I want to replace missing values of fiiy and proholy for quarter 4 of every year with the previous quarter 3, (if 3 also missing, take the nearest non-missing quarter in the two previous quarters only ). Firm id is Code.
Help is highly appreciated.

Comment

Priyesh VP

Join Date: Jul 2016

Posts: 38
#5

23 Sep 2017, 23:09

Based on Lazzaro's and Cox's reply, I have tried the following codes, but it resulted in error(103) unknown function fiiy()

HTML Code:

bysort code ( quarter ): replace fiiy= fiiy(_n-1) if missing(fiiy) bysort code ( quarter ): replace fiiy= fiiy(_n-1) if fiiy==.

Can someone help me ??
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29968
#6

23 Sep 2017, 23:29

Details matter. Look carefully at what you typed and then look carefully at what Nick and Carlo told you in #3 and #2. You are using parentheses around the _n-1, but what you were shown, and what is needed, are square brackets [ ]. Re-examine #2 and #3 and follow those models exactly. Where they use parentheses, you should use them. Where they use square brackets, you should use those.
2 likes
Comment
Priyesh VP

Join Date: Jul 2016

Posts: 38
#7

24 Sep 2017, 00:28

Originally posted by Clyde Schechter View Post

Details matter. Look carefully at what you typed and then look carefully at what Nick and Carlo told you in #3 and #2. You are using parentheses around the _n-1, but what you were shown, and what is needed, are square brackets [ ]. Re-examine #2 and #3 and follow those models exactly. Where they use parentheses, you should use them. Where they use square brackets, you should use those.

Thanks Clyde.. I missed that minor change. Anyway its working now.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

24 Sep 2017, 06:16

Priyesh VP Since the essence of the solution to your problem was to use a subscript to reference the value of your fiiy variable in the previous observation, you might want to review the output of help subscripting to better understand how subscripts are used in Stata, and to understand that, in Stata, square brackets [ ] have a different meaning than parentheses ( ) which in turn have a different meaning than curly braces { }. A big change from mathematics, where we often use them interchangeably, but in Stata, they have different meaning.
1 like
Comment
Priyesh VP

Join Date: Jul 2016

Posts: 38
#9

26 Sep 2017, 03:49

In continuation with my question #4, the code suggested by Nick and Carlo works fine. But it checks only the previous quarter (n-1), how can I modify the following code to accommodate my requirement; ie, I want the code to look back (n-1) if "n" is missing, if (n-1) is also missing it should check (n-2). In nutshell, I want the code to check until (n-4) and stop, if n,n-1,n-2 and n-3 are missing..

HTML Code:

bysort code ( quarter ): replace fiiy= fiiy[_n-1] if missing(fiiy)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35467
#10

26 Sep 2017, 05:31

Priyesh VP: no, that is wrong. That code copies in a cascade, so it can do much more than copy the immediately previous value. For discussion. see https://www.stata.com/support/faqs/d...issing-values/

If you want to insist that at most the previous 4 values be inspected, you need something else.

Code:

sort code quarter clonevar mycopy = fiiy forval lag = 1/4 { by code: replace fiiy = mycopy[_n-`lag'] if missing(fiiy) }

Last edited by Nick Cox; 26 Sep 2017, 05:41.
2 likes
Comment

Dr. Iqra Yaseen

Join Date: Jun 2022
Posts: 89

#11

08 Jun 2023, 06:21

my dataset looks like this

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
year iso3_o iso3_d contig
1995 "ARE" "ARG" 0
1995 "ARE" "ATG" 0
1995 "ARE" "BHR" 0
1995 "ARE" "BRA" 0
1995 "ARE" "BRB" 0
1995 "ARE" "BRN" 0
1995 "ARE" "BWA" 0
1995 "ARE" "CHL" 0
1995 "ARE" "CHN" 0
1995 "ARE" "COL" 0
1995 "ARE" "CRI" 0
1995 "ARE" "CUB" 0
1995 "ARE" "DMA" 0
1995 "ARE" "DOM" 0
1995 "ARE" "ECU" 0
1995 "ARE" "GAB" 0
1995 "ARE" "GRD" 0
1995 "ARE" "GUY" 0
1995 "ARE" "HKG" 0
1995 "ARE" "KAZ" 0
1995 "ARE" "KNA" 0
1995 "ARE" "KWT" 0
1995 "ARE" "LCA" 0
1995 "ARE" "MAC" 0
1995 "ARE" "MDV" 0
1995 "ARE" "MEX" 0
1995 "ARE" "MUS" 0
1995 "ARE" "MYS" 0
1995 "ARE" "OMN" 1
1995 "ARE" "PAN" 0
1995 "ARE" "PER" 0
1995 "ARE" "QAT" 1
1995 "ARE" "SAU" 1
1995 "ARE" "SGP" 0
1995 "ARE" "SUR" 0
1995 "ARE" "SYC" 0
1995 "ARE" "THA" 0
1995 "ARE" "TTO" 0
1995 "ARE" "TUR" 0
1995 "ARE" "URY" 0
1995 "ARE" "VCT" 0

Now that i want to add years 2020, 2021 and 2022 to the dataset, i want to update "contig" variable with values same as in year 1995.
how can i do that. TIA

Last edited by Dr. Iqra Yaseen; 08 Jun 2023, 06:24.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29968
#12

08 Jun 2023, 08:41

Code:

isid iso3_d iso3_d year, sort expand 4 if year == 1995 by iso3_d iso3_d (year), sort: replace year = 2018 + _n if _n >= 2 isid iso3_d iso3_d year, sort

Notes: In your example data, the only value of year is 1995. This code will work even if there are other years as well.

Your -dataex- example is not, in fact, real -dataex- output. You either tried to imitate -dataex- output, or you edited some. Please don't do that! The -dataex- you showed does not work: when you try to use it to create a data set, it gives error messages and creates nothing. That defeats the purpose of using -dataex-.
1 like
Comment
Dr. Iqra Yaseen

Join Date: Jun 2022

Posts: 89
#13

08 Jun 2023, 10:52

thanks Clyde Schechter ....but i have some confusion related to the code you provided 1. the first line of the code contains iso3_d twice...is it like so or it should be iso3_o and iso3_d
2. line no 2 says expand 2...why 2...i am asking this just to understand the code... because i don't just wanna copy and paste the code without any understanding
3. line no 3 says replace year = 2018....but i want to replace missing data for year = 2020, 2021 and 2022
4. again line no 4 codes iso3_d twice
these points have been mentioned just to understand the code...thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29968
#14

08 Jun 2023, 11:00

but i have some confusion related to the code you provided 1. the first line of the code contains iso3_d twice...is it like so or it should be iso3_o and iso3_d

Sorry, my mistake. You are right, it should say iso3_o iso3_d, and the same is true in lines 3 and 4. (I think this was a typo in line 1 that I then propgated by using copy/paste.)

line no 2 says expand 2...why 2...i am asking this just to understand the code... because i don't just wanna copy and paste the code without any understanding

No, it doesn't say -expand 2-, it says -expand 4-. And the reason it's 4 is I'm trying to create four observations (1995, 2020, 2021, 2022) out of each original 1995 observation.

line no 3 says replace year = 2018....but i want to replace missing data for year = 2020, 2021 and 2022

Yes, but you have overlooked (replaced with ....) the most important part of that line of code: - + _n if _n >= 2 -. When _n == 2, we get year = 2018 + 2 = 2020. When _n == 3, we get year = 2018 + 3 = 2021, and when _n == 4, we get year = 2018 + 4 = 2022.
Comment
Dr. Iqra Yaseen

Join Date: Jun 2022

Posts: 89
#15

08 Jun 2023, 11:05

thanks again Clyde Schechter ...will give it a try.....
Comment

Announcement

Panel Data: Replace missing values with a calculation based on other cells

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment