Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data: Replace missing values with a calculation based on other cells

    Hi,

    I have difficulties in replacing certain missing values with a calculation based on two other cells. In order to explain, let me use a sketched example. In my data set I have missing values for the variable FR in 2011 and 2012 for every observed unit, denoted ID. To explain the mechanism of the variable: FRt takes the value FRt-1 - DFt. As we see below, for ID#1, FR2010 is FR2009 - DF2010. I need to generate a new FR variable (or replace missing values) that for every missing value in 2011 and 2012, by ID, take the cell above [_n-1] minus the cell to the left. However, I am uncertain how to refer to the cell to the left properly - I presume I can utilise the row number.

    ID Year DF FR
    1 2009 0 1
    1 2010 0.50 0.50
    1 2011 0.25 (missing)
    1 2012 0 (missing)
    2 2009 0.30 0.70
    2 2010 0.40 0.30
    2 2011 0 (missing)
    2 2012 0.15 (missing)
    3 2009 0 1
    3 2010 0 1
    3 2011 0.90 (missing)
    3 2012 0.07 (missing)


















    I hope this explanation was clear.

    I would guess I could start my code with the following:
    Code:
    bysort ID (Year) : [...]

    Thanks in advance for any help.



  • #2
    Morten:
    you may want to try:
    Code:
    bysort ID (Year): replace FR =FR[_n-1]-DF[_n] if FR==.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Stata certainly has the notion of a previous observation but is indifferent to the order of variables (columns, in spreadsheet terms) in the dataset. You may want something like this:


      Code:
      clear
      input byte ID int Year float DF double FR
      1 2009   0  1
      1 2010  .5 .5
      1 2011 .25  .
      1 2012   0  .
      2 2009  .3 .7
      2 2010  .4 .3
      2 2011   0  .
      2 2012 .15  .
      3 2009   0  1
      3 2010   0  1
      3 2011  .9  .
      3 2012 .07  .
      end
      
      bysort ID (Year) : replace FR = FR[_n-1] - DF if missing(FR)
      
      list, sepby(ID)
      
      
           +-----------------------------+
           | ID   Year    DF          FR |
           |-----------------------------|
        1. |  1   2009     0           1 |
        2. |  1   2010    .5          .5 |
        3. |  1   2011   .25         .25 |
        4. |  1   2012     0         .25 |
           |-----------------------------|
        5. |  2   2009    .3          .7 |
        6. |  2   2010    .4          .3 |
        7. |  2   2011     0          .3 |
        8. |  2   2012   .15   .14999999 |
           |-----------------------------|
        9. |  3   2009     0           1 |
       10. |  3   2010     0           1 |
       11. |  3   2011    .9   .10000002 |
       12. |  3   2012   .07   .03000002 |
           +-----------------------------+
      The numerous decimal places in the new values are just a side-effect of precision: a different display format would be a good idea.

      More crucially,

      1. For future questions, please post data examples using dataex. Yours is helpful, but dataex examples are easier to use.

      2. For this question, see http://www.stata.com/support/faqs/da...issing-values/ for a discussion of what the code above does exactly.

      Comment


      • #4
        Dear all,
        I have my data in the following format.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double(proholy fiiy) float(quarter code)
            .     . 164 1
            .     . 165 1
            .     . 166 1
            .     . 167 1
            .     . 168 1
            .     . 169 1
            .     . 170 1
            .     . 171 1
            .     . 172 1
            .     . 173 1
            .     . 174 1
            .     . 175 1
            .     . 176 1
            .     . 177 1
            .     . 178 1
            .     . 179 1
            .     . 180 1
            .     . 181 1
            .     . 182 1
            .     . 183 1
            .     . 184 1
            .     . 185 1
            .     . 186 1
            .     . 187 1
            .     . 188 1
            .     . 189 1
            .     . 190 1
            .     . 191 1
            .     . 192 1
        42.54  9.95 193 1
        42.54  2.06 194 1
            .     . 195 1
        47.54  2.05 196 1
        47.46   .46 197 1
        50.61   .29 198 1
        47.21  2.05 199 1
        51.93   .29 200 1
        53.12   .29 201 1
        54.13   .29 202 1
        51.75   .29 203 1
        54.99   .29 204 1
        57.09   .29 205 1
         58.6     0 206 1
        54.24   .29 207 1
        59.65     0 208 1
        61.91     0 209 1
        62.53     0 210 1
        59.65     0 211 1
        63.76     0 212 1
        63.74     0 213 1
        59.68     0 214 1
        63.76   .01 215 1
         54.2     0 216 1
        52.34     0 217 1
         54.2     0 218 1
         55.3     0 219 1
        54.02     0 223 1
            .     . 164 2
            .     . 165 2
            .     . 166 2
            .     . 167 2
            .     . 168 2
            .     . 169 2
            .     . 170 2
            .     . 171 2
            .     . 172 2
            .     . 173 2
            .     . 174 2
            .     . 175 2
            .     . 176 2
            .     . 177 2
            .     . 178 2
            .     . 179 2
        54.29   6.4 180 2
        54.25  4.45 181 2
        54.21  3.59 182 2
        92.45     0 183 2
        53.92  4.08 184 2
        48.41  6.24 185 2
        48.32  4.88 186 2
        54.04  3.46 187 2
        41.69  4.69 188 2
        39.73  5.66 189 2
         39.6  6.55 190 2
        45.85  4.53 191 2
        39.52  4.77 192 2
        39.49  4.99 193 2
        39.49  5.92 194 2
        39.55  5.53 195 2
        39.46  7.74 196 2
        29.97 16.59 197 2
        23.61  8.08 198 2
        39.49  4.41 199 2
        20.35  8.22 200 2
        20.34  8.05 201 2
        20.34 11.69 202 2
        23.13  9.48 203 2
        20.33 14.15 204 2
        20.33 12.98 205 2
        20.33   4.5 206 2
        end
        format %tq quarter

        What if I want to replace missing values of fiiy and proholy for quarter 4 of every year with the previous quarter 3, (if 3 also missing, take the nearest non-missing quarter in the two previous quarters only ). Firm id is Code.
        Help is highly appreciated.

        Comment


        • #5
          Based on Lazzaro's and Cox's reply, I have tried the following codes, but it resulted in error(103) unknown function fiiy()
          HTML Code:
          bysort code ( quarter ): replace fiiy= fiiy(_n-1) if missing(fiiy)
          bysort code ( quarter ): replace fiiy= fiiy(_n-1) if fiiy==.
          Can someone help me ??

          Comment


          • #6
            Details matter. Look carefully at what you typed and then look carefully at what Nick and Carlo told you in #3 and #2. You are using parentheses around the _n-1, but what you were shown, and what is needed, are square brackets [ ]. Re-examine #2 and #3 and follow those models exactly. Where they use parentheses, you should use them. Where they use square brackets, you should use those.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              Details matter. Look carefully at what you typed and then look carefully at what Nick and Carlo told you in #3 and #2. You are using parentheses around the _n-1, but what you were shown, and what is needed, are square brackets [ ]. Re-examine #2 and #3 and follow those models exactly. Where they use parentheses, you should use them. Where they use square brackets, you should use those.
              Thanks Clyde.. I missed that minor change. Anyway its working now.

              Comment


              • #8
                Priyesh VP Since the essence of the solution to your problem was to use a subscript to reference the value of your fiiy variable in the previous observation, you might want to review the output of help subscripting to better understand how subscripts are used in Stata, and to understand that, in Stata, square brackets [ ] have a different meaning than parentheses ( ) which in turn have a different meaning than curly braces { }. A big change from mathematics, where we often use them interchangeably, but in Stata, they have different meaning.

                Comment


                • #9
                  In continuation with my question #4, the code suggested by Nick and Carlo works fine. But it checks only the previous quarter (n-1), how can I modify the following code to accommodate my requirement; ie, I want the code to look back (n-1) if "n" is missing, if (n-1) is also missing it should check (n-2). In nutshell, I want the code to check until (n-4) and stop, if n,n-1,n-2 and n-3 are missing..

                  HTML Code:
                    bysort code ( quarter ): replace fiiy= fiiy[_n-1] if missing(fiiy)

                  Comment


                  • #10
                    Priyesh VP: no, that is wrong. That code copies in a cascade, so it can do much more than copy the immediately previous value. For discussion. see https://www.stata.com/support/faqs/d...issing-values/

                    If you want to insist that at most the previous 4 values be inspected, you need something else.

                    Code:
                    sort code quarter
                    clonevar mycopy = fiiy 
                    forval lag = 1/4 {  
                         by code: replace fiiy = mycopy[_n-`lag'] if missing(fiiy)  
                    }
                    Last edited by Nick Cox; 26 Sep 2017, 05:41.

                    Comment


                    • #11
                      my dataset looks like this
                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      year iso3_o iso3_d contig
                      1995 "ARE" "ARG" 0
                      1995 "ARE" "ATG" 0
                      1995 "ARE" "BHR" 0
                      1995 "ARE" "BRA" 0
                      1995 "ARE" "BRB" 0
                      1995 "ARE" "BRN" 0
                      1995 "ARE" "BWA" 0
                      1995 "ARE" "CHL" 0
                      1995 "ARE" "CHN" 0
                      1995 "ARE" "COL" 0
                      1995 "ARE" "CRI" 0
                      1995 "ARE" "CUB" 0
                      1995 "ARE" "DMA" 0
                      1995 "ARE" "DOM" 0
                      1995 "ARE" "ECU" 0
                      1995 "ARE" "GAB" 0
                      1995 "ARE" "GRD" 0
                      1995 "ARE" "GUY" 0
                      1995 "ARE" "HKG" 0
                      1995 "ARE" "KAZ" 0
                      1995 "ARE" "KNA" 0
                      1995 "ARE" "KWT" 0
                      1995 "ARE" "LCA" 0
                      1995 "ARE" "MAC" 0
                      1995 "ARE" "MDV" 0
                      1995 "ARE" "MEX" 0
                      1995 "ARE" "MUS" 0
                      1995 "ARE" "MYS" 0
                      1995 "ARE" "OMN" 1
                      1995 "ARE" "PAN" 0
                      1995 "ARE" "PER" 0
                      1995 "ARE" "QAT" 1
                      1995 "ARE" "SAU" 1
                      1995 "ARE" "SGP" 0
                      1995 "ARE" "SUR" 0
                      1995 "ARE" "SYC" 0
                      1995 "ARE" "THA" 0
                      1995 "ARE" "TTO" 0
                      1995 "ARE" "TUR" 0
                      1995 "ARE" "URY" 0
                      1995 "ARE" "VCT" 0
                      Now that i want to add years 2020, 2021 and 2022 to the dataset, i want to update "contig" variable with values same as in year 1995.
                      how can i do that. TIA
                      Last edited by Dr. Iqra Yaseen; 08 Jun 2023, 06:24.

                      Comment


                      • #12
                        Code:
                        isid iso3_d iso3_d year, sort
                        expand 4 if year == 1995
                        by iso3_d iso3_d (year), sort: replace year = 2018 + _n if _n >= 2
                        isid iso3_d iso3_d year, sort
                        Notes: In your example data, the only value of year is 1995. This code will work even if there are other years as well.

                        Your -dataex- example is not, in fact, real -dataex- output. You either tried to imitate -dataex- output, or you edited some. Please don't do that! The -dataex- you showed does not work: when you try to use it to create a data set, it gives error messages and creates nothing. That defeats the purpose of using -dataex-.

                        Comment


                        • #13
                          thanks Clyde Schechter ....but i have some confusion related to the code you provided 1. the first line of the code contains iso3_d twice...is it like so or it should be iso3_o and iso3_d
                          2. line no 2 says expand 2...why 2...i am asking this just to understand the code... because i don't just wanna copy and paste the code without any understanding
                          3. line no 3 says replace year = 2018....but i want to replace missing data for year = 2020, 2021 and 2022
                          4. again line no 4 codes iso3_d twice
                          these points have been mentioned just to understand the code...thanks

                          Comment


                          • #14
                            but i have some confusion related to the code you provided 1. the first line of the code contains iso3_d twice...is it like so or it should be iso3_o and iso3_d
                            Sorry, my mistake. You are right, it should say iso3_o iso3_d, and the same is true in lines 3 and 4. (I think this was a typo in line 1 that I then propgated by using copy/paste.)

                            line no 2 says expand 2...why 2...i am asking this just to understand the code... because i don't just wanna copy and paste the code without any understanding
                            No, it doesn't say -expand 2-, it says -expand 4-. And the reason it's 4 is I'm trying to create four observations (1995, 2020, 2021, 2022) out of each original 1995 observation.

                            line no 3 says replace year = 2018....but i want to replace missing data for year = 2020, 2021 and 2022
                            Yes, but you have overlooked (replaced with ....) the most important part of that line of code: - + _n if _n >= 2 -. When _n == 2, we get year = 2018 + 2 = 2020. When _n == 3, we get year = 2018 + 3 = 2021, and when _n == 4, we get year = 2018 + 4 = 2022.


                            Comment


                            • #15
                              thanks again Clyde Schechter ...will give it a try.....

                              Comment

                              Working...
                              X