Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to replace a missing value with a first non-missing forward value

    I have the following data as specified below where ¨permno” is used as a firm identifier. I need to replace all missing observations of “fyear” with the first forward non-missing observation. That is, for permno 54594 “fyear” should be replaced with “1998” for all observation before fyear==1998 and so on. Please, help me with this issue.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double permno long date double fyear str8 gvkey
    54594 14273    . "001004"  
    54594 14301    . "001004"  
    54594 14334    . "001004"  
    54594 14364    . "001004"  
    54594 14392 1998 "001004"  
    54594 14425    . "001004"  
    54594 14455    . "001004"  
    54594 14487    . "001004"  
    54594 14517    . "001004"  
    54594 14546    . "001004"  
    54594 14578    . "001004"  
    54594 14609    . "001004"  
    54594 14640    . "001004"  
    54594 14669    . "001004"  
    54594 14700    . "001004"  
    54594 14728    . "001004"  
    54594 14761 1999 "001004"  
    54594 14791    . "001004"  
    54594 14822    . "001004"  
    54594 14853    . "001004"  
    54594 14882    . "001004"  
    54594 14914    . "001004"  
    54594 14944    . "001004"  
    54594 14973    . "001004"  
    54594 15006    . "001004"  
    54594 15034    . "001004"  
    54594 15064    . "001004"  
    54594 15095    . "001004"  
    54594 15126 2000 "001004"  
    54594 15155    . "001004"  
    54594 15187    . "001004"  
    54594 15218    . "001004"  
    54594 15246    . "001004"  
    54594 15279    . "001004"  
    54594 15309    . "001004"  
    54594 15340    . "001004"  
    54594 15371    . "001004"  
    54594 15399    . "001004"  
    54594 15427    . "001004"  
    54594 15460    . "001004"  
    54594 15491 2001 "001004"  
    14322 14425    . "81238710"
    14322 14455    . "81238710"
    14322 14487    . "81238710"
    14322 14517    . "81238710"
    14322 14546    . "81238710"
    14322 14578    . "81238710"
    14322 14609 1999 "81238710"
    14322 14640    . "81238710"
    14322 14669    . "81238710"
    14322 14700    . "81238710"
    14322 14728    . "81238710"
    14322 14761    . "81238710"
    14322 14791    . "81238710"
    14322 14822    . "81238710"
    14322 14853    . "81238710"
    14322 14882    . "81238710"
    14322 14914    . "81238710"
    14322 14944    . "81238710"
    14322 14973 2000 "81238710"
    14322 15006    . "81238710"
    14322 15034    . "81238710"
    14322 15064    . "81238710"
    14322 15095    . "81238710"
    14322 15126    . "81238710"
    14322 15155    . "81238710"
    14322 15187    . "81238710"
    14322 15218    . "81238710"
    14322 15246    . "81238710"
    14322 15279    . "81238710"
    14322 15309    . "81238710"
    14322 15340 2001 "81238710"
    14322 15371    . "81238710"
    14322 15399    . "81238710"
    14322 15427    . "81238710"
    14322 15460    . "81238710"
    14322 15491    . "81238710"
    14322 15519    . "81238710"
    14322 15552    . "81238710"
    end
    format %d date

  • #2
    Hello Alberto. Does the following code achieve what you want?

    Code:
    generate yyyy = year(date)
    generate   mm = month(date)
    replace fyear = yyyy - inrange(mm,1,5) if missing(fyear)
    drop yyyy mm
    --
    Bruce Weaver
    Email: bweaver@lakeheadu.ca
    Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
    Stata version: 15.1 IC (Windows)

    Comment


    • #3
      Hi Alberto,

      I wonder if you don't you want to obtain the year from your date? If so, the code below would work:

      Code:
      gen year = year(date)
      If you indeed want “fyear” to be replaced by “1998” for all observation before fyear==1998, try the code below. Please note that the last observations on your dataset won't have fyear (because there is no fyear value after them). If the last observation for fyear was not missing, then you would not have missing values. If you still have gaps with missing observations in the middle of the dataset, increase i (instead of 1/5, try 1/8 or 1/10).

      Code:
      forvalues i = 1/5{
          replace fyear = fyear[_n+`i'] if fyear==.
      }

      Comment


      • #4
        Thank you very much for the help!
        The solution you provide does not fully performs my needs. For example, for “permno” 14322 fyear is expected to be 2000 from 31jan2000 till 29dec2000, but it equals to 2000 just until 31may2000. Is it possible to change “fyear” treating it as a regular variable (without linking it to a “date”)? Basically, all the missing values are expected to be replaced by the first non-missing future value. Thank you

        Comment


        • #5
          Originally posted by Alberto Alvarez View Post
          Thank you very much for the help!
          The solution you provide does not fully performs my needs. For example, for “permno” 14322 fyear is expected to be 2000 from 31jan2000 till 29dec2000, but it equals to 2000 just until 31may2000. Is it possible to change “fyear” treating it as a regular variable (without linking it to a “date”)? Basically, all the missing values are expected to be replaced by the first non-missing future value. Thank you
          Here is a listing of the data for permno 14322.

          Code:
          . list if permno==14322, clean
          
                 permno        date   fyear      gvkey  
           42.    14322   30jun1999       .   81238710  
           43.    14322   30jul1999       .   81238710  
           44.    14322   31aug1999       .   81238710  
           45.    14322   30sep1999       .   81238710  
           46.    14322   29oct1999       .   81238710  
           47.    14322   30nov1999       .   81238710  
           48.    14322   31dec1999    1999   81238710  
           49.    14322   31jan2000       .   81238710  
           50.    14322   29feb2000       .   81238710  
           51.    14322   31mar2000       .   81238710  
           52.    14322   28apr2000       .   81238710  
           53.    14322   31may2000       .   81238710  
           54.    14322   30jun2000       .   81238710  
           55.    14322   31jul2000       .   81238710  
           56.    14322   31aug2000       .   81238710  
           57.    14322   29sep2000       .   81238710  
           58.    14322   31oct2000       .   81238710  
           59.    14322   30nov2000       .   81238710  
           60.    14322   29dec2000    2000   81238710  
           61.    14322   31jan2001       .   81238710  
           62.    14322   28feb2001       .   81238710  
           63.    14322   30mar2001       .   81238710  
           64.    14322   30apr2001       .   81238710  
           65.    14322   31may2001       .   81238710  
           66.    14322   29jun2001       .   81238710  
           67.    14322   31jul2001       .   81238710  
           68.    14322   31aug2001       .   81238710  
           69.    14322   28sep2001       .   81238710  
           70.    14322   31oct2001       .   81238710  
           71.    14322   30nov2001       .   81238710  
           72.    14322   31dec2001    2001   81238710  
           73.    14322   31jan2002       .   81238710  
           74.    14322   28feb2002       .   81238710  
           75.    14322   28mar2002       .   81238710  
           76.    14322   30apr2002       .   81238710  
           77.    14322   31may2002       .   81238710  
           78.    14322   28jun2002       .   81238710  
           79.    14322   31jul2002       .   81238710
          If missing values of fyear are to be replaced with the next non-missing value, what value are you going to plug in for observations 73 to 79?
          --
          Bruce Weaver
          Email: bweaver@lakeheadu.ca
          Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
          Stata version: 15.1 IC (Windows)

          Comment


          • #6
            If there are no non-missing forward value (within unique permno) then it should not be any change. Thank you!

            Comment


            • #7
              1. Bruce's code at #2 generates properly the fiscal year from date. For this target, consider also the 1-line code below:
              Code:
              gen fiscal_year = yofd(dofm(mofd(date)-6))
              2. However, in Alberto's issue, it seems that fyear is something different or modified from normal fiscal year, which explains why he just want it to be treated as a regular variable with no link to date. The below code is expected to do that. And as noticed by Bruce, there will be no info for observation 73-79 of the given example.
              Code:
              gen _backward=-_n
              bys permno (_backward): replace fyear= fyear[_n-1] if missing(fyear)
              gsort -_backward
              drop _backward
              Last edited by Romalpa Akzo; 14 Jun 2018, 09:53.

              Comment


              • #8
                See also mipolate from SSC. https://www.statalist.org/forums/for...-interpolation

                Comment

                Working...
                X