Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filling missing months-years with data on the basis of the last month year entry ?

    Dear all
    I have a dataset that looks like this



    ticker cusip oftic cname dilfac pdi sdates yr
    A1Z 04334810 ARZMF ARVIND MILLS 1 P 19dec2002 2002
    A1Z 04334810 ARZMF ARVIND MILLS 1 D 18aug2005 2005
    A1Z 04334810 ARZMF ARVIND LTD 1 D 15may2008 2008
    A1Z 04334810 ARZMY ARVIND LTD 1 D 18jun2009 2009
    AA 02224910 AA ALCOA 1 P 17mar1988 1988
    AA 02224910 AA ALCOA 1 D 19feb1998 1998
    AA 01381710 AA ALCOA INC. 1.006 D 14jan1999 1999
    AA 01381710 AA ALCOA INC. 1.018 D 20jan2000 2000
    AA 01381710 AA ALCOA INC. 1.006 D 18jan2001 2001
    AA 01381710 AA ALCOA INC. 1 D 17jan2002 2002
    AA 01381710 AA ALCOA INC 1 D 14mar2013 2013


    Ticker is the primary firm identifier. The sdates (start date) is the date when a firm has been followed in the database. The pdi variable shows if the firm is followed using a primary basis (P) or diluted basis (D).

    To illustrate (example):
    The start date here for A1Z firm shows that in year 2002 the firm has been followed using a P basis, in 2005 using a D basis, in 2008 a D basis and then in 2009 a D basis again.
    The years in between are assumed to follow the most recent entry, so in my example firm A1Z had a P basis in 2002, then a D basis in 2005, therefore all years in between (2003 and 2004) are on a P basis, but they are not entered !

    The problem is that these years (in between) are not in the dataset, but we know them based on this assumption. The problem is also not limited to years, some firms have a different basis during the one year (over the months) . However, same assumption: all months that are not in the data follows that last month-year in the data, until a new month-year is reported, then they follow the new month-year, and so on.

    My question,
    How to add the missing months-years based on this assumption ?
    I think the solution will be by creating months-years that fill the missing months-years, and then apply a certain criteria the satisfy my assumption here.
    Note that in my data, there are months and years, but not in my example.

    I attached a large set of my data to this post.

    Hope someone can help, please !

    Thanks a lot
    Attached Files

  • #2
    Here's one way to do this.

    Code:
    * this example was created using -dataex- (from SSC) after loading
    * forstata.dta and picking two representative ticker symbols using
    * keep if inlist(ticker, "AAC","AAHS")
    clear
    input str6 ticker str8 cusip str6 oftic str16 cname double dilfac str2 pdi long sdates float yr
    "AAC" "03237510" "AAC" "ANACOMP INC" 1.03 "D" 10303 1988 
    "AAC" "03237510" "AAC" "ANACOMP INC" 1.03 "D" 10520 1988 
    "AAC" "03237510" "AAC" "ANACOMP INC" 1.014 "D" 10758 1989 
    "AAC" "03237510" "AAC" "ANACOMP INC" 1.01 "P" 11122 1990 
    "AAC" "03237510" "AAC" "ANACOMP INC" 1.014 "P" 11458 1991 
    "AAC" "03237510" "AAC" "ANACOMP INC" 1 "P" 12403 1993 
    "AAHS" "01375010" "AAHS" "ALCO HEALTH SVCS" 1 "P" 10303 1988 
    "AAHS" "01375010" "AAHS" "ALCO HEALTH SVCS" 1.09 "D" 10331 1988 
    "AAHS" "01375010" "AAHS" "ALCO HEALTH SVCS" 1.09 "D" 10366 1988 
    "AAHS" "16875510" "AAHS" "CHILDRENS BRDCST" 1 "P" 12249 1993 
    end
    format %td sdates
    
    * convert start date to monthly date
    gen mdate = ym(year(sdates), month(sdates))
    format %tm mdate
    
    * you must decide what to do if the following is not true
    isid ticker mdate, sort
    
    * add observations to fill-in missing months
    by ticker: gen nadd = mdate[_n+1] - mdate
    expand nadd
    bysort ticker mdate: gen mdate2 = mdate + _n - 1
    format %tm mdate2
    
    list, sepby(ticker mdate) noobs

    Comment


    • #3
      I'm thinking you will want to create a monthly date from your daily date (help datetime), then declare your data as panel data with the ticker as the panel variable and monthly date as the time variable (help xtset), then use fillin to create the missing months for each panel (help fillin), and finally fill in the missing values using the time series lag operator (help tsvarlist).
      Last edited by William Lisowski; 20 Jul 2015, 09:40.

      Comment


      • #4
        Dear Robert,
        Thanks a lot. I am running the code and checking if it produces exactly what I am looking for in the entire sample.
        In the mean time, I tried to add the following code:

        gen yr=year(mdate2)
        gen mth=month(mdate2)

        , in order to generate a year and month variables from mdates2 that can be used for merging with another dataset?
        My addition produces wrong year and month. How can I handle this ?

        Comment


        • #5
          Also, my understanding is that the new date variable (mdate2) is now the date identifier that one can use when merging with another data set after filling the missing months-years? Is this correct ?
          Thanks

          Comment


          • #6
            When dealing with dates, it pays to study the help files (help dates). There's no direct function to extract the month and year from a monthly date. The solution is to convert the monthly date to a daily date and then extract the components:

            Code:
            gen yr2 = year(dofm(mdate2))
            gen mo2 = month(dofm(mdate2))
            You are correct that mdate2 uniquely identifies observation within a ticker group. You can check that using

            Code:
            isid ticker mdate2, sort
            As to what to use to merge with another dataset, that depends on how the dates are defined in it.

            Comment


            • #7
              Robert is correct, as usual, but monthly dates are not really difficult, and yield also to the first principle that the origin is always the start of 1960. Let's use Mata as a calculator and create a sandbox with 12 dates from August 2014 to July 2015.

              Code:
               
              . mata
              ------------------------------------------------- mata (type end to exit) -----
              : mdates = J(1, 12, ym(2014,8))
              
              : mdates
                       1     2     3     4     5     6     7     8     9    10    11    12
                  +-------------------------------------------------------------------------+
                1 |  655   655   655   655   655   655   655   655   655   655   655   655  |
                  +-------------------------------------------------------------------------+
              
              : mdates = mdates + (0..11)
              
              : mdates
                       1     2     3     4     5     6     7     8     9    10    11    12
                  +-------------------------------------------------------------------------+
                1 |  655   656   657   658   659   660   661   662   663   664   665   666  |
                  +-------------------------------------------------------------------------+
              
              : 1960 :+ floor(mdates/12)
                        1      2      3      4      5      6      7      8      9     10
                  +-----------------------------------------------------------------------
                1 |  2014   2014   2014   2014   2014   2015   2015   2015   2015   2015
                  +-----------------------------------------------------------------------
                       11     12
                   ---------------+
                1    2015   2015  |
                   ---------------+
              
              : 1 :+ mod(mdates, 12)
                      1    2    3    4    5    6    7    8    9   10   11   12
                  +-------------------------------------------------------------+
                1 |   8    9   10   11   12    1    2    3    4    5    6    7  |
                  +-------------------------------------------------------------+
              So, with the natural constants supplied too, year can be extracted with floor() and month with mod().
              :

              Comment


              • #8
                Dear Robert

                I investigated the data. I observed something, which I should have mentioned clearly in the beginning. Sorry !
                Your code will produce data entries for all month-years between the reported data entries for the same company, however there will be no entries after the last start date for a company. For example if a company has a start date sdate march 1990 which has already a data entry , and another following start date (when the basis changes) on Feb 1995, then the code will fill all months-years in between ONLY. But all subsequent months-years after the last sdate (i.e. after Feb 1995) will have no entries ! Indeed, when the last sdate is Feb 1995, then it means that ALL FOLLOWING MONTHS-YEARS until the end of the sample month-years will have the same data entry as of that in the last observed sdate.

                Can we amend your code to be able to keep producing month-years entries after the last observed month-year for the same firm and entries have the same reported basis as the last month-year?


                Thanks

                Comment


                • #9
                  For completeness, the sample period is 1 Jan 1988 to 31 Dec 2014, in case we need this in the code!

                  Comment


                  • #10
                    You don't say up to when. Here's the same example with a single additional command that adds observations up to Jun. 2015.

                    Code:
                    * this example was created using -dataex- (from SSC) after loading
                    * forstata.dta and picking two representative ticker symbols using
                    * keep if inlist(ticker, "AAC","AAHS")
                    clear
                    input str6 ticker str8 cusip str6 oftic str16 cname double dilfac str2 pdi long sdates float yr
                    "AAC" "03237510" "AAC" "ANACOMP INC" 1.03 "D" 10303 1988 
                    "AAC" "03237510" "AAC" "ANACOMP INC" 1.03 "D" 10520 1988 
                    "AAC" "03237510" "AAC" "ANACOMP INC" 1.014 "D" 10758 1989 
                    "AAC" "03237510" "AAC" "ANACOMP INC" 1.01 "P" 11122 1990 
                    "AAC" "03237510" "AAC" "ANACOMP INC" 1.014 "P" 11458 1991 
                    "AAC" "03237510" "AAC" "ANACOMP INC" 1 "P" 12403 1993 
                    "AAHS" "01375010" "AAHS" "ALCO HEALTH SVCS" 1 "P" 10303 1988 
                    "AAHS" "01375010" "AAHS" "ALCO HEALTH SVCS" 1.09 "D" 10331 1988 
                    "AAHS" "01375010" "AAHS" "ALCO HEALTH SVCS" 1.09 "D" 10366 1988 
                    "AAHS" "16875510" "AAHS" "CHILDRENS BRDCST" 1 "P" 12249 1993 
                    end
                    format %td sdates
                    
                    * convert start date to monthly date
                    gen mdate = ym(year(sdates), month(sdates))
                    format %tm mdate
                    
                    * you must decide what to do if the following is not true
                    isid ticker mdate, sort
                    
                    * add observation to fill-in missing months
                    by ticker: gen nadd = mdate[_n+1] - mdate
                    by ticker: replace nadd = ym(2015,07) - mdate if _n == _N
                    expand nadd
                    bysort ticker mdate: gen mdate2 = mdate + _n - 1
                    format %tm mdate2
                    
                    list, sepby(ticker mdate) noobs
                    
                    isid ticker mdate2, sort
                    
                    gen yr2 = year(dofm(mdate2))
                    gen mo2 = month(dofm(mdate2))

                    Comment

                    Working...
                    X