Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Expand data based on a time variable

    Dear Statalisters,

    Below is my data (show the first two records). Ticker, cname, and permno identify the company and its stock issue code. I want to expand each row into month and year, that is for GFGC, I want to have add rows for dec 1985, jan 1986, feb 1986, till dec 2014. I guess I need a year and month variable. I wonder if I need to reshape the data. It is a little complicated for me to clearly outline the programming step. Could someone help?

    Startdate and enddate are long, format %d

    Data have
    input str6 ticker str32 cname permno startdate enddate
    GFGC GAS NATURAL INC 10001 31-Dec-85 31-Dec-14
    BTFG BANCTRUST FINANCIAL GROUP INC 10002 31-Dec-85 28-Feb-13
    end

    Data want

    input str6 ticker str32 cname permno year month
    GFGC GAS NATURAL INC 10001 1985 12
    GFGC GAS NATURAL INC 10001 1986 1
    GFGC GAS NATURAL INC 10001 1986 2
    ....
    GFGC GAS NATURAL INC 10001 2014 12


    Regards,
    ​Rochelle

  • #2
    EDIT: I see from the subsequent post that I misunderstood what was wanted. Nevertheless, this is easily modified to do the job.

    Code:
    . list
    
         +--------------------+
         | ticker          dd |
         |--------------------|
      1. |   GFGC   31dec1985 |
         +--------------------+
    
    . generate mm = mofd(dd)
    
    . format mm %tm
    
    . drop dd
    
    . expand ym(2014,12) - mm + 1
    (348 observations created)
    
    . bysort ticker (mm): replace mm = mm[_n-1]+1 if _n>1
    (348 real changes made)
    
    . list in f/5
    
         +------------------+
         | ticker        mm |
         |------------------|
      1. |   GFGC   1985m12 |
      2. |   GFGC    1986m1 |
      3. |   GFGC    1986m2 |
      4. |   GFGC    1986m3 |
      5. |   GFGC    1986m4 |
         +------------------+
    
    . list in -5/l
    
         +------------------+
         | ticker        mm |
         |------------------|
    345. |   GFGC    2014m8 |
    346. |   GFGC    2014m9 |
    347. |   GFGC   2014m10 |
    348. |   GFGC   2014m11 |
    349. |   GFGC   2014m12 |
         +------------------+
    
    .
    Last edited by William Lisowski; 28 Aug 2015, 14:50.

    Comment


    • #3
      Rochelle,

      Here is one possibility, using time series methods:

      Code:
      // convert both dates to month variables
      gen date1=mofd(startdate)    
      gen date2=mofd(enddate)
      drop startdate enddate
      
      reshape long date, i(permno) j(j)
      drop j
      
      tsset permno date, monthly
      tsfill 
      
      // Fill in missing data created by tsfill
      bys permno: replace ticker=ticker[_n-1] if mi(ticker)
      bys permno: replace cname=cname[_n-1] if mi(cname)
      
      // Extract year and month from date variable
      gen year=int(date/12)+1960
      gen month=mod(date,12)+1
      drop date
      Regards,
      Joe

      Comment


      • #4
        Updated ...
        Code:
        . expand mofd(enddate) - mofd(startdate) + 1
        (674 observations created)
        
        . generate month = mofd(startdate)
        
        . format month %tm
        
        . bysort permno (month): replace month = month[_n-1]+1   if _n>1
        (674 real changes made)
        
        . list in 1/5, clean
        
               ticker             cname   permno   startdate     enddate     month  
          1.     GFGC   GAS NATURAL INC    10001   31dec1985   31dec2014   1985m12  
          2.     GFGC   GAS NATURAL INC    10001   31dec1985   31dec2014    1986m1  
          3.     GFGC   GAS NATURAL INC    10001   31dec1985   31dec2014    1986m2  
          4.     GFGC   GAS NATURAL INC    10001   31dec1985   31dec2014    1986m3  
          5.     GFGC   GAS NATURAL INC    10001   31dec1985   31dec2014    1986m4  
        
        . list in 345/354, clean
        
               ticker                           cname   permno   startdate     enddate     month  
        345.     GFGC                 GAS NATURAL INC    10001   31dec1985   31dec2014    2014m8  
        346.     GFGC                 GAS NATURAL INC    10001   31dec1985   31dec2014    2014m9  
        347.     GFGC                 GAS NATURAL INC    10001   31dec1985   31dec2014   2014m10  
        348.     GFGC                 GAS NATURAL INC    10001   31dec1985   31dec2014   2014m11  
        349.     GFGC                 GAS NATURAL INC    10001   31dec1985   31dec2014   2014m12  
        350.     BTFG   BANCTRUST FINANCIAL GROUP INC    10002   31dec1985   28feb2013   1985m12  
        351.     BTFG   BANCTRUST FINANCIAL GROUP INC    10002   31dec1985   28feb2013    1986m1  
        352.     BTFG   BANCTRUST FINANCIAL GROUP INC    10002   31dec1985   28feb2013    1986m2  
        353.     BTFG   BANCTRUST FINANCIAL GROUP INC    10002   31dec1985   28feb2013    1986m3  
        354.     BTFG   BANCTRUST FINANCIAL GROUP INC    10002   31dec1985   28feb2013    1986m4  
        
        . list in -5/l, clean
        
               ticker                           cname   permno   startdate     enddate     month  
        672.     BTFG   BANCTRUST FINANCIAL GROUP INC    10002   31dec1985   28feb2013   2012m10  
        673.     BTFG   BANCTRUST FINANCIAL GROUP INC    10002   31dec1985   28feb2013   2012m11  
        674.     BTFG   BANCTRUST FINANCIAL GROUP INC    10002   31dec1985   28feb2013   2012m12  
        675.     BTFG   BANCTRUST FINANCIAL GROUP INC    10002   31dec1985   28feb2013    2013m1  
        676.     BTFG   BANCTRUST FINANCIAL GROUP INC    10002   31dec1985   28feb2013    2013m2  
        
        .

        Comment


        • #5
          Thank you both very much !!! I have a couple of questions.

          @Joe, in my #1 post, I copy and past the first two rows from my real data, which has 18,000 rows. Ticker is firm id, permno is stock id, some firms have multiple stock issuance. I apologize for not mentioning this . I guess this led to the error

          reshape long date, i(permno) j(j)
          (note: j = 1 2)
          variable id does not uniquely identify the observations
          Your data are currently wide. You are performing a reshape long. You specified i(permno) and
          j(j). In the current wide form, variable permno should uniquely identify the observations.
          Remember this picture:

          long wide
          +---------------+ +------------------+
          | i j a b | | i a1 a2 b1 b2 |
          |---------------| <--- reshape ---> |------------------|
          | 1 1 1 2 | | 1 1 3 2 4 |
          | 1 2 3 4 | | 2 5 7 6 8 |
          | 2 1 5 6 | +------------------+
          | 2 2 7 8 |
          +---------------+
          Type reshape error for a list of the problem observations.
          r(9);



          @William: this might be a naive question. in my #1 post, I copy and past the first two rows from my real data. My question to you is, when you coded, how did you input my data, I often get confused by date/time variable. How did you input
          the startdate , did you copy my number into excel then import into stata?

          the follwoing will not import the date correctly, could you point out my error.

          input str6 ticker str32 cname permno startdate enddate
          GFGC GAS NATURAL INC 10001 31-Dec-85 31-Dec-14
          BTFG BANCTRUST FINANCIAL GROUP INC 10002 31-Dec-85 28-Feb-13
          end


          Regards,
          Rochelle

          Comment


          • #6
            Rochelle - Two important points.
            • Using input, to read strings that contain spaces, the strings needs to be surrounded in quotation marks.
            • You need to read the dates as strings (Stata calls this HRF for Human-Readable Format) and then convert them to SIF (Stata Internal Format) dates, and then assign the SIF dates an appropriate format (I chose the simplest).
            Code:
            . input str6 ticker str32 cname permno str20 (s_sd s_ed)
            
                    ticker    cname     permno   s_sd    s_ed
              1. GFGC "GAS NATURAL INC" 10001 31-Dec-85 31-Dec-14
              2. BTFG "BANCTRUST FINANCIAL GROUP INC" 10002 31-Dec-85 28-Feb-13
              3. end
            
            . generate startdate = date(s_sd,"DMY",2020)
            
            . generate enddate   = date(s_ed,"DMY",2020)
            
            . format startdate enddate %td
            
            . drop s_sd s_ed
            
            . list, clean noobs
            
                ticker                           cname   permno   startdate     enddate  
                  GFGC                 GAS NATURAL INC    10001   31dec1985   31dec2014  
                  BTFG   BANCTRUST FINANCIAL GROUP INC    10002   31dec1985   28feb2013  
            
            . describe startdate enddate
            
                          storage   display    value
            variable name   type    format     label      variable label
            ------------------------------------------------------------------------------------------------
            startdate       float   %td                   
            enddate         float   %td                   
            
            .
            If you have not already done so, it would benefit you to review the suggested readings in Chapter 18 of the [GS] Getting Started with Stata PDF, specifically [U] 24 Working with dates and times. (Actually, reading [U] Stata User's Guide PDF from beginning to end is recommended.)

            Comment


            • #7
              Thank you William very much !!!

              Your example made things crystal clear .

              Regards,
              Rochelle

              Comment


              • #8
                I attached a dataset similar to the structure of my post #1 data ,except this time, the firm identifier is gvkey and stock identifier is lpermno. I want to expand the records into monthly observations as described in my post #1 as well. e.g. for gvkey 013007, its effective date range is 07jan1986 to 30jun1987, i want to create monthly records for jan , feb, march, ... 1986 and end June 1987.

                my real data has over 22,000 observation, these two date variables are stored long format %d.

                I was able to use William 's code successfully for my first dataset (see post #4). however, when I apply the same coding to the current dataset, I got

                expand mofd(linkdt) - mofd(linkenddt) + 1
                (21728 negative counts ignored; observations not deleted)
                (78 zero counts ignored; observations not deleted)
                (0 observations created)



                it looks to me the date variables linkdt linkenddt are stored the same as my post #1 data. the only difference is some rows may have a missing value for linkenddt, which means the link is still effective till today.

                My questions:
                1. why my expand did not work as shown above.

                2. do I need to replace linkenddt when it is missing with a value for expand to work?
                Attached Files

                Comment


                • #9
                  1. In this case you have subtracted the (larger) end date from the (smaller) starting date, giving a negative number of observations.

                  2. Expand will not fail even observations have a date missing, but also it will not create any additional observations for observations with a date missing. If that is not what you want you will need to replace the missing linkenddt with a date that will cause expand to create what you expect.

                  Comment


                  • #10
                    Thanks again William!


                    I see my error now . I did not know how to replace missing with a current date after reading the manual. I made a new request on the forum.

                    Best,
                    ​Rochelle

                    Comment


                    • #11
                      Rochelle,

                      Sorry, I did not notice your additional questions until now. In case it is still relevant, the reason I chose to use permno was because tsset requires that the panel variable be numeric and using permno was the easiest solution. However, if you need to use ticker you can use encode to convert ticker to a numeric variable.

                      Regards,
                      Joe

                      Comment

                      Working...
                      X