Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Newey in Statsby returns 'Date is not regularly spaced'

    Hello. I am currently working with a panel dataset consisting of mutual fund numbers (ID's), a date variable and various market variables. I have xtset fund IDs ('crsp_fundno') and a yyyymm date variable ('date'), which was originally a ddmmyyyy variable. It is formatted as %tm and type=long.

    I want to run a regression with Newey-West standard errors and a lag of 4 for each fund ID. I have tried doing this by using the Statsby command:

    statsby _b e(t), by(crsp_fundno) saving(myresults) : newey excess_ret mktrf_pct smb_pct hml_pct mom_pct, lag(4)

    However, Stata returns:
    date is not regularly spaced
    an error occurred when statsby executed newey


    To try and check for missing values in the date variable, I ran the command: mdesc date. This returned a 'missing' of 0, hence there are no missing values in the variable.
    I then tried to check for gaps between date values by running the following:


    gen gap=1 if date-l.date!=1 & crsp_fundno==l.crsp_fundno

    My hope is that it generates a 1 in my gap variable if the current date is not 1 month greater than the former date, and that it accounts for changes in fund ID in the panel, where the date of course changes. An example of my data is found below. The date variable spans from 2000m12 to 2020m6. Number of observations varies by ID.

    How can I figure out whether there are any gaps in my date variable? And if there are gaps, what alternatives do I have to doing the regression for each ID using newey?


    date crsp_fundno
    2020m1 3211
    2020m2 3211
    2020m3 3211
    2020m4 3211
    2020m5 3211
    2020m6 3211
    2000m12 3222
    2001m1 3222
    2001m2 3222
    2001m3 3222
    2001m4 3222
    2001m5 3222

    Update:
    I tried running the following command (
    distinct is by Nick Cox I believe, available in SSC):
    distinct crsp_fundno if date-l.date!=1

    Stata returned:

    -----------------------------------
    | total distinct
    -------------+--------------------
    crsp_fundno | 585 571
    ----------------------------------


    How do I interpret this?
    Last edited by Aleksander Lind; 28 Apr 2021, 07:45.

  • #2
    The xtset command will have told you if there were gaps in your data. Consider the following example based on your data, which I present using the output of the dataex command, having had to beat it into shape to reproduce what you describe.
    Code:
    . * Example generated by -dataex-. For more info, type help dataex
    . clear
    
    . input float date int crsp_fundno
    
              date  crsp_f~o
      1. 720 3211
      2. 721 3211
      3. 722 3211
      4. 723 3211
      5. 724 3211
      6. 725 3211
      7. 491 3222
      8. 492 3222
      9. 493 3222
     10. 494 3222
     11. 495 3222
     12. 496 3222
     13. end
    
    . format %tm date
    
    . 
    . xtset crsp_fundno date
           panel variable:  crsp_fundno (weakly balanced)
            time variable:  date, 2000m12 to 2020m6
                    delta:  1 month
    
    . drop if inlist(_n,2,9)
    (2 observations deleted)
    
    . xtset crsp_fundno date
           panel variable:  crsp_fundno (weakly balanced)
            time variable:  date, 2000m12 to 2020m6, but with gaps
                    delta:  1 month
    
    .

    Comment


    • #3
      Hi William Lisowski. Thank you for the response. Stata did tell me that there are gaps in my date variable when I xtset. I forgot to mention this.Thank you for taking the time to create an example. How do you recommend dealing with this?
      Last edited by Aleksander Lind; 28 Apr 2021, 08:41.

      Comment


      • #4
        My expertise in ecometrics ends at xtset. It seems to me you have to omit the panels with gaps from your analysis. Here's some technique, starting with the same example data.

        Code:
        . drop in 3
        (1 observation deleted)
        
        . xtset crsp_fundno date
               panel variable:  crsp_fundno (unbalanced)
                time variable:  date, 2000m12 to 2020m6, but with a gap
                        delta:  1 month
        
        . preserve
        
        . collapse (count) N=date (firstnm) date1=date (lastnm) date2=date, by(crsp_fundno)
        
        . format N %9.0f
        
        . generate N_missing = (date2-date1+1) - N
        
        . list, abbreviate(20)
        
             +------------------------------------------------+
             | crsp_fundno   N     date1    date2   N_missing |
             |------------------------------------------------|
          1. |        3211   5    2020m1   2020m6           1 |
          2. |        3222   6   2000m12   2001m5           0 |
             +------------------------------------------------+
        
        . restore
        
        . bysort crsp_fundno (date): drop if date[_N]-date[1]+1!=_N
        (5 observations deleted)
        
        . tab crsp_fundno
        
        crsp_fundno |      Freq.     Percent        Cum.
        ------------+-----------------------------------
               3222 |          6      100.00      100.00
        ------------+-----------------------------------
              Total |          6      100.00
        
        .

        Comment


        • #5
          This worked brilliantly. I identified two funds with a lot of missing observations and with gaps in the date variable. When i xtset now, Stata returns:
          Code:
          xtset crsp_fundno date
                 panel variable:  crsp_fundno (unbalanced)
                  time variable:  date, 2000m12 to 2020m6
                          delta:  1 month
          I do have an unbalanced panel dataset, since the number of observations vary by ID. However, when I run:

          Code:
          statsby _b e(t), by(crsp_fundno) saving(myresults) : newey excess_ret mktrf_pct smb_pct hml_pct mom_pct, lag(4)
          Stata still returns the following:
          date is not regularly spaced
          an error occurred when statsby executed newey.


          Can there possibly be any gaps, having run the code you posted?

          Comment


          • #6
            Do you have missing values of your model's variables as well? If any of your model's variables are missing, newey will exclude the observation and then you again have a gap.
            Code:
            egen V_missing = rowmiss(excess_ret mktrf_pct smb_pct hml_pct mom_pct)
            tab crsp_fundno V_missing if V_missing>0

            Comment


            • #7
              After running your code, I can see that I did indeed have missing values in the model's variables. These were all, luckily, the last value of the panel in each fund, so I tried dropping them
              Code:
              bysort crsp_fundno (date): drop if V_missing>0
              The date variable remained equally spaced. Thank you very much for your help.

              Comment

              Working...
              X