Newey in Statsby returns 'Date is not regularly spaced'

Aleksander Lind

Join Date: Apr 2021

Posts: 7
#1

Newey in Statsby returns 'Date is not regularly spaced'

28 Apr 2021, 07:09

Hello. I am currently working with a panel dataset consisting of mutual fund numbers (ID's), a date variable and various market variables. I have xtset fund IDs ('crsp_fundno') and a yyyymm date variable ('date'), which was originally a ddmmyyyy variable. It is formatted as %tm and type=long.

I want to run a regression with Newey-West standard errors and a lag of 4 for each fund ID. I have tried doing this by using the Statsby command:

statsby _b e(t), by(crsp_fundno) saving(myresults) : newey excess_ret mktrf_pct smb_pct hml_pct mom_pct, lag(4)

However, Stata returns:
date is not regularly spaced
an error occurred when statsby executed newey

To try and check for missing values in the date variable, I ran the command: mdesc date. This returned a 'missing' of 0, hence there are no missing values in the variable.
I then tried to check for gaps between date values by running the following:

gen gap=1 if date-l.date!=1 & crsp_fundno==l.crsp_fundno

My hope is that it generates a 1 in my gap variable if the current date is not 1 month greater than the former date, and that it accounts for changes in fund ID in the panel, where the date of course changes. An example of my data is found below. The date variable spans from 2000m12 to 2020m6. Number of observations varies by ID.

How can I figure out whether there are any gaps in my date variable? And if there are gaps, what alternatives do I have to doing the regression for each ID using newey?

date crsp_fundno
2020m1 3211
2020m2 3211
2020m3 3211
2020m4 3211
2020m5 3211
2020m6 3211
2000m12 3222
2001m1 3222
2001m2 3222
2001m3 3222
2001m4 3222
2001m5 3222

Update:
I tried running the following command (distinct is by Nick Cox I believe, available in SSC):
distinct crsp_fundno if date-l.date!=1

Stata returned:
-----------------------------------
| total distinct
-------------+--------------------
crsp_fundno | 585 571
----------------------------------

How do I interpret this?

Last edited by Aleksander Lind; 28 Apr 2021, 07:45.
Tags: data, panel, panel data, regression

William Lisowski

Join Date: Dec 2014
Posts: 10150

28 Apr 2021, 08:01

The xtset command will have told you if there were gaps in your data. Consider the following example based on your data, which I present using the output of the dataex command, having had to beat it into shape to reproduce what you describe.

Code:

. * Example generated by -dataex-. For more info, type help dataex
. clear

. input float date int crsp_fundno

          date  crsp_f~o
  1. 720 3211
  2. 721 3211
  3. 722 3211
  4. 723 3211
  5. 724 3211
  6. 725 3211
  7. 491 3222
  8. 492 3222
  9. 493 3222
 10. 494 3222
 11. 495 3222
 12. 496 3222
 13. end

. format %tm date

. 
. xtset crsp_fundno date
       panel variable:  crsp_fundno (weakly balanced)
        time variable:  date, 2000m12 to 2020m6
                delta:  1 month

. drop if inlist(_n,2,9)
(2 observations deleted)

. xtset crsp_fundno date
       panel variable:  crsp_fundno (weakly balanced)
        time variable:  date, 2000m12 to 2020m6, but with gaps
                delta:  1 month

.

Comment

Aleksander Lind

Join Date: Apr 2021

Posts: 7
#3

28 Apr 2021, 08:23

Hi William Lisowski. Thank you for the response. Stata did tell me that there are gaps in my date variable when I xtset. I forgot to mention this.Thank you for taking the time to create an example. How do you recommend dealing with this?

Last edited by Aleksander Lind; 28 Apr 2021, 08:41.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

28 Apr 2021, 08:58

My expertise in ecometrics ends at xtset. It seems to me you have to omit the panels with gaps from your analysis. Here's some technique, starting with the same example data.

Code:

. drop in 3
(1 observation deleted)

. xtset crsp_fundno date
       panel variable:  crsp_fundno (unbalanced)
        time variable:  date, 2000m12 to 2020m6, but with a gap
                delta:  1 month

. preserve

. collapse (count) N=date (firstnm) date1=date (lastnm) date2=date, by(crsp_fundno)

. format N %9.0f

. generate N_missing = (date2-date1+1) - N

. list, abbreviate(20)

     +------------------------------------------------+
     | crsp_fundno   N     date1    date2   N_missing |
     |------------------------------------------------|
  1. |        3211   5    2020m1   2020m6           1 |
  2. |        3222   6   2000m12   2001m5           0 |
     +------------------------------------------------+

. restore

. bysort crsp_fundno (date): drop if date[_N]-date[1]+1!=_N
(5 observations deleted)

. tab crsp_fundno

crsp_fundno |      Freq.     Percent        Cum.
------------+-----------------------------------
       3222 |          6      100.00      100.00
------------+-----------------------------------
      Total |          6      100.00

.

Comment

Aleksander Lind

Join Date: Apr 2021

Posts: 7
#5

28 Apr 2021, 10:00

This worked brilliantly. I identified two funds with a lot of missing observations and with gaps in the date variable. When i xtset now, Stata returns:

Code:

xtset crsp_fundno date panel variable: crsp_fundno (unbalanced) time variable: date, 2000m12 to 2020m6 delta: 1 month

I do have an unbalanced panel dataset, since the number of observations vary by ID. However, when I run:

Code:

statsby _b e(t), by(crsp_fundno) saving(myresults) : newey excess_ret mktrf_pct smb_pct hml_pct mom_pct, lag(4)

Stata still returns the following:
date is not regularly spaced
an error occurred when statsby executed newey.

Can there possibly be any gaps, having run the code you posted?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

28 Apr 2021, 12:05

Do you have missing values of your model's variables as well? If any of your model's variables are missing, newey will exclude the observation and then you again have a gap.

Code:

egen V_missing = rowmiss(excess_ret mktrf_pct smb_pct hml_pct mom_pct) tab crsp_fundno V_missing if V_missing>0
Comment
Aleksander Lind

Join Date: Apr 2021

Posts: 7
#7

28 Apr 2021, 13:12

After running your code, I can see that I did indeed have missing values in the model's variables. These were all, luckily, the last value of the panel in each fund, so I tried dropping them

Code:

bysort crsp_fundno (date): drop if V_missing>0

The date variable remained equally spaced. Thank you very much for your help.
Comment

Announcement