Hello. I am currently working with a panel dataset consisting of mutual fund numbers (ID's), a date variable and various market variables. I have xtset fund IDs ('crsp_fundno') and a yyyymm date variable ('date'), which was originally a ddmmyyyy variable. It is formatted as %tm and type=long.
I want to run a regression with Newey-West standard errors and a lag of 4 for each fund ID. I have tried doing this by using the Statsby command:
statsby _b e(t), by(crsp_fundno) saving(myresults) : newey excess_ret mktrf_pct smb_pct hml_pct mom_pct, lag(4)
However, Stata returns:
date is not regularly spaced
an error occurred when statsby executed newey
To try and check for missing values in the date variable, I ran the command: mdesc date. This returned a 'missing' of 0, hence there are no missing values in the variable.
I then tried to check for gaps between date values by running the following:
gen gap=1 if date-l.date!=1 & crsp_fundno==l.crsp_fundno
My hope is that it generates a 1 in my gap variable if the current date is not 1 month greater than the former date, and that it accounts for changes in fund ID in the panel, where the date of course changes. An example of my data is found below. The date variable spans from 2000m12 to 2020m6. Number of observations varies by ID.
How can I figure out whether there are any gaps in my date variable? And if there are gaps, what alternatives do I have to doing the regression for each ID using newey?
date crsp_fundno
2020m1 3211
2020m2 3211
2020m3 3211
2020m4 3211
2020m5 3211
2020m6 3211
2000m12 3222
2001m1 3222
2001m2 3222
2001m3 3222
2001m4 3222
2001m5 3222
Update:
I tried running the following command (distinct is by Nick Cox I believe, available in SSC):
distinct crsp_fundno if date-l.date!=1
Stata returned:
-----------------------------------
| total distinct
-------------+--------------------
crsp_fundno | 585 571
----------------------------------
How do I interpret this?
I want to run a regression with Newey-West standard errors and a lag of 4 for each fund ID. I have tried doing this by using the Statsby command:
statsby _b e(t), by(crsp_fundno) saving(myresults) : newey excess_ret mktrf_pct smb_pct hml_pct mom_pct, lag(4)
However, Stata returns:
date is not regularly spaced
an error occurred when statsby executed newey
To try and check for missing values in the date variable, I ran the command: mdesc date. This returned a 'missing' of 0, hence there are no missing values in the variable.
I then tried to check for gaps between date values by running the following:
gen gap=1 if date-l.date!=1 & crsp_fundno==l.crsp_fundno
My hope is that it generates a 1 in my gap variable if the current date is not 1 month greater than the former date, and that it accounts for changes in fund ID in the panel, where the date of course changes. An example of my data is found below. The date variable spans from 2000m12 to 2020m6. Number of observations varies by ID.
How can I figure out whether there are any gaps in my date variable? And if there are gaps, what alternatives do I have to doing the regression for each ID using newey?
date crsp_fundno
2020m1 3211
2020m2 3211
2020m3 3211
2020m4 3211
2020m5 3211
2020m6 3211
2000m12 3222
2001m1 3222
2001m2 3222
2001m3 3222
2001m4 3222
2001m5 3222
Update:
I tried running the following command (distinct is by Nick Cox I believe, available in SSC):
distinct crsp_fundno if date-l.date!=1
Stata returned:
-----------------------------------
| total distinct
-------------+--------------------
crsp_fundno | 585 571
----------------------------------
How do I interpret this?
Comment