xtunitroot dfuller panel error

Barry Smith

Join Date: Aug 2023

Posts: 5
#1

xtunitroot dfuller panel error

25 Aug 2023, 10:27

Good Afternoon Stata Community,

What is the optimum unit root procedure for a panel dataset built from 33 surveys on 27 unique dates across 15 months? Each survey includes between 491-1,502 respondents for a total of 26,965 observations. When specified as a time series data, STATA describes the panel variable as "weakly balanced." I am using STATA 17.0 and the dataset is attached to this post.

Commands - Generate Time Variable & Declare Time Series:
gen edate = mdy(month,day,year)
format %tdMonth_DD,_CCYY edate
xtset order edate, daily
STATA Response:
Panel variable: order (weakly balanced)
Time variable: edate, December 09, 1992 to March 27, 1994
Delta: 1 day
Commands - dfuller Unit Root Test on "pres" Independent Variable:
xtunitroot fisher pres, dfuller lags(1)
xtunitroot fisher pres, dfuller lags(8)
STATA Response:
performing unit-root test on first panel using the syntax
dfuller pres, lags(1)
returned error code 2000
r(2000)
I scoured the STATA manuals and internet forums for two months and tried dozens of alternative unit root approaches, all to no avail. My leading assumptions at this point is that I am either (1) making an error in structuring variables for the declaration command or (2) there is an elegant unit root procedure that can handle this dataset for which I am just not aware.

Very Respectfully,
Barry Smith
Attached Files

Unit Root DS.dta (507.7 KB, 1 view)
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10275
#2

25 Aug 2023, 13:42

Code:

search r(2000)

[P] error . . . . . . . . . . . . . . . . . . . . . . . . Return code 2000
no observations
You have requested some statistical calculation and there are
no observations on which to perform it. Perhaps you specified
if or in and inadvertently filtered all the data.

The error indicates no observations as shown above. You claim to have monthly data yet your time variable is in units of days. A lag is the previous day. The xtset command indicates your panel is weakly balanced, so you may not have consecutive days present for any panel. Look at your time settings again.
Comment
Barry Smith

Join Date: Aug 2023

Posts: 5
#3

28 Aug 2023, 06:51

Good Morning Andrew - Thank you for the quick response! I absolutely believe that the manner in which I structure the data and declare it time series is a culprit here, but I am not following your guidance on the influence of the month series. If my time series declaration command is a "claim to have monthly data," that is inadvertent. My intent in generating an edate is to utilize the series "month" only in the context of producing a unique identifier that is a combination of the Year-Month-Day on which respondents participated in a survey. The unit of analysis is intended to be respondent-day (i.e. "order" + "edate").

I want the lag to look at previous day(s). Analyzing 33 surveys on 27 unique dates across 15 months at the respondent-day unit of analysis is going to have gaps. Please note, the data is not longitudinal. Each survey is a different set of respondents that are nationally representative sample, but not tracking specific respondents overtime. Intent is to instruct STATA to consider those gaps between surveys as unchanged value. In other words, if survey respondents record their approval of the president on 11 February 1993, no respondents from 12-13 February, and then a new panel of survey respondents record presidential approval on 14 February, I am asking STATA to assume those gaps are unchanged approval of the president from 11 February until 14 February. My understanding is that Augmented Dickey-Fuller should be able to handle those gaps in that way. The observations are there, just not consecutively. I experimented with the tsfill command to see if STATA would then read the dataset as 15 months of consecutive daily data with 27 changes in presidential approval over that time, but structuring the data that way returned the same error message following a command for the unit root test.

Commands:
gen edate = mdy(month,day,year)
format %tdMonth_DD,_CCYY edate
xtset order edate, daily
tsfill
xtunitroot fisher pres, dfuller lags(1)
STATA Response:
performing unit-root test on first panel using the syntax
dfuller pres, lags(1)
returned error code 2000
r(2000);
With that further explanation of intent, do you still see an error in the way I structure the dataset and declare it as time series? What is the optimum procedure for a unit root test on this sort of data?

Very Respectfully,
Barry
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10275
#4

28 Aug 2023, 08:23

The command needs consecutive time values for a lag of 1 to make sense. If you say that the interview dates do not coincide across panels but regardless, you want the order of interviews to define time, then just create such a time variable. Note that in

Code:

xtset x y

\(x\) refers to the panel identifier and \(y\) to the time variable. So it makes no sense to define the order as the panel identifier.

Code:

bysort id (edate): gen time=_n xtset id time xtunitroot fisher pres, dfuller lags(1)
Comment
Barry Smith

Join Date: Aug 2023

Posts: 5
#5

30 Aug 2023, 13:18

Hello Andrew - Now I see the way you are approaching the designation of time, as a sequential timestamp rather than a calendar date, and commanding STATA to read respondents in that order. However, the resulting data looks a lot like organizing longitudinal structure and it reduces the number of surveys/dates.

Point 1: Longitudinal Structure
The way the "time" variable is generated results in a time value for every id within each survey. In other words, id = "1" in the first survey is designated as timestamp "1" and id = "1" in the second survey is designated as timestamp "2" and so on. It appears to be an instruction for STATA to read id "1" as the same individual recording an observable response to each survey across time. When the new variable "time" is used in declaring the data time series, doesn't this instruct STATA to read the data as longitudinal?

Point 2: Reduced Number of Surveys
There are 27 dates on which surveys are taken in this dataset. After the "time" variable is generated, the number of "time" drops to 22 occurrences. If the dataset is sorted based on the new "time" variable, there are several dates ("edate") represented in each "time." If the new variable "time" is a sequential timestamp rather than a calendar date, there should be only one date ("edate") value for each. How is STATA to read a time series declaration with multiple calendar dates in each block of time as a sequential series of surveys conducted overtime?

Very Respectfully,
Barry
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10275
#6

01 Sep 2023, 14:33

I do not understand your data or problem, but the use of the xtset command suggests that you are working with panel data. At this point, I recommend that you speak to your supervisor or a colleague who has knowledge of your data so that you can better understand its structure. If most observations in the dataset are singletons and only a few have repeated observations, it may not make sense to analyze the data as a panel, and you may need to revert to a pooled analysis. However, these issues seem to be beyond what can reasonably be addressed in a public forum like Statalist.

Last edited by Andrew Musau; 01 Sep 2023, 14:37.
Comment
Barry Smith

Join Date: Aug 2023

Posts: 5
#7

02 Sep 2023, 06:44

Thank you, Andrew. If I come up with a solution, I will post it here for anyone who encounters a similar problem.

v/r,
Barry
Comment

Announcement

xtunitroot dfuller panel error

Comment

Comment

Comment

Comment

Comment

Comment