I have a question about my regression on monthly data of 500 companies which I defined as panel data sorted by gvkey and date. From this data I would like to run a regression on variable x, where x is the sum of three lags of the variable count. For each month I have a variable count, but the variable y is only available 4 times per year (quaterly).
The x var (qcount) is created by the following formula:
gen qcount = l.count + l2.count + l3.count
From some tests I assume that Stata excludes all observations with a missing value of x. This is indeed what I want, but what Stata also does, is excluding all observations of x if y has a missing value. As a result, qcount will not include all the lags I asked for.
reg y qcount
Example of my data:
gvkey time y x
123 1 . 12
123 2 . 16
123 3 3 10
123 4 . 14
It seems to look like Stata drops the observations at time 1,2,4 and therefore excludes the x variables as well.
Does anyone have a solution to exclude the missing y variables while not excluding the x variables in this regression?
Thanks in advance
Best, Maarten
The x var (qcount) is created by the following formula:
gen qcount = l.count + l2.count + l3.count
From some tests I assume that Stata excludes all observations with a missing value of x. This is indeed what I want, but what Stata also does, is excluding all observations of x if y has a missing value. As a result, qcount will not include all the lags I asked for.
reg y qcount
Example of my data:
gvkey time y x
123 1 . 12
123 2 . 16
123 3 3 10
123 4 . 14
It seems to look like Stata drops the observations at time 1,2,4 and therefore excludes the x variables as well.
Does anyone have a solution to exclude the missing y variables while not excluding the x variables in this regression?
Thanks in advance
Best, Maarten

Comment