trend and time

Yunjeong Kwon

Join Date: Dec 2019

Posts: 18
#1

trend and time

26 Dec 2019, 20:54

I want regress difference model in panel data.
using two year data, for example, I want to estimate the effect of independent variable with trend.
but the problem is that when I use the gen time=_n, the time order is mixed up every time, so the result became different whenever I analyze.
The other problem is that if I made the time variable , for example 1, 2 for every id, because of colinearity, it came out zero
How can I handle that?

Attached Files

175% 미만 - 복사본.xlsx (38.6 KB, 1 view)
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#2

26 Dec 2019, 21:16

The problem is with your command -by occ, sort: gen time = _n-. The variable occ does not determine a unique sort order of the data, because there can be multiple observations with the same value of occ. In Stata, when you -sort- the data on a variable or list of variables that do not uniquely identify the observations, the observations that are duplicates on the sort key variables are sorted into random order, irreproducibly. That's why you're getting different results every time. So you either need another variable (or several other variables) that, together with occ uniquely identify the data , or you need to tell Stata not to randomize the sorting but to retain the existing order. It seems that you do not have any other variables that, with occ, uniquely identify the observations, such as a date. It seems that, instead, you want to take the order in which the data originally appear as the correct time order. If that is correct, you can do that with:

Code:

sort occ, stable by occ: gen time = _n

Note that you must specify the -stable- option in the -sort- command and you must not specify a -sort- option in the -by occ:...- command.

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

In the future when showing Stata output, copy the output directly from your Results window or log file to the clipboard and then paste them into the forum editor in between code delimiters. If you are unfamiliar with code delimiters, please read forum FAQ #12 for instructions.
Comment
Yunjeong Kwon

Join Date: Dec 2019

Posts: 18
#3

27 Dec 2019, 00:35

Thank you very much, sir!

I'm not sure the way I paste the result here is the same way you mentioned, but I left the result here.
I faced another issue of collinearity. As you see, my model is simple difference model of panel data with trend, but in this time, the result of trend is omitted in every time period. I analyzed difference model for four time period based 2014; so the time gap is 1,2,3,4 to 2015, 2016, 2017, 2018 respectively, I used the time gap as trend variable.
Is it fail to building model appropriately or coding?
Thank you for taking your time and answer in advance.

drop in 39/40
(2 observations deleted)

. encode occupation, gen(occ)

sort occ, stable

. by occ: gen time=_n

. tsset occ time
panel variable: occ (strongly balanced)
time variable: time, 1 to 2
delta: 1 unit

. gen lemp=log(근로자수)

. reg D.lemp trend D.lnMWRIIWMW

note: trend omitted because of collinearity

Source | SS df MS Number of obs = 19
-------------+---------------------------------- F(1, 17) = 0.41
Model | .016417834 1 .016417834 Prob > F = 0.5315
Residual | .684092334 17 .040240726 R-squared = 0.0234
-------------+---------------------------------- Adj R-squared = -0.0340
Total | .700510168 18 .038917232 Root MSE = .2006

------------------------------------------------------------------------------
D.lemp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
trend | 0 (omitted)
|
lnMWRIIWMW |
D1. | -.0059717 .0093492 -0.64 0.532 -.0256969 .0137534
|
_cons | .083711 .0534071 1.57 0.135 -.0289683 .1963902
------------------------------------------------------------------------------

Last edited by Yunjeong Kwon; 27 Dec 2019, 00:46.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#4

27 Dec 2019, 11:13

As you have only two time periods per occ, the differences are defined only for one observation per occ. When time = 1, D.temp and D.lnMWRIIWMW will both be missing, so the observation will be omitted from the analysis. Only time = 2 observations are included in your analysis. Because your example data is shown as a screen shot, which is not helpful, I cannot explore your data in Stata, but a quick visual inspection seems to suggest that whenever time = 2, trend = 1, which would make trend a constant--hence its omission due to collinearity (with the _cons) term of the model. You can check this yourself easily by running your regression and then following it with:

Code:

tab trend if e(sample)
Comment
Yunjeong Kwon

Join Date: Dec 2019

Posts: 18
#5

27 Dec 2019, 21:10

Thank you very much sir

Your advice is always helpful.

I'm not good at using the statalist tool, but next time, I will try to show my data in stata.

Have a good day!
Comment

Announcement

Comment

Comment

Comment

Comment