xtset and panel data

Eugene Lacoste

Join Date: Jul 2020

Posts: 24
#1

xtset and panel data

28 Jul 2020, 13:28

Dear all,

I have the following dataset. My dataset is way longer (40,000+ observations) and this is just an abstract.
ID is the number of the firm.
subsidiaries are the subsidiaries for each firm.
year is the variable year from 1995-2000.
Country is the country in which we have the subsidiaries and finally the two last columns are the sale per subsidiaries ("sale") and the total sale per firm (mean of sale group by id).
CAN = subsidiairies within Canada (1= Y and 0 = N)
Above = 1 if year >1998 and 0 if year < 1998

Code:

input double id str103 subsidiaries float year str2 Country float(sale sale2 CAN Above) 1 Firm Kiwi 1995 "CAN" 2000 2000 1 0 2 Firm Strawberry 1996 "CAN" 3000 4500 1 0 2 Firm Strawberry 1996 "CAN" 1500 4500 1 0 2 Firm Strawberry 1997 "CAN" 800 1300 1 0 2 Firm Strawberry 1997 "CAN" 500 1300 1 0 2 Firm Strawberry 1998 "CAN" 200 300 1 1 2 Firm Strawberry 1998 "CAN" 100 300 1 1 3 Firm Apple 1996 "JAP" 450 820 0 0 3 Firm Banana 1996 "NET" 370 820 53620 0 0 3 Firm Apple 1997 "JAP" 4000 7000 0 0 3 Firm Banana 1997 "NET" 3000 7000 0 0 3 Firm Banana 1998 "NET" 200 250 0 1 3 Firm Apple 1998 "JAP" 50 250 0 1 3 Firm Banana 1999 "NET" 60 130 0 1 3 Firm Apple 1999 "JAP" 70 130 0 1 3 Firm Banana 2000 "NET" 2000 2000 0 1

I am trying to use the xtset function which does not work given that I have repeated values in the dataset. I would like to test the effect of two other variables: CAN (1 = within Canada for the subsidiaries and 0 = not in Canada) and Above (1 = above 1998 and 0 below 1998)

As you can see, a firm can have multiple other firms (variables subsidiaries) and I grouped them as follows:

Code:

egen company_id = group(id)

However, the error message is still there: repeated time values within panel
r(451);

What should I do? I used also the function which led to a deletion of more than 12,000 observations.

Code:

drop if year==year[_n-1]

Thank you in advance if you can help me,
Eugene
Tags: xtset
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17700
#2

29 Jul 2020, 00:16

Eugene:
if you do not plan to use time-series related commands, such as lags and leads, you can simply -xtset- your data with -panelid- only.
Hence the whole issue boils down to find a feasible panel indicator.

Kind regards,
Carlo
(Stata 19.0)
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#3

29 Jul 2020, 00:57

And to expand on Carlo's suggestion, you might not need to -xtset- your data at all, you can in some cases directly deal with including dummy variables for the fixed effects.

Also from your explanation it is not clear at all what you tried to -xtset- and what error messages you got back from Stata. Can you not just paste this, you tried to xtset as so and so, and you got back this and that error messages.
Comment
Eugene Lacoste

Join Date: Jul 2020

Posts: 24
#4

29 Jul 2020, 06:45

Dear Carlo and Joro,

Thank you very much for your messages. I will explain in details what I am trying to do:
I would like to use diff in diff regression by including different fixed-effects (and my dataset is a panel data)
1) The first model would be only a regression of CAN and Above on sale.
2) The second model would be a regression of CAN and Above on sale with fixed time effect
3) The third model would be a regression of CAN and Above on sale with both fixed time effect and fixed firm effect

How does it look like when we combined both two different fixed-effects with DID?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17700

29 Jul 2020, 07:23

Eugene:
as far as your points 2) and 3) are concerned, why not using -xtreg,fe-?

Code:

. use "https://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtset idcode year
       panel variable:  idcode (unbalanced)
        time variable:  year, 68 to 88, but with gaps
                delta:  1 unit

. xtreg ln_wage c.age##c.age, fe

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1006                                         avg =        6.1
     overall = 0.0865                                         max =         15

                                                F(2,23798)        =    1451.88
corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0539076   .0028078    19.20   0.000     .0484041    .0594112
             |
 c.age#c.age |  -.0005973   .0000465   -12.84   0.000    -.0006885   -.0005061
             |
       _cons |    .639913   .0408906    15.65   0.000     .5597649    .7200611
-------------+----------------------------------------------------------------
     sigma_u |   .4039153
     sigma_e |  .30245467
         rho |  .64073314   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4709, 23798) = 8.74                 Prob > F = 0.0000

. xtreg ln_wage c.age##c.age i.year, fe

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1162                                         min =          1
     between = 0.1078                                         avg =        6.1
     overall = 0.0932                                         max =         15

                                                F(16,23784)       =     195.45
corr(u_i, Xb)  = 0.0613                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0728746   .0107894     6.75   0.000     .0517267    .0940224
             |
 c.age#c.age |  -.0010113    .000061   -16.57   0.000    -.0011309   -.0008917
             |
        year |
         69  |   .0647054   .0158222     4.09   0.000     .0336928     .095718
         70  |   .0284423   .0234621     1.21   0.225     -.017545    .0744295
         71  |   .0579959   .0326524     1.78   0.076    -.0060048    .1219967
         72  |   .0510671   .0422995     1.21   0.227    -.0318426    .1339769
         73  |   .0424104    .052118     0.81   0.416    -.0597442    .1445651
         75  |   .0151376   .0717194     0.21   0.833    -.1254371    .1557123
         77  |   .0340933   .0918106     0.37   0.710    -.1458613    .2140478
         78  |   .0537334   .1023339     0.53   0.600    -.1468475    .2543143
         80  |   .0369475   .1221806     0.30   0.762    -.2025343    .2764293
         82  |   .0391687   .1423573     0.28   0.783    -.2398606     .318198
         83  |    .058766   .1523743     0.39   0.700    -.2398974    .3574294
         85  |   .1042758   .1726431     0.60   0.546    -.2341157    .4426673
         87  |   .1242272   .1930108     0.64   0.520    -.2540863    .5025406
         88  |   .1904977   .2068016     0.92   0.357    -.2148466     .595842
             |
       _cons |   .3937532   .2001741     1.97   0.049     .0013992    .7861072
-------------+----------------------------------------------------------------
     sigma_u |  .40275174
     sigma_e |  .30127563
         rho |  .64120306   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4709, 23784) = 8.75                 Prob > F = 0.0000

Kind regards,
Carlo
(Stata 19.0)

Comment

Eugene Lacoste

Join Date: Jul 2020
Posts: 24

29 Jul 2020, 08:55

Dear Carlo,

Thank you very much for your answer.
Unfortunately, I could not use xtset with the current structure of my dataset.
As you can see, each firm (id) can have multiple subsidiaries (variable "subsidiaries") and some subsidiaires can even be owned by two firms (so you will have Firm Kiwi owned by firm 1 and firm 33 for example).
Everytime I ran xtset, I had the error message "repeated time values within panel r(451)" since I have for each different subsidiaires the years from 1995 to 2000.

Code:

 input double id str103 subsidiaries float year str2 Country float(sale sale2  CAN Above)
1 Firm Kiwi                       1995 "CAN"   2000    2000 1 0
2 Firm Strawberry            1996 "CAN"   3000   4500 1 0
2 Firm Strawberry            1996 "CAN"   1500   4500 1 0
2 Firm Strawberry            1997 "CAN"   800    1300 1 0
2 Firm Strawberry             1997 "CAN"   500    1300 1 0
2 Firm Strawberry             1998 "CAN"   200    300 1 1
2 Firm Strawberry             1998 "CAN"   100    300  1 1
3 Firm Apple                1996 "JAP"   450    820 0 0
3 Firm Banana             1996 "NET"  370   820 53620 0 0
3 Firm Apple                  1997 "JAP"   4000    7000 0 0
3 Firm Banana            1997 "NET"   3000    7000 0 0
3 Firm Banana               1998 "NET"   200   250 0 1
3 Firm Apple                1998 "JAP"   50  250 0 1
3 Firm Banana              1999 "NET"   60    130 0 1
3 Firm Apple                  1999 "JAP"   70    130 0 1
3 Firm Banana              2000 "NET"   2000   2000 0 1

I also have to consider the subsidiary-fixed effect (which is currently a string str103) into my regression.

Best regards,
Eugene

Last edited by Eugene Lacoste; 29 Jul 2020, 08:57.

Announcement