Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtset and panel data

    Dear all,

    I have the following dataset. My dataset is way longer (40,000+ observations) and this is just an abstract.
    ID is the number of the firm.
    subsidiaries are the subsidiaries for each firm.
    year is the variable year from 1995-2000.
    Country is the country in which we have the subsidiaries and finally the two last columns are the sale per subsidiaries ("sale") and the total sale per firm (mean of sale group by id).
    CAN = subsidiairies within Canada (1= Y and 0 = N)
    Above = 1 if year >1998 and 0 if year < 1998

    Code:
    input double id str103 subsidiaries float year str2 Country float(sale sale2  CAN Above)
    1 Firm Kiwi                       1995 "CAN"   2000    2000 1 0
    2 Firm Strawberry            1996 "CAN"   3000   4500 1 0
    2 Firm Strawberry            1996 "CAN"   1500   4500 1 0
    2 Firm Strawberry            1997 "CAN"   800    1300 1 0
    2 Firm Strawberry             1997 "CAN"   500    1300 1 0
    2 Firm Strawberry             1998 "CAN"   200    300 1 1
    2 Firm Strawberry             1998 "CAN"   100    300  1 1
    3 Firm Apple                1996 "JAP"   450    820 0 0
    3 Firm Banana             1996 "NET"  370   820 53620 0 0
    3 Firm Apple                  1997 "JAP"   4000    7000 0 0
    3 Firm Banana            1997 "NET"   3000    7000 0 0
    3 Firm Banana               1998 "NET"   200   250 0 1
    3 Firm Apple                1998 "JAP"   50  250 0 1
    3 Firm Banana              1999 "NET"   60    130 0 1
    3 Firm Apple                  1999 "JAP"   70    130 0 1
    3 Firm Banana              2000 "NET"   2000   2000 0 1
    I am trying to use the xtset function which does not work given that I have repeated values in the dataset. I would like to test the effect of two other variables: CAN (1 = within Canada for the subsidiaries and 0 = not in Canada) and Above (1 = above 1998 and 0 below 1998)

    As you can see, a firm can have multiple other firms (variables subsidiaries) and I grouped them as follows:

    Code:
     egen company_id = group(id)
    However, the error message is still there: repeated time values within panel
    r(451);

    What should I do? I used also the function which led to a deletion of more than 12,000 observations.

    Code:
    drop if year==year[_n-1]
    Thank you in advance if you can help me,
    Eugene

  • #2
    Eugene:
    if you do not plan to use time-series related commands, such as lags and leads, you can simply -xtset- your data with -panelid- only.
    Hence the whole issue boils down to find a feasible panel indicator.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      And to expand on Carlo's suggestion, you might not need to -xtset- your data at all, you can in some cases directly deal with including dummy variables for the fixed effects.

      Also from your explanation it is not clear at all what you tried to -xtset- and what error messages you got back from Stata. Can you not just paste this, you tried to xtset as so and so, and you got back this and that error messages.

      Comment


      • #4
        Dear Carlo and Joro,

        Thank you very much for your messages. I will explain in details what I am trying to do:
        I would like to use diff in diff regression by including different fixed-effects (and my dataset is a panel data)
        1) The first model would be only a regression of CAN and Above on sale.
        2) The second model would be a regression of CAN and Above on sale with fixed time effect
        3) The third model would be a regression of CAN and Above on sale with both fixed time effect and fixed firm effect

        How does it look like when we combined both two different fixed-effects with DID?

        Comment


        • #5
          Eugene:
          as far as your points 2) and 3) are concerned, why not using -xtreg,fe-?
          Code:
          . use "https://www.stata-press.com/data/r16/nlswork.dta"
          (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
          
          . xtset idcode year
                 panel variable:  idcode (unbalanced)
                  time variable:  year, 68 to 88, but with gaps
                          delta:  1 unit
          
          . xtreg ln_wage c.age##c.age, fe
          
          Fixed-effects (within) regression               Number of obs     =     28,510
          Group variable: idcode                          Number of groups  =      4,710
          
          R-sq:                                           Obs per group:
               within  = 0.1087                                         min =          1
               between = 0.1006                                         avg =        6.1
               overall = 0.0865                                         max =         15
          
                                                          F(2,23798)        =    1451.88
          corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
               ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   age |   .0539076   .0028078    19.20   0.000     .0484041    .0594112
                       |
           c.age#c.age |  -.0005973   .0000465   -12.84   0.000    -.0006885   -.0005061
                       |
                 _cons |    .639913   .0408906    15.65   0.000     .5597649    .7200611
          -------------+----------------------------------------------------------------
               sigma_u |   .4039153
               sigma_e |  .30245467
                   rho |  .64073314   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(4709, 23798) = 8.74                 Prob > F = 0.0000
          
          . xtreg ln_wage c.age##c.age i.year, fe
          
          Fixed-effects (within) regression               Number of obs     =     28,510
          Group variable: idcode                          Number of groups  =      4,710
          
          R-sq:                                           Obs per group:
               within  = 0.1162                                         min =          1
               between = 0.1078                                         avg =        6.1
               overall = 0.0932                                         max =         15
          
                                                          F(16,23784)       =     195.45
          corr(u_i, Xb)  = 0.0613                         Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
               ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   age |   .0728746   .0107894     6.75   0.000     .0517267    .0940224
                       |
           c.age#c.age |  -.0010113    .000061   -16.57   0.000    -.0011309   -.0008917
                       |
                  year |
                   69  |   .0647054   .0158222     4.09   0.000     .0336928     .095718
                   70  |   .0284423   .0234621     1.21   0.225     -.017545    .0744295
                   71  |   .0579959   .0326524     1.78   0.076    -.0060048    .1219967
                   72  |   .0510671   .0422995     1.21   0.227    -.0318426    .1339769
                   73  |   .0424104    .052118     0.81   0.416    -.0597442    .1445651
                   75  |   .0151376   .0717194     0.21   0.833    -.1254371    .1557123
                   77  |   .0340933   .0918106     0.37   0.710    -.1458613    .2140478
                   78  |   .0537334   .1023339     0.53   0.600    -.1468475    .2543143
                   80  |   .0369475   .1221806     0.30   0.762    -.2025343    .2764293
                   82  |   .0391687   .1423573     0.28   0.783    -.2398606     .318198
                   83  |    .058766   .1523743     0.39   0.700    -.2398974    .3574294
                   85  |   .1042758   .1726431     0.60   0.546    -.2341157    .4426673
                   87  |   .1242272   .1930108     0.64   0.520    -.2540863    .5025406
                   88  |   .1904977   .2068016     0.92   0.357    -.2148466     .595842
                       |
                 _cons |   .3937532   .2001741     1.97   0.049     .0013992    .7861072
          -------------+----------------------------------------------------------------
               sigma_u |  .40275174
               sigma_e |  .30127563
                   rho |  .64120306   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(4709, 23784) = 8.75                 Prob > F = 0.0000
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Dear Carlo,

            Thank you very much for your answer.
            Unfortunately, I could not use xtset with the current structure of my dataset.
            As you can see, each firm (id) can have multiple subsidiaries (variable "subsidiaries") and some subsidiaires can even be owned by two firms (so you will have Firm Kiwi owned by firm 1 and firm 33 for example).
            Everytime I ran xtset, I had the error message "repeated time values within panel r(451)" since I have for each different subsidiaires the years from 1995 to 2000.

            Code:
             input double id str103 subsidiaries float year str2 Country float(sale sale2  CAN Above)
            1 Firm Kiwi                       1995 "CAN"   2000    2000 1 0
            2 Firm Strawberry            1996 "CAN"   3000   4500 1 0
            2 Firm Strawberry            1996 "CAN"   1500   4500 1 0
            2 Firm Strawberry            1997 "CAN"   800    1300 1 0
            2 Firm Strawberry             1997 "CAN"   500    1300 1 0
            2 Firm Strawberry             1998 "CAN"   200    300 1 1
            2 Firm Strawberry             1998 "CAN"   100    300  1 1
            3 Firm Apple                1996 "JAP"   450    820 0 0
            3 Firm Banana             1996 "NET"  370   820 53620 0 0
            3 Firm Apple                  1997 "JAP"   4000    7000 0 0
            3 Firm Banana            1997 "NET"   3000    7000 0 0
            3 Firm Banana               1998 "NET"   200   250 0 1
            3 Firm Apple                1998 "JAP"   50  250 0 1
            3 Firm Banana              1999 "NET"   60    130 0 1
            3 Firm Apple                  1999 "JAP"   70    130 0 1
            3 Firm Banana              2000 "NET"   2000   2000 0 1
            I also have to consider the subsidiary-fixed effect (which is currently a string str103) into my regression.

            Best regards,
            Eugene
            Last edited by Eugene Lacoste; 29 Jul 2020, 08:57.

            Comment

            Working...
            X