Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Repeated time values in sample.

    Dear all

    I am new in STATA and i would like to ask some suggestions about the problem i face..
    The objective is to predict a score taking into consideration some post scores (2013-2015). My database has the following form:
    ID Date Score Region INK Gender BS PMT UN GDP Growth Income POV INS CF PC BTF
    393368 05.06.2013 517 3 0 1 584 12 4 31123 -1 21 16 .10733219 82 1517 1374
    393290 05.06.2013 454 5 1 0 352 12 6 34796 0 21 16 .19055245 59 6774 3873
    393254 05.06.2013 471 5 0 1 233 12 6 34796 0 21 16 .19055245 59 6774 3873
    394099 05.06.2013 459 5 1 1 533 9 6 34796 0 21 16 .19055245 59 6774 3873
    393390 05.06.2013 550 9 0 1 536 9 4 40446 1 30 15 .09240682 19 4752 1203
    393379 05.06.2013 436 16 0 0 183 12 5 24663 1 24 12 .01918129 3 821 184
    393235 05.06.2013 501 7 0 0 323 14 5 31226 0 20 17 .03945251 28 429 449
    The ID is possible to be appeared in 2014 or 2015.

    However when i try to define Date as time variable i receive the following:

    Code:
    . tsset D
    repeated time values in sample
    r(451);
    Thank you in advance for your response.

    Best Regards

    George
    Last edited by Georgios Kyrkos; 21 Oct 2016, 05:17.

  • #2
    Georgios:
    welcome to the list.
    The issue there is that you should consider the panel structure of your dataset:
    Code:
    . use http://www.stata-press.com/data/r14/nlswork.dta
    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
    
    . tsset year
    repeated time values in sample
    r(451);
    
    . xtset idcode year
           panel variable:  idcode (unbalanced)
            time variable:  year, 68 to 88, but with gaps
                    delta:  1 unit
    For the future, please post your examples (or dataset excerpts) via -dataex- (-search dataex). Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Mr. Lazzaro

      Thank you for your quick response and i apologize for the wrong post.
      So in order to be structured as panel, the id needs to be appeared in all years? (2013-2014-2015)
      Do you recommend any STATA document regarding Panel data preparation?

      Kind Regards

      Georgios

      Comment


      • #4
        Start with

        Code:
        help xt
        and follow its links.

        Comment


        • #5
          Originally posted by Georgios Kyrkos View Post
          So in order to be structured as panel, the id needs to be appeared in all years? (2013-2014-2015)
          Yes, this is the meaning of panel data: the same ids are observed several times (here in 2013, 2014 and 2015).
          To declare your panel you'll have to use xtset (or tsset), so I advice you to take a look on the xtset helpfile (type help xtset). It would provide you details about panel data preparation.

          Best,
          Charlie

          Edit: Nick has been the first to answer, I thought He would have taken a little more time, to make a point on the Stata spelling in #3
          Last edited by Charlie Joyez; 21 Oct 2016, 06:11.

          Comment


          • #6
            To make myself predicted: http://www.statalist.org/forums/help#spelling

            Comment


            • #7
              Thank you all for the response.

              Yes, this is the meaning of panel data: the same ids are observed several times (here in 2013, 2014 and 2015).
              However in my data set this is not happening systematically. The majority appears in one year of the period. it seems like more a pooled cross-sectional data.

              Best,

              George

              Comment


              • #8
                Working backwards, if you want to apply tsset or xtset, then these are the rules:

                1. No time can appear more than once.

                2. No (panel, time) couple can appear more than once.

                It is not a problem in general that panels may be unbalanced, but if you want to use any commands that require tsset or xtset, you must apply one of those commands first.

                Comment


                • #9
                  As Nick said, having unbalanced panel (where not all identifiers are reported each time) might not be a problem in general

                  However, in your specific case you precise that the majority of individual are only observed one year.
                  In this case, I would warn you that using fixed effects after having set the panel structure would lead to remove all of this uniquely observed individual.

                  Yet, I don't know whether you have planned to use fixed effects, but beware of the potential bias it could cause (especially is the uniquely observed individual are not randomly distributed).

                  Comment


                  • #10
                    Georgios:
                    in the same fashion, beware of making up your dataset in order to obtain (if feasible) a balanced panel.
                    That approach would seriously bias your regression results because of dealing with a sample whose relationships with the original one are tenuous at best (especially if missingness is, as it often occurs, informative).
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Dear all

                      Thank you very much for the feedback. It took me some time to comprehend those terms since i am not familiar with panel datasets. However, since my dataset is dominated by unique observations per year, i removed the duplicated observations and the panel is weakly unbalanced (13 530 out of 170 797 observations are out).
                      Now i have unique ID for each date. N>T. Is it going to bias my regression?

                      Charlie you mentioned fixed effects. I am planning to use Random effect instead of fixed.

                      Thank you and kind regards

                      Georgios

                      Comment


                      • #12
                        Georgios:
                        the bias may rest on the fact that you removed all the second entries for the ids with two observations per year, unless you had methodologically sound reasons to do so (that is, two entries per year for the same id were mistakenly entered)..
                        Assuming that you had them, you are now probably dealing with a largel N, small T panel dataset.
                        In that instance -xtreg. is the way to go and the outcome of the -hausman-specification test shoud have placed you on the right track as far as -fe- or -re- specification fits your data appropriately.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Mr. Lazzaro:
                          Actually the replicated transactions were just the same customer who bought two times at the same day. Same Score, Region, payment method but different amount. So i suppose is a bug in the system of the company i study. To be more precise i can either merge or remove the duplicated observations.

                          Regarding the panel data set:
                          Code:
                           xtset id D
                                 panel variable:  id (unbalanced)
                                  time variable:  D, 5/6/2013 to 12/30/2015
                                          delta:  1 day
                          Then, for the Fixed-Effects all variables are excluded due to collinearity. I also tried without the dummy variables. Command used:
                          Code:
                           xtreg SC REG INK GEN BS PMT UN GDP GR INC POV INS CF PC BTF, fe vce(robust)
                          For the Random Effects
                          Code:
                          xtreg SC REG INK GEN BS PMT UN GDP GR INC POV INS CF PC BTF, re
                          insufficient observations
                          r(2001);
                          However the Between regression works.

                          I know that for the Hausman-Specification test i need the outcome from RE and FE


                          Kind Regards

                          Georgios
                          Last edited by Georgios Kyrkos; 23 Oct 2016, 13:02.

                          Comment


                          • #14
                            Georgios:
                            thanks for providing further details.
                            As per your reply, it seems that you have only one observation per panel_id.
                            It that were the case, it is not surrprising that -xtreg- gives back results with the -be- specification only.
                            At the top of that, you may find out that the results you got with -xtreg, be- are the same that you would obtain with -regress- (as -regress- is a particular case of panel data with one wave of data only).
                            A toy- example can support what stated above:
                            Code:
                            . sysuse auto.dta
                            (1978 Automobile Data)
                            
                            . g year=1
                            
                            . g panel_id=_n
                            
                            . xtset panel_id year
                                   panel variable:  panel_id (strongly balanced)
                                    time variable:  year, 1 to 1
                                            delta:  1 unit
                            
                            . xtreg price mpg i.rep78, fe
                            note: mpg omitted because of collinearity
                            note: 2.rep78 omitted because of collinearity
                            note: 3.rep78 omitted because of collinearity
                            note: 4.rep78 omitted because of collinearity
                            note: 5.rep78 omitted because of collinearity
                            
                            Fixed-effects (within) regression               Number of obs     =         69
                            Group variable: panel_id                        Number of groups  =         69
                            
                            R-sq:                                           Obs per group:
                                 within  =      .                                         min =          1
                                 between =      .                                         avg =        1.0
                                 overall =      .                                         max =          1
                            
                                                                            F(0,0)            =       0.00
                            corr(u_i, Xb)  =      .                         Prob > F          =          .
                            
                            ------------------------------------------------------------------------------
                                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                            -------------+----------------------------------------------------------------
                                     mpg |          0  (omitted)
                                         |
                                   rep78 |
                                      2  |          0  (omitted)
                                      3  |          0  (omitted)
                                      4  |          0  (omitted)
                                      5  |          0  (omitted)
                                         |
                                   _cons |   6146.043          .        .       .            .           .
                            -------------+----------------------------------------------------------------
                                 sigma_u |  2912.4403
                                 sigma_e |          .
                                     rho |          .   (fraction of variance due to u_i)
                            ------------------------------------------------------------------------------
                            F test that all u_i=0: F(68, 0) = .                          Prob > F =      .
                            
                            . xtreg price mpg i.rep78, re
                            insufficient observations
                            r(2001);
                            
                            . xtreg price mpg i.rep78, be
                            
                            Between regression (regression on group means)  Number of obs     =         69
                            Group variable: panel_id                        Number of groups  =         69
                            
                            R-sq:                                           Obs per group:
                                 within  =      .                                         min =          1
                                 between = 0.2584                                         avg =        1.0
                                 overall = 0.2584                                         max =          1
                            
                                                                            F(5,63)           =       4.39
                            sd(u_i + avg(e_i.))=  2605.782                  Prob > F          =     0.0017
                            
                            ------------------------------------------------------------------------------
                                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                            -------------+----------------------------------------------------------------
                                     mpg |  -280.2615   61.57666    -4.55   0.000    -403.3126   -157.2103
                                         |
                                   rep78 |
                                      2  |   877.6347   2063.285     0.43   0.672     -3245.51     5000.78
                                      3  |   1425.657   1905.438     0.75   0.457    -2382.057    5233.371
                                      4  |   1693.841   1942.669     0.87   0.387    -2188.274    5575.956
                                      5  |   3131.982   2041.049     1.53   0.130    -946.7282    7210.693
                                         |
                                   _cons |   10449.99   2251.041     4.64   0.000     5951.646    14948.34
                            ------------------------------------------------------------------------------
                            
                            . reg price mpg i.rep78
                            
                                  Source |       SS           df       MS      Number of obs   =        69
                            -------------+----------------------------------   F(5, 63)        =      4.39
                                   Model |   149020603         5  29804120.7   Prob > F        =    0.0017
                                Residual |   427776355        63  6790100.88   R-squared       =    0.2584
                            -------------+----------------------------------   Adj R-squared   =    0.1995
                                   Total |   576796959        68  8482308.22   Root MSE        =    2605.8
                            
                            ------------------------------------------------------------------------------
                                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                            -------------+----------------------------------------------------------------
                                     mpg |  -280.2615   61.57666    -4.55   0.000    -403.3126   -157.2103
                                         |
                                   rep78 |
                                      2  |   877.6347   2063.285     0.43   0.672     -3245.51     5000.78
                                      3  |   1425.657   1905.438     0.75   0.457    -2382.057    5233.371
                                      4  |   1693.841   1942.669     0.87   0.387    -2188.274    5575.956
                                      5  |   3131.982   2041.049     1.53   0.130    -946.7282    7210.693
                                         |
                                   _cons |   10449.99   2251.041     4.64   0.000     5951.646    14948.34
                            ------------------------------------------------------------------------------
                            
                            .
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              Dear Mr. Lazzaro

                              Exactly. my results are exactly the same. (Regress BE vs Regress). I suppose this has to do with my data structure (time var and panel_id).

                              Code:
                                xtreg SC REG INK GEN BS PMT UN GDP GR INC POV INS CF PC BTF, be
                              
                              Between regression (regression on group means)  Number of obs     =    328,280
                              Group variable: id                              Number of groups  =    328,280
                              
                              R-sq:                                           Obs per group:
                                   within  =      .                                         min =          1
                                   between = 0.0869                                         avg =        1.0
                                   overall = 0.0869                                         max =          1
                              
                                                                              F(14,328265)      =    2230.94
                              sd(u_i + avg(e_i.))=  52.52907                  Prob > F          =     0.0000
                              
                              ------------------------------------------------------------------------------
                                        SC |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                              -------------+----------------------------------------------------------------
                                       REG |   .0902898   .0517752     1.74   0.081    -.0111881    .1917677
                                       INK |  -44.00994   .3700228  -118.94   0.000    -44.73517    -43.2847
                                       GEN |   1.334461   .1836945     7.26   0.000      .974425    1.694497
                                        BS |  -.0082948   .0005337   -15.54   0.000    -.0093408   -.0072488
                                       PMT |  -2.769994   .0318891   -86.86   0.000    -2.832496   -2.707492
                                        UN |  -3.348294   .0672922   -49.76   0.000    -3.480184   -3.216403
                                       GDP |  -.0010511   .0000289   -36.35   0.000    -.0011077   -.0009944
                                        GR |   3.438825   .1324835    25.96   0.000     3.179161    3.698489
                                       INC |   .0807456   .0404129     2.00   0.046     .0015375    .1599537
                                       POV |   2.186218   .1657191    13.19   0.000     1.861413    2.511023
                                       INS |    -71.812     5.2909   -13.57   0.000    -82.18201   -61.44199
                                        CF |   .0780609   .0064883    12.03   0.000      .065344    .0907778
                                        PC |   .0013398   .0000871    15.39   0.000     .0011692    .0015105
                                       BTF |  -.0011731   .0000736   -15.94   0.000    -.0013174   -.0010288
                                     _cons |   591.0429   2.570625   229.92   0.000     586.0046    596.0813
                              ------------------------------------------------------------------------------
                              
                              .


                              KInd Regards

                              Georgios Kyrkos





                              Comment

                              Working...
                              X