Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • When can I use "xtset id" in response to "repeated time values within panel"?

    Hi there,

    I currently have data for a large pool of individuals across four years (2015/16/17/18).

    I appended four annual datasets from the 'Diary of Consumer Payment Choice' which contain transactional level data for approx. 2000-3000 individuals who report their transaction behaviour across 4 diary days (diary_day == 0, diary_day == 1, diary_day == 2, diary_day == 3), per year.

    Individuals report their transactions on each day of the diary, and can report as many transactions as they wish each day.

    I created a "date" variable prior to appending the datasets in order to distinguish between the years of the diaries.

    When I input "xtset id date, yearly", I get the "repeated time values within panel" error.

    I think this is because my id and date variables do not uniquely identify individuals in the sample, as many respondents report multiple transactions per diary_day.

    I read through other posts and I now know that "xtset id" is a possibility if I am not looking to use time-series operators such as lags and leads. Whilst these would be a nice option, I am not sure how critical they will be for my purposes. I am looking to estimate the effect of using certain payment instruments (such as contactless debit cards) on spending amounts, controlling for unobserved heterogeneity. My intention is to use fixed effects (and potentially pooled OLS and random effects). I am still in the early stages of my research.

    (I am using Stata/MP 15.1)

    Below I have posted the data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id float(date diary_day tran) double amnt
    100001 2016 0 2               1000
    100001 2016 0 .                  .
    100001 2016 0 1                  5
    100001 2016 1 .                  .
    100001 2016 2 3                127
    100001 2016 2 2                820
    100001 2016 2 1               30.5
    100001 2016 2 .                  .
    100001 2016 2 4 30.400000000000002
    100001 2016 3 1                820
    100001 2016 3 .                  .
    100001 2017 0 .                  .
    100001 2017 1 1                127
    100001 2017 2 1                 40
    100001 2017 3 1                 35
    100001 2018 0 .                  .
    100001 2018 1 .                  .
    100001 2018 2 1                 10
    100001 2018 2 2              89.23
    100001 2018 3 .                  .
    100002 2017 0 .                  .
    100002 2017 1 1              25.35
    100002 2017 2 1                120
    100002 2017 3 1  6.140000000000001
    100003 2016 0 1               1623
    100003 2016 0 .                  .
    100003 2016 1 .                  .
    100003 2016 1 3                500
    100003 2016 1 1                 20
    100003 2016 1 6                 20
    100003 2016 1 5                150
    100003 2016 1 2                  2
    100003 2016 1 4              34.15
    100003 2016 1 7                 25
    100003 2016 1 8                 20
    100003 2016 2 .                  .
    100003 2016 2 2               17.5
    100003 2016 2 1                 61
    100003 2016 3 .                  .
    100003 2016 3 1                  2
    100003 2017 0 .                  .
    100003 2017 1 2               7.45
    100003 2017 1 1              12.99
    100003 2017 1 3                 15
    100003 2017 2 .                  .
    100003 2017 3 2              19.72
    100003 2017 3 1              93.97
    100003 2017 3 4            1376.33
    100003 2017 3 3              23.89
    100003 2018 0 .                  .
    100003 2018 1 5                  3
    100003 2018 1 2 19.150000000000002
    100003 2018 1 6                 40
    100003 2018 1 8                 20
    100003 2018 1 3             107.92
    100003 2018 1 1              13.71
    100003 2018 1 7                  6
    100003 2018 1 4                 28
    100003 2018 2 1               94.2
    100003 2018 2 3              41.51
    100003 2018 2 2              22.87
    100003 2018 3 2 1696.1000000000001
    100004 2017 0 .                  .
    100004 2017 1 1               3.48
    100004 2017 2 3                579
    100004 2017 2 4                505
    100004 2017 2 1                597
    100004 2017 3 2              74.84
    100004 2017 3 4             389.74
    100004 2017 3 3              92.01
    100004 2017 3 1              92.01
    100004 2017 3 5             389.73
    100004 2018 0 .                  .
    100004 2018 1 1                123
    100004 2018 1 2                123
    100004 2018 2 1                 12
    100004 2018 2 2                  7
    100004 2018 3 1                  5
    100004 2018 3 2                 40
    100005 2015 0 .                  .
    100005 2015 1 1             100.41
    100005 2015 1 .                  .
    100005 2015 2 1              35.81
    100005 2015 2 .                  .
    100005 2015 3 1               6.53
    100005 2015 3 3                 14
    100005 2015 3 2                  3
    100005 2015 3 4                 37
    100005 2015 3 .                  .
    100005 2016 0 .                  .
    100005 2016 1 1              516.5
    100005 2016 1 .                  .
    100005 2016 2 .                  .
    100005 2016 3 .                  .
    100007 2015 0 .                  .
    100007 2015 1 1                  5
    100007 2015 1 .                  .
    100007 2015 2 .                  .
    100007 2015 2 1                 30
    100007 2015 3 .                  .
    end
    format %ty date
    If "xtset id" is not appropriate here, could you please advise how to proceed? I am also not sure whether I could use "xtset id" and then simply include year dummies in my regression.

    Thank you for your help in advance.

    Jack

  • #2
    Apologies; below was my actual command, to be more specific.

    order id date

    sort id date

    egen key = group(id)

    xtset key date, yearly

    Comment


    • #3
      Originally posted by Jack Hipwood View Post
      My intention is to use fixed effects (and potentially pooled OLS and random effects).
      If you're using xtreg , fe or xtreg , re, then the second argument to xtset panelvar timevar is ignored, anyway.

      Comment


      • #4
        Thanks Joseph!

        So just to clarify, I should be able to control for individual-specific fixed effects across all four years without needing to specify the timevar in the xtset?

        And do you know if I should include year dummy variables in the xtreg, fe ?

        Last edited by Jack Hipwood; 15 May 2020, 04:44.

        Comment


        • #5
          Jack:
          you can legally -xtset- your data with -panelid- only.
          However, it will not allow you to use time series commands, such as lags and leads.
          Yes, it is highly recommended (and how you -xtset- you data does not bear on the resulting coefficients, as you can see from the following toy-example):
          Code:
          . use "https://www.stata-press.com/data/r16/nlswork.dta"
          (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
          
          . xtset idcode
                 panel variable:  idcode (unbalanced)
          
          . xtreg ln_wage age i.year, fe
          
          Fixed-effects (within) regression               Number of obs     =     28,510
          Group variable: idcode                          Number of groups  =      4,710
          
          R-sq:                                           Obs per group:
               within  = 0.1060                                         min =          1
               between = 0.0914                                         avg =        6.1
               overall = 0.0805                                         max =         15
          
                                                          F(15,23785)       =     188.00
          corr(u_i, Xb)  = 0.0467                         Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
               ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   age |   .0125992   .0102163     1.23   0.217    -.0074253    .0326238
                       |
                  year |
                   69  |   .0748621   .0159011     4.71   0.000      .043695    .1060292
                   70  |   .0478697   .0235673     2.03   0.042     .0016763     .094063
                   71  |   .0865577   .0327939     2.64   0.008     .0222795     .150836
                   72  |   .0856757   .0424903     2.02   0.044     .0023919    .1689594
                   73  |   .0880069    .052344     1.68   0.093    -.0145906    .1906044
                   75  |   .0778607   .0720304     1.08   0.280    -.0633235    .2190449
                   77  |    .108365   .0922272     1.17   0.240    -.0724063    .2891363
                   78  |   .1309518   .1028143     1.27   0.203    -.0705707    .3324743
                   80  |   .1142649    .122792     0.93   0.352    -.1264152     .354945
                   82  |   .1090451   .1431112     0.76   0.446    -.1714619    .3895522
                   83  |   .1211272   .1532018     0.79   0.429    -.1791581    .4214125
                   85  |   .1465637   .1736146     0.84   0.399    -.1937321    .4868594
                   87  |   .1382642   .1941163     0.71   0.476     -.242216    .5187445
                   88  |   .1799741   .2079871     0.87   0.387    -.2276938     .587642
                       |
                 _cons |   1.203731   .1952306     6.17   0.000     .8210667    1.586396
          -------------+----------------------------------------------------------------
               sigma_u |   .4058746
               sigma_e |  .30300411
                   rho |  .64212421   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(4709, 23785) = 8.80                 Prob > F = 0.0000
          
          . xtset idcode year
                 panel variable:  idcode (unbalanced)
                  time variable:  year, 68 to 88, but with gaps
                          delta:  1 unit
          
          . xtreg ln_wage age i.year, fe
          
          Fixed-effects (within) regression               Number of obs     =     28,510
          Group variable: idcode                          Number of groups  =      4,710
          
          R-sq:                                           Obs per group:
               within  = 0.1060                                         min =          1
               between = 0.0914                                         avg =        6.1
               overall = 0.0805                                         max =         15
          
                                                          F(15,23785)       =     188.00
          corr(u_i, Xb)  = 0.0467                         Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
               ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   age |   .0125992   .0102163     1.23   0.217    -.0074253    .0326238
                       |
                  year |
                   69  |   .0748621   .0159011     4.71   0.000      .043695    .1060292
                   70  |   .0478697   .0235673     2.03   0.042     .0016763     .094063
                   71  |   .0865577   .0327939     2.64   0.008     .0222795     .150836
                   72  |   .0856757   .0424903     2.02   0.044     .0023919    .1689594
                   73  |   .0880069    .052344     1.68   0.093    -.0145906    .1906044
                   75  |   .0778607   .0720304     1.08   0.280    -.0633235    .2190449
                   77  |    .108365   .0922272     1.17   0.240    -.0724063    .2891363
                   78  |   .1309518   .1028143     1.27   0.203    -.0705707    .3324743
                   80  |   .1142649    .122792     0.93   0.352    -.1264152     .354945
                   82  |   .1090451   .1431112     0.76   0.446    -.1714619    .3895522
                   83  |   .1211272   .1532018     0.79   0.429    -.1791581    .4214125
                   85  |   .1465637   .1736146     0.84   0.399    -.1937321    .4868594
                   87  |   .1382642   .1941163     0.71   0.476     -.242216    .5187445
                   88  |   .1799741   .2079871     0.87   0.387    -.2276938     .587642
                       |
                 _cons |   1.203731   .1952306     6.17   0.000     .8210667    1.586396
          -------------+----------------------------------------------------------------
               sigma_u |   .4058746
               sigma_e |  .30300411
                   rho |  .64212421   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(4709, 23785) = 8.80                 Prob > F = 0.0000
          
          .
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you, Carlo! This has been very helpful.

            Comment

            Working...
            X