Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is my panel regression correct?

    Hi all,

    I have three countries and each of them was surveyed two times at least (country C) and three times at most (countries A and B). I pool the data together and then examine the determinant of y. In the data, observations are uniquely identified by three variables: id, round, and country.

    First, I create a paneled variable and tell Stata that I will use panel regressions as follows:
    Code:
    egen panelid = group(id country)
         xtset panelid round
    Second, I use xtpoisson command since my dedepent variable is a count
    Code:
    qui xtpoisson y x1 x2 x3, re vce(cluster panelid)
        margins, dydx(*)
    Since I do not have much experience with panel data analysis, I am not sure if my code above is correct. I would appreciate if anyone can take a look at my code and give advice if any. Thanks!

    Data example
    Code:
    clear
    input float(id round country y x1 x2 x3)
     1 1 1  15 1 1 1
     1 2 1  15 1 0 1
     1 3 1  15 1 0 1
     1 1 2   1 0 1 1
     1 2 2   1 0 0 1
     1 3 2   1 0 1 1
     1 1 3   2 0 0 0
     1 2 3   2 0 0 0
     2 1 1  10 1 1 1
     2 2 1  10 1 0 1
     2 3 1  10 1 0 1
     2 1 2   1 0 1 1
     2 2 2   1 0 0 1
     2 3 2   1 0 1 1
     2 1 3   5 0 0 1
     2 2 3   5 0 0 1
     3 1 1  10 0 1 1
     3 2 1  10 0 0 1
     3 1 2   8 0 1 1
     3 2 2   8 0 0 1
     3 3 2   8 0 1 1
     3 1 3   1 0 0 0
     4 1 1   2 1 0 1
     4 2 1   3 1 1 1
     4 1 2   2 0 0 1
     4 2 2   2 0 0 1
     4 1 3   5 0 1 0
     4 2 3   5 0 0 0
     5 3 1   3 1 1 1
     5 1 2   1 0 0 1
     5 2 2   1 0 1 1
     5 3 2   1 0 0 1
     5 1 3   7 1 0 1
     6 1 1   7 0 1 0
     6 2 1   7 0 0 0
     6 3 1   7 0 0 0
     6 1 2 100 0 1 1
     6 2 2 100 0 0 1
     6 3 2 100 0 0 1
     6 1 3  10 0 1 1
     6 2 3  10 0 0 1
     7 1 1  15 1 0 1
     7 2 1   8 1 1 1
     7 1 2   2 0 0 1
     7 2 2   2 0 1 1
     7 3 2   2 0 0 1
     7 1 3   5 0 0 1
     7 2 3   5 0 1 1
     8 1 1   4 1 0 1
     8 2 1  10 1 0 1
     8 1 2  10 0 1 1
     8 2 2  10 0 0 1
     8 3 2  10 0 0 1
     8 1 3   3 0 1 1
     8 2 3   3 0 0 1
     9 1 1   1 1 0 0
     9 2 1   1 1 1 0
     9 3 1   1 1 0 0
     9 1 2   4 0 1 1
     9 2 2   4 0 1 1
     9 3 2   4 0 0 1
     9 1 3   1 0 1 1
     9 2 3   1 0 0 1
    10 1 1  30 1 1 1
    10 2 1   5 1 0 1
    10 1 2   0 0 0 1
    10 2 2   0 0 1 1
    10 2 3   5 0 0 1
    11 1 1  40 1 1 1
    11 2 1  40 1 0 1
    11 3 1  40 1 0 1
    11 1 2  20 0 1 1
    11 2 2  20 0 0 1
    11 3 2  20 0 0 1
    11 1 3  12 0 1 1
    11 2 3  12 0 0 1
    12 1 1   8 1 0 0
    12 2 1   8 1 1 0
    12 3 1   8 1 0 0
    12 1 2   1 0 1 1
    12 2 2   1 0 0 1
    12 3 2   1 0 1 1
    12 1 3   3 0 0 1
    12 2 3   3 0 0 1
    13 1 1 100 0 1 1
    13 2 1 100 0 0 1
    13 3 1 100 0 0 1
    13 1 2   7 0 1 1
    13 2 2   7 0 0 1
    13 3 2   7 0 0 1
    13 1 3   7 0 1 1
    13 2 3   7 0 0 1
    14 1 1  10 1 0 1
    14 2 1  10 1 1 1
    14 3 1  10 1 0 1
    14 1 3  10 0 0 1
    14 2 3  10 0 0 1
    15 1 1   0 0 1 0
    15 2 1   0 0 0 0
    15 3 1   0 0 0 0
    15 1 2  20 0 1 1
    15 2 2  20 0 0 1
    15 3 2  20 0 0 1
    15 1 3   4 1 0 1
    end
    label values country country
    label def country 1 "A", modify
    label def country 2 "B", modify
    label def country 3 "C", modify

  • #2
    Matthew:
    what I'm not clear with your -xtset- code is the -panelid-: -id- or -country-?
    Last edited by Carlo Lazzaro; 09 Apr 2022, 01:56.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Yeah I agree with Carlo, why is the country and the id your panel variable?

      Your code may be fine, but maybe not. What's your outcome, the number of people kicked to death by horses in each country?

      Comment


      • #4
        Dear Carlo Lazzaro and Jared Greathouse,

        Thank you for your promote reply. The reason I used panelid and country in my xtset is that I have three variables (id, round, and country) that uniquely identify observations, so I was unable to use id and round in my xtset because of the following errors:
        repeated time values within panel
        The
        xtset command above is also the reason that makes me unconfident about my regression. I am not sure if the way I generate panelid and subsequent xtset and xtpoission commands are correct. Any advice is appreciated.

        It works fine if I run separate regressions for each country, for example:
        Code:
        preserve
        keep if country==1
        xtset id round
        qui xtpoisson y x1 x2 x3, re vce(cluster id)
        margins, dydx(*)
        restore
        @Jared Greathouse: the outcome is the number of clients of a drug store per week in each country. Since the data I am using is confidential, I have to code all variables in a way that they are not recognized.
        Last edited by Matthew Williams; 10 Apr 2022, 02:34.

        Comment


        • #5
          Matthew:
          if you do not plan to use time-series related operators (such as lags and leads) yoiu can safely -xtset- your dataset with -id- as -panelid- and even add -i.round- as a predictor in the right hand-side of your -xtpoisson. regression.
          In addition,exploiting Jared Greathouse 's assist, as far as the number of the Prussian soldiers kicked to death by their horses (which was one of the first application of the Poisson distribution in the real world - see https://en.wikipedia.org/wiki/Ladislaus_Bortkiewicz and StataPress colophon), you may find the following article interesting: https://www.jstor.org/stable/2348169.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Dear Carlo Lazzaro,

            if you do not plan to use time-series related operators (such as lags and leads) you can safely -xtset- your dataset with -id- as -panelid- and even add -i.round- as a predictor in the right-hand-side of your -xtpoisson. regression.
            To make sure that I understand your advice correctly, do you mean the following code?
            Code:
            xtset id
            qui xtpoisson y x1 x2 x3 i.round i.country, re vce(cluster id)
            margins, dydx(*)
            Many thanks for the relevant references.

            Comment


            • #7
              Matthew:
              yes, I'd go that way.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Originally posted by Carlo Lazzaro View Post
                Matthew:
                yes, I'd go that way.
                Thank you so much .

                Comment


                • #9
                  Originally posted by Matthew Williams View Post
                  . . . the outcome is the number of clients of a drug store per week in each country.
                  Are you saying that id is the identifier for a drug store? If so, then I think that your model isn't specified correctly, inasmuch as a given drug store cannot simultaneously exist in each of three countries.

                  On the other hand, if you have a fixed sample of fifteen drug store chains with franchise stores in each of the three countries, and you want to treat the international drug-store-chain chain firms as the exchangeable variable, then you'd probably be OK.

                  Comment


                  • #10
                    Originally posted by Joseph Coveney View Post
                    Are you saying that id is the identifier for a drug store? If so, then I think that your model isn't specified correctly, inasmuch as a given drug store cannot simultaneously exist in each of three countries.

                    On the other hand, if you have a fixed sample of fifteen drug store chains with franchise stores in each of the three countries, and you want to treat the international drug-store-chain chain firms as the exchangeable variable, then you'd probably be OK.
                    Dear Prof. Joseph,

                    Thank you for your comments. Yes, id is the identifier for a drug store.
                    If so, then I think that your model isn't specified correctly, in as much as a given drug store cannot simultaneously exist in each of three countries.
                    So, do you have any advice for me regarding the model specification?

                    Thank you.

                    Comment


                    • #11
                      Given that you've made an ID for each brand and country, i think your intuition is fine. I mean where I live, near Atlanta Georgia, Kroger and Publix are popular grocery stores. If I were studying these stores specifically, I would just xtset them by their ID and (presumably county) that they're in. And that's okay, because we have one identifier per individual store over time.

                      My only real question is, is Poisson appropriate? I mean I don't know how many people visit my local drug stores, but it has to be.... I don't know, hundreds, per day. I guess my real question is, could you get away with (or at least experiment with) OLS in this scenario? Any regression you'd do, poisson or normal, would just be
                      Code:
                      xtreg y Xs i.round, fe
                      But, do see this article on FE estimators. People tend to throw FE at everything, sometimes inappropriately.

                      Comment


                      • #12
                        Originally posted by Jared Greathouse View Post
                        My only real question is, is Poisson appropriate?
                        Yeah, he's ostensibly got up to a hundred patrons per week, but if these are counts and given that there are a fair number of zeros, onsies, threesies and other low counts, then it would seem that a count model would be more defensible than a linear model. Specifying a negative binomial distribution family doesn't seem to do much here, and so maybe Poisson is a reasonable first-pass specification for the distribution.

                        But, do see this article on FE estimators. People tend to throw FE at everything, sometimes inappropriately.
                        I think that he's not using a fixed-effects specification.

                        Originally posted by Matthew Williams View Post
                        Yes, id is the identifier for a drug store.

                        So, do you have any advice for me regarding the model specification?
                        As Jared implies, it seems that you were on the right track originally, with the egen . . . group() data-management step. So, perhaps something like the following (variable names shortened for brevity).

                        .ÿ
                        .ÿversionÿ17.0

                        .ÿ
                        .ÿclearÿ*

                        .ÿ
                        .ÿquietlyÿinputÿfloat(idÿroundÿcountryÿyÿx1ÿx2ÿx3)

                        .ÿ
                        .ÿquietlyÿcompress

                        .ÿ
                        .ÿrenameÿcountryÿcid

                        .ÿrenameÿroundÿtim

                        .ÿ
                        .ÿ*
                        .ÿ*ÿBeginÿhere
                        .ÿ*
                        .ÿgenerateÿintÿpidÿ=ÿ100ÿ*ÿcidÿ+ÿid

                        .ÿisidÿpidÿtim,ÿsort
                        (dataÿnowÿsortedÿbyÿpidÿtim)

                        .ÿ
                        .ÿmeglmÿyÿi.(x?ÿtimÿcid),ÿfamily(poisson)ÿ||ÿpid:ÿ,ÿnolrtestÿnolog

                        Mixed-effectsÿGLMÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ104
                        Family:ÿPoisson
                        Link:ÿÿÿLog
                        Groupÿvariable:ÿpidÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿÿ44

                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿObsÿperÿgroup:
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿÿÿ1
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿÿÿ2.4
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿÿÿ3

                        Integrationÿmethod:ÿmvaghermiteÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿIntegrationÿpts.ÿÿ=ÿÿÿÿÿÿÿÿÿÿ7

                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(7)ÿÿÿÿÿÿ=ÿÿÿÿÿÿ10.67
                        Logÿlikelihoodÿ=ÿ-278.16066ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.1536
                        ------------------------------------------------------------------------------
                        ÿÿÿÿÿÿÿÿÿÿÿyÿ|ÿCoefficientÿÿStd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
                        -------------+----------------------------------------------------------------
                        ÿÿÿÿÿÿÿÿ1.x1ÿ|ÿÿ-.3233068ÿÿÿ.5579446ÿÿÿÿ-0.58ÿÿÿ0.562ÿÿÿÿ-1.416858ÿÿÿÿ.7702446
                        ÿÿÿÿÿÿÿÿ1.x2ÿ|ÿÿ-.0112909ÿÿÿ.1009929ÿÿÿÿ-0.11ÿÿÿ0.911ÿÿÿÿ-.2092334ÿÿÿÿ.1866516
                        ÿÿÿÿÿÿÿÿ1.x3ÿ|ÿÿÿ1.430379ÿÿÿ.5366718ÿÿÿÿÿ2.67ÿÿÿ0.008ÿÿÿÿÿ.3785212ÿÿÿÿ2.482236
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
                        ÿÿÿÿÿÿÿÿÿtimÿ|
                        ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿ-.0607674ÿÿÿÿ.099528ÿÿÿÿ-0.61ÿÿÿ0.541ÿÿÿÿ-.2558386ÿÿÿÿ.1343039
                        ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿ-.0332801ÿÿÿ.1112711ÿÿÿÿ-0.30ÿÿÿ0.765ÿÿÿÿ-.2513674ÿÿÿÿ.1848072
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
                        ÿÿÿÿÿÿÿÿÿcidÿ|
                        ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿÿ-1.23794ÿÿÿ.6318782ÿÿÿÿ-1.96ÿÿÿ0.050ÿÿÿÿ-2.476398ÿÿÿÿ.0005186
                        ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿÿ-.912386ÿÿÿ.5530151ÿÿÿÿ-1.65ÿÿÿ0.099ÿÿÿÿ-1.996276ÿÿÿÿ.1715036
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
                        ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ1.257814ÿÿÿ.5969795ÿÿÿÿÿ2.11ÿÿÿ0.035ÿÿÿÿÿ.0877558ÿÿÿÿ2.427872
                        -------------+----------------------------------------------------------------
                        pidÿÿÿÿÿÿÿÿÿÿ|
                        ÿÿÿvar(_cons)|ÿÿÿ1.220383ÿÿÿ.3069654ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.7454042ÿÿÿÿ1.998024
                        ------------------------------------------------------------------------------

                        .ÿ
                        .ÿ//ÿAlternativeÿ(youÿdon'tÿhaveÿenoughÿdataÿtoÿstructureÿtheÿworkingÿcorrelationÿmatrix)
                        .ÿxtgeeÿyÿi.(x?ÿtimÿcid),ÿi(pid)ÿ/*ÿt(tim)ÿ*/ÿfamily(poisson)ÿcorr(independent)ÿvce(robust)ÿnolog

                        GEEÿpopulation-averagedÿmodelÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿ=ÿÿÿÿÿÿ104
                        Groupÿvariable:ÿpidÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿgroupsÿ=ÿÿÿÿÿÿÿ44
                        Family:ÿPoissonÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿObsÿperÿgroup:ÿÿ
                        Link:ÿÿÿLogÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿ1
                        Correlation:ÿindependentÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿ2.4
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿ3
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(7)ÿÿÿÿÿ=ÿÿÿÿ47.97
                        Scaleÿparameterÿ=ÿ1ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿ=ÿÿÿ0.0000

                        Pearsonÿchi2(104)ÿÿÿÿ=ÿÿ2475.87ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿDevianceÿÿÿÿÿÿÿÿÿ=ÿÿ1617.12
                        Dispersionÿ(Pearson)ÿ=ÿ23.80649ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿDispersionÿÿÿÿÿÿÿ=ÿ15.54923

                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(Std.ÿerr.ÿadjustedÿforÿclusteringÿonÿpid)
                        ------------------------------------------------------------------------------
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿRobust
                        ÿÿÿÿÿÿÿÿÿÿÿyÿ|ÿCoefficientÿÿstd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
                        -------------+----------------------------------------------------------------
                        ÿÿÿÿÿÿÿÿ1.x1ÿ|ÿÿ-1.328568ÿÿÿ.5391672ÿÿÿÿ-2.46ÿÿÿ0.014ÿÿÿÿ-2.385316ÿÿÿ-.2718197
                        ÿÿÿÿÿÿÿÿ1.x2ÿ|ÿÿ-.2364593ÿÿÿ.1487161ÿÿÿÿ-1.59ÿÿÿ0.112ÿÿÿÿ-.5279374ÿÿÿÿ.0550188
                        ÿÿÿÿÿÿÿÿ1.x3ÿ|ÿÿÿ2.059836ÿÿÿ.6187065ÿÿÿÿÿ3.33ÿÿÿ0.001ÿÿÿÿÿ.8471932ÿÿÿÿ3.272478
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
                        ÿÿÿÿÿÿÿÿÿtimÿ|
                        ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿ-.1846983ÿÿÿÿ.095755ÿÿÿÿ-1.93ÿÿÿ0.054ÿÿÿÿ-.3723746ÿÿÿÿÿ.002978
                        ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿÿ.0749424ÿÿÿ.1611165ÿÿÿÿÿ0.47ÿÿÿ0.642ÿÿÿÿ-.2408402ÿÿÿÿ.3907249
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
                        ÿÿÿÿÿÿÿÿÿcidÿ|
                        ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿÿ-1.48177ÿÿÿÿÿ.72726ÿÿÿÿ-2.04ÿÿÿ0.042ÿÿÿÿ-2.907174ÿÿÿ-.0563668
                        ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿ-2.082517ÿÿÿ.5170389ÿÿÿÿ-4.03ÿÿÿ0.000ÿÿÿÿ-3.095894ÿÿÿ-1.069139
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
                        ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ2.141616ÿÿÿ.5156226ÿÿÿÿÿ4.15ÿÿÿ0.000ÿÿÿÿÿ1.131014ÿÿÿÿ3.152217
                        ------------------------------------------------------------------------------

                        .ÿ
                        .ÿexit

                        endÿofÿdo-file


                        .


                        The maginitude of the difference in regression coefficients between the population average (robust standard errors) and individual-specific (random effects) models seems larger than I would have anticipated, and maybe you'd want to look into that.

                        Comment


                        • #13
                          Dear Prof. Joseph,

                          Thank you for your insightful advice. Do I need to compute marginal effects after running meglm or xtgee?

                          Comment

                          Working...
                          X