Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtologit & complex weighting for a ordered logistic regression in a pre- & post- study design

    Dear Statalist friends,

    I have been trying to use xtologit and complex weighting (or simpler weighting) in Stata 13 for a research project. Y is the dependent variable that has the value of 1 (unlikely), 2 (somewhat likely), and 3 (very likely) to denote a motivation to do something. I'd like to use difference-in-difference approach to examine if an event may change people's attitude/perception of doing something. The data structure (long form) is at the following:

    ID Year Treatment Year * Treatment Y age gender x3 x4 .....
    1 0 1 0 2 5 1
    1 1 1 1 3 8 1
    2 0 0 0 3 5 0
    2 1 0 0 1 8 0
    3 0 1 0 2 5 1
    3 1 1 1 2 8 1
    4 0 0 0 3 5 1
    4 1 0 0 2 8 1
    5 0 0 0 3 5 1
    5 1 0 0 1 8 1
    After I sorted ID and Year, I ran the following codes (i.e., setting up the xtset capability and running the panel ordered logistic regression) as follows:
    xtset ID Year
    panel variable: ID (strongly balanced)
    time variable: Year, 0 to 1
    delta: 1 unit
    svyset myPSU [pweight=myPweightVariable], strata(myStratumVariable)


    svy: xtologit Y Year Treatment Y*Tx age gender x3 x4 ......



    I got an error message as follows:
    xtologit is not supported by svy with vce(linearized); see help svy estimation for a list of Stata
    estimation commands that are supported by svy
    r(322);


    Can someone kindly resolve the above issue? How can I do a panel ordered logistic regression with simple or complex weighted settings? This issue has bothered me for a long time. I really want to find help to do a panel ordered logistic regression with simple or complex weights for this pre/post (2 time points) repeated measure design in Stata.

    I've searched things online, however, I could not find the answers for my needs.

    Thank you very much.

    Best,
    Jane



  • #2
    Svy and xt commands do not work together. A few xt commands support some weights but not xtologit. You could consider dichotomizing and then see if the weight options available with xtlogit met your needs. I am not aware of any way to do exactly what you want.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Hi Richard,

      Thanks so much for your comments and suggestions. I am hoping that dichotomizing can make some sense and xtlogit can support pweight option. Thanks again.

      Best wishes,
      Brian

      Comment


      • #4
        I know this is a huge debate in the literature, but do you really want/need to use weights in a multivariate analysis. I rather put all the weight-specific variables as controls in my model without weighting.

        Comment


        • #5
          Hi Julian,

          Could you be more specific or give some real examples of your approach?

          Comment


          • #6
            Dear Colleagues,

            I just learned that xtgee might be able to take care of my needs of a panel logistic regression (or panel ordered logistic regression) with the pweight capability. Can someone kindly show me some examples or steps to select the correct family, link, correlation matrix, and pweight (for the above dataset and desired analysis)?

            Thank you very much.

            Best,
            Jane

            Comment


            • #7
              The purpose of including design-weights into a model is to adjust for the sampling probability of the respondent. Those weights (in the social science world) are mostly created by comparing the sample to a census. So for example if you oversample females and undersample less educated people you want to adjust for that in you model. But, if you include sex and education in your regression model as a control anyway, there is no need to include the design-weight.

              For multilevel analysis the second edition of Snijeder & Bosker (2012, p.216-244) "Multilevel Analysis" is a good reference.

              I hope that helps.

              Comment


              • #8
                First off, I would disagree with Julian. There usually are multiple dimensions of complex survey designs, which include unequal weighting, clustering and stratification. Jane mentioned unequal weights, but I would bet that unequal weights came from a more complex design that simply unequal probabilities. Including design variables as controls is rarely fully feasible, and can rarely be done with sufficient level of complexity to produce estimates that would work well (in the sense of being consistent under repeated sampling with a complex design). For instance, if you want to control for stratification, then design-consistent estimation of the means of an arbitrary y-variable can be achieved, more or less, with regress y i.stratum, robust for simple random sampling designs, regress y i.stratum, cluster(psu) for epsem cluster designs (rare beasts, I have to say) or with regress y i.stratum full list of calibration variables when raking/calibration is being used for non-response adjustments in otherwise simple random sampling designs. The analyst/researcher can quickly get lost in what is it exactly that they need to do for a problem as simple as estimation of the means. If you have something as complicated as xtologit that you want to run, you have to stick these strata variables everywhere -- as interactions with regressors, and as explanatory variables for thresholds and random effects. I am sure one will run into identifiability issues with that, and would have to start excluding stratification variables -- which means you are making implicit assumptions that some aspects of the model are equivalent between strata. I am not at all sure I know how to include the weighting variables to make estimation design-consistent, either. And, finally, the ultimate user may not have access to all of the sampling design variables, as they are often irrelevant for the substantive analyses, and not included in the publicly available survey microdata. So bottom line, I do not believe much in inclusion of the sample design and weighting variables into the model.

                Going back to Jane's original question, I would run this model using gllamm that supports multilevel weights. Jane, however, would need to figure out how to scale her weights, which is touched upon in Stata 13 mixed command (see help mixed##scale_method); and how to translate her sampling design into the appropriate scale of weights. She would also have to lose stratification. So in the end, she will have something like

                Code:
                egen pw2 = mean( myPweightVariable / number_of_level1_units_in_a_given_level2_unit ), by(ID)
                gen pw1 = myPweightVariable / pw1
                gllamm  Y Year Treatment Y*Tx age gender x3 x4 i.myStratumVariable, i(ID) family(binomial) link(ologit) cluster(myPSUvariable) pweight(pw)
                The first two lines assume a simple random sample of the level 1 units within a level 2 unit. If Jane's level 1 units are repeated observations over the same individual, then something like

                Code:
                gen pw1 = 1
                gen pw2 = myPweightVariable
                will be more appropriate. She will have to carefully figure out what to do here.

                HTH.
                -- Stas Kolenikov || http://stas.kolenikov.name
                -- Principal Survey Scientist, Abt SRBI
                -- Opinions stated in this post are mine only

                Comment


                • #9
                  For a recent discussion by economists/econometricians of the use of weights, see "What are we weighting for?", by Gary Solon, Steven J. Haider, Jeffrey Wooldridge, NBER Working Paper No. 18859, http://www.nber.org/papers/w18859 (and the references therein), November 2013. Quantitative sociologists are fond of referring to Winship and Graybill's article "Sampling weights and regression analysis", Sociological Methods and Research, 23(2), November 1994, 230-257. As I recall, neither set of authors explicitly considers the complex complex survey design context to which Stas refers.

                  Comment


                  • #10
                    Is there some inherent reason svy and xt don't play well together? Is it just incredibly complicated or is it totally impossible?

                    For xt commands that do support weights, the help usually says "Weights must be constant within panel." I would think that would often be a problem, as the sampling scheme might vary from one wave to the next.
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    StataNow Version: 19.5 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment


                    • #11
                      Hi Stas,

                      Thank you so much for providing your sincere thoughts and codes. I'd like to say more about my dataset and design. My dataset is from a nationally representative dataset that uses Multi-stage Area Probability Sample Design (of household). My predictors for x3, x4, x5... are mainly demographic variables such as year of education, household income (dummies vs. 1st quartile), etc. This is a repeated measure measured in two different time points (2000 [coded:0] and 2003 [coded:1] ).

                      I am still confused about the some parts of the commands I should use. Let me ask more details in a backward order.

                      My svyset commands are (I've made the weighting variables consistent in 2000 and 2003):
                      svyset myPSU [pweight=myPweightVariable], strata(myStratumVariable)

                      Code (from the above):
                      gllamm Y Year Treatment Y*Tx age gender x3 x4 i.myStratumVariable, i(ID) family(binomial) link(ologit) cluster(myPSUvariable) pweight(pw)
                      My understanding is that after I did my svyset command, I can use i.myStratumVariable and myPSUvariable above. However, I did not see pw is defined or calculated. Unlike the xtgee command, it seems to me that I don't have to select a correlation matrix for the gllamm. Please let me know my understanding is correct and how I can correctly use or define pw for pweight(pw) in the above code. You've generated pw1 and pw2, but when or where do I need to use them in the gllamm command?

                      One more thing, can gllamm produce the odds ratio just like the xtologit (i.e., xtologit y x1 x2, or)?

                      Thank you very very much. I really appreciate your kindness and help from all people who have been trying to help out.

                      Warmest regards,
                      Jane

                      Comment


                      • #12
                        Rich Williams said:
                        For xt commands that do support weights, the help usually says "Weights must be constant within panel." I would think that would often be a problem, as the sampling scheme might vary from one wave to the next.
                        If you're doing analysis with a 'longitudinal sample', e.g. a panel of T waves long, then there's often available a set of "longitudinal" weights for wave T. This is the single set of weights that you'd apply in your panel data analysis using the xt suite - and the weights are constant by construction. (For example, the British Household Panel Survey released a new set of such longitudinal weights every time it released a new wave of data. These weights are different from the BHPS 'cross-sectional' weights -- there's a set of these for each and every cross-section.) Whether this sort of longitudinal weights are what one really wants in all panel applications is doubtful, though, because they often refer to a balanced sample of individuals present since the initial wave. And, yet, we often want to use unbalanced panels, or to pool pairs of transitions from throughout the length of the panel, and so on. (The deeper issue is: what is the population one is trying to make inferences about.)

                        Comment


                        • #13
                          Hi Stas,

                          After I generated pw1=1 and pw2=myPweightVariable (in the pre-post long form setting), I ran the following code:

                          gllamm Y Year Treatment Y*Tx age gender x3 x4 i.myStratumVariable, i(ID) family(binomial) link(ologit) cluster(myPSUvariable) pweight(pw)
                          Stata returned an error message:
                          factor variables and time=series operators not allowed.

                          Please let me know if you see where I did wrong. Thanks a lot.


                          Best,
                          Jane

                          Comment


                          • #14
                            Is it the "Y*Tx" that is causing problems? What predictors are you trying to refer to with the wildcard "*"? If -- I'm guessing -- you were trying to refer to an interaction between Y and Tx, this won't work. I think you may have to manually calculate it before fitting the model, unless gllamm now accepts factor variable notation. If it does not, your use of the "I." prefix will also cause problems. [PS it helps to also report the precise error code that Stata reports.]

                            Comment


                            • #15
                              Dear Stephen, Stas, Richard, and all other kind colleagues,

                              Thanks so much for all your insights. Y*Tx here is just for the convenience of the expression. I did generate an interaction term myself (using gen _interaction = Y * Tx). Thanks for bring that up.
                              I directly copied the error message from Stata and I thought that "pw" that is inside the parameter field for pweight could cause the problem. The error message is "factor variables and time=series operators not allowed." Thanks a lot.



                              Best regards,
                              Jane

                              Comment

                              Working...
                              X