Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Country and Survey Wave fixed effects logistic regression model

    I am trying to run both a country and survey wave fixed-effects logistic regression. I have 25 different countries and 3 survey waves, 1 dependent variable, and a few control variables. When I try to set the data as panel data in order to run the FE logit, I am told that there are "repeated time values within panel r(451);"

    Then, when I try to run "xtlogit outcomevariable independentvariable control variable(s) i.wave i.country, fe" it doesn't work. I can run just a normal logit and get results "logit outcomevariable independentvariable control variable(s) i.wave i.country" but I am not sure if those results would be as valid/mean the same thing.

    These are not the same individuals across each wave so I'm not sure if that is the problem with running this type of model? Do I need to run another model like and OLS pooled regression with country fixed effects?

  • #2
    I am told that there are "repeated time values within panel r(451);"
    Well, the message is self-explanatory. You are presumably issuing a command like -xtset country wave- and Stata is finding that for a given country and wave you have more than one observation. So the question is whether this means there is something wrong with your data, or whether you misunderstand your data. The first thing I would do is find out what these offending observations are:

    Code:
    duplicates tag country wave, gen(flag)
    browse if flag
    That will show them to you. Then you can decide whether they are supposed to be there are not. If they are not supposed to be there, then you need to figure out how to get it down to one observation per country per wave. That might mean picking out one for each country-wave combination that is the correct one. It might mean combining them in some way, such as averaging the values of the other variables, or something like that.

    The other major possibility is that those data are supposed to be there and you shouldn't be trying to -xtset country wave-. Do you really need to include wave in the -xtset- command? You only need that if you are going to estimate autoregressive structure, or use lag, lead, and difference operations. But none of those things exist when you have more than one observation for any country-wave combination, so that means you need to rethink your analysis plans. If all you want to do is carry out a fixed-effects regression, with country as the fixed effect, then you only need to -xtset- country-. After that you can run -xtreg, fe- and you won't have a problem.

    Another question arises: assuming that your data are indeed supposed to contain multiple observations per country-wave combination, what do those observations represent? Are they households? or individuals? or firms? or industries? Or what? Perhaps what you really have is not panel data but a three-level data set, in which case using a two-level model may be inappropriate.

    So, a lot of questions raised. Look into these, and if you need further advice, post back. If you do so, it would be most helpful if you include an example of your data (using the -dataex- command: read the Forum FAQ #12 for more information about -dataex-). And if you are trying code that "does not work" you need to show the actual code and the actual response you got from Stata. After all, there are many, many ways in which a command might not produce what you are looking for, and if you don't show what actually happened, it's anybody's guess what might have gone wrong.

    Be all of that as it may, you should not use the results of a plain -logit- with i.country and i.wave in it: that is not equivalent to a fixed effects logistic regression and its results are going to be biased. (That equivalence applies only to linear regression, not logistic or other non-linear models.)

    Comment


    • #3
      Thank you for your detailed response!

      Let me explain a little bit more about the data. For wave one, for example, there are multiple (i.e. over a thousand for some) observations per country. This is the case for each wave. So wave 1 has 1,460 observations for Austria, wave 2 has 1,522 observations for Austria, and wave 3 has 1,510. Each of these observations represents an individual survey response from the European Values Survey. The respondents are different from each wave but are nationally representative. So based on what you said, sounds like I need to change my analysis plans and this might be a two-level model? Not sure what that is, however.

      I'll also explain why I want to do both country and wave fixed effects. The country fixed effects is to control for any country-specific omitted variables (eg, "culture" of a country) that are constant over time, and the panel fixed effects are there to deal with country-invariant trends over time, eg global trends towards more equality for women, general economic trends.


      I am using Stata/MP 15.0 for Mac (64-bit Intel).
      Here is an example of my data:
      xtlogit scarceoutcome quota age female labor parliament townsize gni i.country i.wave,
      > fe
      note: multiple positive outcomes within groups encountered.
      note: 56.country omitted because of no within-group variance.
      note: 100.country omitted because of no within-group variance.
      note: 203.country omitted because of no within-group variance.
      note: 208.country omitted because of no within-group variance.
      note: 246.country omitted because of no within-group variance.
      note: 250.country omitted because of no within-group variance.
      note: 276.country omitted because of no within-group variance.
      note: 348.country omitted because of no within-group variance.
      note: 352.country omitted because of no within-group variance.
      note: 372.country omitted because of no within-group variance.
      note: 380.country omitted because of no within-group variance.
      note: 428.country omitted because of no within-group variance.
      note: 440.country omitted because of no within-group variance.
      note: 470.country omitted because of no within-group variance.
      note: 528.country omitted because of no within-group variance.
      note: 616.country omitted because of no within-group variance.
      note: 620.country omitted because of no within-group variance.
      note: 642.country omitted because of no within-group variance.
      note: 703.country omitted because of no within-group variance.
      note: 705.country omitted because of no within-group variance.
      note: 724.country omitted because of no within-group variance.
      note: 752.country omitted because of no within-group variance.
      note: 826.country omitted because of no within-group variance.
      4,573 (group size) take 2,705 (# positives) combinations results in numeric overflow;
      computations cannot proceed
      r(1400);

      Comment


      • #4
        Update - not sure if this is at all correct but I tried to run a multi-level mixed-effects logistic regression using "melogit scarceoutcome quota age labor parliament i.wave i.country" and the model successfully produced an output. Does this sound like what you had in mind with a two-level model?

        Comment


        • #5
          Code:
          melogit scarceoutcome quota age labor parliament i.wave i.country
          is not exactly what I had in mind, but it is a move in that direction.

          Since your waves consist of different respondents, it is not the nested design I thought it might be. In any case, although the code you suggested here uses -melogit-, it only specifies a one level model. So I think something more like

          Code:
          melogit scarceoutcome quota age labor parliament i.wave || country:
          (If labor or parliament is a categorical variable, you should prefix it with i., just like wave.)

          Comment


          • #6
            All of that said, I'm not sure why
            Code:
            xtset country
            xtlogit scarceoutcome quota age labor parliament i.wave, fe
            wouldn't work. That would be a fixed-effects logistic model, rather than a random effects model. So you will lose some fraction of your sample due to issues of colinearity or lack of outcome variation. And its estimates would be purely within-country effect estimates. -melogit- is a random effects model and it will be able to use more of the data and is more efficient. Its results are a blend of within-country and between-country effects. Which approach is better depends on your specific research goals.

            Comment


            • #7

              So the overall goal of my research question is to see how responses to the DV change over time in the quota vs. non-quota countries. The idea behind this type of model (generalization of difference in difference) was that it compares average attitudes postquota minus attitudes prequota in the treated countries to the change in attitudes in the control countries over the same period. I was hoping that the results would be interpreted as within-unit (country) changes, ie the link between quotas and attitude change within quota countries. When I ran the "melogit scarceoutcome quota age labor parliament i.wave || country:" I wasn't able to see the individual country changes.

              Also, when I run

              xtset country
              xtlogit scarceoutcome quota age labor parliament i.wave, fe

              I get the following error message:
              . xtset country
              panel variable: country (unbalanced)

              . xtlogit scarceoutcome quota age labor parliament i.wave, fe
              note: multiple positive outcomes within groups encountered.
              4,322 (group size) take 2,319 (# positives) combinations results in numeric overflow;
              computations cannot proceed
              r(1400);

              Comment


              • #8
                Code:
                I get the following error message:
                . xtset country
                panel variable: country (unbalanced)
                This isn't an error message. It's just notifying you that the panel is unbalanced. But, unless you have some relatively unusual analysis that only works with balanced data in mind, it isn't a problem.
                Code:
                xtlogit scarceoutcome quota age labor parliament i.wave, fe
                note: multiple positive outcomes within groups encountered.
                4,322 (group size) take 2,319 (# positives) combinations results in numeric overflow;
                computations cannot proceed
                r(1400);
                This one is a problem. It's saying that the data are umanageable from the perspective of computational resources for this analysis.

                Now, from your description of your problem, and your use of the very phrase "difference in differences" I think you need to rethink your model in any case. But there are some key aspects of your data that are not clear yet. It is unclear whether the quota countries have a quota in all three waves of the survey, or if one or two of the waves are pre-quota adoption. It is unclear whether the variable quota indicates that a quota is in effect for that country in that year, or whether it indicates that this is one of the countries that gets a quota, though not necessarily this year. The choice of analysis would depend on these things.

                Comment


                • #9
                  Clyde Schechter Dear Clyde,

                  I am running the model to analyze the effect of different individual characteristics including generational categorical variable on charitable giving and similarly to Hannah my dataset is 2 wave from World Value Survey. I want to have country and wave FE/RE.

                  Following your recommendation I run the model using melogit:

                  Code:
                   melogit charity_participation i.generation sex i.income_class i.educ_level happiness char_conf i.marital_status i.religion i.S002(wave) || S003:(country)
                  and I have got pretty good results.

                  My Question:

                  Can I use command margins after melogit to predict coefficient for generational groups?

                  When I am trying to run the fixed effect model:

                  Code:
                  xtset S002
                  xtlogit charity_participation i.generation sex i.income_class i.educ_level happiness char_conf marital_status i.religion i.S003, fe
                  I am receiving the same error message as Hannah:
                  note: multiple positive outcomes within groups encountered.
                  4,874 (group size) take 1,541 (# positives) combinations results in numeric overflow; computations cannot proceed
                  r(1400);
                  Can you suggest: if I use the model does it mean that I have only random effect? As this model provides me with some output why you didn't suggest it and suggested using with ,fe
                  am I missing something if I run the model without ,fe?
                  Code:
                  xtset S002
                  xtlogit charity_participation i.generation sex i.income_class i.educ_level happiness char_conf marital_status i.religion i.S003

                  Thanks,
                  Anna

                  Comment


                  • #10
                    Can I use command margins after melogit to predict coefficient for generational groups?
                    I'm not sure what you have in mind here. -margins- does not predict coefficients. It calculates predicted values and it calculates marginal effects. Marginal effects are, in a sense, similar to coefficients (and in the case of linear models are equivalent to them) but are not exactly that. In any case if you want to see the predicted level of charity participation in each generation, yes you can get that with:

                    Code:
                    margins generation
                    If you want to see the average marginal effect in each generation of, say, happiness, on charity participation:
                    Code:
                    margins generation, dydx(happiness)
                    If you want to see the marginal effect in each generation of happiness conditional on specific values of other model variables, just add the corresponding -at()- option to the immediately preceding code. -help margins-.

                    Code:
                    xtset S002
                    xtlogit charity_participation i.generation sex i.income_class i.educ_level happiness char_conf marital_status i.religion i.S003
                    gives you a random effects model with random intercepts at the S002 level. Since S002 is the wave variable and it only has 2 levels in your data, this model doesn't really make much sense. Two levels are simply not an adequate sample of wave-space to support modeling random effects at that level.

                    As for the model that is giving you numerical overflow problems, I don't see any good way around that for you. I also don't quite get what you are trying to do with that model. Why you would use country as an unconditional fixed effect and then condition the analysis on wave fixed effects (where, again, there are only two waves) eludes me. I'd be much more inclined to condition on the country variable and include an unconditional wave effect
                    Code:
                    xtset S003
                    xtlogit charity_participation ... i.wave, fe
                    This is a somewhat different approach to modeling, but I think it's more sensible in its own right, and I would prefer it even if there were no computational difficulties with the other.

                    Comment


                    • #11
                      Clyde Schechter Dear Clyde,

                      Thank you for your detailed clarification regarding -margins- command.

                      Regarding random effect model, sorry, it was a typo:

                      I have definitely meant :
                      Code:
                       xtset S003 (country)
                      xtlogit charity_participation i.generation sex i.income_class i.educ_level happiness char_conf marital_status i.religion i.S002 (wave), fe
                      and as a result I got the error message:
                      note: multiple positive outcomes within groups encountered.
                      4,874 (group size) take 1,541 (# positives) combinations results in numeric overflow; computations cannot proceed
                      r(1400);
                      For this reason I was wondering if I run the model without fe (which gives me some output):
                      Code:
                      xtset S003 (country)
                      xtlogit charity_participation i.generation sex i.income_class i.educ_level happiness char_conf marital_status i.religion i.S002 (wave)
                      Does this model then make any sense? As I assume that my model with -melogit- and model which I tried to run -xtset country - -xtlogit...i.wave, fe- can be interpreted differently I wanted to include both models but as you can notice I cannot receive any appropriate results from random model. That is why I would be very appreciate if you can advice if I can focus only on melogit model results or I can do smth with rundom model to get some output.

                      Thanks!

                      BR,
                      Anna
                      Last edited by Anna Petrova; 08 Feb 2019, 11:58.

                      Comment


                      • #12
                        Code:
                        xtset S003 (country)
                        xtlogit charity_participation i.generation sex i.income_class i.educ_level happiness char_conf marital_status i.religion i.S002 (wave)
                        is, to be clear, a model with random country effects and fixed wave effects. It is, in fact, the exact same model as your -melogit- model. The results may differ slightly due to different numerical algorithms used to estimate the coefficients, but they are otherwise identical.

                        While I understand your desire to also do a model with fixed country effects, evidently that is not possible with your data. As pointed out in https://www.statalist.org/forums/for...sage-in-clogit, you probably won't have any better luck with any other statistical package because the estimation requires the calculation of a number that cannot be represented even in quadruple precision.

                        Comment


                        • #13
                          Thank you for your answer!

                          Just to be clear, I think I have some blind spots regarding the difference between country random and country fixed effect...Do they interpretation absolutely different?
                          If I keep the model with random country effect how I can interpret it then?
                          Last edited by Anna Petrova; 08 Feb 2019, 12:22.

                          Comment


                          • #14
                            Yes, the fixed and random effects models are quite different. I will highlight some of the differences here, but really you should refer to a good statistics or econometrics textbook for a deeper understanding than can be given in a short post.

                            The fixed-effects estimators estimate within-panel effects. So a fixed effects model in your situation would answer questions like "among the people in a given country, how strongly is education level associated with charity participation?" Fixed effects models cannot estimate the effects of any variable that is constant within panels. So, for example, if you were to try to include some time-invariant country-level variable in a fixed-effects model, such as the official language of the country, that variable would be automatically omitted by Stata (or any other statistical package) because it cannot be estimated. And, because the estimation of the model requires conditioning on the panel fixed effects themselves, these models cannot estimate the actual probability of charity participation in each country.

                            The random-effects model is rather different. It does not estimate a within-panel effect. Its results reflect an assumption that the within- and between- panel effects of a variable are the same and gives the estimate of that common effect.* Because this assumption isn't always true, random-effects models can be misleading. Random-effects models also do not always provide consistent estimates for their parameters; they do so only if the errors terms are independent of all the covariates in the model. On the positive side, random effects models have no difficulty with variables that are constant within panel, so you would be able to explore the effects of variables like the country's official language, and there is no problem estimating the predicted probability of charity participation in each country. Random effects models are also more flexible in that it is possible to create multi-level random effects models, whereas fixed-effects models allow only 2 levels.

                            *For example, there are many things that are different about married people and unmarried people, but getting married or divorced doesn't necessarily change them within person. To see a clear example of data where the within- and between- panel effects are very different, indeed, go in opposite directions, run this:

                            Code:
                            clear
                            set obs 5
                            gen panel_id = _n
                            expand 2
                            
                            set seed 1234
                            by panel_id , sort: gen y = 4*panel_id - _n + 3 + rnormal(0, 0.5)
                            by panel_id: gen x = panel_id + _n
                            
                            xtset panel_id 
                            
                            xtreg y x, fe
                            xtreg y x, re
                            
                            //    GRAPH THE DATA TO SHOW WHAT'S HAPPENING
                            separate y, by(panel_id)
                            
                            graph twoway connect y? x || lfit y x
                            and study the -xtreg, fe- and -xtreg, re- outputs as well as the graph.

                            Comment


                            • #15
                              Dear Clyde, your answer is very helpful.

                              As I can see that of course for me make more sense to have FE... Interestingly that when I run the model
                              Code:
                              xtset country
                              xtlogit y x1 x2..., fe
                              using only the dataset from one wave I have received the good outcome.

                              But including more waves leads to error message with positive outcomes within groups.


                              BR,
                              Anna

                              Comment

                              Working...
                              X