Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • logit with fixed effects - almost all observations dropped

    Hey everyone,

    I have a problem with a logitig regression analysis. This is the first time that I work with stata so maybe this is a quite easy question.

    My research question is which variables have an impact on the preferences for small or large companies of students after their graduation. i have a panel dataset. the survey was conducted 5 times.

    i identified several characteristics that describe small (UG_W = 0) and large (UG_W = 1) companies. these attributes are career possibilities, solcial relationships and compensation.
    My regression looks as follows:

    . xtlogit UG_W career_pos_W social_rel_W high_compensation_W, fe

    when i run the regression almost all of my observations are dropped and I am left with 31 observations, which is obviously not enough.
    stata says:

    note: multiple positive outcomes within groups encountered.
    note: 130 groups (301 obs) dropped because of all positive or all negative outcomes.

    what exactly does this mean. how can i solve this problem? which possibilities do i have?

    thank you!

  • #2
    Technically Stata is saying that 301 of your panel units (students, I guess from here) either prefer a small or a large company at every occasion/repeated measurement (interview, I would guess). In a fixed-effects logistic regression model, you cannot use observations that have no variation on y (i.e. your left-hand side variable/outcome/response/dependent variable ...).

    If the problem is not due to coding errors, your options seem to be a random-effects model, or a linear probability model. But you may want to tell us a little bit more about your data, i.e. what are the panel units, what are the occasions, etc.

    Best
    Daniel

    Comment


    • #3
      You may be stuck, unable to run the models you want. Fixed-effects models depend on there being variation within each higher-level unit of analysis. If there is no variation within a company's observations (assuming that's your level two), it can't be used in the model. Realistically, when you think about it, not a whole lot of companies would exhibit variation on UG_W, since small companies tend to stay small, and large companies tend to stay large.

      For that matter, I think you have your causality backwards: are you really trying to predict company size? Or are you trying to predict things like career possibilities and social relationships with company size?

      *===========added after other comments by others=========
      Ah, now I understand your unit of analysis. Well, most students seem to have stuck with a preference for large or small companies, thus no variation. So it's the same issue I and others describe, but my comment about reversing the causality is now moot. Still can't run the model without within-student variation.
      Last edited by ben earnhart; 20 Nov 2014, 06:19.

      Comment


      • #4
        The question the students had to answer in every survey was : "which size of company would you prefer to work for after your graduation". Therefore it is about the preferred size of company of the students.
        based on literature i linked the attributes to company size and in the next step i want to analyse whether the students link these attributes with the "right" company size

        Comment


        • #5
          based on literature i linked the attributes to company size and in the next step i want to analyse whether the students link these attributes with the "right" company size
          Sorry, I do not fully follow this. The students where only asked to state which company size they would prefer? How can you "link" this to attributes the students did not rate?

          Best
          Daniel

          Comment


          • #6
            they did rate the attributes but seperatedly from company size. so i know which attributes are important to themn and i know which company size they prefer. i want to know if my assumptions about the link of the attributes to the company size are supported.

            Comment


            • #7
              Ok, so picking on your example " career_pos_W " isthe students rating of how important career possibilities are? In that case your mode seems adequate. Unfortunately, my explanation still holds. If there is no coding error, you cannot use the fixed-effects logit model.

              Best
              Daniel

              Comment


              • #8
                i am sure that there is no coding error.
                so i use a random effects model.

                logit UG_W career_pos_W social_rel_W high_compensation_W

                and as there is no variation in y it is unnecessary to add variables like age or sex.

                Comment


                • #9
                  and as there is no variation in y it is unnecessary to add variables like age or sex.
                  Exactly the opposite. As (almost) all the variation comes from between-student comparisons, you want to make sure to control for all the factors that vary between students (like age and sex).

                  btw. also note that logit more corresponds to a pooled-model. The RE model would be xtlogit ,re and if use the former, you want to make sure to correct he standard errors for clustering within students.

                  Best
                  Daniel
                  Last edited by daniel klein; 20 Nov 2014, 06:47.

                  Comment


                  • #10
                    Paul Allison has an excellent and inexpensive book on fixed effects regression models:

                    http://www.amazon.com/Effects-Regres...dp/0761924973/

                    He has good discussions of the merits of fixed effects vs random effects.
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    Stata Version: 17.0 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment


                    • #11
                      So if I wanted to use a pooled model and since there is no variation in y would it be a reasonable alternative to convert the panel data to a cross sectional dataset and run a regression on that?

                      how would i test for autocorrelation with a logit function? "estat dwatson"?

                      Comment


                      • #12
                        how can i convert paneldata to cross sectional data?

                        Comment


                        • #13
                          Katharina:
                          why do you prefer a pooled -logit- model to -xtlogit, re-, as Daniel suggested?
                          Kind regards,
                          Carlo
                          (Stata 18.0 SE)

                          Comment


                          • #14
                            as i gave my question/ hyp. some thought i realized that i simply want to know if the students link the attributes to the "right" size of the company. At that point (as this is only one of many questions) i am not interested in how it changes over time but are the theoretical links supported by the data.

                            Comment


                            • #15
                              Katharina:
                              admittedly, I do not follow you in full.
                              You seem to have repeated measurements (5 times) on the same sample of students with a binary dependent variable; hence a panel model would be the right choice.
                              As within variation among students is null (i.e., students' answers do not vary across the 5 measurements), you will run out of luck with a -xtlogit, fe- specification; hence a -xtlogit, re- or a pooled -logit, vce(cluster idstudent)- would be worth trying.
                              However, the latter choice is possibly different from -xtlogit, re- specification.
                              You can assess if this holds for your model by observing the result of the likelihood-ratio test that appears as a footnote of -xtlogit, re- output table.
                              You may want to take a look at -help xtlogit- and related entry in Stata 13.1 .pdf manual.
                              Kind regards,
                              Carlo
                              (Stata 18.0 SE)

                              Comment

                              Working...
                              X