Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with variables for logit model

    Hi there everyone!
    I am a beginner at using Stata software and I am supposed to prepare a logit model, which indicates how satisfaction on life and work experiences ifluences the willingness to migrate to another country. The probelm is, that my explanatory variables are numbered from 1 to 8, where ( 1 stands for utterly satisfied, 2 means slightly satisfied, 3 neutral attitude, 4 slightly dissatisfied, 5 dissatisfied, 6 completely disappointed, 7 means i don't have opininon and 8 stands for missing observation). My problem is that I want to substitute values form 1 to 3 with simply 1 and values 4, 5 and 6 substitute with 0 and get rid of observations with values 7 and 8. When I am able to do that, I will have variables perfect for logit model ( 0,1 variables). How am I supposed to do that? I do not even know what kind of command should I use, so I would be really greatful if someone could give me any clue.

  • #2
    It sounds like you want to use satisfaction as a variable that explains whether or not someone migrates, i.e. migration is the explained/dependent/response/left-hand-side/y-variable and satisfaction explanatory/independent/right-hand-side/x-variable. In that case you don't need to make that variable binary (0,1) to run a logistic regression. Actualy, you probably don't want to do that, as that way you are throwing away a lot of information.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      I agree with Maarten. If 7 and 8 are missing data codes, you should change them via something like

      replace var1 = . if var1 >= 7

      or else maybe

      replace var1 = .a if var1 == 7
      replace var1 = .b if var1 == 8

      If you really really really really want to stick with your original plan, do -help recode- .
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Of course I do not want to lose any information, so it is really great that there is a way to run logistic model without having to transform them into binary variables. However, I still encounter some problem. Maybe it will be best if I present my model utterly:

        - dependent variable ( willingness to leave mother country in order to get the job) - binary variable ( 1 stands for "yes", 0 stands for "no");
        - independent variables ( satisfaction from your life, satisfaction from the situation in your country, satisfaction from your income) - scale form 1 to 8 as presented in the first post;
        - however I have additional variables of diffrent types or at least ranges of answer - for example ( income and there could be a number such as 5000$, ) bonds with your family ( range from 1 to 4), type of household with range from 1 to 5 ( where 1 stands for single man/woman, 5 stands for faimly with 3 kids);


        So the problem is, how do I "tell" Stata, that, there are lot of different types of variables and ranges, which should be treated differently ( because I assume, that I should somehow).Moreover I want to get rid of observation, where dependent variable have no response( this question in survey was not answered). I have to prepare this model and a raport based on it for my studies, however this is my first contact with STATA, so I am completely lost. I hardly managed to import data from excel using command line ( I think that it reflects my lack of experience). Hereby I kindly ask: is ist really hard to do such an operation in STATA, and if it is possible, what command should I use to get logistic model in order to analyse the data?

        Comment


        • #5
          By default, Stata will do listwise deletion of missing data. So, make sure that missing data has codes like ., or .a, .b, etc. I showed you how to do that above

          By default, Stata is going to treat your independent variables as continuous. If that is inappropriate, you probably want to break the variables up into dummies. So, for example, with household type, you might want to do something like

          logit y income i.household

          Sometimes categorical variables are ordinal and it is debatable whether they should be treated as continuous or categorical. You can do formal tests like

          logit y income i.household c.household
          testparm i.household

          If the test stat is insignificant that is a sign you can go ahead and treat the variable as continuous.

          I suspect your problem is not so much Stata as it is understanding exactly how logistic regression works. i.e. questions about what the independent variables can be like go well beyond Stata or logistic regression. I would suggest checking out out some introductory stats texts. Or even the Stata help for various commands.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Ok, pretty understandable, thank you very much for your kindness and patience. Indeed, I have many problems with understanding how logistic regression works, actually I barely begun my studies on econometrics, however I am eager to work. Have I understood corectly? STATA, will by default drop the empty observations??? P.S Just one more question : When I want to implemet the code that you have proposed( replace var1 = . if var1 >= 7
            ), I get the "type mismatch" error. In my case I typed in : (replace sat_fin= . if sat_fin=7) and (replace sat_fin= .if sat_fin=-8). Sorry for previous mistake, -8 stands for missing observation. In brackets I wrote down exactly what I' ve typed in STATA

            Comment


            • #7
              Note that you need 2 equals signs, i.e. the code should be

              Code:
              replace sat_fin= . if sat_fin==7
              replace sat_fin= . if sat_fin==-8
              The type mismatch error might also indicate that you have string variables that you are trying to treat as numeric. See -help encode- for how to fix that.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Ok, that helped a lot. It turned out that the main problem were string variables, but i managed to transform them into numeric. Your ansewrs have cleared a lot to me and everything is now more comprehensible. However, as I think, I understand a little bit now how things work I have another ( maybe stupid question). If I run logistic model and do not recode variables to binary ones, wouldn't it give false estimation of coefficients? To be more specific : 1 which stands for completely satisfied would mean than it is six times worse than 6 which stands for completely dissatisfied. Am I thinking correct? Or I can simply run code like: logit var1(dependent variable) var2 and so on and everything will be ok?

                Comment


                • #9
                  logit wyjazd1 sat_kraj1 sat_fin1 sat_perspektywy1 zarobki1 liczba_rodzina1 wiezi1


                  I' VE TRIED TO RUN SUCH A CODE AS MENTIONED ABOVE AND GOT FOLLOWING ERROR COMMUNICATE:

                  outcome does not vary; remember:
                  0 = negative outcome,
                  all other nonmissing values = positive outcome
                  r(2000);

                  What does it mean? Is something wrong again with variables?

                  Comment


                  • #10
                    wyjazd1 should be coded as 0 or 1. If, say, it is coded as 1 or 2, you will get errors like what you are seeing.

                    At least, 0/1 is the common coding. As the error message implies, it could be 0 and any other nonmissing value, but 0/1 is most common.
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    StataNow Version: 19.5 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment


                    • #11
                      The error message tells you exactly what's wrong with the variable. Your outcome is wyjazd1 and it will be treated as a binary variable where zero is a negative outcome and all other values are positive outcomes (0=0; everything else=1). Currently it appears that for your regression sample wyjazd1 is constant. This may be because it's constant in your entire dataset or it may be because one of your explanatory variables is always missing when wyjazd1 takes on one value or the other. It's also possible that it's coded wrong. For example perhaps it's coded as 1=positive outcome and 2=negative outcome.

                      I would start by tabulating wyjazd1. If it's coded correctly and there's variability, I would crosstab wyjazd1 with each of the other variables in your model.

                      Comment


                      • #12
                        Yes it was coded 1 and 2, so once again thank you very much, I appreciate your patience. I fixed it. Coming back now to my previous question, if I ran my logistic model without replacing those variables with values form 1 to 6 ( 1 - completely satisifed, 6 completely dissatisfied) wouldn't I get false estimation of coefficients? You know, it would mean that 1 is six times worse than 6, but that is not comparable, because we do not even know what distance is between values 1 and 2)

                        Comment


                        • #13
                          First off, "1 is 6 times worse than 6" would only hold if the variable had ratio level measurement, i.e. a non-arbitrary 0 point. For example, it is legit to say that 6 feet is twice as much as 3 feet, but it is not legit to say that 80 degrees fahrenheit is twice as much as 40 degrees fahrenheit. It is sufficient for variables to have interval-level measurement, which would mean that the distance between 1 and 2 is, say, the same as the distance between 5 and 6.

                          It may be debatable whether your variable has interval-level measurement. With Likert scales, the assumption may be more or less reasonable, but with other types of ordinal coding it may be highly dubious, e.g. 0 = no income, 1 = $1 to $500, 2 = $501 - $1500, 3 = $1501 to $5,000, etc.

                          I already addressed this issue earlier though when I said

                          Sometimes categorical variables are ordinal and it is debatable whether they should be treated as continuous or categorical. You can do formal tests like

                          logit y income i.household c.household
                          testparm i.household

                          If the test stat is insignificant that is a sign you can go ahead and treat the variable as continuous.
                          If you have a lot of ordinal independent variables your model can get really cluttered if you treat them all as categorical, plus you lose the advantage of the ordering of the data. The parsimony of having a simpler model is often worth a trivial violation of assumptions.
                          -------------------------------------------
                          Richard Williams, Notre Dame Dept of Sociology
                          StataNow Version: 19.5 MP (2 processor)

                          EMAIL: [email protected]
                          WWW: https://www3.nd.edu/~rwilliam

                          Comment

                          Working...
                          X