Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic regression with Dummy variables

    Dear people,

    I'm lookng at Worldvalues survey data on Ukraine. My hypothesis is that there is causality between different variables in the survey and with survey question Q151: "will you be willing to fight for your country?". "Willingness" therefore will be my dependent variable, it's operationalized into 5 values: Yes, No, Don't Know, No answer and Missing.
    The other independent variables I would like to use are mostly categorical, but also numeric (region, age, place of birth and whether Security is more important than Freedom). I believe, in order to make a regression, it should be under a logistic one. And to do so I should change the dependent variable into binary and the dependant variables should be changed into dummy variables.
    My question is whether a logistic regression is possible in this case and if Dummy variables are going to be useful. I'm kind of new in this field and help would be appreciated. Let me know if there is more information needed.

    Thank you very much and looking forward your answers.



    PS: the data was extracted from this page:

    https://www.worldvaluessurvey.org/WV...ntationWV7.jsp

    And the codes of the variables can be found here:

    https://www.worldvaluessurvey.org/WV...ntationWV7.jsp

    under WVS 7 Codebook Variable Report.
    Last edited by Eduardo Jurado; 28 Nov 2022, 04:25.

  • #2
    Eduardo:
    I'd take a look at -mlogit- and -svy estimation-.
    In addition, as per FAQ please do not post screenshots. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Eduardo:
      I'd take a look at -mlogit- and -svy estimation-.
      In addition, as per FAQ please do not post screenshots. Thanks.
      Thank you very much for the your insight, I'll look right into it and let you know how it goes.


      King regards,

      Eduardo



      PS: Sorry for the screenshots. They have been removed.

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Eduardo:
        I'd take a look at -mlogit- and -svy estimation-.
        In addition, as per FAQ please do not post screenshots. Thanks.
        Dear Carlo,

        I've changed my dependant variable "willingness to fight for your country" to only contain "0" and "1", where "0" is "no" and "1" is "yes" so I can run the logit binarial regression.
        The independant variables that I've used are:
        • regions: converted into dummy variables with the code "gen".
        • free_sec: "preference between freedom and security". I changed the variable to contain only "0" "1" and "2" (Dont know, freedom, security, respectively)
        • sex: "0" male and "1" female.
        • language: "language spoken at home" codified as "0 - Other", "1 - Russian" and "2 - Ukrainian".
        I've run the -logit- code and got a result but I have a few questions.
        • Regarding the "region" variable converted into dummy variables. When running the -logit- code with the variable "regions*" (note the *) I get a message saying "note: regions8 omitted because of collinearity". I understand this is a dummy trap but, would it make a difference in the final result if I exclude other "regions" dummy variable such as "regions1", for example?
        • Also, regarding the reference variable for the regression. I chose sex "0" as the base variable. Does that make sense?
        • In general I'm pretty new with STATA. do you see any flaws in the methodology?


        Code:
        . keep N_REGION_WVS Q150 Q151 Q260 Q272 Q266 Q289CS9 Q289
        
        . rename (N_REGION_WVS Q150 Q151 Q260 Q266 Q272 Q289 Q289CS9) (regions free_sec willing sex c_birth language rel rel_d)
        
        . replace willing=. if willing==-1
        (212 real changes made, 212 to missing)
         
        . replace willing=. if willing==-2
        (12 real changes made, 12 to missing)
        
        . replace willing=0 if willing==2
        (323 real changes made)
        
        . tab willing
        
               Willingness to fight for country |      Freq.     Percent        Cum.
        ----------------------------------------+-----------------------------------
                                              0 |        323       30.33       30.33
                                              1 |        742       69.67      100.00
        ----------------------------------------+-----------------------------------
                                          Total |      1,065      100.00
        
        
        . replace sex=sex-1
        (1,289 real changes made)
        
        . tab sex
        
                                            Sex |      Freq.     Percent        Cum.
        ----------------------------------------+-----------------------------------
                                              0 |        524       40.65       40.65
                                              1 |        765       59.35      100.00
        ----------------------------------------+-----------------------------------
                                          Total |      1,289      100.00
        
        . replace language=. if language==-1
        (2 real changes made, 2 to missing)
         
        . replace language= 0 if language==9000
        (15 real changes made)
        
        . replace language= 1 if language== 3630
        (505 real changes made)
        
        . replace language= 2 if language== 4410
        (767 real changes made)
        
        
        . tab language
        
        Language at |
               home |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  0 |         15        1.17        1.17
                  1 |        505       39.24       40.40
                  2 |        767       59.60      100.00
        ------------+-----------------------------------
              Total |      1,287      100.00
        
        . tab regions, gen(regions)
        
             Region |
            country |
           specific |      Freq.     Percent        Cum.
        ------------+-----------------------------------
         UA: West 1 |        192       14.90       14.90
         UA: West 2 |        142       11.02       25.91
           UA: Kiyv |        123        9.54       35.45
          UA: South |        125        9.70       45.15
          UA: North |        140       10.86       56.01
         UA: East 1 |        238       18.46       74.48
         UA: East 2 |        147       11.40       85.88
         UA: Centre |        182       14.12      100.00
        ------------+-----------------------------------
              Total |      1,289      100.00
        
        
        . replace free_sec=. if free_sec==-2
        (4 real changes made, 4 to missing)
        
        . replace free_sec=0 if free_sec==-1
        (55 real changes made)
        
        . tab free_sec
        
        Freedom and security - |
          Which more important |      Freq.     Percent        Cum.
        -----------------------+-----------------------------------
                             0 |         55        4.28        4.28
                             1 |        385       29.96       34.24
                             2 |        845       65.76      100.00
        -----------------------+-----------------------------------
                         Total |      1,285      100.00
        
        . logit willing b0.sex free_sec language regions2 regions3 regions4 regions5 regions6 regions7 regions8
        
        Iteration 0:   log likelihood = -650.75851  
        Iteration 1:   log likelihood = -607.44338  
        Iteration 2:   log likelihood = -606.65144  
        Iteration 3:   log likelihood = -606.65034  
        Iteration 4:   log likelihood = -606.65034  
        
        Logistic regression                             Number of obs     =      1,062
                                                        LR chi2(10)       =      88.22
                                                        Prob > chi2       =     0.0000
        Log likelihood = -606.65034                     Pseudo R2         =     0.0678
        
        ------------------------------------------------------------------------------
             willing |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 sex |
                  1  |  -.7240608   .1450033    -4.99   0.000    -1.008262   -.4398596
            free_sec |  -.0037337   .1277108    -0.03   0.977    -.2540422    .2465749
            language |  -.3786902   .1760421    -2.15   0.031    -.7237264    -.033654
            regions2 |  -.1281975   .3127152    -0.41   0.682    -.7411081     .484713
            regions3 |    -.98108   .3226218    -3.04   0.002    -1.613407   -.3487529
            regions4 |  -.9396666   .3514683    -2.67   0.008    -1.628532   -.2508013
            regions5 |  -.6172889   .3030486    -2.04   0.042    -1.211253   -.0233245
            regions6 |  -1.835553    .280983    -6.53   0.000     -2.38627   -1.284837
            regions7 |  -1.260257   .3404873    -3.70   0.000      -1.9276   -.5929142
            regions8 |  -1.002767   .2728934    -3.67   0.000    -1.537628   -.4679054
               _cons |   2.810965    .464304     6.05   0.000     1.900946    3.720984
        ------------------------------------------------------------------------------
        
        
        ------------------------------------------------------------------------------
        I hope it made sense. Let me know if you need more information or explanations.


        King regards,

        Eduardo

        Comment


        • #5
          Eduardo:
          I think your approach and your take about the so called "dummy trap" are correct.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment

          Working...
          X