Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • dealing with nominal variables in OLS regression

    Dear all, I am using a national perception survey data to find out how corruption is linked with sympathy for anti government groups.
    I am using OLS regression, I also want to control for some demographics too, for ex: region and province.
    The region and province variables are nominal which have 8 and 34 unique values, respectively.

    Can I use the following command
    Code:
    reg sympathy corruption region province
    Or do I have to create dummy variables for each levels of controlling variables (region and province)?

    Any idea please.

  • #2
    Fahim:
    as -fvvarlist- does all the job on your behalf, try:
    Code:
    reg sympathy corruption i.region i.province
    Closing-out remark. as your data reveal some nesting structure (eg: provinces are probably nested within regions), you should also consider a -mixed- model instead of an OLS.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thanks a lot Carlo Lazarro,
      Region and provinces are only for example to describe my question, ofcourse i won't use both of them as controlling variable at the same : )

      When I am using
      Code:
      reg sympathy corruption i.region
      It shows the coefficients for 7 regions only and omit one region.
      Can you please tell why it happens, what is the idea behind it.

      Thanks,
      Fahim

      Comment


      • #4
        Fahim:
        the omission of one out 7 regions is correct in that it avoids the so called dummy trap (https://en.wikipedia.org/wiki/Dummy_...le_(statistics)).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          This is really usefull Carlo!
          I aplogies for asking too many questions.

          Variable label of my dependent variable is as below:

          1 no sympathy at all
          2 a little sympathy
          3 a lot of sympathy

          And the "region" variable label is:

          1 Central Kabul
          2 North
          3 South
          4 East
          5 West
          6 North West
          7 South West
          8 Central Hazarajt

          ​​​​​While i run the command i mentioned above the coefficient for all regions is positive, so in this case the region which has the lowest coefficient has the lowest sympathy for anti government groups and vise-versa, right?
          And how to know level of sympathy on the region which will be omitted by using this command.

          Really appreciate your help.

          Comment


          • #6
            Fahim:
            if your depvar is categorical and ordered, you should consider -ologit- instead of -regress-;
            q1) yes, when adjusted for the other predictors (but see my comment above);
            q2)
            Code:
            tab symphathy if region==<thenumber_indentifying_the_excluded_region>
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              My mistake, yes you are right Carlo!
              I must use ordered logistic regression, here is the result of my model.

              Code:
              . ologit sympathy gender corruption Army Police    urban_rural i.region    [aw=w]
              
              (sum of wgt is   9.3960e+03)
              Iteration 0:   log likelihood = -5727.5333  
              Iteration 1:   log likelihood = -5433.1136  
              Iteration 2:   log likelihood = -5413.1129  
              Iteration 3:   log likelihood = -5412.9249  
              Iteration 4:   log likelihood = -5412.9248  
              
              Ordered logistic regression    Number of obs     =    9,473
                  LR chi2(12)       =    536.11
                  Prob > chi2       =    0.0000
              Log likelihood = -5412.9248    Pseudo R2         =    0.0472 
              sympathy Coef. Std. Err. z P>z [95% Conf. Interval]
              gender .1598654 .0556266 2.87 0.004 .0508392 .2688916
              corruption .4048046 .0385252 10.51 0.000 .3292966 .4803127
              Army -.2192776 .046173 -4.75 0.000 -.309775 -.1287803
              Police -.2895502 .0448472 -6.46 0.000 -.3774491 -.2016513
              urban_rural -.5638942 .0776629 -7.26 0.000 -.7161107 -.4116777
              region
              East .5382542 .1075157 5.01 0.000 .3275272 .7489811
              South East .2869898 .1059743 2.71 0.007 .079284 .4946956
              South West .4224143 .1030393 4.10 0.000 .2204609 .6243677
              West .1515146 .1003236 1.51 0.131 -.045116 .3481452
              North East -.0689522 .1022639 -0.67 0.500 -.2693858 .1314815
              Central / Hazarajat -1.326957 .2825237 -4.70 0.000 -1.880694 -.7732211
              North West .0585194 .1013457 0.58 0.564 -.1401146 .2571534
              /cut1 .1259794 .1991583 -.2643638 .5163226
              /cut2 1.411804 .2008946 1.018058 1.80555

              What I actually look for is to find out how corruption linked with having sympathy for anti government groups, by controlling demographic (gender, place of residents, and region), and level of respondents confidence of Police and Army.

              The labels for mentioned variables are below.

              var1 ) sympathy: measure of level of sympathy for anti government groups

              1 no sympathy at all
              2 a little sympathy
              3 a lot of sympathy

              var2) corruption : measure of number of times a person experienced corruption in government institutions (it is a scale which has constructed from 10 variables), the label are below

              1 in no cases
              2 in some cases
              3 in most cases
              4 in all cases

              var3 ) gender : 1 female 2 male

              var4 ) urban_rural: shows place of residents of respondents , 1 rural 2 urban

              var5 ) Army: measures level of respondent's confidence for National Army, it is a scale which has constructed from 3 variables, the labels are below:

              1 a lot of confidence
              2 somewhat confidence
              3 a little confidence
              4 no confidence at all

              var 5) Police: measures level of respondents confidence for National Police, it is also a scale which has constructed from 3 variables, the labels are same as labels for "Army" variable.

              There was a lot of papers on how to interpret coefficients of ologit model, basically what i found is that coefficients must interpreted in terms of odds ratio, since I am new with this model i don't understand what the odds ratio means exactly.

              BTW, looking back to the model, the coefficient for gender is (.1598) and for corruption is (.4048), can i say that males and those who have experienced corruption are more likely to have sympathy for anti government groups.

              Many thanks.




              Comment


              • #8
                Fahim:
                take a llok at -ologit- and -ologit postestmation- for comprehensive answers to all your questions.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  You are a great help for Stata list users, many thanks Carlo!

                  Comment

                  Working...
                  X