Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Non-normality of errors

    Hello everyone,

    I will begin by giving a bit of background of my work. I'm trying to study voting behavior of counties towards solar amendment. I have small sample with 67 observations. I ran this regression where:
    pyamd4: % of votes in favor of amendment 4 of total votes.
    larea: log of area
    solarrr: solar resource
    ofrepublicans: % of republicans
    ofdemocracts: % of democrats
    sjobs: no.of solar jobs

    This data is at county level. When I do tests for checking Gauss-Markov assumptions. My regression doesn't satisfy normality and homoscedasticity assumption.
    I ran following commands:
    Code:
    reg pyamd4 larea lpcincome solarres  ofrepublicans ofdemocracts sjobs
    predict resi2, resid
    kdensity resi2, normal
    pnorm resi2                                        
    rvfplot, yline(0)                                               
    estat hettest
    estat vif

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 county double pyamd4 float(larea lpcincome) double(solarresource ofrepublicans ofdemocracts) int sjobs
    "Alachua"      .7226168507798478 6.876389   10.1565 4.65439856159   .280860877128731  .4767270188610145  40
    "Baker"         .637917485265226  6.37824  9.962793 4.58400273448  .5325954919692266  .3577405857740586   2
    "Bay"           .695111748535915 6.940484 10.154052 4.66124543391  .5188370359825488 .25756370426816505  34
    "Bradford"     .6416365428853106 6.397046   9.89606 4.60320281519  .4706863532464323 .37239760301476493   2
    "Brevard"      .7232516245255098 8.043622 10.246687 4.87910985947 .42041183632417556 .31021265537701936 227
    "Broward"      .7634885255699859 7.878212 10.274603 4.74840814016 .21428565234997213  .5006594298468093 830
    "Calhoun"      .6345191773207337 7.046369  9.714746 4.56599260982  .2917004282903114  .5917351545317745   2
    "Charlotte"    .7082288348132583 7.449056 10.234947 5.05225627719  .4495916494953901 .27426414186231896  52
    "Citrus"       .6500029916831209 7.343614 10.093612 4.85882703118  .4743051500912874 .27384361590717415  36
    "Clay"         .6963859614237423 7.160388 10.209464 4.57542817673   .536944088549578  .2207135215099922  23
    "Collier"      .7108576736505315 8.435953 10.586988 4.92136245798  .5084053116401484 .23635363081708338 125
    "Columbia"     .6537055500166168 7.379058    9.9931 4.62466003525  .4529435423055653 .36121864159670203  10
    "Desoto"       .4509561998766194 7.153826  9.723763 4.95785739711   .333633741888969  .4367940398942562   6
    "Dixie"        .5995740149094782 7.454326  9.824986 4.82430395388 .39644351464435146 .43964435146443515   1
    "Duval"         .710956156270895 7.515574 10.212258 4.60426204932 .36803260704460683 .40638132689743633 203
    "Escambia"     .6779929107748599 7.468022 10.107734 4.51376248655   .448691172042651  .3389188129979497  58
    "Flagler"      .6980101041344469 7.040107 10.139112 4.66401260913  .4087013034788047 .31006364115706736  13
    "Franklin"     .7004860267314702 7.637692  9.976505 4.76870872546  .3287958115183246  .5280104712041885   2
    "Gadsden"       .640840098835157 6.963171  9.808462 4.72360658646 .15837943597355283  .7410268519767913   4
    "Gilchrist"    .6225533158048495 6.566588  9.981374 4.65617175536  .5506797453106178 .28945104112889347   2
    "Glades"       .6509330406147091  7.58724  9.786841 5.03319253753   .394491337183474  .4164075225825559   1
    "Gulf"         .6382636655948553 7.305981  9.905985 4.72645654176 .47902825979176994 .39325731284085275   3
    "Hamilton"     .6235790841182202 6.945648  9.678467 4.56765750397 .31562459208980553   .555671583344211   1
    "Hardee"       .6595531843947983 7.152002  9.751443 4.89647433975  .4355676477862973 .37274301261439524   4
    "Hendry"       .5872241579558652 7.774683  9.799626 4.91611670705  .3587321459056507  .4469356399021226   8
    "Hernando"     .7366509134513649 7.071709 10.018377  4.8535054563 .41166375601974703 .31743439704248033  36
    "Highlands"    .6923626831883712  7.70191  9.994972 4.88375874447 .46089146224475813 .30565643415102095  27
    "Hillsborough" .7502068011249982 7.836935 10.265593 4.90972701385  .3164335122057136  .3895842827726571 373
    "Holmes"       .6340051885851128 6.884906  9.789983 4.61156498685  .5752961082910322  .3109607068997932   2
    "Indian River" .6933540229126085 7.117903  10.37997 4.77615442332 .46186095260277166 .27206600317116214  58
    "Jackson"       .617502017394423 7.554424  9.753942 4.61020853014  .3681511178531441  .5220500595947557   6
    "Jefferson"    .5825697413968641 7.149359  9.966979 4.71163779583  .3255119892971082  .5567562004733971   2
    "Lafayette"    .5732484076433121 6.999277  9.869983 4.66603018766 .37749419953596286  .5501160092807425   1
    "Lake"         .7150353178607467 7.746219 10.116984 4.76585934848  .4386257418557841  .3059685375398464  84
    "Lee"          .7198267466773017   7.7931  10.27329 5.08122556844 .42961215585152507 .27070839730286894 271
    "Leon"         .7125818327197615 7.246767 10.210605 4.87836678018 .27558221082924234  .5238422287248192  63
    "Levy"         .6469058762350494 7.946133   9.93086 4.79619548754  .4675927605092307 .33433680316274345  10
    "Liberty"      .5844265763859501 7.430304  9.753711 4.60028333567   .195273061037173  .7145479577787977   1
    "Madison"      .6113495469718646 7.266541  9.710267 4.62748019288  .2969398999745698  .5873527167924049   2
    "Manatee"      .7277555253558401 7.487432 10.279867 4.99956594744   .431882177274788 .30541656788490573  75
    "Marion"       .6876961043539573 8.109538  10.01637  4.6787968369 .44258642241109014  .3295527230009657  84
    "Martin"       .7248721303675508  7.31694  10.48827 4.86743452803 .49614654601633196  .2527124255131318  82
    "Miami-Dade"   .7421913842336139 8.489312  10.10704 4.76655273126 .26409123724973016  .4193518180973417 888
    "Monroe"       .8086342621516931 8.919226 10.512465 5.02861857762 .39071069253753926  .3234804644653731  46
    "Nassau"       .6888182460389545 7.280504  10.34628 4.62281322707  .5742732160758225 .21821744281418026  10
    "Okaloosa"     .6746022759440639   7.6797  10.29563 4.53543954523  .5775714983652148 .18802660753880265  32
    "Okeechobee"   .6424082708291101 7.486142  9.751094 4.91590516457  .4348706300612285  .3615939166502074   9
    "Orange"       .7437151125879142 7.605069 10.173896 4.75654910609 .26807206698922464  .4222231136635088 375
    "Osceola"      .7383541033866069 8.010595  9.877246 4.79563595013 .22776479677401118  .4276743484897021  48
    "Palm Beach"   .7671382881409122 8.470651 10.458694 4.82882928978 .28197111643855743  .4216997794176059 899
    "Pasco"        .7730504981106149 7.459252  10.13559 4.87800932823 .38885741374525945 .31275147997410047  86
    "Pinellas"     .7932731210433623 7.102787  10.34287 5.14775717258  .3537068650630509 .35528867682592274 365
    "Polk"         .7169741434575112 8.299049  9.983638  4.8238532677  .3565602680539635  .3540889006456417 122
    "Putnam"        .653510165184244 7.411145  9.828818 4.63264384388   .398695040300226 .40210669964604034  10
    "Santa Rosa"   .6485668276972625 7.760953  10.24775 4.53922072531  .5853649184426261  .1929155898434804  20
    "Sarasota"     .7595158312247828 7.279567 10.469086 5.08188199762  .4252344638799749 .30604427646820753 196
    "Seminole"     .7452675510508272 6.536344  10.31218 4.72133385234 .36974259843320784 .33977685081561365 254
    "St. Johns"    .7194724633148278 7.404194 10.554823   4.678036611   .526244747784817 .23706267824875363  30
    "St. Lucie"    .6752733042098629 7.227052 10.090133 4.77358116458  .3222108965903186 .39691509773354117   0
    "Sumter"        .702576850360619 7.056701 10.360627 4.78716291733  .5398208691630049 .24996320669427916  15
    "Suwannee"     .6476301930953774 7.232589  9.868999 4.61778977673 .45817059834400875  .3811904389939072   4
    "Taylor"       .5864406779661017 7.809549  9.685393 4.78342347039 .36663611365719523  .5282893092242313   3
    "Union"        .6504285069914298 6.213448   9.46831 4.62037857743  .4332100287710645  .4640361693382655   1
    "Volusia"      .7370536548887643 7.960282  10.11997 4.70158898802  .3545647840885201 .34271207278746285 188
    "Wakulla"      .6890025019036223 7.294024 10.004644 4.87937102228 .41265280280524036 .41187356937612624   3
    "Walton"       .6973589723130934 7.814428 10.256782 4.59274840879  .6010498155729399 .19409428073446558  10
    "Washington"   .6455675498855087 7.116053  9.811317 4.57566850274  .5142911706664969 .33986886498185753   2
    end
    To correct homoscedasticity I use
    Code:
    vce(robust)
    but I'm stuck with non-normality.

    What do I do from this point on, once I know that error terms of my distribution are not normally distributed? I read queries on this forum and found out that its ok if errors are not normally distributed. What can I do next, given I have small sample.

    Can anyone please help me with this?

    Thanks,
    Ritika

  • #2
    Ritika:
    you're overemphasizing a frequent nuisance that is by no means a big deal.
    You impose -vce(robust)- and that is all you should do.
    Go on with -regress- with robustified standard errors and do not worry about heteroskedasticity (be more concerned about the possible significance of -estat ovtest-, instead).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Your response is bounded, so beta regression or a generalized linear model with logit link is likely to be much preferable in any case.

      Comment


      • #4
        Thank you Carlo for your suggestion.

        I used generalized linear model too. For that I found this user written command and ran regression using the following command:

        Code:
        fracglm pyamd4 larea lpcincome solarres  ofrepublicans ofdemocracts sjobs
            margins, dydx(*) post
            est store margins                                                                        
            outreg2 [margins] using myfile.doc
        To know how different results would be if I tried:

        Code:
        glm pyamd4 larea lpcincome solarres ofrepublicans ofdemocracts sjobs, family(gaussian) link(identity)
        Now, I have some observations and questions.
        #1. I found out my results are very similar from both these regressions. I didn't understand difference between these two commands, given that response is bounded? Are they same pertaining to my dataset?
        #2. Just want to if i'm thinking correctly - in glm command, I used family(gaussian) instead of family(binomial)? Can you please correct me if I'm wrong here? My thinking was the response variable is continuous and not 0/1.
        #3 Lastly, I did not understand which part of glm command specifies that my data is fractional, so that leads me to think that I'm missing something in my command due to which my results from 'fracglm' and 'glm' are very similar. Or is it implicit in glm command?

        I have these questions because I'm trying to understand this command and use of generalized linear models.

        Thank you.

        Comment


        • #5
          glm with link(logit) f(binomial) vce(robust) would be one standard recipe here.

          See also

          Code:
          help fracreg

          Comment


          • #6
            Thank you Nick.

            Comment

            Working...
            X