Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fractional logit model for proportions over time

    Dear all, I have calculated a “Diversity Index” for a given population. Per the census website, the DI: “the DI tells us the chance that two people chosen at random will be from different racial and ethnic groups….The DI is bounded between 0 and 1, with a zero-value indicating that everyone in the population has the same racial and ethnic characteristics, while a value close to 1 indicates that everyone in the population has different characteristics.” (I put the full equation below)

    I’m running a fractional logit model with the DI as the dependent variable and year as the independent variable. I’d like to plot the trend line and 95% CIs around the trend. Here is my code. First, I use dataex to show the data; then I show the model with continuous year; then the model with categorical year.

    Questions:

    1) any obvious problems with this approach? In particular, I wasn’t sure if I need to make any adjustments to the fractional logit code for the fact this the same group of individuals over time, or maybe use a different approach to fractional logit.

    2) better to include year as c.year or i.year? The plots look quite different.

    I am using Stata 14.

    Thank you!!!

    Code:
    ******************************DATA
     dataex di_rev year_r
    
    ----------------------- copy starting from the next line ---------------------
    > --
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(di_rev year_r)
    .34123 1
    .35147 2
    .36345 3
    .37255 4
    .39094 5
    .39714 6
    .39895 7
    end
    
    ------------------ copy up to and including the previous line ----------------
    > --
    
    Listed 7 out of 7 observations
    Code:
    ************************************OPTION 1: WITH CONTINUOUS YEAR
    
    . fracreg logit di_rev c.year_r
    
    Iteration 0:   log pseudolikelihood = -5.3012582  
    Iteration 1:   log pseudolikelihood = -4.6198733  
    Iteration 2:   log pseudolikelihood = -4.6196722  
    Iteration 3:   log pseudolikelihood = -4.6196722  
    
    Fractional logistic regression                  Number of obs     =          7
                                                    Wald chi2(1)      =     163.74
                                                    Prob > chi2       =     0.0000
    Log pseudolikelihood = -4.6196722               Pseudo R2         =     0.0014
    
    ------------------------------------------------------------------------------
                 |               Robust
          di_rev |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          year_r |   .0446101   .0034862    12.80   0.000     .0377772     .051443
           _cons |  -.6959255   .0097576   -71.32   0.000    -.7150501   -.6768008
    ------------------------------------------------------------------------------
    
    . quietly margins, at(year_r=(1(1)7))
    
    . marginsplot
    
      Variables that uniquely identify margins: year_r
    Click image for larger version

Name:	test2.png
Views:	2
Size:	45.3 KB
ID:	1638855



    ***********************************OPTION 2: WITH CATEGORICAL YEAR:

    Code:
    .
    . fracreg logit di_rev i.year_r
    note: 7.year_r omitted because of collinearity
    
    Iteration 0:   log pseudolikelihood = -5.3011755  
    Iteration 1:   log pseudolikelihood = -4.6196655  
    Iteration 2:   log pseudolikelihood = -4.6194615  
    Iteration 3:   log pseudolikelihood = -4.6194615  
    
    Fractional logistic regression                  Number of obs     =          7
                                                    Wald chi2(0)      =          .
                                                    Prob > chi2       =          .
    Log pseudolikelihood = -4.6194615               Pseudo R2         =     0.0015
    
    ------------------------------------------------------------------------------
                 |               Robust
          di_rev |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          year_r |
              2  |   .0452339   1.05e-11  4.3e+09   0.000     .0452339    .0452339
              3  |   .0973966   6.01e-11  1.6e+09   0.000     .0973966    .0973966
              4  |    .136525   1.20e-10  1.1e+09   0.000      .136525     .136525
              5  |   .2144551   2.28e-10  9.4e+08   0.000     .2144551    .2144551
              6  |   .2404216   2.45e-10  9.8e+08   0.000     .2404216    .2404216
              7  |   .2479757   2.47e-10  1.0e+09   0.000     .2479757    .2479757
                 |
           _cons |  -.6578177   1.96e-13 -3.4e+12   0.000    -.6578177   -.6578177
    ------------------------------------------------------------------------------
    
    . quietly margins i.year_r
    
    . marginsplot
    
      Variables that uniquely identify margins: year_r
    Click image for larger version

Name:	test1.png
Views:	2
Size:	46.4 KB
ID:	1638856



    ----------------------------------------------------------------------------------------------------------

    FYI, DIVERSITY INDEX EQUATION BELOW:





    Diversity Index Equation



    DI = 1 – (H² + W² + B² + AIAN² + Asian² + NHPI² + SOR² + Multi²)



    H is the proportion of the population who are Hispanic or Latino.

    W is the proportion of the population who are White alone, not Hispanic or Latino.

    B is the proportion of the population who are Black or African American alone, not Hispanic or Latino.

    AIAN is the proportion of the population who are American Indian and Alaska Native alone, not Hispanic or Latino.

    Asian is the proportion of the population who are Asian alone, not Hispanic or Latino.

    NHPI is the proportion of the population who are Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino.

    SOR is the proportion of the population who are Some Other Race alone, not Hispanic or Latino.

    MULTI is the proportion of the population who are Two or More Races, not Hispanic or Latino.



    Source: https://www.census.gov/library/visua...20-census.html



  • #2
    Perhaps a naïve approach, but if you're interested in confidence intervals, then wouldn't it be better to use the actual counts of each category? There is an assumption about sampling distribution, I suppose, but how about something like -mlogit- followed by -predictnl , ci()- where you predict the diversity index and its confidence bounds from the equation?

    I show an example of the mechanics below using a fictitious dataset of a rather small population of three categories. (Begin at the "Begin here" comment; the top part of the output shows creation of the toy dataset for illustration.)

    .ÿ
    .ÿversionÿ17.0

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿsetÿseedÿ`=strreverse("1638845")'

    .ÿ
    .ÿ//ÿSevenÿyears
    .ÿquietlyÿsetÿobsÿ7

    .ÿ
    .ÿ//ÿTotalÿpopulationÿ(increasingÿoverÿtime)
    .ÿgenerateÿintÿtotÿ=ÿruniformint(9000,ÿ10000)

    .ÿsortÿtot

    .ÿgenerateÿbyteÿyeaÿ=ÿ_n

    .ÿ
    .ÿ//ÿBreakdownÿ(threeÿcategories)
    .ÿgenerateÿintÿcount1ÿ=ÿrbinomial(tot,ÿ0.5)

    .ÿgenerateÿintÿcount2ÿ=ÿrbinomial(tot,ÿ0.3)

    .ÿgenerateÿintÿcount3ÿ=ÿtotÿ-ÿcount1ÿ-ÿcount2

    .ÿ
    .ÿquietlyÿreshapeÿlongÿcount,ÿi(yea)ÿj(rac)

    .ÿ
    .ÿ*
    .ÿ*ÿBeginÿhere
    .ÿ*
    .ÿquietlyÿmlogitÿracÿi.yeaÿ[fweight=count],ÿbaseoutcome(1)

    .ÿpredictnlÿdoubleÿdinÿ=ÿ1ÿ-ÿ(1ÿ+ÿexp(xb(#2))^2ÿ+ÿexp(xb(#3))^2)ÿ/ÿ///
    >ÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(xb(#2))ÿ+ÿexp(xb(#3)))^2,ÿci(lbÿub)
    note:ÿconfidenceÿintervalsÿcalculatedÿusingÿZÿcriticalÿvalues.

    .ÿ
    .ÿ//ÿSanityÿcheck
    .ÿpredictÿdoubleÿpr*,ÿpr

    .ÿgenerateÿdoubleÿdin_chkÿ=ÿ1ÿ-ÿpr1^2ÿ-ÿpr2^2ÿ-ÿpr3^2

    .ÿassertÿfloat(din)ÿ==ÿfloat(din_chk)

    .ÿ
    .ÿ//ÿFinally,ÿplotÿwithÿCIs
    .ÿgraphÿtwowayÿ///
    >ÿÿÿÿÿÿÿÿÿrcapÿubÿlbÿyeaÿifÿracÿ==ÿ1,ÿlcolor(black)ÿ||ÿ///
    >ÿÿÿÿÿÿÿÿÿconnectedÿdinÿyeaÿifÿracÿ==ÿ1,ÿlcolor(black)ÿlpattern(solid)ÿ///
    >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmsize(small)ÿmcolor(black)ÿmfcolor(white)ÿ///
    >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿxtitle(Year)ÿxlabel(1(1)7)ÿ///
    >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿytitle(DiversityÿIndex)ÿ///
    >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿylabel(0.60(0.01)0.63,ÿformat(%4.2f)ÿangle(horizontal)ÿnogrid)ÿ///
    >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlegend(off)

    .ÿquietlyÿgraphÿexportÿdin.png

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .


    And if your dataset is by a designed survey, then I believe that -mlogit- allows the -svy- prefix, and so can accommodate that situation, as well.

    Click image for larger version

Name:	din.png
Views:	1
Size:	25.6 KB
ID:	1638944

    Comment


    • #3
      Wow, this is genius – thank you so much for taking the time to write all this code!! This made my day. Since I have 8 categories rather than 3 I changed the code as follows (bold parts were changed), it seemed to work correctly.

      Code:
      predictnl double din = 1 - (1 + exp(xb(#2))^2 + exp(xb(#3))^2 +   exp(xb(#4))^2  +   exp(xb(#5))^2 +   exp(xb(#6))^2 +   exp(xb(#7))^2 +   exp(xb(#8))^2) / (1 + exp(xb(#2)) + exp(xb(#3)) + exp(xb(#4)) + exp(xb(#5)) + exp(xb(#6)) + exp(xb(#7)) + exp(xb(#8)))^2 ,      ci(lb ub)
      
      
      generate double din_chk = 1 - pr1^2 - pr2^2 - pr3^2 - pr4^2 - pr5^2 - pr6^2 - pr7^2 – pr8^2
      Is there a straightforward way to get the “trend” from this model? I guess I was thinking the marginal effect of the year variable (with CIs). (I may be thinking about this the wrong way).

      Thank you again!!!!! I'm very grateful.

      Comment


      • #4
        Originally posted by Jennifer Carson View Post
        Is there a straightforward way to get the “trend” from this model? I guess I was thinking the marginal effect of the year variable (with CIs). (I may be thinking about this the wrong way).
        If by trend you mean whether the index is increasing or decreasing over time, then my recommendation is to rely upon inspection of the plot.

        If you're concerned that your audience will demand a null hypothesis statistical test of whether the linear trend of the temporal profile is exactly zero, then you might be able to do something using the linear component of the set of orthogonal polynomial contrasts for the seven years, but I doubt that it would be straightforward. And I'm not sure that it would be worth the effort, inasmuch as the functional relationship of the index is by construction inherently nonlinear.

        Comment


        • #5
          Jennifer: When you include i.year, the model is saturated: the year dummies are exhaustive (with an intercept) and mutually exclusive. Therefore, the trend will be exactly the same no matter which model you use. You might as well just use a linear model. If you use c.year then that's different but less flexible. If you start adding covariates then the fracreg makes sense.

          Comment


          • #6
            I wrote "you might be able to do something using the linear component of the set of orthogonal polynomial contrasts for the seven years, but I doubt that it would be straightforward".

            Actually, it is straightforward as it turns out. But does take some typing.

            Below I show the test of linear trend in diversity index over time for the example that I used above. Again, begin at the "Begin here" (new location) for the new code's output, showing the how to test for linear trend in the computed diversity index's timecourse.

            .ÿ
            .ÿversionÿ17.0

            .ÿ
            .ÿclearÿ*

            .ÿ
            .ÿsetÿseedÿ`=strreverse("1638845")'

            .ÿ
            .ÿ//ÿSevenÿyears
            .ÿquietlyÿsetÿobsÿ7

            .ÿ
            .ÿ//ÿTotalÿpopulationÿ(increasingÿoverÿtime)
            .ÿgenerateÿintÿtotÿ=ÿruniformint(9000,ÿ10000)

            .ÿsortÿtot

            .ÿgenerateÿbyteÿyeaÿ=ÿ_n

            .ÿ
            .ÿ//ÿBreakdownÿ(threeÿcategories)
            .ÿgenerateÿintÿcount1ÿ=ÿrbinomial(tot,ÿ0.5)

            .ÿgenerateÿintÿcount2ÿ=ÿrbinomial(tot,ÿ0.3)

            .ÿgenerateÿintÿcount3ÿ=ÿtotÿ-ÿcount1ÿ-ÿcount2

            .ÿ
            .ÿquietlyÿreshapeÿlongÿcount,ÿi(yea)ÿj(rac)

            .ÿ
            .ÿ
            .ÿquietlyÿmlogitÿracÿi.yeaÿ[fweight=count],ÿbaseoutcome(1)ÿnolog

            .ÿpredictnlÿdoubleÿdinÿ=ÿ1ÿ-ÿ(1ÿ+ÿexp(xb(#2))^2ÿ+ÿexp(xb(#3))^2)ÿ/ÿ///
            >ÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(xb(#2))ÿ+ÿexp(xb(#3)))^2,ÿci(lbÿub)
            note:ÿconfidenceÿintervalsÿcalculatedÿusingÿZÿcriticalÿvalues.

            .ÿlistÿyeaÿdinÿlbÿubÿifÿracÿ==ÿ1,ÿnoobsÿseparator(0)

            ÿÿ+-----------------------------------------+
            ÿÿ|ÿyeaÿÿÿÿÿÿÿÿÿdinÿÿÿÿÿÿÿÿÿÿlbÿÿÿÿÿÿÿÿÿÿubÿ|
            ÿÿ|-----------------------------------------|
            ÿÿ|ÿÿÿ1ÿÿÿ.62425017ÿÿÿ.61931046ÿÿÿ.62918989ÿ|
            ÿÿ|ÿÿÿ2ÿÿÿ.61787938ÿÿÿ.61264441ÿÿÿ.62311434ÿ|
            ÿÿ|ÿÿÿ3ÿÿÿ.61892674ÿÿÿ.61379026ÿÿÿ.62406322ÿ|
            ÿÿ|ÿÿÿ4ÿÿÿ.61863805ÿÿÿ.61345668ÿÿÿ.62381942ÿ|
            ÿÿ|ÿÿÿ5ÿÿÿ.62230284ÿÿÿ.61743457ÿÿÿÿ.6271711ÿ|
            ÿÿ|ÿÿÿ6ÿÿÿ.61453096ÿÿÿ.60933869ÿÿÿ.61972324ÿ|
            ÿÿ|ÿÿÿ7ÿÿÿ.61932345ÿÿÿ.61436164ÿÿÿ.62428525ÿ|
            ÿÿ+-----------------------------------------+

            .ÿ
            .ÿ*
            .ÿ*ÿBeginÿhere
            .ÿ*
            .ÿquietlyÿnlcomÿ///
            >ÿÿÿÿÿÿÿÿÿ(y1:ÿ1ÿ-ÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:1b.yea])^2ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:1b.yea])^2)ÿ/ÿ///
            >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:1b.yea])ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:1b.yea]))^2)ÿ///
            >ÿÿÿÿÿÿÿÿÿ(y2:ÿ1ÿ-ÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:2.yea])^2ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:2.yea])^2)ÿ/ÿ///
            >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:2.yea])ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:2.yea]))^2)ÿ///
            >ÿÿÿÿÿÿÿÿÿ(y3:ÿ1ÿ-ÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:3.yea])^2ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:3.yea])^2)ÿ/ÿ///
            >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:3.yea])ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:3.yea]))^2)ÿ///
            >ÿÿÿÿÿÿÿÿÿ(y4:ÿ1ÿ-ÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:4.yea])^2ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:4.yea])^2)ÿ/ÿ///
            >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:4.yea])ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:4.yea]))^2)ÿ///
            >ÿÿÿÿÿÿÿÿÿ(y5:ÿ1ÿ-ÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:5.yea])^2ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:5.yea])^2)ÿ/ÿ///
            >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:5.yea])ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:5.yea]))^2)ÿ///
            >ÿÿÿÿÿÿÿÿÿ(y6:ÿ1ÿ-ÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:6.yea])^2ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:6.yea])^2)ÿ/ÿ///
            >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:6.yea])ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:6.yea]))^2)ÿ///
            >ÿÿÿÿÿÿÿÿÿ(y7:ÿ1ÿ-ÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:7.yea])^2ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:7.yea])^2)ÿ/ÿ///
            >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:7.yea])ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:7.yea]))^2),ÿpost

            .ÿ
            .ÿ//ÿConfirmingÿcorrectnessÿofÿ-nlcom-ÿ(compareÿpointÿestimateÿandÿconfidenceÿboundsÿwithÿtheÿlistingÿabove)
            .ÿnlcom

            ------------------------------------------------------------------------------
            ÿÿÿÿÿÿÿÿÿracÿ|ÿCoefficientÿÿStd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
            -------------+----------------------------------------------------------------
            ÿÿÿÿÿÿÿÿÿÿy1ÿ|ÿÿÿ.6242502ÿÿÿ.0025203ÿÿÿ247.69ÿÿÿ0.000ÿÿÿÿÿ.6193105ÿÿÿÿ.6291899
            ÿÿÿÿÿÿÿÿÿÿy2ÿ|ÿÿÿ.6178794ÿÿÿÿ.002671ÿÿÿ231.33ÿÿÿ0.000ÿÿÿÿÿ.6126444ÿÿÿÿ.6231143
            ÿÿÿÿÿÿÿÿÿÿy3ÿ|ÿÿÿ.6189267ÿÿÿ.0026207ÿÿÿ236.17ÿÿÿ0.000ÿÿÿÿÿ.6137903ÿÿÿÿ.6240632
            ÿÿÿÿÿÿÿÿÿÿy4ÿ|ÿÿÿÿ.618638ÿÿÿ.0026436ÿÿÿ234.01ÿÿÿ0.000ÿÿÿÿÿ.6134567ÿÿÿÿ.6238194
            ÿÿÿÿÿÿÿÿÿÿy5ÿ|ÿÿÿ.6223028ÿÿÿ.0024839ÿÿÿ250.54ÿÿÿ0.000ÿÿÿÿÿ.6174346ÿÿÿÿ.6271711
            ÿÿÿÿÿÿÿÿÿÿy6ÿ|ÿÿÿÿ.614531ÿÿÿ.0026492ÿÿÿ231.97ÿÿÿ0.000ÿÿÿÿÿ.6093387ÿÿÿÿ.6197232
            ÿÿÿÿÿÿÿÿÿÿy7ÿ|ÿÿÿ.6193234ÿÿÿ.0025316ÿÿÿ244.64ÿÿÿ0.000ÿÿÿÿÿ.6143616ÿÿÿÿ.6242853
            ------------------------------------------------------------------------------

            .ÿ
            .ÿ/*ÿOrthogonalÿpolynomialÿcontrast:
            >ÿÿÿÿ(linearÿcomponent'sÿcoefficients:ÿ-3ÿ-2ÿ1ÿ0ÿ1ÿ2ÿ3)ÿ*/
            .ÿtestÿ-3ÿ*ÿy1ÿ-ÿ2ÿ*ÿy2ÿ-ÿy3ÿ+ÿ0ÿ*ÿy4ÿ+ÿy5ÿ+ÿ2ÿ*ÿy6ÿ+ÿ3ÿ*ÿy7ÿ=ÿ0

            ÿ(ÿ1)ÿÿ-ÿ3*y1ÿ-ÿ2*y2ÿ-ÿy3ÿ+ÿy5ÿ+ÿ2*y6ÿ+ÿ3*y7ÿ=ÿ0

            ÿÿÿÿÿÿÿÿÿÿÿchi2(ÿÿ1)ÿ=ÿÿÿÿ1.78
            ÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿ=ÿÿÿÿ0.1827

            .ÿ
            .ÿexit

            endÿofÿdo-file


            .


            I believe that Jeff''s suggestion of fitting a linear regression model or usage of -fracreg- would be more suitable if you had a whole bunch of [0, 1] values for each year, that is, if you had a sample containing numerous diversity index values per year. But my understanding from your original post is that you have a single computed diversity index per year—a single derived datapoint per year—for each of seven years.

            Comment


            • #7
              I see now there are only seven observations; I should've paid more attention to that. Then the second approach is just plotting the 7 different points, and there cannot be any kind of confidence interval. Using c.year doesn't give a perfect fit, but with n = 7 and 5 degrees of freedom you cannot use a normal confidence interval anyway -- especially with a linear model. Plus, there is almost always serial correlation in these time series. I hope Jennifer has more than 7 years of data for the remaining analysis.

              Comment


              • #8
                thank you so much both! Joseph, this code is amazing, very many thanks!!! Yes, to clarify: the raw dataset I am using using has thousands of people, but by definition the diversity index is a single derived data point for the whole population (in this case I have one per year). Formula for diversity index below:


                DI = 1 – (H² + W² + B² + AIAN² + Asian² + NHPI² + SOR² + Multi²) ...where H is the proportion of Hispanics in population, W is the proportion of whites, etc.

                Jeff, it seemed Joseph approach’s to calculate the CIs involved using the actual counts of each category. For example, before he runs:

                Code:
                quietly mlogit rac i.yea [fweight=count],baseoutcome(1)
                the data was transformed to look like this:

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input byte(yea rac) int(tot count)
                1 1 9075 4475
                1 2 9075 2712
                1 3 9075 1888
                2 1 9175 4627
                2 2 9175 2730
                2 3 9175 1818
                3 1 9204 4613
                3 2 9204 2777
                3 3 9204 1814
                4 1 9231 4643
                4 2 9231 2749
                4 3 9231 1839
                5 1 9619 4767
                5 2 9619 2901
                5 3 9619 1951
                6 1 9730 4947
                6 2 9730 2931
                6 3 9730 1852
                7 1 9994 5020
                7 2 9994 2962
                7 3 9994 2012
                end

                I’m very grateful for the time both of you have taken to respond to this post!!!

                Comment


                • #9
                  Originally posted by Jeff Wooldridge View Post
                  Then the second approach is just plotting the 7 different points, and there cannot be any kind of confidence interval..
                  Jeff, I'm curious about your comment that there cannot be any kind of confidence interval here - would you mind explaining why?

                  Comment

                  Working...
                  X