Fractional logit model for proportions over time

Jennifer Carson

Join Date: Apr 2019
Posts: 21

Fractional logit model for proportions over time

30 Nov 2021, 11:14

Dear all, I have calculated a “Diversity Index” for a given population. Per the census website, the DI: “the DI tells us the chance that two people chosen at random will be from different racial and ethnic groups….The DI is bounded between 0 and 1, with a zero-value indicating that everyone in the population has the same racial and ethnic characteristics, while a value close to 1 indicates that everyone in the population has different characteristics.” (I put the full equation below)

I’m running a fractional logit model with the DI as the dependent variable and year as the independent variable. I’d like to plot the trend line and 95% CIs around the trend. Here is my code. First, I use dataex to show the data; then I show the model with continuous year; then the model with categorical year.

Questions:

1) any obvious problems with this approach? In particular, I wasn’t sure if I need to make any adjustments to the fractional logit code for the fact this the same group of individuals over time, or maybe use a different approach to fractional logit.

2) better to include year as c.year or i.year? The plots look quite different.

I am using Stata 14.

Thank you!!!

Code:

******************************DATA
 dataex di_rev year_r

----------------------- copy starting from the next line ---------------------
> --
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(di_rev year_r)
.34123 1
.35147 2
.36345 3
.37255 4
.39094 5
.39714 6
.39895 7
end

------------------ copy up to and including the previous line ----------------
> --

Listed 7 out of 7 observations

Code:

************************************OPTION 1: WITH CONTINUOUS YEAR

. fracreg logit di_rev c.year_r

Iteration 0:   log pseudolikelihood = -5.3012582  
Iteration 1:   log pseudolikelihood = -4.6198733  
Iteration 2:   log pseudolikelihood = -4.6196722  
Iteration 3:   log pseudolikelihood = -4.6196722  

Fractional logistic regression                  Number of obs     =          7
                                                Wald chi2(1)      =     163.74
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -4.6196722               Pseudo R2         =     0.0014

------------------------------------------------------------------------------
             |               Robust
      di_rev |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      year_r |   .0446101   .0034862    12.80   0.000     .0377772     .051443
       _cons |  -.6959255   .0097576   -71.32   0.000    -.7150501   -.6768008
------------------------------------------------------------------------------

. quietly margins, at(year_r=(1(1)7))

. marginsplot

  Variables that uniquely identify margins: year_r

Click image for larger version

Name: test2.png
Views: 2
Size: 45.3 KB
ID: 1638855

***********************************OPTION 2: WITH CATEGORICAL YEAR:

Code:

.
. fracreg logit di_rev i.year_r
note: 7.year_r omitted because of collinearity

Iteration 0:   log pseudolikelihood = -5.3011755  
Iteration 1:   log pseudolikelihood = -4.6196655  
Iteration 2:   log pseudolikelihood = -4.6194615  
Iteration 3:   log pseudolikelihood = -4.6194615  

Fractional logistic regression                  Number of obs     =          7
                                                Wald chi2(0)      =          .
                                                Prob > chi2       =          .
Log pseudolikelihood = -4.6194615               Pseudo R2         =     0.0015

------------------------------------------------------------------------------
             |               Robust
      di_rev |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      year_r |
          2  |   .0452339   1.05e-11  4.3e+09   0.000     .0452339    .0452339
          3  |   .0973966   6.01e-11  1.6e+09   0.000     .0973966    .0973966
          4  |    .136525   1.20e-10  1.1e+09   0.000      .136525     .136525
          5  |   .2144551   2.28e-10  9.4e+08   0.000     .2144551    .2144551
          6  |   .2404216   2.45e-10  9.8e+08   0.000     .2404216    .2404216
          7  |   .2479757   2.47e-10  1.0e+09   0.000     .2479757    .2479757
             |
       _cons |  -.6578177   1.96e-13 -3.4e+12   0.000    -.6578177   -.6578177
------------------------------------------------------------------------------

. quietly margins i.year_r

. marginsplot

  Variables that uniquely identify margins: year_r

Click image for larger version

Name: test1.png
Views: 2
Size: 46.4 KB
ID: 1638856

----------------------------------------------------------------------------------------------------------

FYI, DIVERSITY INDEX EQUATION BELOW:

Diversity Index Equation

DI = 1 – (H² + W² + B² + AIAN² + Asian² + NHPI² + SOR² + Multi²)

H is the proportion of the population who are Hispanic or Latino.

W is the proportion of the population who are White alone, not Hispanic or Latino.

B is the proportion of the population who are Black or African American alone, not Hispanic or Latino.

AIAN is the proportion of the population who are American Indian and Alaska Native alone, not Hispanic or Latino.

Asian is the proportion of the population who are Asian alone, not Hispanic or Latino.

NHPI is the proportion of the population who are Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino.

SOR is the proportion of the population who are Some Other Race alone, not Hispanic or Latino.

MULTI is the proportion of the population who are Two or More Races, not Hispanic or Latino.

Source: https://www.census.gov/library/visua...20-census.html

Tags: None

Joseph Coveney

Join Date: Apr 2014

Posts: 4396
#2

30 Nov 2021, 23:02

Perhaps a naïve approach, but if you're interested in confidence intervals, then wouldn't it be better to use the actual counts of each category? There is an assumption about sampling distribution, I suppose, but how about something like -mlogit- followed by -predictnl , ci()- where you predict the diversity index and its confidence bounds from the equation?

I show an example of the mechanics below using a fictitious dataset of a rather small population of three categories. (Begin at the "Begin here" comment; the top part of the output shows creation of the toy dataset for illustration.)

.ÿ
.ÿversionÿ17.0

.ÿ
.ÿclearÿ*

.ÿ
.ÿsetÿseedÿ`=strreverse("1638845")'

.ÿ
.ÿ//ÿSevenÿyears
.ÿquietlyÿsetÿobsÿ7

.ÿ
.ÿ//ÿTotalÿpopulationÿ(increasingÿoverÿtime)
.ÿgenerateÿintÿtotÿ=ÿruniformint(9000,ÿ10000)

.ÿsortÿtot

.ÿgenerateÿbyteÿyeaÿ=ÿ_n

.ÿ
.ÿ//ÿBreakdownÿ(threeÿcategories)
.ÿgenerateÿintÿcount1ÿ=ÿrbinomial(tot,ÿ0.5)

.ÿgenerateÿintÿcount2ÿ=ÿrbinomial(tot,ÿ0.3)

.ÿgenerateÿintÿcount3ÿ=ÿtotÿ-ÿcount1ÿ-ÿcount2

.ÿ
.ÿquietlyÿreshapeÿlongÿcount,ÿi(yea)ÿj(rac)

.ÿ
.ÿ*
.ÿ*ÿBeginÿhere
.ÿ*
.ÿquietlyÿmlogitÿracÿi.yeaÿ[fweight=count],ÿbaseoutcome(1)

.ÿpredictnlÿdoubleÿdinÿ=ÿ1ÿ-ÿ(1ÿ+ÿexp(xb(#2))^2ÿ+ÿexp(xb(#3))^2)ÿ/ÿ///
>ÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(xb(#2))ÿ+ÿexp(xb(#3)))^2,ÿci(lbÿub)
note:ÿconfidenceÿintervalsÿcalculatedÿusingÿZÿcriticalÿvalues.

.ÿ
.ÿ//ÿSanityÿcheck
.ÿpredictÿdoubleÿpr*,ÿpr

.ÿgenerateÿdoubleÿdin_chkÿ=ÿ1ÿ-ÿpr1^2ÿ-ÿpr2^2ÿ-ÿpr3^2

.ÿassertÿfloat(din)ÿ==ÿfloat(din_chk)

.ÿ
.ÿ//ÿFinally,ÿplotÿwithÿCIs
.ÿgraphÿtwowayÿ///
>ÿÿÿÿÿÿÿÿÿrcapÿubÿlbÿyeaÿifÿracÿ==ÿ1,ÿlcolor(black)ÿ||ÿ///
>ÿÿÿÿÿÿÿÿÿconnectedÿdinÿyeaÿifÿracÿ==ÿ1,ÿlcolor(black)ÿlpattern(solid)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmsize(small)ÿmcolor(black)ÿmfcolor(white)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿxtitle(Year)ÿxlabel(1(1)7)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿytitle(DiversityÿIndex)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿylabel(0.60(0.01)0.63,ÿformat(%4.2f)ÿangle(horizontal)ÿnogrid)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlegend(off)

.ÿquietlyÿgraphÿexportÿdin.png

.ÿ
.ÿexit

endÿofÿdo-file

.

And if your dataset is by a designed survey, then I believe that -mlogit- allows the -svy- prefix, and so can accommodate that situation, as well.
1 like
Comment
Jennifer Carson

Join Date: Apr 2019

Posts: 21
#3

01 Dec 2021, 10:25

Wow, this is genius – thank you so much for taking the time to write all this code!! This made my day. Since I have 8 categories rather than 3 I changed the code as follows (bold parts were changed), it seemed to work correctly.

Code:

predictnl double din = 1 - (1 + exp(xb(#2))^2 + exp(xb(#3))^2 + exp(xb(#4))^2 + exp(xb(#5))^2 + exp(xb(#6))^2 + exp(xb(#7))^2 + exp(xb(#8))^2) / (1 + exp(xb(#2)) + exp(xb(#3)) + exp(xb(#4)) + exp(xb(#5)) + exp(xb(#6)) + exp(xb(#7)) + exp(xb(#8)))^2 , ci(lb ub) generate double din_chk = 1 - pr1^2 - pr2^2 - pr3^2 - pr4^2 - pr5^2 - pr6^2 - pr7^2 – pr8^2

Is there a straightforward way to get the “trend” from this model? I guess I was thinking the marginal effect of the year variable (with CIs). (I may be thinking about this the wrong way).

Thank you again!!!!! I'm very grateful.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4396
#4

02 Dec 2021, 18:16

Originally posted by Jennifer Carson View Post

Is there a straightforward way to get the “trend” from this model? I guess I was thinking the marginal effect of the year variable (with CIs). (I may be thinking about this the wrong way).

If by trend you mean whether the index is increasing or decreasing over time, then my recommendation is to rely upon inspection of the plot.

If you're concerned that your audience will demand a null hypothesis statistical test of whether the linear trend of the temporal profile is exactly zero, then you might be able to do something using the linear component of the set of orthogonal polynomial contrasts for the seven years, but I doubt that it would be straightforward. And I'm not sure that it would be worth the effort, inasmuch as the functional relationship of the index is by construction inherently nonlinear.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2148
#5

02 Dec 2021, 21:56

Jennifer: When you include i.year, the model is saturated: the year dummies are exhaustive (with an intercept) and mutually exclusive. Therefore, the trend will be exactly the same no matter which model you use. You might as well just use a linear model. If you use c.year then that's different but less flexible. If you start adding covariates then the fracreg makes sense.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4396
#6

03 Dec 2021, 08:10

I wrote "you might be able to do something using the linear component of the set of orthogonal polynomial contrasts for the seven years, but I doubt that it would be straightforward".

Actually, it is straightforward as it turns out. But does take some typing.

Below I show the test of linear trend in diversity index over time for the example that I used above. Again, begin at the "Begin here" (new location) for the new code's output, showing the how to test for linear trend in the computed diversity index's timecourse.

.ÿ
.ÿversionÿ17.0

.ÿ
.ÿclearÿ*

.ÿ
.ÿsetÿseedÿ`=strreverse("1638845")'

.ÿ
.ÿ//ÿSevenÿyears
.ÿquietlyÿsetÿobsÿ7

.ÿ
.ÿ//ÿTotalÿpopulationÿ(increasingÿoverÿtime)
.ÿgenerateÿintÿtotÿ=ÿruniformint(9000,ÿ10000)

.ÿsortÿtot

.ÿgenerateÿbyteÿyeaÿ=ÿ_n

.ÿ
.ÿ//ÿBreakdownÿ(threeÿcategories)
.ÿgenerateÿintÿcount1ÿ=ÿrbinomial(tot,ÿ0.5)

.ÿgenerateÿintÿcount2ÿ=ÿrbinomial(tot,ÿ0.3)

.ÿgenerateÿintÿcount3ÿ=ÿtotÿ-ÿcount1ÿ-ÿcount2

.ÿ
.ÿquietlyÿreshapeÿlongÿcount,ÿi(yea)ÿj(rac)

.ÿ
.ÿ
.ÿquietlyÿmlogitÿracÿi.yeaÿ[fweight=count],ÿbaseoutcome(1)ÿnolog

.ÿpredictnlÿdoubleÿdinÿ=ÿ1ÿ-ÿ(1ÿ+ÿexp(xb(#2))^2ÿ+ÿexp(xb(#3))^2)ÿ/ÿ///
>ÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(xb(#2))ÿ+ÿexp(xb(#3)))^2,ÿci(lbÿub)
note:ÿconfidenceÿintervalsÿcalculatedÿusingÿZÿcriticalÿvalues.

.ÿlistÿyeaÿdinÿlbÿubÿifÿracÿ==ÿ1,ÿnoobsÿseparator(0)

ÿÿ+-----------------------------------------+
ÿÿ|ÿyeaÿÿÿÿÿÿÿÿÿdinÿÿÿÿÿÿÿÿÿÿlbÿÿÿÿÿÿÿÿÿÿubÿ|
ÿÿ|-----------------------------------------|
ÿÿ|ÿÿÿ1ÿÿÿ.62425017ÿÿÿ.61931046ÿÿÿ.62918989ÿ|
ÿÿ|ÿÿÿ2ÿÿÿ.61787938ÿÿÿ.61264441ÿÿÿ.62311434ÿ|
ÿÿ|ÿÿÿ3ÿÿÿ.61892674ÿÿÿ.61379026ÿÿÿ.62406322ÿ|
ÿÿ|ÿÿÿ4ÿÿÿ.61863805ÿÿÿ.61345668ÿÿÿ.62381942ÿ|
ÿÿ|ÿÿÿ5ÿÿÿ.62230284ÿÿÿ.61743457ÿÿÿÿ.6271711ÿ|
ÿÿ|ÿÿÿ6ÿÿÿ.61453096ÿÿÿ.60933869ÿÿÿ.61972324ÿ|
ÿÿ|ÿÿÿ7ÿÿÿ.61932345ÿÿÿ.61436164ÿÿÿ.62428525ÿ|
ÿÿ+-----------------------------------------+

.ÿ
.ÿ*
.ÿ*ÿBeginÿhere
.ÿ*
.ÿquietlyÿnlcomÿ///
>ÿÿÿÿÿÿÿÿÿ(y1:ÿ1ÿ-ÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:1b.yea])^2ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:1b.yea])^2)ÿ/ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:1b.yea])ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:1b.yea]))^2)ÿ///
>ÿÿÿÿÿÿÿÿÿ(y2:ÿ1ÿ-ÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:2.yea])^2ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:2.yea])^2)ÿ/ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:2.yea])ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:2.yea]))^2)ÿ///
>ÿÿÿÿÿÿÿÿÿ(y3:ÿ1ÿ-ÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:3.yea])^2ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:3.yea])^2)ÿ/ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:3.yea])ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:3.yea]))^2)ÿ///
>ÿÿÿÿÿÿÿÿÿ(y4:ÿ1ÿ-ÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:4.yea])^2ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:4.yea])^2)ÿ/ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:4.yea])ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:4.yea]))^2)ÿ///
>ÿÿÿÿÿÿÿÿÿ(y5:ÿ1ÿ-ÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:5.yea])^2ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:5.yea])^2)ÿ/ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:5.yea])ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:5.yea]))^2)ÿ///
>ÿÿÿÿÿÿÿÿÿ(y6:ÿ1ÿ-ÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:6.yea])^2ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:6.yea])^2)ÿ/ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:6.yea])ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:6.yea]))^2)ÿ///
>ÿÿÿÿÿÿÿÿÿ(y7:ÿ1ÿ-ÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:7.yea])^2ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:7.yea])^2)ÿ/ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1ÿ+ÿexp(_b[2:_cons]ÿ+ÿ_b[2:7.yea])ÿ+ÿexp(_b[3:_cons]ÿ+ÿ_b[3:7.yea]))^2),ÿpost

.ÿ
.ÿ//ÿConfirmingÿcorrectnessÿofÿ-nlcom-ÿ(compareÿpointÿestimateÿandÿconfidenceÿboundsÿwithÿtheÿlistingÿabove)
.ÿnlcom

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿracÿ|ÿCoefficientÿÿStd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿy1ÿ|ÿÿÿ.6242502ÿÿÿ.0025203ÿÿÿ247.69ÿÿÿ0.000ÿÿÿÿÿ.6193105ÿÿÿÿ.6291899
ÿÿÿÿÿÿÿÿÿÿy2ÿ|ÿÿÿ.6178794ÿÿÿÿ.002671ÿÿÿ231.33ÿÿÿ0.000ÿÿÿÿÿ.6126444ÿÿÿÿ.6231143
ÿÿÿÿÿÿÿÿÿÿy3ÿ|ÿÿÿ.6189267ÿÿÿ.0026207ÿÿÿ236.17ÿÿÿ0.000ÿÿÿÿÿ.6137903ÿÿÿÿ.6240632
ÿÿÿÿÿÿÿÿÿÿy4ÿ|ÿÿÿÿ.618638ÿÿÿ.0026436ÿÿÿ234.01ÿÿÿ0.000ÿÿÿÿÿ.6134567ÿÿÿÿ.6238194
ÿÿÿÿÿÿÿÿÿÿy5ÿ|ÿÿÿ.6223028ÿÿÿ.0024839ÿÿÿ250.54ÿÿÿ0.000ÿÿÿÿÿ.6174346ÿÿÿÿ.6271711
ÿÿÿÿÿÿÿÿÿÿy6ÿ|ÿÿÿÿ.614531ÿÿÿ.0026492ÿÿÿ231.97ÿÿÿ0.000ÿÿÿÿÿ.6093387ÿÿÿÿ.6197232
ÿÿÿÿÿÿÿÿÿÿy7ÿ|ÿÿÿ.6193234ÿÿÿ.0025316ÿÿÿ244.64ÿÿÿ0.000ÿÿÿÿÿ.6143616ÿÿÿÿ.6242853
------------------------------------------------------------------------------

.ÿ
.ÿ/*ÿOrthogonalÿpolynomialÿcontrast:
>ÿÿÿÿ(linearÿcomponent'sÿcoefficients:ÿ-3ÿ-2ÿ1ÿ0ÿ1ÿ2ÿ3)ÿ*/
.ÿtestÿ-3ÿ*ÿy1ÿ-ÿ2ÿ*ÿy2ÿ-ÿy3ÿ+ÿ0ÿ*ÿy4ÿ+ÿy5ÿ+ÿ2ÿ*ÿy6ÿ+ÿ3ÿ*ÿy7ÿ=ÿ0

ÿ(ÿ1)ÿÿ-ÿ3*y1ÿ-ÿ2*y2ÿ-ÿy3ÿ+ÿy5ÿ+ÿ2*y6ÿ+ÿ3*y7ÿ=ÿ0

ÿÿÿÿÿÿÿÿÿÿÿchi2(ÿÿ1)ÿ=ÿÿÿÿ1.78
ÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿ=ÿÿÿÿ0.1827

.ÿ
.ÿexit

endÿofÿdo-file

.

I believe that Jeff''s suggestion of fitting a linear regression model or usage of -fracreg- would be more suitable if you had a whole bunch of [0, 1] values for each year, that is, if you had a sample containing numerous diversity index values per year. But my understanding from your original post is that you have a single computed diversity index per year—a single derived datapoint per year—for each of seven years.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2148
#7

03 Dec 2021, 08:18

I see now there are only seven observations; I should've paid more attention to that. Then the second approach is just plotting the 7 different points, and there cannot be any kind of confidence interval. Using c.year doesn't give a perfect fit, but with n = 7 and 5 degrees of freedom you cannot use a normal confidence interval anyway -- especially with a linear model. Plus, there is almost always serial correlation in these time series. I hope Jennifer has more than 7 years of data for the remaining analysis.
Comment
Jennifer Carson

Join Date: Apr 2019

Posts: 21
#8

03 Dec 2021, 08:53

thank you so much both! Joseph, this code is amazing, very many thanks!!! Yes, to clarify: the raw dataset I am using using has thousands of people, but by definition the diversity index is a single derived data point for the whole population (in this case I have one per year). Formula for diversity index below:

DI = 1 – (H² + W² + B² + AIAN² + Asian² + NHPI² + SOR² + Multi²) ...where H is the proportion of Hispanics in population, W is the proportion of whites, etc.

Jeff, it seemed Joseph approach’s to calculate the CIs involved using the actual counts of each category. For example, before he runs:

Code:

quietly mlogit rac i.yea [fweight=count],baseoutcome(1)

the data was transformed to look like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(yea rac) int(tot count) 1 1 9075 4475 1 2 9075 2712 1 3 9075 1888 2 1 9175 4627 2 2 9175 2730 2 3 9175 1818 3 1 9204 4613 3 2 9204 2777 3 3 9204 1814 4 1 9231 4643 4 2 9231 2749 4 3 9231 1839 5 1 9619 4767 5 2 9619 2901 5 3 9619 1951 6 1 9730 4947 6 2 9730 2931 6 3 9730 1852 7 1 9994 5020 7 2 9994 2962 7 3 9994 2012 end

I’m very grateful for the time both of you have taken to respond to this post!!!
Comment
Jennifer Carson

Join Date: Apr 2019

Posts: 21
#9

05 Dec 2021, 08:33

Originally posted by Jeff Wooldridge View Post

Then the second approach is just plotting the 7 different points, and there cannot be any kind of confidence interval..

Jeff, I'm curious about your comment that there cannot be any kind of confidence interval here - would you mind explaining why?
Comment

Announcement

Fractional logit model for proportions over time

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment