Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fractional logit model for proportions over time

    Dear all, I have calculated a “Diversity Index” for a given population. Per the census website, the DI: “the DI tells us the chance that two people chosen at random will be from different racial and ethnic groups….The DI is bounded between 0 and 1, with a zero-value indicating that everyone in the population has the same racial and ethnic characteristics, while a value close to 1 indicates that everyone in the population has different characteristics.” (I put the full equation below)

    I’m running a fractional logit model with the DI as the dependent variable and year as the independent variable. I’d like to plot the trend line and 95% CIs around the trend. Here is my code. First, I use dataex to show the data; then I show the model with continuous year; then the model with categorical year.

    Questions:

    1) any obvious problems with this approach? In particular, I wasn’t sure if I need to make any adjustments to the fractional logit code for the fact this the same group of individuals over time, or maybe use a different approach to fractional logit.

    2) better to include year as c.year or i.year? The plots look quite different.

    I am using Stata 14.

    Thank you!!!

    Code:
    ******************************DATA
     dataex di_rev year_r
    
    ----------------------- copy starting from the next line ---------------------
    > --
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(di_rev year_r)
    .34123 1
    .35147 2
    .36345 3
    .37255 4
    .39094 5
    .39714 6
    .39895 7
    end
    
    ------------------ copy up to and including the previous line ----------------
    > --
    
    Listed 7 out of 7 observations
    Code:
    ************************************OPTION 1: WITH CONTINUOUS YEAR
    
    . fracreg logit di_rev c.year_r
    
    Iteration 0:   log pseudolikelihood = -5.3012582  
    Iteration 1:   log pseudolikelihood = -4.6198733  
    Iteration 2:   log pseudolikelihood = -4.6196722  
    Iteration 3:   log pseudolikelihood = -4.6196722  
    
    Fractional logistic regression                  Number of obs     =          7
                                                    Wald chi2(1)      =     163.74
                                                    Prob > chi2       =     0.0000
    Log pseudolikelihood = -4.6196722               Pseudo R2         =     0.0014
    
    ------------------------------------------------------------------------------
                 |               Robust
          di_rev |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          year_r |   .0446101   .0034862    12.80   0.000     .0377772     .051443
           _cons |  -.6959255   .0097576   -71.32   0.000    -.7150501   -.6768008
    ------------------------------------------------------------------------------
    
    . quietly margins, at(year_r=(1(1)7))
    
    . marginsplot
    
      Variables that uniquely identify margins: year_r
    Click image for larger version

Name:	test2.png
Views:	2
Size:	45.3 KB
ID:	1638846



    ***********************************OPTION 2: WITH CATEGORICAL YEAR:

    Code:
    .
    . fracreg logit di_rev i.year_r
    note: 7.year_r omitted because of collinearity
    
    Iteration 0:   log pseudolikelihood = -5.3011755  
    Iteration 1:   log pseudolikelihood = -4.6196655  
    Iteration 2:   log pseudolikelihood = -4.6194615  
    Iteration 3:   log pseudolikelihood = -4.6194615  
    
    Fractional logistic regression                  Number of obs     =          7
                                                    Wald chi2(0)      =          .
                                                    Prob > chi2       =          .
    Log pseudolikelihood = -4.6194615               Pseudo R2         =     0.0015
    
    ------------------------------------------------------------------------------
                 |               Robust
          di_rev |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          year_r |
              2  |   .0452339   1.05e-11  4.3e+09   0.000     .0452339    .0452339
              3  |   .0973966   6.01e-11  1.6e+09   0.000     .0973966    .0973966
              4  |    .136525   1.20e-10  1.1e+09   0.000      .136525     .136525
              5  |   .2144551   2.28e-10  9.4e+08   0.000     .2144551    .2144551
              6  |   .2404216   2.45e-10  9.8e+08   0.000     .2404216    .2404216
              7  |   .2479757   2.47e-10  1.0e+09   0.000     .2479757    .2479757
                 |
           _cons |  -.6578177   1.96e-13 -3.4e+12   0.000    -.6578177   -.6578177
    ------------------------------------------------------------------------------
    
    . quietly margins i.year_r
    
    . marginsplot
    
      Variables that uniquely identify margins: year_r
    Click image for larger version

Name:	test1.png
Views:	2
Size:	46.4 KB
ID:	1638847








    ----------------------------------------------------------------------------------------------------------

    FYI, DIVERSITY INDEX EQUATION BELOW:





    Diversity Index Equation



    DI = 1 – (H² + W² + B² + AIAN² + Asian² + NHPI² + SOR² + Multi²)



    H is the proportion of the population who are Hispanic or Latino.

    W is the proportion of the population who are White alone, not Hispanic or Latino.

    B is the proportion of the population who are Black or African American alone, not Hispanic or Latino.

    AIAN is the proportion of the population who are American Indian and Alaska Native alone, not Hispanic or Latino.

    Asian is the proportion of the population who are Asian alone, not Hispanic or Latino.

    NHPI is the proportion of the population who are Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino.

    SOR is the proportion of the population who are Some Other Race alone, not Hispanic or Latino.

    MULTI is the proportion of the population who are Two or More Races, not Hispanic or Latino.



    Source: https://www.census.gov/library/visua...20-census.html





  • #2
    You have accidentally posted your topic in Statalist's Mata Forum, which is used for discussions of Stata's Mata language, which is different than Stata's command language. Your question will see a more appropriate, and much larger audience if you post it in Statalist's General Forum.

    Comment


    • #3
      Thank you so much. I have just posted it now to the General Forum.

      Comment

      Working...
      X