Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • statistical significance ratio difference

    Hello everybody, I have a question about to testing statisticas significance between two ratios.
    I have a data as follows:
    PHP Code:
    ano    sexo    edad    expr    varstrat    varunit    activ
    2013    mujer    36    32    11121    11121100    ocupados
    2013    mujer    30    32    11121    11121100    ocupados
    2013    hombre    41    32    11121    11121100    ocupados
    2013    mujer    41    32    11121    11121100    ocupados
    2013    mujer    10    32    11121    11121100    
    2013    hombre    9    32    11121    11121100    
    2013    hombre    73    32    11121    11121100    inactivo
    2013    mujer    73    32    11121    11121100    inactivo
    2013    mujer    60    32    11121    11121100    ocupados
    2013    mujer    49    32    11121    11121100    ocupados
    2013    mujer    21    32    11121    11121100    ocupados
    2013    mujer    17    32    11121    11121100    inactivo
    2013    hombre    1    32    11121    11121100    
    2013    hombre    37    32    11121    11121100    ocupados
    2013    mujer    33    32    11121    11121100    ocupados
    2013    hombre    3    32    11121    11121100    
    .
    .

    In summary, I have over 400.000 observations from a survey data from 2013 and 2017. I need to test stastistical significance between two ratio: The percentage of employed people in 2013 vs 2017. To do that I use the svy and ratio command, and to test the difference I use de lincom command, and I got the following:

    PHP Code:
    SurveyRatio estimation

    Number of strata 
    =     607       Number of obs   =     434,930
    Number of PSUs   
    =   3,464       Population size =  35,080,531
                                     Subpop
    noobs =     106,224
                                     Subpop
    size    =   8,671,466
                                     Design df       
    =       2,857

         _ratio_1
    tredad_1/ocupados

             2013
    ano 2013
             2017
    ano 2017

    --------------------------------------------------------------
                 |             
    Linearized
            Over 
    |      Ratio   StdErr.     [95ConfInterval]
    -------------+------------------------------------------------
    _ratio_1     |
            
    2013 |   .2332271   .0033352      .2266874    .2397668
            2017 
    |   .2243416   .0038795      .2167346    .2319486
    --------------------------------------------------------------

    lincom  [_ratio_1]2013 - [_ratio_1]2017

     
    1)  [_ratio_1]2013 - [_ratio_1]2017 0

    ------------------------------------------------------------------------------
           
    Ratio |      Coef.   StdErr.      t    P>|t|     [95ConfInterval]
    -------------+----------------------------------------------------------------
             (
    1) |   .0088855   .0051192     1.74   0.083    -.0011521    .0189231
    ------------------------------------------------------------------------------ 
    Therefore, my question is whether it is correct to use the lincom command to obtain the statistical signficance difference between this two ratios.
    Thank you very much for all your comments.
    Kind Regards.

  • #2
    If it helps, the code I use to get the above results is as follows:
    PHP Code:
    use Base_1clear

    gen tedad
    =.
    replace tedad=if edad>=15 edad<=29
    replace tedad
    =if edad>=30 edad<=44
    replace tedad
    =if edad>=45 edad<=59
    replace tedad
    =if edad>=60
    tab tedad
    g(tredad_)
    # delimit;
        
    label define tedad
        1 
    "15-29 años"
        
    "30-44 años"
        
    "45-59 años"
        
    "60 años o más";
    # delimit cr
    label values tedad tedad
    label variable tedad 
    "T Edad"

    gen ocupados= (activ==1)

    svyset varunit [w=expr], strata(varstratvce(linearizedsingleunit(certainty)
    svysubpop(if activ == sexo==1): ratio tredad_1/ocupadosover(ano)
    lincom  [_ratio_1]2013 - [_ratio_1]2017 

    Comment


    • #3
      Yes it's perfectly all right to use lincom in this way. However, basing your conclusion on "statistical significance" alone might be wrong. If there had been a census (complete enumeration) in both years, would the employment rates be identical? Of course not. So before doing the calculation, you know that the null hypothesis is false. The real question is "how different" were the rates and that question is answered by the confidence interval for the difference.

      I might also be informative to report the ratio of the rates or the percent change in rates:
      Code:
      nlcom 100*([_ratio_1]2017-[_ratio_1]2013)/ [_ratio_1]2013
      Last edited by Steve Samuels; 30 Oct 2018, 08:46.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Thank you very much for your comments Steve. I perefectly understand your point. So, in this case, how can I interpret the confidence interval to answer the real quiestion of "how different" are? Or I have to do something else (another command) to get the confidence interval for the difference?

        Also, with the command that you gave me I got the following:
        PHP Code:
        nlcom 100*([_ratio_1]2017-[_ratio_1]2013)/ [_ratio_1]2013

               _nl_1
        :  100*([_ratio_1]2017-[_ratio_1]2013)/ [_ratio_1]2013

        ------------------------------------------------------------------------------
               
        Ratio |      Coef.   StdErr.      z    P>|z|     [95ConfInterval]
        -------------+----------------------------------------------------------------
               
        _nl_1 |  -3.809808   2.159771    -1.76   0.078    -8.042881    .4232647
        ------------------------------------------------------------------------------ 
        How can I interpret that?
        Last edited by Nicolas Rodriguez; 30 Oct 2018, 08:59.

        Comment


        • #5


          1. To interpret the original CI of the difference, I would 1) put the rates in terms of the units in which they are usually reported (per ten thousand?) and 2) express it as a change from 2013
          Code:
          10000*[_ratio_1]2017 - [_ratio_1]2013
          Then the estimated change per 10,000 would be - 88.9 with confidence interval [ - 189.2 to +11.5]

          2. On looking at the relative decrease in rates or the ratio of rates, I don't find either one easy to understand. The ratio is only a little better:

          The employment rate in 2017 was 96.2% of the rate in 2013. Confidence interval 92.0% - 100.4%

          My advice is to stick to the change.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment

          Working...
          X