Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How much power has a question in my questionnaire?

    Hi,
    I am running statistics on a questionnaire I have conducted. On a particular question, I have two predictor variables that I know correlate somewhat with each other. Both predictor variables are statistically significant on their own (tested using Chi square). When I use logistic regression neither of them are statistically significant. I want to perform a kind of power analysis to check whether or not this might be because of low power (in that case a possible type II error).

    Response variable: Dichotomous
    Predictor variable 1: Ordinal
    Predictor variable 2: Nominal
    R¨2 between predictor variables: 0.35

    Let' say the question has the alternatives yes/no.

    Code:
    Predictor variable 1:
                          |   3 quantiles of Predictor 1
      Answer              |         1          2          3 |     Total
    ----------------------+---------------------------------+----------
    Yes                   |        16         42         28 |        86 
    No                    |        94        121         45 |       260 
    ----------------------+---------------------------------+----------
                    Total |       110        163         73 |       346 
    
    
    Predictor variable 2:
                          | Region
      Answer              | Northern   Western E  Southern   Eastern E      Other |     Total
    ----------------------+-------------------------------------------------------+----------
    Yes                   |        27         35         12          9          4 |        87 
    No                    |        33         90         82         44         15 |       264 
    ----------------------+-------------------------------------------------------+----------
                    Total |        60        125         94         53         19 |       351 
    I hope you understand what I mean, do not hesitate to point out any flaws in my reasoning. Can anyone give me helpful advice on how to proceed? Thankyou in advance.

  • #2
    You didn't get a quick answer. You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

    There are power calculators in Stata 16. Whether they do logit, I don't know. However, if you really have 350 observations and only 2 rhs variables, then the issue is not power. I don't know what the columns mean in your tables. But, it may be that you have many more parameters because these take on multiple values (and I assume are treated as creating dummies). But, even with 8 parameters, 340 observations seems like sufficient.

    Have you looked at the colinearity diagnostics?

    Comment


    • #3
      Thank you for your response, it is definitely helpful although I am struggling..

      Sample data using dataex, 5 first observations:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float Questioncase2_dicho byte(GDP_10000USD_Tertile Country_categories)
      . 3 1
      1 3 1
      1 3 1
      1 3 1
      1 3 1
      end
      label values Questioncase2_dicho question2_dicho
      label def question2_dicho 1 "Prophylactic intervention (endovascular or surgical)", modify
      label values Country_categories countrycategories
      label def countrycategories 1 "Western Europe", modify
      label var GDP_10000USD_Tertile "3 quantiles of GDP_10000USD" 
      label var Country_categories "Region (0=Northern, 1=Western, 2=Southern, 3=Eastern, 4=Other)"
      The logistic regression:
      Code:
      . logit Questioncase2_dicho GDP_10000USD_Tertile Country_categories 
      
      Iteration 0:   log likelihood = -194.01672  
      Iteration 1:   log likelihood = -185.61497  
      Iteration 2:   log likelihood = -185.42824  
      Iteration 3:   log likelihood = -185.42796  
      Iteration 4:   log likelihood = -185.42796  
      
      Logistic regression                             Number of obs     =        346
                                                      LR chi2(2)        =      17.18
                                                      Prob > chi2       =     0.0002
      Log likelihood = -185.42796                     Pseudo R2         =     0.0443
      
      --------------------------------------------------------------------------------------
       Questioncase2_dicho |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ---------------------+----------------------------------------------------------------
      GDP_10000USD_Tertile |  -.3008431   .2525566    -1.19   0.234    -.7958449    .1941587
        Country_categories |   .3453159   .1804626     1.91   0.056    -.0083843    .6990161
                     _cons |   1.223779   .7075544     1.73   0.084    -.1630026     2.61056
      --------------------------------------------------------------------------------------
      
      .
      Here, none of the predictor variables are statistically significant. Tested on their own (logit Questioncase2_dicho GDP_10000USD_Tertile), they both are statistically significant with p=.000.

      Below is how I have checked the correlation between the two predictor variables:
      Code:
      . corr GDP_10000USD_Tertile Country_categories 
      (obs=415)
      
                   | GDP_10~e Countr~s
      -------------+------------------
      GDP_10000U~e |   1.0000
      Country_ca~s |  -0.7528   1.0000
      
      
      . regress GDP_10000USD_Tertile Country_categories 
      
            Source |       SS           df       MS      Number of obs   =       415
      -------------+----------------------------------   F(1, 413)       =    540.08
             Model |  122.017723         1  122.017723   Prob > F        =    0.0000
          Residual |  93.3075778       413  .225926338   R-squared       =    0.5667
      -------------+----------------------------------   Adj R-squared   =    0.5656
             Total |  215.325301       414  .520109423   Root MSE        =    .47532
      
      ------------------------------------------------------------------------------------
      GDP_10000USD_Ter~e |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------------+----------------------------------------------------------------
      Country_categories |  -.4900547   .0210871   -23.24   0.000    -.5315061   -.4486033
                   _cons |   2.647775   .0413143    64.09   0.000     2.566563    2.728988
      ------------------------------------------------------------------------------------
      Here's how I think:
      1. The two predictor variables correlate (R-squared of 0.57).
      2. Both predictor variables are statistically significant when tested on their own. None of them are when both are entered in the same logit command.
      3. Is this because of low power with the observed effect size (a possible type II error)? I thought I could use some kind of power analysis to partly answer that question.
      4. Is this because the two predictor variables are to closely correlated? How do I check that in a good way?

      Comment


      • #4
        Maybe my example is too specific, or the questions are framed wrongly. However, I'd guess others have wrestled with similar issues. Has anyone got tips on how I can proceed? I am very thankful for any reply.

        Comment

        Working...
        X