Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting observations from a multivariate logistic regression

    Hello,

    I wish to know how I can determine the number of observations for an outcome variable (case or control) by an independent variable (intervention) that has been incorporated in an adjusted logistic regression model for a case-control study.

    In this example the model has dropped 7 observations due to "predicting failure perfectly". So I would need to show a total of cases/controls x intervention which also has excluded these seven observations.

    I have installed the "distinct" module, but I don't know how to apply it to a regression analysis as opposed to an entire dataset. Perhaps this is the wrong approach.

    Any help is appreciated.

    Thank you
    Jocelynne

    logistic outcome independent_1 i.age_mth_test_grp i.family_smoking_miss i.state2 i.month if year==2016 & tested_7d==0 & age_mth_test_grp<5
    note: 2.state2 != 0 predicts success perfectly
    2.state2 dropped and 4 obs not used

    note: 5.state2 != 0 predicts failure perfectly
    5.state2 dropped and 3 obs not used


    Logistic regression Number of obs = 63
    LR chi2(15) = 12.08
    Prob > chi2 = 0.6728
    Log likelihood = -33.330255 Pseudo R2 = 0.1534

    -------------------------------------------------------------------------------------
    outcome | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
    --------------------+----------------------------------------------------------------
    independent_1 | 1.022343 .7859253 0.03 0.977 .2265874 4.61272
    |
    age_mth_test_grp |
    1 | 3.119581 2.807651 1.26 0.206 .5345734 18.20477
    2 | .9349802 1.190389 -0.05 0.958 .0771041 11.33776
    3 | 2.846745 3.486589 0.85 0.393 .2581238 31.39562
    4 | 2.495358 2.854672 0.80 0.424 .2650723 23.491
    |
    |
    family_smoking_miss |
    1 | 7.982311 9.941517 1.67 0.095 .6950327 91.67524
    9 | 4.259453 9.634958 0.64 0.522 .0505733 358.7456
    |
    state2 |
    2 | 1 (empty)
    4 | 1.123285 .9685987 0.13 0.893 .2072555 6.087986
    5 | 1 (empty)
    |
    month |
    6 | .5933001 1.113463 -0.28 0.781 .0149899 23.48274
    7 | .1460062 .2688005 -1.05 0.296 .0039563 5.388294
    8 | .3355456 .587782 -0.62 0.533 .0108314 10.39485
    9 | .0646431 .129849 -1.36 0.173 .001261 3.313871
    10 | .2031119 .4800371 -0.67 0.500 .001977 20.86772
    |
    _cons | .7504512 1.38808 -0.16 0.877 .0199935 28.16803
    -------------------------------------------------------------------------------------
    Attached Files

  • #2
    Code:
    // open example data
    sysuse nlsw88, clear
    
    // prepare the data
    
    gen byte highoc = occupation < 3 if !missing(occupation)
    label variable highoc "high occupation"
    label define highoc 1 "higher" ///
                        0 "lower"
    label value highoc highoc
    
    
    // estimate logistic regression
    logit highoc i.collgrad ttl_exp i.race i.south i.union, or
    
    // find the number of observations by highoc and collgrad
    // for those observations used in the previous model
    tab highoc collgrad if e(sample)
    
    // find the number of observations by highoc and collgrad
    // for those observations in the data
    tab highoc collgrad
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      That is at most multiple logistic regression, where multiple means having several predictors. That term is fast fading away as a term of art: having several predictors is the norm, not exceptional.

      "Multivariate" as applied to models should mean several variables as responses or outcomes: see e.g. mvreg.

      To see what has been omitted (a term I prefer to dropped here, but that really is taste) you can go after the model fit

      Code:
      list outcome independent_1 age_mth_test_grp family_smoking_miss state2 month if year==2016 & tested_7d==0 & age_mth_test_grp<5 & !e(sample) 
      where !e(sample) is code for "not in the estimation sample". See e.g. https://www.stata-journal.com/articl...article=dm0030 for more.

      I see 15 parameters being estimated from 63 observations, which seems optimistic to me, but you may well be on a road towards a simpler model.
      Last edited by Nick Cox; 05 May 2020, 01:22.

      Comment


      • #4
        Thank you both very kindly, I have been able to get the number breakdown needed.

        Very helpful,
        Cheers
        Jocelynne

        Comment

        Working...
        X