Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Defining a subpopulation using subpop()

    Dear all,

    I am working on a survey data, and I want to run a logit regression from a subsample of my data. My database is composed by 130,000 firms (identified by idstd variable), with their corresponding weight. I am interest in those firms for which have a loan, and it was approved in the last 4 years. Variable loan takes value 1 if the firm has a loan, 0 otherwise. Loan_approval indicates the number of years that have passed since the loan was granted. My dependent variable, fin11, is a binary one that records if the loan obtained, nedeed collateral or not. This is my code:

    Code:
    svyset, clear
    svyset idstd [pweight=wt], strata(strata) singleunit(scaled)
    svy, subpop(k8 if loan_duration<=4): logit fin11 n_outcome i.k9 lnemployees
    margins, dydx(*)

    One of the explanatory variables, k9, records the type of financial institution which granted the loan. So, with all this on hand, I am asking if I define my subpopulation properly, taking into account that loan_duration=k9=fin11=. when loan=0 | loan=.

    Here I post the output of my code:

    Code:
    . svy, subpop(loan if loan_duration<=4): logit fin11 n_outcome i.k9 lnemployees
    (running logit on estimation sample)
    
    Survey: Logistic regression
    
    Number of strata   =     1,049                  Number of obs     =    123,517
    Number of PSUs     =   123,517                  Population size   =  8,216,935
                                                    Subpop. no. obs   =     31,993
                                                    Subpop. size      =  1,828,643
                                                    Design df         =    122,468
                                                    F(   5, 122464)   =      15.01
                                                    Prob > F          =     0.0000
    
    ---------------------------------------------------------------------------------------------------------
                                            |             Linearized
                                      fin11 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ----------------------------------------+----------------------------------------------------------------
                                  n_outcome |  -.4922472   .1218816    -4.04   0.000    -.7311332   -.2533613
                                            |
                                         k9 |
    State-owned banks or government agency  |   .4004102   .1653038     2.42   0.015     .0764176    .7244028
           Non-bank financial institutions  |  -.0593137   .2005896    -0.30   0.767    -.4524659    .3338386
                                     Other  |  -.6965148   .3219102    -2.16   0.030    -1.327453   -.0655761
                                            |
                                lnemployees |   .2936962   .0494761     5.94   0.000      .196724    .3906685
                                      _cons |  -.1388981   .1671363    -0.83   0.406    -.4664825    .1886863
    ---------------------------------------------------------------------------------------------------------
    Note: 196 strata omitted because they contain no subpopulation members.
    Note: Variance scaled to handle strata with a single sampling unit.
    
    . margins, dydx(*)
    
    Average marginal effects                        Number of obs     =     42,890
    Model VCE    : Linearized
    
    Expression   : Pr(fin11), predict()
    dy/dx w.r.t. : n_outcome 2.k9 3.k9 4.k9 lnemployees
    
    ---------------------------------------------------------------------------------------------------------
                                            |            Delta-method
                                            |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ----------------------------------------+----------------------------------------------------------------
                                  n_outcome |  -.1081822    .027127    -3.99   0.000    -.1613506   -.0550138
                                            |
                                         k9 |
    State-owned banks or government agency  |   .0866893   .0347437     2.50   0.013     .0185923    .1547864
           Non-bank financial institutions  |  -.0135546   .0460373    -0.29   0.768     -.103787    .0766778
                                     Other  |  -.1634197   .0752488    -2.17   0.030    -.3109061   -.0159332
                                            |
                                lnemployees |   .0645462   .0107651     6.00   0.000     .0434468    .0856456
    ---------------------------------------------------------------------------------------------------------
    Note: dy/dx for factor levels is the discrete change from the base level.
    
    .
    However, I am struggling because I cannot understand what observations is Stata using, above all when I get the margins. I do not understand from where those 42,980 observationes come.

    I hope you can help me.
    Thank you in advanced!!

    Last edited by Ibai Ostolozaga Falcon; 22 Jun 2023, 02:30.

  • #2
    PD: My main doubt is related with the number of observations after running "marging, dydx(*)", or the number of observations in the logit regression. Where they come from ?
    Last edited by Ibai Ostolozaga Falcon; 22 Jun 2023, 06:35.

    Comment


    • #3
      I really appreciate any insight or explanation

      Comment


      • #4
        Consider specifying option subpop() on margins to match that of your original estimation.

        Comment

        Working...
        X