Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with observations quantity

    Dear Forum Users,

    I am working on a binomial logistic regression with around 16mln observations (panel data). My model analyzes the investors' decision to invest. I use logit command and it involves fixed effects. My main fixed effects are investor and the firm (they invest in) IDs. There are more than 7k investor and 4k firm IDs. Running the whole data is very time-consuming. Therefore, I use random sampling to select 100k subsample out of my sample to test the model. It runs around 12-24hrs, depending on the number of independent variables and additional fixed effects. When I run a simple model without fixed effects the number of observations is around 94k. However, when I run the same model with firm ID fixed effects the number of observations drops to 10k. There are no missing values in ID dummies. Do you have any idea of why it is so few?

    Kind regards,
    Firangiz

  • #2
    Possibly, the outcome does not vary for a large number of investors in the dataset. How large is your \(T\) dimension? If you have just a few observations per investor, you need to estimate a conditional FE logit model due to the incidental parameters problem. But that may itself be a problem given the size of your dataset. The example below illustrates the issue of time-invariant outcomes for some units.

    Code:
    webuse union, clear
    xtset idcode year
    xtlogit union age grade i.not_smsa south##c.year, fe
    Res.:

    Code:
    . xtlogit union age grade i.not_smsa south##c.year, fe
    note: multiple positive outcomes within groups encountered.
    note: 2,744 groups (14,165 obs) dropped because of all positive or
          all negative outcomes.
    
    Iteration 0:   log likelihood = -4516.5881  
    Iteration 1:   log likelihood = -4510.8906  
    Iteration 2:   log likelihood =  -4510.888  
    Iteration 3:   log likelihood =  -4510.888  
    
    Conditional fixed-effects logistic regression   Number of obs     =     12,035
    Group variable: idcode                          Number of groups  =      1,690
    
                                                    Obs per group:
                                                                  min =          2
                                                                  avg =        7.1
                                                                  max =         12
    
                                                    LR chi2(6)        =      78.60
    Log likelihood  =  -4510.888                    Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
           union | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             age |   .0710973   .0960536     0.74   0.459    -.1171643    .2593589
           grade |   .0816111   .0419074     1.95   0.051    -.0005259     .163748
      1.not_smsa |   .0224809   .1131786     0.20   0.843     -.199345    .2443069
         1.south |  -2.856488   .6765694    -4.22   0.000    -4.182539   -1.530436
            year |  -.0636853   .0967747    -0.66   0.510    -.2533602    .1259896
                 |
    south#c.year |
              1  |   .0264136   .0083216     3.17   0.002     .0101036    .0427235
    ------------------------------------------------------------------------------
    
    .

    Comment


    • #3
      You are right, Andrew. Thank you for your answer. I will need to increase the number of observations in my sample to get proper results.

      Comment

      Working...
      X