Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logit wih fixed effects taking forever

    Hi all,

    I am trying to calculate the simple regression below:
    Code:
    logit female age i.office_id#i.year#i.d25, cluster(employee_id)
    Outcome is the binary variable of the company's client being a female or not. The independent variable is the age of the employee of the company who is assigned to work with the client. For fixed effects, I have an interaction of the office the employee works at, year and d25 which is a binary variable (=1 if age of the client is above 25.)
    I have also clustered at the employee level.

    Now I have two questions:
    1. Is it the correct way to include my fixed effects in a logit regression?
    2. Why is it taking so so long to run?

    I have about 1,400,000 clients, 3,362 unique employees, 190 offices (unique office ids), and 12 years.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(female employee_id office_id year age age_client d25)
    1 111 40 2002 24 45 1
    0 111 40 2002 24 29 1
    1 112 41 2002 36 32 1
    1 112 41 2003 37 23 0
    1 112 41 2004 38 22 0
    0 112 41 2004 38 23 0
    0 113 41 2002 40 40 1
    1 114 42 2006 20 37 1
    0 114 42 2007 21 36 1
    1 114 42 2007 21 19 0
    0 115 43 2006 42 26 1
    0 115 43 2006 42 29 1
    1 116 41 2006 23 34 1
    1 116 41 2007 24 42 1
    end

  • #2
    Hi Neg
    In general, Maximum Likelihood/nonlinear models take longer to converge, because there is no "closed-form" solution to the problem. Instead, it needs to do an interactive process.\
    Now, the more parameters you have, there more iterations Stata needs to find the correct solution. In your case, you have 190*12*2 parameters, which is a lot. On top of that, you have over a million observations. It would be no surprise that it takes a long time give you results.
    Perhaps something you may want to consider, and may explain why your results take longer than expected to show anything. Make sure you have variation within each possible subgroup.
    In other words, if you create a new variable that combines officeXyearXage_dummy, answer the following:
    - do you still see both men and women across all subgroups?.
    - Is the proportion roughly balanced between men and women (in other words, you do not have few groups with say 1 man 99 women. They will also be problematic.

    A simple way of dealing with this. Estimate the model using OLS, and pay attention to groups that very high T-statistics (Low SE).

    HTH
    F

    Comment


    • #3
      Originally posted by FernandoRios View Post
      Hi Neg
      In general, Maximum Likelihood/nonlinear models take longer to converge, because there is no "closed-form" solution to the problem. Instead, it needs to do an interactive process.\
      Now, the more parameters you have, there more iterations Stata needs to find the correct solution. In your case, you have 190*12*2 parameters, which is a lot. On top of that, you have over a million observations. It would be no surprise that it takes a long time give you results.
      Perhaps something you may want to consider, and may explain why your results take longer than expected to show anything. Make sure you have variation within each possible subgroup.
      In other words, if you create a new variable that combines officeXyearXage_dummy, answer the following:
      - do you still see both men and women across all subgroups?.
      - Is the proportion roughly balanced between men and women (in other words, you do not have few groups with say 1 man 99 women. They will also be problematic.

      A simple way of dealing with this. Estimate the model using OLS, and pay attention to groups that very high T-statistics (Low SE).

      HTH
      F
      Thank you Fernando!
      You are right, for some of the subgroups, I do not have enough variation. Do you recommend dropping those and run an OLS?
      By the way, what constitutes "enough variation"? Is there a threshold or rule of thumb?

      Kind regards,
      Negar

      Comment


      • #4
        As in most cases , it’s an empirical question
        For example, what is the purpose of the regression ?
        for most cases ols May work just fine.

        alternatively, you could drop groups for which the fixed effects are too significant in the ols model. Say a t above 10?
        if you look at the output you will notice them.

        then try again using logit.

        hth

        Comment


        • #5
          Originally posted by FernandoRios View Post
          As in most cases , it’s an empirical question
          For example, what is the purpose of the regression ?
          for most cases ols May work just fine.

          alternatively, you could drop groups for which the fixed effects are too significant in the ols model. Say a t above 10?
          if you look at the output you will notice them.

          then try again using logit.

          hth
          Thanks! I am using the reghdfe command to run the fe model. I know how to save the coefficients of the absorbed fixed effects but I do not have an idea of how to save their p-values or tstats so I can drop those with high values.
          Do you know how?
          Code:
          eststo: reghdfe female age, absorb(i.office_id#i.year#i.d25,savefe) cluster(employee_id)

          Comment


          • #6
            Right, you cant
            Just like with logit, you need to expliclty estimate dummies (Dummy inclusion approach)
            It will take a long time (huge matrix inversion), but will still be less time than doing the same with a logit model.
            F

            Comment


            • #7
              Originally posted by FernandoRios View Post
              Right, you cant
              Just like with logit, you need to expliclty estimate dummies (Dummy inclusion approach)
              It will take a long time (huge matrix inversion), but will still be less time than doing the same with a logit model.
              F
              Thanks Fernando. I have been struggling with it a bit. Do you know how I can save the standard errors of the dummies after I estimate them? Maybe there is a way I can save them in a matrix or in a variable so I can drop the big values after?

              Comment


              • #8
                look into r(table).
                All results from the regression estimates are stored there.
                Otherwise, you can just use estout or outreg to export it to excel/word, and work from there

                Comment

                Working...
                X