Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survival Analysis, competing risk regression or multinomial logit/probit?

    Hi all,
    I am Francesco Chrico, I am not a Stata expert, and I appreciate and value a lot the time you devote to help other people.
    I have a problem/issue with some analyses I am running given that I am not sure which analysis I need to run.
    I will try to give you some details about what I am doing: The dataset I have is of firms where the major owner is a male or female. I consider all the firms in existence in 2004 and follow them until 2008. I have dummy variable for gender (that do not vary over time), and a variable named “outcome” equal to : 1) continuation; 2) merger; 3) sale; 4) dissolution of the company. Of course each observation is linked with the "id" of the firm and the "year" when the exit happened or the firm continue.
    Here the analyses Im trying to make:
    Hypothesis 1: Firms with a female owner are less likely to exit (dissolve, sale or merge) the business (versus continuation) than firms with a male owner. Basically firms with a female owner are more likely to continue.
    Hypothesis 2 :Firms with a female owner are more likely than firms with a male owner to exit by merger than exit by dissolution or sale.
    Hypothesis 3: Firms with a female owner are less likely than firms with a male owner to exit by sale than exit by dissolution.

    Thus, based on H2 and H3:
    Hypothesis 4a. When the exit decision is made, firms with a female owner are more likely than firms with a male owner to follow a descending ranked order of exit preferences: merger, dissolution and sale.
    Hypothesis 4b. When the exit decision is made, firms with a male owner are more likely than firms with a female owner to follow a descending ranked order of exit preferences: sale, dissolution and merger.

    My view is that I can easily test H1 with competing risk regression (stcrreg in STATA) in survival analysis. But I am not sure it is correct to test H2 and H3 with survival analysis (and in case how given that continuation is not part of h2 and h3). H4a and H4b are a simple consequence of H2 and H3 but I m not sure if there is a way to test this empirically.
    Definitively, the exit types (dissolution, sale and merger) are competing events….

    Initially I used multinomial logit but I had problems because of the IIA issue. So I need to use something else. Not sure if survival analysis (and specifically competing risk regression or stratified cox model) may be an option. Or if I need to use multinomial probit, although I admit I dont know how to do it. Rather, for what I read, it seems to me that nested logit is not the best for me (I dont have a nested structure), but I may be wrong.

    Many many thanks and again many thanks for your help.

    Best
    Francesco Chirico

  • #2
    My first reaction to your post was that there are bigger issues to consider. Your outcome variable is an event (dissolution, sold, merged; relative to continuing 'as is') and you want to model the chances of this event ( or a partcular event type) as a function of covariates and time at risk of experiencing the event. But time at risk should presumably be measured from the year in which the firm was founded. But your sample first observes firms in 2004. So, either (1) you know the year in which the firm was founded, so you can model time to event taking account of the 'left truncation' in your spell data (you have a sample from the stock of firms in 2004). Or (2) you don't know when the firm was founded, in which case your data are left censored. This raises many problems for analysis -- most analysts either drop the left-censored cases, or assume that hazard rates do not depend on elapsed time at risk. So, you need to resolve which case you have. Next note that you have annual data, so I suspect that application of continuous time survival analysis methods -- and I include stcrreg and stratified Cox models under this heading -- as not the preferred way to go. You should take account of the interval-censored (discrete time) nature of your data. Related to this, I that your remarks about using multinomial probit rather than multinomial logit are better interpreted as a question about whether you should allow for correlated unobservables in the event-specific hazard rates. (stcrreg or stratified Cox won't help you there).

    For an introductory (and free) discussion of these issues, you might like to browse the materials at "Survival Analysis Using Stata": http://www.iser.essex.ac.uk/survival-analysis . For a discussion of how to fit a multinomial logit with correlated unobservables in Stata, see paper in the Stata Journal 2006, 6(2), by Hahn and Uhlendorff (free download). Whatever, I think that thinking about basic data issues should be your first priority.

    Comment


    • #3
      Stephan, first many many thanks for your great help. I delayed in answering given that I read some of the stuff you suggested. Many thanks. Yes, I dont know the year of foundation, so my data are left censored and I cannot do much about this. I am still working on your suggestions (discrete time, the paper from Hahn et al etc), hope I will be able to run my analyses properly. Yet, in case I use stcrreg, do you think these commands will work?:
      Please note: in my dataset the variable "outcome" is equal to 1) continuation; 2) merger; 3) sale; 4) dissolution

      For H1:
      *Merger vs continuation - competing events sale and dissolution*
      stset time, id(id) failure(outcome==2)
      stcrreg control variables gender_dummy, compete (outcome== 3 4)

      *sale vs continuation - competing events merger and dissolution*
      stset time, id(id) failure(outcome==3)
      stcrreg control variables gender_dummy, compete (outcome== 2 4)

      *dissolution vs continuation - competing events merger and sale*
      stset time, id(id) failure(outcome==4)
      stcrreg control variables gender_dummy, compete (outcome== 2 3)

      Sorry I didnt get then how to test H2 and H3. I didnt get, and very sorry for this but it is because of my little knowledge about statistics procedures, when you say "The preference structure for the reference type is in the constants, and the preference structure of the “other” type is in the sums of the constant and the dummy effect."

      For H2:
      *merger vs dissolution - competing events continuation and sale*
      stset time, id(id) failure(outcome==2)
      stcrreg control variables gender_dummy, compete (outcome== 1 3)
      *merger vs sale - competing events continuation and dissolution*
      stset time, id(id) failure(outcome==2)
      stcrreg control variables gender_dummy, compete (outcome== 1 4)
      For H3:
      *sale vs dissolution - competing events continuation and merger*
      stset time, id(id) failure(outcome==3)
      stcrreg control variables gender_dummy, compete (outcome== 1 2)

      OR maybe I should get rid of the continuation option and focus only on the three exit strategies:
      For H2
      *merger vs dissolution - competing events sale*
      stset time, id(id) failure(outcome==2)
      stcrreg control variables gender_dummy if outcome!=1, compete (outcome== 3)
      *merger vs sale - competing events dissolution*
      stset time, id(id) failure(outcome==2)
      stcrreg control variables gender_dummy if outcome!=1, compete (outcome== 4)

      For H3
      *sale vs dissolution - competing events merger*
      stset time, id(id) failure(outcome==3)
      stcrreg control variables gender_dummy if outcome!=1, compete (outcome== 2)


      Sorry to bother you so much.

      Best

      Francesco

      Comment


      • #4
        Hi Stephen,
        Do you think this approach may work? I am reading a lot:

        Given that I have three exit options (merger, sale, dissolution), I am considering them as competing events theoretically and empirically, and I use the stcrreg command for the analyses:
        PS: "outcome” equal to : 1) continuation; 2) merger; 3) sale; 4) dissolution of the company
        *Merger vs continuation - competing events sale and dissolution*
        stset time, id(id) failure(outcome==2)
        stcrreg control variables male_dummy, compete (outcome== 3 4)
        stcrreg, noshr

        *sale vs continuation - competing events merger and dissolution*
        stset time, id(id) failure(outcome==3)
        stcrreg control variables male_dummy, compete (outcome== 2 4)
        stcrreg, noshr

        *dissolution vs continuation - competing events merger and sale*
        stset time, id(id) failure(outcome==4)
        stcrreg control variables male_dummy, compete (outcome== 2 3)
        stcrreg, noshr

        Basically, by looking at the results, I found that male owners do more merger (SHR: 1.14; coeff: .13), more dissolution (SHR: 1.46; coef: .38) and more sale (SHR: 1.73; coef: .55) compared to female owners.
        MY PROBLEM: By looking at the coefficients, male owners are more likely to engage in exit strategies; and apparently (I know this is still a simple speculation) if they engage in exit their first option will be sale (which has the highest coefficient), then dissolution (which has the middle coefficient), and finally merger (which has the lowest coefficient).
        Now, how can I compare these coefficients or SHR to validate/test this ? I suppose I first need to save the results with “estimates store” for each regression I run, but then I am totally unable to do the comparing analyses for the coefficients. Many many thanks in advance
        Best
        Francesco

        Comment


        • #5
          Sorry, but I don't have time to look through your (very detailed) posts -- marking of exam papers and other commitments take priority. (By the way your posts would be more legible if you enclosed your code fragments within CODE delimiters. Switch the editor on using the "A" button in the bar above where you compose your posts; and also have a look at the Forum FAQ about this.) Be aware that -stcrreg- estimates different concepts than does the other approaches I cited (read up on competing risks -- the Stata Manual is a good place to start). And if you have left-censored data, I don't know how to interpret your results in any case. I'd possibly start walking before running, and fit a simple multinomial regression without duration dependence, as described in my website Lessons.

          Comment


          • #6
            You wrote privately about this. I re-emphasize here that your data require discrete/grouped hazards methods. As you are not interested in cumulative incidence, you need only the grouped hazard models covered in Lesson 6 of Stephen's web page. With these you model each outcome one at a time, treating the others as censored.
            Last edited by Steve Samuels; 30 Jun 2014, 08:10.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              An older but substantially cited article on competing risks with discrete time data is:

              Hill, Daniel H., William Axinn, and Arland Thornton. 1993. "Competing Hazards with Shared Unmeasured Risk Factors." Sociological Methodology, 23: 245-77.

              They describe a method for analyzing the hazard of exit of persons from the "unmarried/single" base state, into either "marriage" or "cohabitation." Their approach involves a two-stage model, with (if I recall correctly) a discrete time hazard model for *any* exit from the base state, coupled with a binary logit model for which kind of exit occurred, conditional on the occurrence of an exit. When I looked at this a few years ago, my impression was that someone more knowledgeable than me could implement this in Stata, without too much heavy lifting. The idea is similar but not identical to a nested logit model for discrete choice.



              Comment


              • #8
                Thanks a lot to all of you. Mike could you please share this article. Unfortuantely I cannot download from my University library (we dont have this journal). Many thanks. My email: [email protected]

                Comment


                • #9
                  Francesco--In this public electronic space, you've asked me to violate copyright law and the contract that journal has with my library. I appreciate your problem, and understand that you meant no wrong, but that's not something I can help you with. You might contact one of the authors directly and see if they have some legal way to help you.

                  Comment


                  • #10
                    You are right Mike, sorry for my inexperience

                    Comment


                    • #11
                      Thanks for the reference, Mike. I'm not sure (literally) that the Hill et al. approach has taken off. My guess is that the nested logit approach is not seen by most as the way to go. Instead, one can incorporate unobserved heterogeneity in each of the destination-specific hazards directly, and allow it to be correlated across exit types, and get around the IIA problem cited by Hill et al. that way. I refer again to the Haan and Uhlendorff paper (with Stata code) that I mentioned before (post #2), downloadable for free from http://www.stata-journal.com/article...article=st0104, as showing how one might proceed.

                      Comment


                      • #12
                        Stephen--Thanks, your response is exactly what I hoped might happen. I had wanted to draw attention to the Hill et al. article in hopes that someone might have a better idea. I attempted to use that approach several years ago to analyze the hazard of degree completion among college students, where none of the competing states (still in school, graduate, drop out) form a meaningful nest. If the Haan and Uhlendorff approach works for that situation, it would solve a lingering problem for me.

                        Comment


                        • #13
                          Hi again,
                          following your advice I am using the code from Hole's paper enclosed (estimating mixed logit model using maximum simulated likelihood) to adopt the method from Haan and Uhlendorff (2006) about mlogit with unobserved hetereogeneity. However, unfortunately and sorry for my ignorance I am not able to understand how Hole derives the following formula for the matrix (page 11 and 12) following from the coefficients got after estimating a model with random but uncorrelated intercepts (page 11 and 12):
                          Matrix b=e(b)
                          Matrix b= b[1,1..5], 0, b[1,6]
                          How do she derive these values for the matrix? Sorry but this is very unclear to me, and I know this derives from my ignorance. Im using a different dataset, and I guess that if I understand how she derives these values in the example in her article, I will be able to understand how to adapt it in my article. In my article I get for instance this but I dont understand the meaning:
                          initial vector: matrix must be dimension 12
                          r(503);
                          Many many many thanks
                          Francesco
                          Attached Files

                          Comment


                          • #14
                            I solved the problem: it is just needed to not indicate the command "from" and the matrix is generated automatically

                            Comment


                            • #15
                              Hi all again, I dont know how much to thank you for all your inputs. I submitted the paper and followed your adivce from the method. Yet, altough the editor and reviewers appreciat ethe method I used, now they ask me to use Coarsened exact matching : http://www.google.se/url?sa=t&rct=j&...bn6HtT98R3u8nw

                              Best
                              Francesco

                              Comment

                              Working...
                              X