Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • dropped data in xtlogit,fe

    hi all,
    In the xtlogit regression below 17667 groups of data have been dropped.
    (All variables in this regression are dummy variables.)
    1. Why have this data been dropped?
    2. What features did these data have?
    3. Is it possible that the selection bias has occurred despite the dropped of most of the data?


    Code:
    . xtlogit underline L.q L.underline,fe
    note: multiple positive outcomes within groups encountered.
    note: 17,667 groups (35,334 obs) dropped because of all positive or
          all negative outcomes.
    
    Iteration 0:   log likelihood = -1074.9136  
    Iteration 1:   log likelihood = -939.89854  
    Iteration 2:   log likelihood = -937.31563  
    Iteration 3:   log likelihood = -937.09315  
    Iteration 4:   log likelihood = -937.09076  
    Iteration 5:   log likelihood = -937.09028  
    Iteration 6:   log likelihood = -937.09019  
    Iteration 7:   log likelihood = -937.09019  
    
    Conditional fixed-effects logistic regression   Number of obs     =      5,014
    Group variable: Address                         Number of groups  =      2,507
    
                                                    Obs per group:
                                                                  min =          2
                                                                  avg =        2.0
                                                                  max =          2
    
                                                    LR chi2(2)        =    1601.26
    Log likelihood  = -937.09019                    Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
       underline |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               q |
             L1. |   .0356927   .1197559     0.30   0.766    -.1990246      .27041
                 |
       underline |
             L1. |  -20.42288   798.6675    -0.03   0.980    -1585.782    1544.937
    ------------------------------------------------------------------------------

  • #2
    Marzieh:
    Stata reports the reason why 17,667 groups (35,334 obs) were dropped: there's no variation in the outcome (all positive or all negative); (conditional) fixed effect specification works at its best when there's within variation as it wipes out any variable that is time-invariant.
    Sample selection bias is always possible: unfortunately, only you can say something about that, as we do not know your data, not the way they were collected.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Thanks for your advice Mr.Lazzaro

      How do i check Sample selection bias in this regression??


      Comment


      • #4
        Marzieh:
        do the -panelid- which were not omitted by Stata differ in some relevant respect (whatever that may mean according to your data) other than the time-invariant outcome from those omitted?
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Do you mean that I have to compare (Comparing from different aspects) the data that is dropped from this regression with data that is not dropped ??

          Comment


          • #6
            Another question: I manually deleted the -panelid- that had the -underline- variable fixed for them at all time periods, and 16206 of the -panelid- were removed.
            (-underline- variable is dependent variable)
            There is still an 17667-16206=1461 of the -panelid- that is dropped When xtlogit regression is runed ,
            But I do not know how to remove them manually. How do I identify them?

            Comment


            • #7
              Marzieh:
              #5) yes (however I would replaced dropped with omitted, as in fact Stata does not drop any observation);
              #6) can you please provide an excerpt of your 1461 observations via CODE delimiters? Thanks.
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #8
                Please take a look at these results.
                I do not know how to remove 1461 of the -panelid-.
                Do you know how to identify them?

                (For more explanation:
                the -panelid- is the same as the Address variable.
                Each the -panelid- is visible for exactly three periods.)

                Code:
                . bys Address: egen uu=sum(underline)
                
                . drop if uu==3 | uu==0
                (48,618 observations deleted)
                
                . xtlogit underline L.q L.underline ,fe
                note: multiple positive outcomes within groups encountered.
                note: 1,461 groups (2,922 obs) dropped because of all positive or
                      all negative outcomes.
                
                Iteration 0:   log likelihood = -1074.9136  
                Iteration 1:   log likelihood = -939.89854  
                Iteration 2:   log likelihood = -937.31563  
                Iteration 3:   log likelihood = -937.09315  
                Iteration 4:   log likelihood = -937.09076  
                Iteration 5:   log likelihood = -937.09028  
                Iteration 6:   log likelihood = -937.09019  
                Iteration 7:   log likelihood = -937.09019  
                
                Conditional fixed-effects logistic regression   Number of obs     =      5,014
                Group variable: Address                         Number of groups  =      2,507
                
                                                                Obs per group:
                                                                              min =          2
                                                                              avg =        2.0
                                                                              max =          2
                
                                                                LR chi2(2)        =    1601.26
                Log likelihood  = -937.09019                    Prob > chi2       =     0.0000
                
                ------------------------------------------------------------------------------
                   underline |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                           q |
                         L1. |   .0356927   .1197559     0.30   0.766    -.1990246      .27041
                             |
                   underline |
                         L1. |  -20.42288   798.6675    -0.03   0.980    -1585.782    1544.937
                ------------------------------------------------------------------------------
                Last edited by Marzieh Goodarzi; 10 Jan 2019, 11:17.

                Comment


                • #9
                  Marzieh:
                  what if you type:
                  Code:
                  bys Address: egen uu=total(underline)
                  tab uu
                  Kind regards,
                  Carlo
                  (Stata 18.0 SE)

                  Comment


                  • #10
                    Code:
                    . bys Address: egen uu=total(underline)
                    
                    . 
                    . tab uu
                    
                             uu |      Freq.     Percent        Cum.
                    ------------+-----------------------------------
                              0 |     45,075       74.48       74.48
                              1 |      7,653       12.64       87.12
                              2 |      4,251        7.02       94.15
                              3 |      3,543        5.85      100.00
                    ------------+-----------------------------------
                          Total |     60,522      100.00

                    Comment


                    • #11
                      Marzieh:
                      you may want to try:
                      Code:
                       bysort panel_id: gen flag=1 if underline[1]==underline[2] & _N==2
                      bysort panel_id: gen flag=1 if underline[1]==underline[2]==underline[3] & _N==3
                      drop if flag==1
                      I would recommend you to run the code mentioned above on a new copy of your dataset (if anything goes wrong along the way, you can still have your data on hand).
                      Kind regards,
                      Carlo
                      (Stata 18.0 SE)

                      Comment


                      • #12
                        Code:
                        . bysort Address: gen flag=1 if underline[1]==underline[2] & _n==2
                        (42,964 missing values generated)
                        
                        . drop if flag==1
                        (17,558 observations deleted)
                        
                        . drop flag
                        
                        . bysort Address: gen flag=1 if underline[1]==underline[2]==underline[3] & _n==3
                        (41,277 missing values generated)
                        
                        . drop if flag==1
                        (1,687 observations deleted)

                        Comment


                        • #13
                          Marzieh:
                          did the suggested code solve your problem?
                          Kind regards,
                          Carlo
                          (Stata 18.0 SE)

                          Comment


                          • #14
                            No
                            not resolved.
                            It does not seem that this code correctly removes the -panelid- that have not variation.
                            By deleting this data (with use the suggested code) , the xtlogit code does not converge at all.

                            Currently we know how 45075+3543=48,618 observations, or the same 48618/3=16206 -panelid-, have been omitted. We do not know about 1461 -panelid- .


                            It is possible that omitting 1461 groups of -panelid- is also related to each explanatory variable.

                            Comment


                            • #15
                              Marzieh:
                              can you please provide an excerpt of your 1461 observations via CODE delimiters? Thanks.
                              Kind regards,
                              Carlo
                              (Stata 18.0 SE)

                              Comment

                              Working...
                              X