Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • different odds every time the codes are re-run.

    Hi,

    Every time I re-run the codes from the beginning I get slight changes in the regression results and the P values of variables fluctuates as well. I don't understand why.

    Your suggestion would be of much help.

    Thank you.

    Best,
    Bibek




  • #2
    Bibek:
    with such scant details I find difficult to reply positively.
    Perhaps, if your code include -simulate- or -bootstrap- setting -seed- may fix the problem.
    Otherwise, as often reminded on this list, posting what you typed and what Stata gave you back can increase your chances of getting helpful replies.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      I agree with Carlo.

      This is only epsilon away from "I get puzzling results and don't know why". But epsilon is not zero. Perhaps something in your procedure depends on sorting the data: if so, slightly different results are indeed possible.

      Comment


      • #4
        Hi,
        Thank you for your reply.

        The error seems to be located somewhere here but I am having a difficult time figuring out where exactly. Every individual is uniquely identified by their pid and through the code i am trying to figure out employment status of an individual at a certain age. The observation in the variable "status" changes slightly every time I re run this particular code.
        In the data set hhid identifies a household and pid identifies an individual.



        Code:
        **list  if latest<earliest & !mi(earliest)
        
        *find age at interview (2015m6 - year&month of birth)
        gen ymob=ym(dobylunar, dobmlunar)
        format ymob %tm
        
        gen interviewage = (tm(2015m6) - ymob) / 12
        format interviewage %8.2f
        
         *if <=25, status at interview is current employment status
        gen status=.
        
         bysort pid (earliest): replace status=empstatus[_N]
         *find year&month at age 25
        *number of months in 25 years= 25*12=300
        gen dateat25=ymob+300
        format dateat25 %tm
        
        
        *if >25, status at interview is current employment status if earliest<=dateat25>=latest
        gen flag1=.
        replace flag1=1 if inrange(dateat25, earliest, latest)
        
        
        bysort pid: egen totflag1=total(flag1)
        list pid if totflag1>1
        count if totflag1>1
        list if totflag1>1, sepby(pid)
        
        
        *totflag=0 are those cases where earliest/latest is missing
        replace flag1=1 if totflag1==0 & mi(earliest) & !mi(latest) & dateat25<latest
        replace flag1=1 if totflag1==0 & !mi(earliest) & mi(latest) & dateat25>earliest
        drop totflag1
        bysort pid: egen totflag1=total(flag1)
        
        bysort pid flag1: replace status=empstatus[1] if totflag==1
        
        tab status, m

        Best,
        Bibek

        Comment


        • #5
          As already hinted, anything that hinges on sorting is suspect in this circumstance. So start with

          Code:
          bysort pid (earliest): replace status=empstatus[_N]
          What this means is that you could get any value of empstatus used that occurs when earliest is equal to its highest value. If ties occur on that highest value, the results may be unpredictable.

          So check the results like this afterwards

          Code:
          tabdisp pid, c(status)
          to see whether results change.We can't test this ourselves, because you don't explain earliest and you don't give us a reproducible data example, but this is the kind of check to make.

          Comment


          • #6
            Below is the data example for the code. earliest and latest refers to from which date to which date an individual was employed or unemployed.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input long(hhid15 pid) int dobylunar byte dobmlunar float(empstatus earliest latest)
                 1      101 1941 12 0  560  558
                 1      101 1941 12 1  559  665
                 1      101 1941 12 0    .  344
                 1      101 1941 12 1  537  559
                 1      101 1941 12 0  517  536
                 1      101 1941 12 1  345  497
                 1      101 1941 12 1  497  516
                 1      101 1941 12 0  498  496
              7258      102 1968  3 0  588  600
              7258      102 1968  3 1  553  587
              7258      102 1968  3 1  601  665
              7258      102 1968  3 0    .  349
              7258      102 1968  3 1  350  533
              7258      102 1968  3 0  534  552
              6034      201 1951  8 0    .  181
              6034      201 1951  8 1  182  587
                 2      202 1954  3 1  506  665
                 2      202 1954  3 0    .  505
                 .      203    .  . 1  520  527
                 .      203    .  . 0    .  499
                 .      203    .  . 0  509  519
                 .      203    .  . 1  500  508
                 .      301    .  . 1  326  425
                 .      301    .  . 0  326  325
                 .      301    .  . 1  302  325
                 .      301    .  . 1  425  665
                 .      301    .  . 0  426  424
                 .      301    .  . 0    .  301
                 4      401 1970  1 1  435  665
                 4      401 1970  1 0    .  434
                 4      402 1969 11 1  389  449
                 4      402 1969 11 0    .  388
                 4      402 1969 11 1  501  665
                 4      402 1969 11 0  450  500
                 4      403 1999  1 .    .    .
                 .      501    .  . 1  445  461
                 .      501    .  . 1  462  665
                 .      501    .  . 0    .  422
                 .      501    .  . 0  462  461
                 .      501    .  . 0  445  444
                 .      501    .  . 1  423  444
                 .      601    .  . 1  -15  455
                 .      601    .  . 0    .  -16
                 6      602 1935  2 0  -57  -58
                 6      602 1935  2 0  602  601
                 6      602 1935  2 0  484  577
                 6      602 1935  2 1  -82  -58
                 6      602 1935  2 0  456  475
                 6      602 1935  2 1  -15  455
                 6      602 1935  2 1  578  601
                 6      602 1935  2 0  -15  -16
                 6      602 1935  2 1  -57  -16
                 6      602 1935  2 0    .  -83
                 6      602 1935  2 1  476  483
                 6      602 1935  2 1  602  665

            Comment


            • #7
              Thanks for the data example, which trivially needs a closing


              Code:
              end
              What do you don't tell us is whether the problem you report can be reproduced for these data.

              In this example, the largest value of earliest occurs just once for each pid. If that's true generally, the problem isn't here.

              Code:
              . bysort pid (earliest) : gen ties = earliest == earliest[_N]
              
              . tab pid ties
              
                         |         ties
                     pid |         0          1 |     Total
              -----------+----------------------+----------
                     101 |         7          1 |         8 
                     102 |         5          1 |         6 
                     201 |         1          1 |         2 
                     202 |         1          1 |         2 
                     203 |         3          1 |         4 
                     301 |         5          1 |         6 
                     401 |         1          1 |         2 
                     402 |         3          1 |         4 
                     403 |         0          1 |         1 
                     501 |         5          1 |         6 
                     601 |         1          1 |         2 
                     602 |        11          1 |        12 
              -----------+----------------------+----------
                   Total |        43         12 |        55
              Your second last line

              Code:
                
               bysort pid flag1: replace status=empstatus[1] if totflag==1
              may also be problematic.

              Loosely, what we can do here is suggest technique to you to find your own bugs.

              Comment


              • #8
                Thank you for your reply.

                I tried to find if the largest value of earliest occurs more than once in the data through your given example. It doesn't. empstatus through which status is created doesn't change when I re-run the data set either. However, this is not the case with status variable. The number keeps changing by a small amount.

                Comment


                • #9
                  I rechecked the codes again. This is where the bug seems to be in the code. However, I still don't understand what could be the mistake.

                  Code:
                    
                    bysort pid flag1: replace status=empstatus[1] if totflag==1

                  Comment


                  • #10
                    If you have there is more than one observation per combination of pid and flag1 then the sort order within that combination is random. What else could the sort order be? you are refering to the first observation within each combination of pid and flag1, but since the sort order is random, the first observation is random to. If you have a variable that represents the sort order of the individuals, e.g. time, then you can fix the sort order by typing

                    Code:
                     
                      bysort pid flag1 (time): replace status=empstatus[1] if totflag==1
                    ---------------------------------
                    Maarten L. Buis
                    University of Konstanz
                    Department of history and sociology
                    box 40
                    78457 Konstanz
                    Germany
                    http://www.maartenbuis.nl
                    ---------------------------------

                    Comment

                    Working...
                    X