Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Beta regression Coefficients change at each run of all the script

    I'm estimating an inequality in welfare of a set of IDPs and the outcome variable is Atkinson Index that lies between 0 and 1 but excluding both. Specifically, the value ranges from 0.4920539 to 0.7574901. The problem is that at each run of all the scripts (from the data cleaning and all transformations to the estimation), all the resulting coefficients change. I set seed 12345 at each sort or bysort and confirmed there's no multicolinearity. I have 20 independent variables and 1,400 observations out of which about 350 have one or two missing values.

  • #2
    Well, resetting the seed before every -sort- or -bysort- command in your scripts is really not advisable at all. If you do need to rely on any commands that require randomization, you are un-randomizing the state of the data by doing this. A seed should, in general, be set once and then left as is for the entire run so that it can continue to update itself in accordance with the operation of the pseudorandom number generator.

    That said, your first task at this point is to figure out where the irreproducibility is creeping in. You don't say how many different scripts are involved in the process. If it's a relatively small number, I would run them, but save the data set in a new file at the end of each one. Then rerun the scripts again, now saving the results in yet another bunch of new files. Then compare (you can use the -cf- command for this) the first round of new files with the second to see at what point they begin to diverge. Then you can focus your efforts on that file. (Of course, there can be more than one problem in the entire set of scripts, so you'll still have to check whether fixing that one file solves the problem all the way to the end.)

    The usual cause for irreproducibility is an indeterminate sort, as you are evidently very aware. If you have truly set the rng seed before every (explicit) -sort- and -bysort-, it may be that there is some indeterminate sort taking place in some command whose inner workings sort the data without your being aware of it. So you may need to check the code in any ado files your scripts call to see if they might have such a problem. If you don't see anything on inspection, you might try running the code for just the first half of the do file in question twice, saving the data at the end each time and comparing to localize the onset of indeterminacy to one half of the file or the other. Then split that part into two, etc., until you narrow done the location of the problem enough that you can find it directly.

    Finally, in those situations where it is necessary to truly reset the random number generator completely to create a reproducible run, -set sortseed- or even -set rngstate- may be the tool for that job, not -set seed-. The RNG has parts other than those governed by the seed, and while these typically affect only very low order digits of the results, occasionally that matters. So if the results you are getting are minimally, not wildly, think about using those. I do recommend first reading the help files on those to get a clearer picture of how they work, if it comes to this.

    Comment


    • #3
      Very hard to tell without seeing actual code, but perhaps it is too voluminous to present. My guess there is someplace you need to set the seed but haven’t.

      But, why do you need to set seeds anyway? Are you random sampling cases, or generating random variables? If not and if you are using all the cases, it is unusual to have problems.

      Edit: My post crossed with Clyde’s. I think we are in agreement but he provides far more detail on what to do.
      Last edited by Richard Williams; 17 Apr 2025, 15:26.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Thank you Clyde and Richard. My script is attached. Kindly go through for better understanding of the issue I have. I initially set seed once at the beginning of the code but the results were still irreproducible.
        Last edited by Noah Olasehinde; 17 Apr 2025, 17:09.

        Comment


        • #5
          I've skimmed the file briefly to see if there is something obvious. I see a few places that might be problematic.

          First, you use the -ineqdeco- and -conindex- commands, which are user-written. I don't have either of those installed, so I don't know if they might contain indeterminate sorts. But perhaps they do. Then there is the -isid- command, used without the -sort- option. It contains some internal -sort- commands that might leave the data in some indeterminate order--I don't know if that's the case or not.

          Another place things may go wrong is that at one point you do a -merge 1:m-, and I don't know if the results of such a -merge- has a guaranteed sort order among the duplicate values of the -merge- key. In that case, the data saved subsequently may change from run to run, which might have downstream consequences as well.

          Another thing to check is whether you -use- a file that you also -save-d earlier in the program. If that's the case, it will get changed iteratively each time you re-run the code, and that will then lead to changed results downstream. I didn't notice any instances of that in the code, but it's possible I missed something.

          Again, it's not obvious, at least not to me. I think you're going to have to just follow the process I outlined in #2. One nice thing about the script is that it does save many files along the way, so it will be easy for you to run the whole script twice and then go through each of the files that gets saved and see which is the first one to show a difference during the replications. Then you can scrutinize the code above that point in the code in greater detail, if need be, saving other intermediate states of the data as you go to check where the indeterminacy first comes in. It's going to be time-consuming and tedious, but it is what it is.
          Last edited by Clyde Schechter; 17 Apr 2025, 17:05.

          Comment


          • #6
            Thank you Clyde. I have run the -cf- and several mismatches. I will patiently go through the #2 above as it will reflect where the issue emanated from. I will be back when I'm done. Thank you once again.

            Comment


            • #7
              Hi Clyde and Richard. I was able to fix the irreproducibility error. Thank you for the head up.

              Comment


              • #8
                Happy to have given you a start on it. But what turned out to be the source of the irreproducibility? I, and others who come searching this Forum in the future with similar problems, would like to learn from your successful experience.

                Comment

                Working...
                X