Why is my code generate different result each time I run again?

Jay Jeong

Join Date: Nov 2019

Posts: 35
#1

Why is my code generate different result each time I run again?

30 Mar 2023, 10:44

Thanks for reading this. I have been struggling with fixing randomized results.
Even if I do not change the code at all, every time I run the code, the regression results are different. I never experienced this situation for about five years of Stata coding.

Is there a randomizing component in my code? Could anyone check this for me? I appreciate it so much.

I checked thoroughly, but I could not find any randomizing component.

The link below shows my code. The main code is "master_code.do". You do not need to download anything to view it.

https://github.com/jayjeo/public/tree/main/Question

Since it is an important issue for me, let me also submit this question to Stackoverflow in below link.

https://stackoverflow.com/questions/...un-again-stata

Sincerely,
Jay Jeong.

Last edited by Jay Jeong; 30 Mar 2023, 10:58.
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 674
#2

30 Mar 2023, 10:51

People are rather reluctant to open unknown zip Files. Could you just post the code in the forum directly? My guess: did you use "sort"? Sorting without the stable option can cause random effects.

Best wishes

(Stata 18.0 MP)
2 likes
Comment
Jay Jeong

Join Date: Nov 2019

Posts: 35
#3

30 Mar 2023, 11:01

Thanks Felix, I just changed the link to Github. You do not need to download anything to view it.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2455
#4

30 Mar 2023, 11:06

Hi Jay
Suggestion. Your code, as it stands right now, its quite complex. (many moving parts)
May I suggests that you run some summary statistic of similar at each step, to know which one is creating the differences?
Better yet, you could create a log of everything, (couple of times) and use github to help you see where the differences arise.
(still you may want to create something that checks the data between do files)
F
2 likes
Comment
Jay Jeong

Join Date: Nov 2019

Posts: 35
#5

30 Mar 2023, 11:14

Originally posted by FernandoRios View Post

Hi Jay
Suggestion. Your code, as it stands right now, its quite complex. (many moving parts)
May I suggests that you run some summary statistic of similar at each step, to know which one is creating the differences?
Better yet, you could create a log of everything, (couple of times) and use github to help you see where the differences arise.
(still you may want to create something that checks the data between do files)
F

Hi Fernando, Thanks for your reply. I will do as you suggested. The summary in each stage of code with a log is a great idea.

One issue here is that much data are confidential, and I cannot share it with others. Or I will be sued for violation of the agreement.

Best regards,
Jay Jeong.

Last edited by Jay Jeong; 30 Mar 2023, 11:30.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30046
#6

30 Mar 2023, 14:48

My guess: did you use "sort"? Sorting without the stable option can cause random effects.

I very strongly disagree with the implicit advice to use the -stable- option in -sort-ing as a way to solve this problem.

Let's assume that some command (-sort-, or something the calls on -sort-) is the source of this problem. It almost always is, except if somebody has simply forgotten to set the sort seed before running.

Then the implication is that the calculations being performed are dependent on the specific, but incompletely specified in the code, order of the data. If one just applies the -stable- option to the offending -sort- command, you will get reproducible results thereafter--but probably they are the wrong results. Using -stable- just freezes the existing order on the indeterminate part of the -sort-. But what reason is there to believe that the results produced with that particular sort order are the correct results, and the other sort orders are wrong? No, we don't know that the sort order that happened to be in effect at the time of the offending -sort- command is the one that produces the right results. What we need to do at that point is figure out what completely specified sort order will produce the right results (or identified what it is in the subsequent commands that improperly depends on the sort order of the data). It might turn out to be the sort order that was in effect prior to the offending -sort-, but that is purely coincidence if so, and it is rather unlikely to be so. -sort, stable- just sweeps the problem under the rug. It hides the indeterminacy of the algorithm by picking one arbitrary result out of the many indeterminate result possibilities and sticks with it. You continue to produce wrong results; it's just that you no longer immediately recognize that they are wrong.

The solution to this problem must lie in identifying the incompletely specified -sort-s in the code and then replacing them with completely-specified sorts that will provide proper input to the subsequent commands, or by replacing the commands that depend on the sort order with others that don't.
Comment
Jay Jeong

Join Date: Nov 2019

Posts: 35
#7

30 Mar 2023, 14:57

Originally posted by Clyde Schechter View Post

I very strongly disagree with the implicit advice to use the -stable- option in -sort-ing as a way to solve this problem.

Let's assume that some command (-sort-, or something the calls on -sort-) is the source of this problem. It almost always is, except if somebody has simply forgotten to set the sort seed before running.

Then the implication is that the calculations being performed are dependent on the specific but incompletely specified in the code, order of the data. If one just applies the -stable- option to the offending -sort- command, you will get reproducible results thereafter--but probably they are the wrong results. Using -stable- just freezes the existing order on the indeterminate part of the -sort-. But what reason is there to believe that the results produced with that particular sort order are the correct results, and the other sort orders are wrong? No, we don't know that the sort order that happened to be in effect at the time of the offending -sort- command is the one that produces the right results. What we need to do at that point is figure out what completely specified sort order will produce the right results (or identified what it is in the subsequent commands that improperly depends on the sort order of the data). It might turn out to be the sort order that was in effect prior to the offending -sort-, but that is purely coincidence if so, and it is rather unlikely to be so. -sort, stable- just sweeps the problem under the rug. It hides the indeterminacy of the algorithm by picking one arbitrary result out of the many indeterminate result possibilities and sticks with it. You continue to produce wrong results; it's just that you no longer immediately recognize that they are wrong.

The solution to this problem must lie in identifying the incompletely specified -sort-s in the code and then replacing them with completely-specified sorts that will provide proper input to the subsequent commands, or by replacing the commands that depend on the sort order with others that don't.

I strongly agree with you. Comparing Version A with sorting and Version B without sorting, there is no reason to believe one is right and another is wrong.
Also, I just put proper soring commands every time before and after I use the data or merge the data, and so on. Nothing solved the randomizing issue. Even after soring everything in the code, weirdly, the result changes every time I run the code again. This is really strange.... I know that using a random function such as rand() or rnormal(); or using a command like bootstrap will cause randomized results. But I never used randomized components in my code.

I am starting to put "summarize" every piece of command and see what causes the randomization. That was great advice.

Thanks a lot!
Comment
Jay Jeong

Join Date: Nov 2019

Posts: 35
#8

30 Mar 2023, 15:17

Let me share an answer from Chat GPT.

In Stata, does command ipolate epolate results in randomized output?

------------------------------------------------
In Stata, the ipolate command is used for linear interpolation of missing values in a variable. It does not produce randomized output. The command itself does not include an "epolate" option or subcommand.

If you meant to ask if interpolating (using ipolate) and extrapolating (using other methods, as Stata doesn't have a direct "epolate" command) can result in randomized output, the answer is still no. Both interpolation and extrapolation are deterministic processes that rely on existing data points to estimate missing values or extend the series beyond the given data. These processes use mathematical formulas and do not involve any randomization.

If you are looking for a way to generate random data in Stata, you may want to use the generate command with random functions such as uniform() or rnormal().
Comment
Jay Jeong

Join Date: Nov 2019

Posts: 35
#9

30 Mar 2023, 15:46

I solved the issue. The reason was my improper coding about merge command.

I should have used as below

Code:

merge m:1 unitcode location year using PPP_1

However, I did a mistake as below... it was a silly mistake.

Code:

merge m:m unitcode year using PPP_1
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30046
#10

30 Mar 2023, 16:41

I cannot thank you enough for posting the solution you found to your problem.

I am well known on Statalist for my frequent posts reminding people that -merge m:m- should never be used, never. This is one particular example of what I mean when I say that -merge m:m- creates data salad. I really do wish that StataCorp would eliminate -merge m:m-.
2 likes
Comment
Jay Jeong

Join Date: Nov 2019

Posts: 35
#11

30 Mar 2023, 18:56

Originally posted by Clyde Schechter View Post

I cannot thank you enough for posting the solution you found to your problem.

I am well known on Statalist for my frequent posts reminding people that -merge m:m- should never be used, never. This is one particular example of what I mean when I say that -merge m:m- creates data salad. I really do wish that StataCorp would eliminate -merge m:m-.

Thanks a lot for your help. I see your posts and replies on Statalist a lot.
For many years, I have never used merge m:m. This implies that merge m:m is not necessary at all.

Best regards,
Jay Jeong.
Comment

Announcement

Why is my code generate different result each time I run again?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment