Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A set of commands yield different output every time I run it

    Hello everyone,

    I am trying to run replace values for a variable "applied_tariff" based on conditions. However, every time I run the code I get (slightly) different estimates for the variable applied_tariff. I cannot really understand the source of this problem and how to fix it. I hope some of you will be willing to help.

    I attach here the dataset that I am using (on GoogleDrive) and also the code for those that want to try to run the code.


    Code:
    // When missing, I replace bilateral tariffs with that of EU when a EU member is exporter and the importer is a non EU-member
    gen exp2= cond(eu_exp, "EUN", exp)
    bys exp2 imp ISIC3 year (applied_tariff): replace applied_tariff= applied_tariff[_n-1] if missing(applied_tariff) & exp!="EUN" & exp2=="EUN"
    
    // When missing, I replace bilateral tariffs when a EU member is importer and the exporter is a non EU-member
    gen imp2= cond(eu_imp, "EUN", imp)
    bys imp2 exp year (applied_tariff): replace applied_tariff= applied_tariff[_n-1] if missing(applied_tariff) & imp!="EUN" & imp2=="EUN"
    
    drop if exp == "EUN" | imp == "EUN"
    
    drop _fillin eu_exp eu_imp exp2 imp2
    
    
    decode ISIC3, gen(sect_string)
    mdesc
    replace sect_string = "32" if ISIC3 ==32
    replace sect_string = "33" if ISIC3 ==33
    replace sect_string = "34" if ISIC3 ==34
    replace sect_string = "35" if ISIC3 ==35
    replace sect_string = "36" if ISIC3 ==36
    
    // I compare summary statistics and missing values wrt to the last time I run the code
    sum applied_tariff
    mdesc applied_tariff

    This problem has been puzzling me a lot, so I sincerely thank you in advance for your time.


  • #2
    The problem is probably the sort function (bysort). If there are ties in the data, sort is not stable and might produce random sortings then. See

    https://www.statalist.org/forums/for...-i-run-do-file

    I suggest using sort, stable and checking whether there are no further underlying errors in the dataset.
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      Dear Felix,
      Thank you a million for your enlightening comment. I did not know about this possibility with the command bysort. Following your advice, I replaced

      Code:
      gen exp2= cond(eu_exp, "EUN", exp)
      bys exp2 imp ISIC3 year (applied_tariff): replace applied_tariff= applied_tariff[_n-1] if missing(applied_tariff) & exp!="EUN" & exp2=="EUN"
      with

      Code:
      gen exp2= cond(eu_exp, "EUN", exp)
      sort exp2 imp ISIC3 year (applied_tariff), stable
      by exp2 imp ISIC3 year (applied_tariff): replace applied_tariff= applied_tariff[_n-1] if missing(applied_tariff) & exp!="EUN" & exp2=="EUN"
      (and I did the same also for the "imp" corresponding part). I indeed get stable results. One last thing just to be really sure of what I am doing, do the two commands that I wrote above yield the same output right (except for the random sorting)?

      thanks again

      Comment


      • #4
        Yes, I think this is fine. But you should think about what this means. Stable only means that you have the same sorting but the ties are still there. You simply "lock in" one version. Therefore, ties apparently do influence your findings. You should either try to find out if these ties are actually necessary or there is another layering of sorting that could resolve this. Otherwise run the command many times with unstable sorting and see how much variation of results is produced.
        Best wishes

        (Stata 16.1 MP)

        Comment


        • #5
          Dear Felix,

          thanks again for your reply. Might you explain what you mean by "ties" in the data?
          Best wishes

          Comment


          • #6
            By "ties in the data" Felix means that your data contains some combinations of exp2, imp, ISIC3, year, and applied_tariff for which there are more than 1 observations.

            Said another way, your data contains some duplicated combinations of exp2, imp, ISIC3, year, and applied_tariff.
            Code:
            duplicates describe exp2 imp ISIC3 year applied_tariff
            Since your results apparently depend on which order these duplicated observations are sorted into, you have a problem. Using sort, stable does not resolve the problem, it hides the problem.

            Your job is to study your data, understand what is happening, and change your code so that there is no ambiguity about what value of applied_tarriff should be be used to replace any missing value.

            Comment

            Working...
            X