Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Retrieve intermediate steps applied to one variable from a lost do-file

    Hi everyone. I really need your help.

    Suppose that I have two variables X and X_2. X_2 comes from different operations (sums, divisions, logarithms, etc.) using X itself and some random variables. I saved the dataset, but my computer crashed and I lost the do-file describing how to get X_2 from X. Thus, the only information I have is a dataset with X and X_2

    I would like to ask you if there is some way in Stata for retrieving the intermediate steps or, at least, one approximation of how can I get X_2 from X. For example, X_2 = X + log(random_variable) where random_variable has certain characteristics (distribution, mean, sd) and can be simulated using a determined seed.

    Thank you.

  • #2
    Without seeing the contents of these variables, I would say that the complexity increases with the number of operations applied. If you applied only one operation, then it may be possible to determine the operation. With many operations, there may be an unlimited number of operations that can produce the same variable. But you could present a data example of the variables in case you are confident that we are dealing with the former and not the latter.

    Code:
    dataex

    Comment


    • #3
      Thank you Andrew. There were some operations being executed, including operations with random variables. These operations among the random variables and X allowed me to get X_2 from X. X_2 and X are the only information I have.

      Comment


      • #4
        You may not be able to get the exact seed, but if you know the random number function, then getting the arguments may still be possible especially if the dataset is large. I would still encourage you to show us these variables and include any information on the functions. Below, I can guess the mean and standard deviation of a normal random variable by summarizing it.

        Code:
        clear
        set obs 10000
        set seed 07242023
        gen X= rnormal(5, 7)
        sum X, d
        Res.:

        Code:
        . gen X= rnormal(5, 7)
        
        
        . sum X, d
        
                                      X
        -------------------------------------------------------------
              Percentiles      Smallest
         1%    -11.21509      -19.77879
         5%     -6.47545      -18.93595
        10%    -3.979005      -17.84546       Obs              10,000
        25%      .206667        -16.493       Sum of wgt.      10,000
        
        50%      5.03195                      Mean           5.021514
                                Largest       Std. dev.      7.017917
        75%     9.814817       29.83835
        90%     13.95592       30.09274       Variance       49.25116
        95%     16.62557       30.15853       Skewness       .0133329
        99%     21.20141       34.37077       Kurtosis       2.913767
        Last edited by Andrew Musau; 24 Jul 2023, 13:05.

        Comment

        Working...
        X