Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • random numbers with rnormal(0,1); mean not equal 0 and SD not equal 1

    I tried to create a random variable for a hypothetical dataset with the following code
    Code:
    set seed 12345
    gen contvar = rnormal(0,1)
    If I summerize the created variable the mean is not equal 0 and the SD not equal 1

    HTML Code:
    . set seed 12345
    
    . gen contvar=rnormal(0,1)
    
    . sum contvar
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
         contvar |      1600      ,01877    1,024139  -3,645242   3,620961
    Can this be correct?

  • #2
    Yes. What's more, we can reproduce that. That's randomness for you.

    Code:
     
    . clear
    
    . set obs 1600
    obs was 0, now 1600
    
    . set seed 12345
    
    . gen y = rnormal()
    
    . su
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
               y |      1600      .01877    1.024139  -3.645242   3.620961
    
    . gen y2 = rnormal()
    
    . su
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
               y |      1600      .01877    1.024139  -3.645242   3.620961
              y2 |      1600    -.009015    1.000674  -3.35892

    Comment


    • #3
      Hmm OK!

      How can I create a variable with the following parameters: Mean of 0, SD of 1 within the range of -2 and +2.

      Is that possible?

      Comment


      • #4
        Should be. You need to work out a distribution function with those properties.
        However, my intuition is that that specification does not tie down the distribution uniquely.

        I don't know of a canned function even for one such distribution.

        For reference the SD of a uniform on [-2, 2] I get to be

        Code:
        . di sqrt(16/12)
        1.1547005
        but do check the calculation.
        Last edited by Nick Cox; 04 Mar 2015, 12:37.

        Comment


        • #5
          Hello Julian,

          Once you are trying to create (pseudo) random numbers, the results are supposed to be approximately - not exactly - the mean and SD previously specified.

          That has nothing to do with sample size (as you can see by checking out the example below) but random-number "games", so to speak.

          Code:
          . set obs 100
          obs was 0, now 100
          
          . gen var1 = rnormal(0 ,1)
          
          . sum var1
          
              Variable |       Obs        Mean    Std. Dev.       Min        Max
          -------------+--------------------------------------------------------
                  var1 |       100    .0006276    1.061928  -2.885089   1.837664
          
          . set obs 1000
          obs was 100, now 1000
          
          . gen var2 = rnormal(0 ,1)
          
          . sum var2
          
              Variable |       Obs        Mean    Std. Dev.       Min        Max
          -------------+--------------------------------------------------------
                  var2 |      1000   -.0005386    1.029321  -2.741925   3.081883
          
          . set obs 10000
          obs was 1000, now 10000
          
          . gen var3 = rnormal(0 ,1)
          
          . sum var3
          
              Variable |       Obs        Mean    Std. Dev.       Min        Max
          -------------+--------------------------------------------------------
                  var3 |     10000   -.0201514    .9958357   -4.41705   3.894426
          .
          Also, please take a look in range as presented in the manual (after clicking on "[D] functions"): the mean ranges from c(mindouble) to c(maxdouble). Keep in mind that c(mindouble), for instance,

          returns the smallest value that can be stored in storage type double
          In the above-selected outputs, it might be also worth to underline that the mean values - however much close to zero they are - can not only by (slightly) positive but (slightly) negative as well.

          Best,

          Marcos
          Best regards,

          Marcos

          Comment


          • #6
            Maybe it helps to explain what you are actually trying to do here.

            Brute force could be

            Code:
            clear
            se obs 1600
            g double x = floor((-5)*runiform() + 3)
            
            forv j = 1/3 { // three runs will usually be enough to force sd be displayed as 1 (exactly)
                qui su x
                replace x = (x - r(mean))/r(sd)
                replace x = (-2) in 1
                replace x = 2 in l
            }
            
            su x
            but I doubt this is a useful approach.

            Best
            Daniel
            Last edited by daniel klein; 04 Mar 2015, 12:50.

            Comment


            • #7
              How can I create a variable with the following parameters: Mean of 0, SD of 1 within the range of -2 and +2.
              Below are two methods (quasi-random) that are close, though variance will be less than one:

              Code:
              //Censored end points
              clear
              set obs 1000000
              gen x = _n/_N
              gen censored_x = x
              replace  censored_x = normal(-2) if censored_x < normal(-2) 
              replace censored_x = normal(2) if censored_x > normal(2)
              gen c_z = invnormal(censored_x)
              
              //Truncated end points
              gen truncated_x = x
              replace  truncated_x = . if truncated_x <= normal(-2) | truncated_x >= normal(2)
              gen t_z = invnormal(truncated_x)
              sum c_z t_z

              Comment


              • #8
                Mean 0, sd 1 is easy. Just use corr2data

                Code:
                . set obs 1600
                obs was 0, now 1600
                
                . corr2data x
                
                . sum x
                
                    Variable |       Obs        Mean    Std. Dev.       Min        Max
                -------------+--------------------------------------------------------
                           x |      1600   -3.18e-09           1  -3.583039   3.331898
                The 2 and -2 limits are another matter. For one thing that wouldn't be a normal distribution, nor, as Nick shows, would it be uniform. You may have to do a brute force approach where you keep experimenting until you get what you want.

                corr2data is handy if, say, you have a published set of means, correlations and standard deviations and want to replicate a published analyses and the command you are using (e.g. regress) doesn't let you enter summary statistics.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  WOW, thanks guys. I learned a lot today.
                  Just to let you know why I am interested in creating such a specific random number. I was wondering how I could replicate an estimation without having an own dataset. Lets say I just knew that one of the variable used had these parameters.

                  Thanks again.
                  A big fan of the forum!

                  Comment


                  • #10
                    Hm, this is interesting. I have never thought about corr2data in this way.

                    However, I wonder whether a table of summary statistics for a couple of variables will be of any use beyond replicating exactly this table, if information about the covariance structure is missing. In other words, I cannot see how uni-variate descriptive results could ever be used to replicate multivariate analysis. So the key point in Richard's example is the presence of information on correlations.

                    Best
                    Daniel

                    Comment


                    • #11
                      I am not sure why you would want to use corr2data to create only one variable, but it can be done.

                      I mostly use corr2data to reproduce published regression results (although you can now do the same thing with the sem and ssd commands). I give an example on pages 8-10 of

                      http://www3.nd.edu/~rwilliam/xsoc63993/OLS-Stata9.pdf

                      I also use it when I want to construct a hypothetical example where everything conveniently works out exactly the way I want it to, I think of it as creating a population with the specified sets of values. For example, in this handout I show how random measurement error affects the means, standard deviations, and correlations of variables, and also bivariate slope coefficients:

                      http://www3.nd.edu/~rwilliam/xsoc63993/l21.pdf

                      corr2data can only work so many miracles. It can create a data set with a specified set of means, correlations and standard deviations, but you can't put constraints on how it does it, i.e. you can't tell it that one of the variables has to be a 0/1 dichotomy or that the variable can only range between -2 and 2. Students who don't read my handouts carefully are always asking questions like "How come a variable that is supposed to be a dichotomy has values that range between -3 and 3???" That is because you can only reproduce the means, correlations and standard deviations of a published analyses, not the actual data set that produced those values.
                      -------------------------------------------
                      Richard Williams, Notre Dame Dept of Sociology
                      StataNow Version: 19.5 MP (2 processor)

                      EMAIL: [email protected]
                      WWW: https://www3.nd.edu/~rwilliam

                      Comment

                      Working...
                      X