Generate random data with mean, std and range

Wenhan Yan

Join Date: Oct 2020

Posts: 56
#1

Generate random data with mean, std and range

26 Apr 2021, 19:45

So my professor is asking us to produce some data from a experimental paper to do power analysis. So we want to generate a variable that has mean of 60 and std of 26.141, but have range from 0 to 100. So every time we use rnormal(60,26.141) will always contain outliers. So one method I think could work is redraw the outliers until they are in the range. But I don't know how to write the argument in loop so that if the condition not satisfied (have outliers) redo the process. This is what I got so far:

Code:

clear all set obs 97 g coop_mixed =. foreach i in coop_mixed { replace coop_mixed = rnormal(60,26.141) replace coop_mixed=rnormal(60,26.141) if coop_mixed <0 replace coop_mixed=rnormal(60,26.141) if coop_mixed >100 }
Tags: data, foreach, loop
Dirk Enzmann

Join Date: Apr 2014

Posts: 529
#2

26 Apr 2021, 21:20

Putting aside why you set the number of observations to 97: You are drawing data from a normal distribution -- why? If the mean is 60 and the minimum value(s) should be 0 and the maximum value(s) 100 (of the randomly drawn data?), would you expect the data to be normally distributed? If this is a problem of an exercise you have to solve, it is perhaps not best to ask on the Stata Forum (or any other). However, once you solved the issue, I would be curious to see the solution.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#3

27 Apr 2021, 00:25

It is not possible to have normal distribution with a given range, because the range of the normal is minus to plus infinity.

The procedure that you have in mind is not going to work, because any time you replace value with is out of range, this will also change the mean and the variance of your sample.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35672
#4

27 Apr 2021, 02:07

I agree with previous posts: seeking to sample from an unbounded distribution when the goal is a bounded distribution is misguided if the bounds bite. .

A beta distribution with parameters 0.6 and 0.4 has mean 0.6 and SD 0.346 (3 dp, calculations hasty). A beta distribution with parameters 1.2 and 0.8 has the same mean 0.6 and SD 0.283 (ditto).

Multiplied by 100 that may work well enough, but you can get closer See e.g. the Wikipedia article for parameterisation of the beta in terms of the mean and SD.
Comment
Wenhan Yan

Join Date: Oct 2020

Posts: 56
#5

27 Apr 2021, 16:05

The experiment in the paper is to design whether mixed strategy will result people cooperate more compare with pure strategy in prisoner dilemma setting. And we want to do data generation process. So we treat the result from the experiment as the population and simulate the data based on the "population". That's why we have mean of 60, std of 26.141 with the obs of 97 with the range from 0-100. And we also assume we draw from normal distribution based on central limit theorem.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35672
#6

27 Apr 2021, 16:31

What we tell you three times is true. A normal distribution with mean 60 and SD 26.141 has a 6.5% probability of exceeding 100 and a 1.2% probability of falling below 0. That's not negotiable.

Code:

. di normal((100-60)/26.41) .93506032 . di normal((0-60)/26.41) .01154728

The central limit theorem is neither here nor there. It implies that sums, or means, tend to normality under certain fairly wide conditions, but you're making an assumption about data, not means.

Naturally you can always fudge the deviant data points e.g. by folding them back, but then you will no longer have a normal distribution.
2 likes
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 529
#7

27 Apr 2021, 20:37

The hint in #4 mentioning the beta distribution could help you to understand the problem (to specify the shape parameters that you will need will require some deeper understanding). Did you have a look at the respective Wikipedia entry? That should give you an idea (e.g. a continuous probability distribution defined on the interval [0, 1]). However, if the goal is a power analysis to replicate a study, I can't see why it is necessary to restrict the interval to the same values as in the original study. But a power analysis where you restrict the number of cases to 97 with a fixed effect size (?) does not make much sense, anyway.

Last edited by Dirk Enzmann; 27 Apr 2021, 20:49.
Comment

Announcement

Generate random data with mean, std and range

Comment

Comment

Comment

Comment

Comment

Comment