Generate Random numbers to replicate a distribution.

Alfonso Russo

Join Date: Jun 2017

Posts: 11
#1

Generate Random numbers to replicate a distribution.

03 Aug 2017, 05:21

Hi everybody.
How can i generate random numbers in Stata in order to replicate a smooth density? What i have is several lognormal distributions estimated only from knowledge of distributional parameters and i need to generate a histogram that reproduces that particular distribution. Note that my distributions are estimated from grouped data (i do not know any value of the variable itself, only aggregate data like Gini and Mean Income from which i recovered the parameters). Since my distributions are lognormals, i should generate random normal values in Stata and then exponentiate them. Is it correct? Any additional comment or suggestion? Thanks to all.
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3460
#2

03 Aug 2017, 05:32

You could plot the distribution without creating random data using twoway function.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Alfonso Russo

Join Date: Jun 2017

Posts: 11
#3

03 Aug 2017, 05:41

Yes. That's right but not the point. What i need to do is replicate the distributions in a "discrete" way in order to obtain frequencies and then add them together.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#4

03 Aug 2017, 06:27

Alfonso:
-help simulate-?

Kind regards,
Carlo
(Stata 19.0)
Comment

Stephen Jenkins

Join Date: Apr 2014
Posts: 1436

03 Aug 2017, 07:19

Alfonso, clearly there is some lack of clarity about what it is precisely that you want to do. Another suggestion that might be potentially relevant, have a look for inspiration at my ancient code for mkbilogn on SSC. Once installed, you can look at the code with viewsource mkbilogn

Code:

TITLE
      'MKBILOGN': module to create bivariate lognormal variables

DESCRIPTION/AUTHOR(S)
     
      mkbilogn creates random variables, var1 and var2, drawn from a
      bivariate lognormal distribution defined as follows. As n --> oo,
      X1 (var1) and X2 (var2) are such that x1=log(X1) and x2=log(X2)
      are bivariate Normal distributed with mean(x1) = m1, mean(x2) =
      m2,  s.d.(x1) = s1, s.d.(x2) = s2, corr(x1,x2) = rho. The
      parameters of  the distribution can be optionally chosen by the
      user, or default  to the values specified above.
     
      Author: Stephen P. Jenkins, London School of Economics
      Support: email [email protected]
     
      Distribution-Date: 19990112
     

INSTALLATION FILES                             (type net install mkbilogn)
      mkbilogn.ado
      mkbilogn.hlp
---------------------------------------------------------------------------------------------------------------------------------
(type ssc install mkbilogn to install)

. ssc install mkbilogn, replace
checking mkbilogn consistency and verifying not already installed...
installing into d:\home\stephenj\ado\stbplus\...
installation complete.

. viewsource mkbilogn.ado

. set obs 10000
number of observations (_N) was 0, now 10,000

. mkbilogn z1 z2 // default settings (see help file for others)
Creating 2 r.v.s X1 X2  s.t. x1=log(X1), x2=log(X2) are bivariate
 Normal with mean(x1) = 0 ; mean(x2) = 0 ; s.d.(x1) = 1 ;
 s.d.(x2) = 1 ; corr(x1,x2) = .5

. su

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          z1 |     10,000    1.628046    2.048731   .0203935   38.68771
          z2 |     10,000    1.627714    2.344479   .0221462   118.4913

. corr z1 z2
(obs=10,000)

             |       z1       z2
-------------+------------------
          z1 |   1.0000
          z2 |   0.3640   1.0000

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4426
#6

03 Aug 2017, 07:29

You could try fitting a Johnson distribution and generate a random variable from the fitted parameters. The user-written package jnsn available from SSC has both a fitting command (actually, two of them, jnsn and jnsw each using a different algorithm) and a command (ajv) that takes the parameters and generates the random variable from the parameters.

Code:

search jnsn

for the link to the package on SSC from where to install it.
Comment
Alfonso Russo

Join Date: Jun 2017

Posts: 11
#7

03 Aug 2017, 08:07

I can provide you with more details. What i need to do is to aggregate several lognormal distributions (each of the represents the income distribution of a certain country). My target is to obtain an estimate of the World integrating these several country-individual distributions. Of course, i only have aggregate data on income for a certain nation but i can recover distributional parameters and characterise the distribution. Once all the country-individuals are estimated i need to "sum" them to obtain the World distribution. My supervisor suggested to create country-individual histograms that replicate the distribution so that i can create bins of people (independently from their nationality) and get a measure of how income is distributed worldwide. Makes more sense now? I need to find a way to construct a distribution which results from an "adding-up" process of several individual lognormals. Thanks to all
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1436
#8

03 Aug 2017, 10:01

There is a substantial literature that takes this sort of approach (as you probably know ... and it would have helped to have cited this initially -- see the Forum FAQ). See e.g. work by Chotikapanich and colleagues, and the references therein. (Example: "GLOBAL INCOME DISTRIBUTIONS AND INEQUALITY, 1993 AND 2000: INCORPORATING COUNTRY-LEVEL INEQUALITY MODELED WITH BETA DISTRIBUTIONS" by Duangkamon Chotikapanich, William E. Griffiths, D. S. Prasada Rao, and Vicar Valencia, The Review of Economics and Statistics, February 2012, 94(1): 52–73.

You also need to be much more specific about the nature of the data to hand --

aggregate data on income

is far too vague. It matters a lot for how these models are fitted. Choice of "global" data sets also matter a lot. (See inter alia the discussion in the special issue of the Journal of Economic Inequality, December 2015)

The "adding-up" idea that you refer to appears to be a standard property: an aggregate density function is the population-weighted sum of the subgroup densities. (For aggregate, read "world"; for subgroup, read "country") In your case, you wish to assume that the relevant density is the lognormal one.
1 like
Comment
Alfonso Russo

Join Date: Jun 2017

Posts: 11
#9

03 Aug 2017, 13:30

Thanks Stephen and the others.
I know most of the papers you cited but there is some lack of details wrt the "integration" procedure. I'll give a more accurate look. Does anybody know anything about the Fenton-Wilknosn approximation? It seems also another good strategy. I'll test its performance in the next days.
Comment
Alfonso Russo

Join Date: Jun 2017

Posts: 11
#10

04 Aug 2017, 08:13

The approximation that i was talking about does not work. When you say "the population-weighted sum of the subgroup densities" what do you mean? Or better, in practice, how i can i calculate the population-weighted densities and then add them up together?
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1436
#11

04 Aug 2017, 08:44

From Jenkins/Van Kerm, "Accounting for income distribution trends: A density function decomposition approach", Journal of Economic Inequality (2005) 3: 43–61

With lognormal parameter estimates for each country (subgroup), you characterise the subgroup (country) densities (the density has a particular functional form), and you presumably have external information about the population shares of each country.
Comment
Alfonso Russo

Join Date: Jun 2017

Posts: 11
#12

04 Aug 2017, 10:50

I'll do an example so that someone can say if I am on the right track. Imagine I have 3 densities estimated for 3 countries with different population and I want to calculate the aggregate density function. Each of the 3 distributions has its parameters mu1,mu2,mu3 and s1,s2,s3. What you mean for external information on population shares is imaging that countries' 1 and 2 populations account for 0.4 each, and country's 3 population for 0.2 of the total population in the sample. The aggregate density mean parameter (muX) can be calculated as muX = mu1(0.4) + mu2(0.4) + mu3(0.2) and same for the variance. Is it correct? I am afraid that this is not right. The point is that i have estimated this individual densities only from knowledge of the two distributional parameters (PPP-adjusted GDP per capita values are used to anchor the distribution and the Gini coefficient is inverted to recover the variance). For each country individual distribution I do not have any other additional information apart from these 2 parameters. That's why I am a bit confused on how to produce an aggregate density.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1436
#13

04 Aug 2017, 12:29

Forget data and estimation issues for a moment. If you have estimates of the 2 lognormal distribution parameters for each country, then by construction, you can calculate what each country's density is, and also other statistics for each country such as moments like the mean and variance. If you also have estimates of the population shares (each country's population size as a fraction of the world population), you can calculate an aggregate (world) density using the expression in post #11. In post #12, you assume that the aggregate statistic (mean; variance) can be calculated in the same way as for the density. This is not necessarily correct in general.
I will now also reiterate my earlier post's point that data and estimation issues are really important too.

Please may I politely suggest that at this stage renewed reading of some canonical studies in the field is likely to pay greater dividends for your research than further posts.
1 like
Comment

Announcement

Generate Random numbers to replicate a distribution.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment