Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate Random numbers to replicate a distribution.

    Hi everybody.
    How can i generate random numbers in Stata in order to replicate a smooth density? What i have is several lognormal distributions estimated only from knowledge of distributional parameters and i need to generate a histogram that reproduces that particular distribution. Note that my distributions are estimated from grouped data (i do not know any value of the variable itself, only aggregate data like Gini and Mean Income from which i recovered the parameters). Since my distributions are lognormals, i should generate random normal values in Stata and then exponentiate them. Is it correct? Any additional comment or suggestion? Thanks to all.

  • #2
    You could plot the distribution without creating random data using twoway function.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Yes. That's right but not the point. What i need to do is replicate the distributions in a "discrete" way in order to obtain frequencies and then add them together.

      Comment


      • #4
        Alfonso:
        -help simulate-?
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Alfonso, clearly there is some lack of clarity about what it is precisely that you want to do. Another suggestion that might be potentially relevant, have a look for inspiration at my ancient code for mkbilogn on SSC. Once installed, you can look at the code with viewsource mkbilogn

          Code:
          TITLE
                'MKBILOGN': module to create bivariate lognormal variables
          
          DESCRIPTION/AUTHOR(S)
               
                mkbilogn creates random variables, var1 and var2, drawn from a
                bivariate lognormal distribution defined as follows. As n --> oo,
                X1 (var1) and X2 (var2) are such that x1=log(X1) and x2=log(X2)
                are bivariate Normal distributed with mean(x1) = m1, mean(x2) =
                m2,  s.d.(x1) = s1, s.d.(x2) = s2, corr(x1,x2) = rho. The
                parameters of  the distribution can be optionally chosen by the
                user, or default  to the values specified above.
               
                Author: Stephen P. Jenkins, London School of Economics
                Support: email [email protected]
               
                Distribution-Date: 19990112
               
          
          INSTALLATION FILES                             (type net install mkbilogn)
                mkbilogn.ado
                mkbilogn.hlp
          ---------------------------------------------------------------------------------------------------------------------------------
          (type ssc install mkbilogn to install)
          
          . ssc install mkbilogn, replace
          checking mkbilogn consistency and verifying not already installed...
          installing into d:\home\stephenj\ado\stbplus\...
          installation complete.
          
          . viewsource mkbilogn.ado
          
          . set obs 10000
          number of observations (_N) was 0, now 10,000
          
          . mkbilogn z1 z2 // default settings (see help file for others)
          Creating 2 r.v.s X1 X2  s.t. x1=log(X1), x2=log(X2) are bivariate
           Normal with mean(x1) = 0 ; mean(x2) = 0 ; s.d.(x1) = 1 ;
           s.d.(x2) = 1 ; corr(x1,x2) = .5
          
          . su
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                    z1 |     10,000    1.628046    2.048731   .0203935   38.68771
                    z2 |     10,000    1.627714    2.344479   .0221462   118.4913
          
          . corr z1 z2
          (obs=10,000)
          
                       |       z1       z2
          -------------+------------------
                    z1 |   1.0000
                    z2 |   0.3640   1.0000

          Comment


          • #6
            You could try fitting a Johnson distribution and generate a random variable from the fitted parameters. The user-written package jnsn available from SSC has both a fitting command (actually, two of them, jnsn and jnsw each using a different algorithm) and a command (ajv) that takes the parameters and generates the random variable from the parameters.
            Code:
            search jnsn
            for the link to the package on SSC from where to install it.

            Comment


            • #7
              I can provide you with more details. What i need to do is to aggregate several lognormal distributions (each of the represents the income distribution of a certain country). My target is to obtain an estimate of the World integrating these several country-individual distributions. Of course, i only have aggregate data on income for a certain nation but i can recover distributional parameters and characterise the distribution. Once all the country-individuals are estimated i need to "sum" them to obtain the World distribution. My supervisor suggested to create country-individual histograms that replicate the distribution so that i can create bins of people (independently from their nationality) and get a measure of how income is distributed worldwide. Makes more sense now? I need to find a way to construct a distribution which results from an "adding-up" process of several individual lognormals. Thanks to all

              Comment


              • #8
                There is a substantial literature that takes this sort of approach (as you probably know ... and it would have helped to have cited this initially -- see the Forum FAQ). See e.g. work by Chotikapanich and colleagues, and the references therein. (Example: "GLOBAL INCOME DISTRIBUTIONS AND INEQUALITY, 1993 AND 2000: INCORPORATING COUNTRY-LEVEL INEQUALITY MODELED WITH BETA DISTRIBUTIONS" by Duangkamon Chotikapanich, William E. Griffiths, D. S. Prasada Rao, and Vicar Valencia, The Review of Economics and Statistics, February 2012, 94(1): 52–73.

                You also need to be much more specific about the nature of the data to hand --
                aggregate data on income
                is far too vague. It matters a lot for how these models are fitted. Choice of "global" data sets also matter a lot. (See inter alia the discussion in the special issue of the Journal of Economic Inequality, December 2015)

                The "adding-up" idea that you refer to appears to be a standard property: an aggregate density function is the population-weighted sum of the subgroup densities. (For aggregate, read "world"; for subgroup, read "country") In your case, you wish to assume that the relevant density is the lognormal one.

                Comment


                • #9
                  Thanks Stephen and the others.
                  I know most of the papers you cited but there is some lack of details wrt the "integration" procedure. I'll give a more accurate look. Does anybody know anything about the Fenton-Wilknosn approximation? It seems also another good strategy. I'll test its performance in the next days.

                  Comment


                  • #10
                    The approximation that i was talking about does not work. When you say "the population-weighted sum of the subgroup densities" what do you mean? Or better, in practice, how i can i calculate the population-weighted densities and then add them up together?

                    Comment


                    • #11
                      From Jenkins/Van Kerm, "Accounting for income distribution trends: A density function decomposition approach", Journal of Economic Inequality (2005) 3: 43–61


                      Click image for larger version

Name:	2017-08-04_1540.png
Views:	1
Size:	23.4 KB
ID:	1405065


                      With lognormal parameter estimates for each country (subgroup), you characterise the subgroup (country) densities (the density has a particular functional form), and you presumably have external information about the population shares of each country.


                      Comment


                      • #12
                        I'll do an example so that someone can say if I am on the right track. Imagine I have 3 densities estimated for 3 countries with different population and I want to calculate the aggregate density function. Each of the 3 distributions has its parameters mu1,mu2,mu3 and s1,s2,s3. What you mean for external information on population shares is imaging that countries' 1 and 2 populations account for 0.4 each, and country's 3 population for 0.2 of the total population in the sample. The aggregate density mean parameter (muX) can be calculated as muX = mu1(0.4) + mu2(0.4) + mu3(0.2) and same for the variance. Is it correct? I am afraid that this is not right. The point is that i have estimated this individual densities only from knowledge of the two distributional parameters (PPP-adjusted GDP per capita values are used to anchor the distribution and the Gini coefficient is inverted to recover the variance). For each country individual distribution I do not have any other additional information apart from these 2 parameters. That's why I am a bit confused on how to produce an aggregate density.

                        Comment


                        • #13
                          Forget data and estimation issues for a moment. If you have estimates of the 2 lognormal distribution parameters for each country, then by construction, you can calculate what each country's density is, and also other statistics for each country such as moments like the mean and variance. If you also have estimates of the population shares (each country's population size as a fraction of the world population), you can calculate an aggregate (world) density using the expression in post #11. In post #12, you assume that the aggregate statistic (mean; variance) can be calculated in the same way as for the density. This is not necessarily correct in general.
                          I will now also reiterate my earlier post's point that data and estimation issues are really important too.

                          Please may I politely suggest that at this stage renewed reading of some canonical studies in the field is likely to pay greater dividends for your research than further posts.

                          Comment

                          Working...
                          X