Calculating first four central moments of a discrete prior distribution (efficient way)

Nicholas Lacourse

Join Date: Dec 2019
Posts: 26

Calculating first four central moments of a discrete prior distribution (efficient way)

02 May 2021, 09:30

Hello,

I have a set of variables in which combined produces a prior belief distribution over possible outcomes. See below for a snippet:

Code:

input str24 id float(p_score_0 p_score_1 p_score_2 p_score_3 p_score_4 p_score_5 p_score_6 p_score_7 p_score_8 p_score_9 p_score_10 p_score_11 p_score_12 p_score_13 p_score_14 p_score_15 p_score_16 p_score_17 p_score_18 p_score_19 p_score_20)
"5f3ecc42701f7b169ba22ff9"   0   0   0   0   0   0   0   0 .01 .02 .06 .11 .14 .19  .2 .18 .08 .01   0   0   0
"57bdb5eb467f26000125db79" .01 .02 .02 .02 .03 .04 .04 .06 .08  .1  .1  .1  .1 .08 .06 .04 .03 .02 .02 .02 .01
"5d2a068fd7d4940019e63136"   0   0   0   0   0   0   0  .4  .3   0  .3   0   0   0   0   0   0   0   0   0   0
"5acba4a15cd10500016280ca"   0   0   0   0   0   0   0   0   0   0   0   0   0   0 .14 .33 .23 .22 .06 .02   0
"5ff76d52a4cb3f434e49eabe"   0   0   0   0   0   0   0   0  .1  .5  .1  .1 .07 .08 .05   0   0   0   0   0   0
"5cb5464a6f451e00012da7cf"   0   0   0   0   0   0   0   0   0 .06  .5 .25 .13 .06   0   0   0   0   0   0   0
"5c1a9424a329230001ecaabd"   0   0   0   0   0   0   0   0   0   0   0   0 .02 .03 .05 .05  .1 .15  .4 .15 .05
"5e7e89cc4fd114104a39fb88"   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 .15  .8 .05   0   0   0
"6057b1a36c955bab2ad1d5df"   0   0   0   0   0   0   0   0   0   0   0   0   0 .05 .75  .1  .1   0   0   0   0
"56fdff4ee0f9ff000f19c466"   0   0   0   0   0   0   0  .1  .1  .1  .5  .1  .1   0   0   0   0   0   0   0   0
"5fde840829db956469ee680f"   0   0   0   0   0   0   0   0   0   0 .05 .06 .08  .3 .25  .2 .04 .02   0   0   0
"59aee2c57f6c84000151bd9b"   0   0   0   0   0   0   0   0   0   0   0   0   0 .05 .05 .05 .75 .05 .05   0   0
"5eaf02446c144d5e3a4d4562"   0   0   0   0   0   0   0   0   0   0 .05 .15  .6  .1  .1   0   0   0   0   0   0
"60159647df028869091a0d3d"  .6  .1 .06 .03 .01  .2   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
"5f4ff67ab2c26c14b484ab03"   0   0   0   0   0   0   0   0   0   0  .1  .1  .5  .1  .1  .1   0   0   0   0   0
"5e5c03ab6a255b3ece760715"   0   0   0   0   0   0   0   0   0   0   0   0  .1  .2  .6  .1   0   0   0   0   0
"5c6d8aa9701e050001338a3e"   0   0   0   0 .02 .02 .04 .05 .12 .15  .3  .2  .1   0   0   0   0   0   0   0   0
"5e8f53644b34ff22d9b3a1c5"   0   0   0   0   0   0   0   0   0   0   0  .2  .2 .05 .05  .5   0   0   0   0   0
"5c2fd11610677f0001dd7efc"   0   0   0   0   0   0   0   0   0 .01 .02 .02 .85 .02 .02 .02 .02 .02   0   0   0
"5ec712b9f6899e15e2ebdc6d"   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 .03 .85  .1 .02

The values of each variable represents the probability they think that variable's outcome will occur where the outcome is the numerical value prescribed at the end of the variable names. I am wanting to produce the first four moments of each individuals prior distribution.

I was able to hard code and produce each of the moments, but it was extremely tedious. Is there a more elegant solution using loops? Right now the data is wide format, I guess it could help by reshaping it long and make the j correspond to the outcome value? Still am lost on the most effective way to then compute the moments.

Any help would be greatly appreciated.

Nicholas

Tags: data, loop

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

02 May 2021, 10:10

Thank you for providing a sample of your data. However from there on you start to speak in your own language (I do not know what it is to "hard code" and the implied opposite to "soft code") so it is not clear what is your problem, as you say that you know how to do it.

You might want to post the code that you have used so far, and to say how you want to have it improved with reference to what you have now.
Comment

Nicholas Lacourse

Join Date: Dec 2019
Posts: 26

02 May 2021, 11:25

Joro,

You are right the language is unclear. What I meant was that I wrote out every expression in creating the moments I did not loop anything at all as you will see below:

Code:

gen prior_mean = 0*p_score_0 + 1*p_score_1 + 2*p_score_2 + 3*p_score_3 + 4*p_score_4 + 5*p_score_5 + 6*p_score_6 + 7*p_score_7 + 8*p_score_8 + 9*p_score_9 + 10*p_score_10 + 11*p_score_11 + 12*p_score_12 + 13*p_score_13 + 14*p_score_14 + 15*p_score_15 + 16*p_score_16 + 17*p_score_17 + 18*p_score_18 + 19*p_score_19 + 20*p_score_20

gen prior_var =  ((0^2)*p_score_0 + (1^2)*p_score_1 + (2^2)*p_score_2 + (3^2)*p_score_3 + (4^2)*p_score_4 + (5^2)*p_score_5 + (6^2)*p_score_6 + (7^2)*p_score_7 + (8^2)*p_score_8 + (9^2)*p_score_9 + (10^2)*p_score_10 + (11^2)*p_score_11 + (12^2)*p_score_12 + (13^2)*p_score_13 + (14^2)*p_score_14 + (15^2)*p_score_15 + (16^2)*p_score_16 + (17^2)*p_score_17 + (18^2)*p_score_18 + (19^2)*p_score_19 + (20^2)*p_score_20) - (prior_mean^2)

gen prior_sd = sqrt(prior_var)

Skewness and Kurtosis I left out for now as I am still unsure how I want to compute them due to having a low count outcome variable. Please advise if you would like still.

Is there anyway to not have to write out every expression in producing at least the first two moments?

Comment

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#4

02 May 2021, 11:58

Lets just go one more time as to what you want to do before I start typing my soul out.

You want to produce row moments, that is, moments of the 21 variables that you have in your data.

E.g., if you want to calculate the row mean, for each row of your data (each individual if I got you right) you want Sum_i=0²⁰ (p_score_i)/21 ?
Comment
Nicholas Lacourse

Join Date: Dec 2019

Posts: 26
#5

02 May 2021, 12:27

Yes, that is correct Joro.

But just to clarify the values in each variable are in fact the probabilities of each possible state of the world. So, it is not as simple as something like:

Code:

egen prior_mean = rowmean(p_score0-p_score_20)

The above may or may not be psuedo code.

Cheers,

Nicholas
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

02 May 2021, 12:52

OK, so you insist that what you have done in #3 is correct, and you want this calculation.

I think we can do it like this.

1) We reshape to long, it will be a pain to deal in wide with this calculation.

Code:

. reshape long p_score_, i(id) j(state)
(note: j = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       20   ->     420
Number of variables                  22   ->       3
j variable (21 values)                    ->   state
xij variables:
     p_score_0 p_score_1 ... p_score_20   ->   p_score_
-----------------------------------------------------------------------------

2) We calculate the mean

Code:

. egen mean = total(state*p), by(id)

3) We calculate the other 3 moments

Code:

. forvalues i=2/4 {
  2. egen moment`i' = total((state-mean)^`i'*p), by(id)
  3. }

You might want to calculate some other stuff here, I saw you calculated the standard deviation. It would be

Code:

. gen sd = sqrt(moment2)

4) We reshape back to wide

Code:

. reshape wide
(note: j = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                      420   ->      20
Number of variables                   8   ->      27
j variable (21 values)            state   ->   (dropped)
xij variables:
                               p_score_   ->   p_score_0 p_score_1 ... p_score_20
-----------------------------------------------------------------------------

. list id mean moment2 moment3 moment4 sd

     +------------------------------------------------------------------------------+
     |                       id    mean   moment2     moment3    moment4         sd |
     |------------------------------------------------------------------------------|
  1. | 56fdff4ee0f9ff000f19c466     9.7      1.81   -1.043999     9.2617   1.345362 |
  2. | 57bdb5eb467f26000125db79   10.18   17.6676   -7.566352   892.1121   4.203284 |
  3. | 59aee2c57f6c84000151bd9b   15.85     .9275   -.9292511   5.066733    .963068 |
  4. | 5acba4a15cd10500016280ca   15.79    1.4859    .7353781   5.592604   1.218975 |
  5. | 5c1a9424a329230001ecaabd   17.28    3.1016   -5.947302   37.17585   1.761136 |
     |------------------------------------------------------------------------------|
  6. | 5c2fd11610677f0001dd7efc   12.21    1.2459    3.255822    17.6635   1.116199 |
  7. | 5c6d8aa9701e050001338a3e    9.48    3.2696   -5.598821   39.55676   1.808203 |
  8. | 5cb5464a6f451e00012da7cf   10.63     .9731    .7607936   2.857929   .9864583 |
  9. | 5d2a068fd7d4940019e63136     8.2      1.56    1.056001   3.979201      1.249 |
 10. | 5e5c03ab6a255b3ece760715    13.7       .61   -.3240014   1.173701    .781025 |
     |------------------------------------------------------------------------------|
 11. | 5e7e89cc4fd114104a39fb88    15.9       .19   -.0420003   .1717001   .4358899 |
 12. | 5e8f53644b34ff22d9b3a1c5   13.45    2.8475   -1.685248   10.98273   1.687454 |
 13. | 5eaf02446c144d5e3a4d4562   12.05     .8475    .2227495   2.592731   .9205977 |
 14. | 5ec712b9f6899e15e2ebdc6d   18.11     .1979    .1633616   .3636064   .4448595 |
 15. | 5f3ecc42701f7b169ba22ff9   13.17    3.4611   -2.500075   31.34906   1.860403 |
     |------------------------------------------------------------------------------|
 16. | 5f4ff67ab2c26c14b484ab03    12.3      1.81    1.043999     9.2617   1.345362 |
 17. | 5fde840829db956469ee680f    13.5      2.25        -1.2    15.8625        1.5 |
 18. | 5ff76d52a4cb3f434e49eabe    9.98    2.7996     4.88798   22.98431     1.6732 |
 19. | 60159647df028869091a0d3d    1.35    3.9475     8.58225   38.21843   1.986832 |
 20. | 6057b1a36c955bab2ad1d5df   14.25     .4875      .46875   1.094531    .698212 |
     +------------------------------------------------------------------------------+

.

Originally posted by Nicholas Lacourse View Post

Yes, that is correct Joro.

But just to clarify the values in each variable are in fact the probabilities of each possible state of the world. So, it is not as simple as something like:

Code:

egen prior_mean = rowmean(p_score0-p_score_20)

The above may or may not be psuedo code.

Cheers,

Nicholas

Comment

Nicholas Lacourse

Join Date: Dec 2019

Posts: 26
#7

02 May 2021, 13:25

Ah I see. Thank you for this! My intuition behind how to do it was correct, I just poorly executed it. So now that I have the moment calculations I can produce the skewness and kurtosis?

So skewness I am guessing would look something like this:

Code:

gen prior_skew = moment3*moment2^(-3/2)

and kurtosis:

Code:

gen prior_kurt = moment4*moment2^(-2) gen prior_kurt_excess = prior_kurt - 3
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

02 May 2021, 13:35

Yes, these are the correct formulas for skewness and kurtosis.

Originally posted by Nicholas Lacourse View Post

Ah I see. Thank you for this! My intuition behind how to do it was correct, I just poorly executed it. So now that I have the moment calculations I can produce the skewness and kurtosis?

So skewness I am guessing would look something like this:

Code:

gen prior_skew = moment3*moment2^(-3/2)

and kurtosis:

Code:

gen prior_kurt = moment4*moment2^(-2) gen prior_kurt_excess = prior_kurt - 3
Comment

Announcement

Calculating first four central moments of a discrete prior distribution (efficient way)

Comment

Comment

Comment

Comment

Comment

Comment

Comment