Hello,
I have data spanning investments from years 2014-2017. Investments are uniquely identified by the variable INVESTMENT_ID. In order to find the total amount invested by each investor in each year (excluding those whose investments were = 0), I have used the following code (in which AMOUNT is the amount of any single investment):
The output looks like this:
investor INVESTMENT _ID year AMOUNT total
2046164 ____55432 _____2014_ 600797 _1153083
2046164 ____99171 _____2014_ 552286 _1153083
To find the average total amount invested (excluding those whose investments in any given year were), I ran the following code:
This yielded the average amount invested per year of $778,141. I am interested in breaking this down further at the 25th and 7th percentiles. However, When I ran the following code:
the output indicated that the 75th percentile was $250,000 - well below the $778,141 mean.
I have a hunch that this is related to how I have sorted the data, but I have not been able to confirm this. Thank you for reading.
Edit: To clarify, the collapse command produces the same results every time, which makes me further doubt that is a sorting error.
I have data spanning investments from years 2014-2017. Investments are uniquely identified by the variable INVESTMENT_ID. In order to find the total amount invested by each investor in each year (excluding those whose investments were = 0), I have used the following code (in which AMOUNT is the amount of any single investment):
Code:
by INVESTMENT_ID year, sort: egen total = total(AMOUNT)
investor INVESTMENT _ID year AMOUNT total
2046164 ____55432 _____2014_ 600797 _1153083
2046164 ____99171 _____2014_ 552286 _1153083
To find the average total amount invested (excluding those whose investments in any given year were), I ran the following code:
Code:
bysort investor total: keep if _n==1 drop if total==0 egen avg_inv = mean(total) tabulate avg_inv
Code:
sort total collapse total (p25) p20=total (p75) p80=total (p90) p90=total
I have a hunch that this is related to how I have sorted the data, but I have not been able to confirm this. Thank you for reading.
Edit: To clarify, the collapse command produces the same results every time, which makes me further doubt that is a sorting error.
Comment