Organizing transformed variables into deciles

Antonio Montalvo

Join Date: Nov 2018

Posts: 1
#1

Organizing transformed variables into deciles

14 Nov 2018, 21:32

Hello Stata users,

I have run into a bit of a conundrum I was hoping to get assistance on. I have a dataset with a number of variables which required transformations for normality. The issue is that I want to organize the output of these variables by decile.

Specifically, my dataset has to do with passing a test. I have the data organized by district so that all of the schools in said district share the same values for my IVS. So, for example, all of the schools in Stata district 1 have 33% African American students. I'm trying to capture the distributions of all of the schools, with each IV, within a decile. So, for example, Stata district 1 (33%), and Stata district 2 with 38% African American students would be placed into the grouping category for districts with 30-40% African American students.

I was able to use :egen AfAmer= cut(AfAmer), at (0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100) label, but because of the transformation, the output was incorrect (everything was 0-10%). I have also tried: egen AfAmer= cut( AfAmer ), group(10) and xtile q_AfAmer= AfAmer, nquantiles(10)

However, the tabulation came out rather equalized, which was not consistent with the tabulation of the raw data. So, for example, within my (raw) dataset the majority of schools either do not have many African American students (<20%) or a large concentration (>60%). Within the tabulation for the transformed and cut data most deciles have a roughly equal number of school districts represented. How can I produce a distribution, by decile, which is more consistent with the distribution found within the raw data? I want all of the school districts within the 0-10% range of transformed values for African American students to be placed into the category for 0-10%.

Please excuse me if this is unclear, I will be happy to clarify. Thanks for your help! I am using Stata 15.1
-Antonio

Last edited by Antonio Montalvo; 14 Nov 2018, 21:40.
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

15 Nov 2018, 14:29

You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output and sample data using dataex.

First, most estimators do not assume normality - they may assume normality of the errors for hypothesis tests, but that is different than normality of the variables. Of course, things will look very different if you transform variables although I would have thought percentiles would still include precisely the same observations (assuming you're not artificially creating missing data by, for example, taking logs of variables where 0 is a legitimate value). Why not work with the raw data to do the groups and then do your transformations?

By definition, you should have almost exactly the same number of observations in each decile - that is what a decile means. I suspect I am not understanding your problem correctly. If you want to take the range of a variable and divide that range into 10 pieces, that is not what we normally call a decile. If you want to divide the range of a variable into ten pieces, there is probably an efficient way to do it, but you can do it by generating the maximum and minimum (summary will do this if you don't want to vary these across observations in the sample [run return list after summary and you'll see r(min) and r(max) have these values]. Alternatively, egen does this and will let you do max and min for groups. Then just do arithmetic to determine your category ranges. By the way, it is seldom a good idea to take a continuous variable and make it into categories- you're throwing away information.
Comment

David Benson

Join Date: Oct 2018
Posts: 489

18 Nov 2018, 00:27

It sounds like, as Phil mentioned, that you don't want deciles, but rather a dummy variable for each 10% range (i.e. 0-9%, 10-19%, 20-29%, 30-39%, etc).

See if this does what you want. I am going to call the variable with percent of African American students percent_afamer I am also assuming that percent_afamer is coded as a number (i.e. 37 for 37%, and not 0.37 for 37%)

Code:

* Int() rounds down, so int(37 / 10) = 3
gen afamer_grouping = int(percent_afamer  / 10)
gen d_AfA0 = (percent_afamer < 10)
forvalues i = 1/9 {
gen d_AfA`i' = (afamer_grouping == `i')
}

* Note: d_AfA7==0 for all obs, because I didn't create any schools with 70-79% in my toy dataset
. list percent_afamer afamer_grouping d_AfA0- d_AfA9 , sepby( afamer_grouping)

     +---------------------------------------------------------------------------------------------------------------+
     | percen~r   afamer~g   d_AfA0   d_AfA1   d_AfA2   d_AfA3   d_AfA4   d_AfA5   d_AfA6   d_AfA7   d_AfA8   d_AfA9 |
     |---------------------------------------------------------------------------------------------------------------|
  1. |        9          0        1        0        0        0        0        0        0        0        0        0 |
     |---------------------------------------------------------------------------------------------------------------|
  2. |       13          1        0        1        0        0        0        0        0        0        0        0 |
  3. |       15          1        0        1        0        0        0        0        0        0        0        0 |
  4. |       17          1        0        1        0        0        0        0        0        0        0        0 |
  5. |       19          1        0        1        0        0        0        0        0        0        0        0 |
     |---------------------------------------------------------------------------------------------------------------|
  6. |       26          2        0        0        1        0        0        0        0        0        0        0 |
     |---------------------------------------------------------------------------------------------------------------|
  7. |       31          3        0        0        0        1        0        0        0        0        0        0 |
  8. |       31          3        0        0        0        1        0        0        0        0        0        0 |
  9. |       33          3        0        0        0        1        0        0        0        0        0        0 |
 10. |       34          3        0        0        0        1        0        0        0        0        0        0 |
 11. |       34          3        0        0        0        1        0        0        0        0        0        0 |
     |---------------------------------------------------------------------------------------------------------------|
 12. |       47          4        0        0        0        0        1        0        0        0        0        0 |
     |---------------------------------------------------------------------------------------------------------------|
 13. |       50          5        0        0        0        0        0        1        0        0        0        0 |
 14. |       50          5        0        0        0        0        0        1        0        0        0        0 |
 15. |       51          5        0        0        0        0        0        1        0        0        0        0 |
 16. |       59          5        0        0        0        0        0        1        0        0        0        0 |
     |---------------------------------------------------------------------------------------------------------------|
 17. |       60          6        0        0        0        0        0        0        1        0        0        0 |
 18. |       68          6        0        0        0        0        0        0        1        0        0        0 |
     |---------------------------------------------------------------------------------------------------------------|
 19. |       88          8        0        0        0        0        0        0        0        0        1        0 |
     |---------------------------------------------------------------------------------------------------------------|
 20. |       91          9        0        0        0        0        0        0        0        0        0        1 |
     +---------------------------------------------------------------------------------------------------------------+

.

Announcement

Organizing transformed variables into deciles

Comment

Comment