loop to create new variable based on cross tabulation ouputs

millogo Ourohire

Join Date: Sep 2016

Posts: 4
#1

loop to create new variable based on cross tabulation ouputs

09 Sep 2016, 10:59

Dear Statalist
I am trying to come out with some statistics but i am facing some issues :
I have two variables : wealth index (count) in five categories (WI)
and region (REG) which has 13 levels

I would like to have the proportion of each wealth index level in each region: first by this script below i have the total population per region to use as denominator

forvalues i =1/13 {
egen w`i'= count(WI) if REG==`i', by(REG)
}

and now i would like to have the population of each wealth index level in each region, to use as numerator to have the proportion of each wealth index level per region, but my script below don't give me what i am expecting

forvalues i =1/5 {
forvalues j =1/13 {
egen wi`i'_`j'= count(WI) if REG==`j' & WI==`j' , by(REG)
}
}

I will appreciate any support to overcome this issue!
best!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

09 Sep 2016, 12:40

Without seeing a sample of your data it will be difficult to offer concrete advice. For example, it isn't clear from your post whether each observation in your data set corresponds to a person, or to a region, or even to something else. In addition, "don't give me what I am expecting" is not very informative--it tells us neither what you got nor what you were expecting.

Please post a representative small excerpt of your data using the -dataex- command (-ssc install dataex-; instructions are at -help dataex-), and then show what the results for that example should look like.
Comment

millogo Ourohire

Join Date: Sep 2016
Posts: 4

10 Sep 2016, 03:36

Dear Clyde,
Thanks for the response. more precision, each observation corresponds to a person. sorry, for "don't give me what I am expecting" i wanted to say that I have the wealth index quintiles for the whole country and I would like to get the quintiles (number of person by each quintiles of wealth index in each of the 13 regions in the country) and my script did not.

I follow yours instructions and here is an example. I hope it is helpful.
Best regards!

Code:

* Example generated by -dataex-. To install: ssc    install    dataex
clear
input byte(REG WI)
1 2
1 2
1 3
1 2
1 3
1 4
1 2
1 4
1 3
1 4
1 4
1 4
1 2
1 3
1 3
1 2
1 3
1 2
1 1
1 2
1 4
1 4
1 5
1 4
1 3
1 2
1 5
1 5
1 3
1 3
1 2
1 5
1 4
1 4
1 5
1 5
1 3
1 5
1 4
1 4
1 1
1 3
1 3
1 5
1 3
1 5
1 3
1 4
1 3
1 4
1 4
1 2
1 2
1 1
1 4
1 3
1 3
1 2
1 2
1 2
1 4
1 2
1 2
1 2
1 4
1 2
1 4
1 3
1 4
1 2
1 1
1 1
1 4
1 4
1 5
1 3
1 5
1 3
1 3
1 4
1 4
1 1
1 4
1 3
1 2
1 2
1 2
1 3
1 2
1 2
1 4
1 3
1 2
1 2
1 4
1 4
1 3
1 4
1 2
1 4
end
label values REG WI
label def REG 1 "BMH", modify
label values WI WI
label def WI 1 "Poorest", modify
label def WI 2 "Poorer", modify
label def WI 3 "Middle", modify
label def WI 4 "Richer", modify
label def WI 5 "Richest", modify

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

10 Sep 2016, 09:32

Thanks for posting the example with -dataex-. Now that I understand your data, I think that you can see, at the same time, the population and the proportion of each wealth group in each region by running:

Code:

tab REG WI, row

If you need to have these same statistics as Stata variables in your data set, not just see them in the Results window, you can do this:

Code:

by REG WI, sort: gen count_wi_in_reg = _N by REG: gen count_all_in_reg = _N gen prop_wi_in_reg = count_wi_in_reg/count_all_in_reg

By the way, I think there is an error in your data set. You have the variable REG labeled with WI. I think you want it labeled with value label REG--that would make more sense.
Comment
millogo Ourohire

Join Date: Sep 2016

Posts: 4
#5

12 Sep 2016, 02:36

Dear Clyde,
I have run the script and everything run well and correspond to what i trying to get. Lot and lot of thanks.
All my best regards!
Comment
millogo Ourohire

Join Date: Sep 2016

Posts: 4
#6

14 Apr 2017, 05:37

Dear Clyde,
I would like to have some help on how I could do a clustering analysis on categorical data after a multiple correspondance analysis mostly the clustering of the variables. What I have seen based mostly on the clustering of observations!
Best and thanks for your support
Comment

Announcement