Loop through two lists to create new variables

Chris Boulis

Join Date: Feb 2019

Posts: 363
#1

Loop through two lists to create new variables

07 Apr 2021, 04:47

Hi Statalist.

I would like to create ratios of the value of different asset types for one group as a share of other groups. (Note these groups are categories within a nominal variable with 9 categories - 1-9). Each asset type is a continuous variable). The following is my attempt at coding the rules for four groups (reg1 - reg4):

Code:

local varlist totasset totfin totbank totcashi totequity totins totnonfin totprop totbus totveh foreach i in local varlist { gen reg1 = `i' if group == 3 / (`i' * inlist(group, 3, 4, 5)) // ratio of group3's assets as a share of groups 3, 4 & 5. gen reg2 = `i' if group == 3 / (`i' * inlist(group, 3, 4)) // ratio of group3's assets as a share of groups 3 & 4. gen reg3 = `i' if group == 3 / (`i' * inlist(group, 3, 5)) // ratio of group3's assets as a share of groups 3 & 5. gen reg4 = `i' if group== 4 / (`i' * inlist(group, 4, 5)) // ratio of group4's assets as a share of groups 4 & 5. }

I then want to feed this new list (reg1 - reg4) into a loop to create the same ratios for each of the asset types in my list.

Code:

local varlist totasset totfin totbank totcashi totequity totins totnonfin totprop totbus totveh foreach i in local varlist {
foreach j in reg1 reg2 reg3 reg4 {
gen `i'_`j' = `i' if `j' }
}

I am hoping this will give me the asset values, e.g. of the first asset type:
totasset_reg1
totasset_reg2
totasset_reg3
totasset_reg4

I then plan to graph these ratios of asset values (separately) over age to see the change over the lifecycle.

Help is kindly appreciated.

Stata v.15.1. I am using panel data.

Last edited by Chris Boulis; 07 Apr 2021, 04:51.
Tags: None
Ken Chui

Join Date: Aug 2014

Posts: 1057
#2

07 Apr 2021, 07:19

Good day,

It'd be really great if you can use -dataex- to provide some sample, as I am unsure how "group" is being incorporated into your data. As far as I can understand, you would like to compute some ratios of raw data of some rows to another set of rows. I saw two problems in your first chunk of code:

1) The four gen commands will cause conflict once you loop to the second variable. Stata would refuse to generate a variable again if a variable of the same name already exists.
2) The division suffers from different sample sizes in the numerator and denominator. You'll usually have fewer cases with group = 3 compared to cases from either 3, 4, or 5. So, the division will not work. Something such as deriving the mean from each group then compute the ratio may be a work around.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35454
#3

07 Apr 2021, 08:56

I don't understand this either but I can go one step beyond @Ken Chui's point 1) in #2 There are two syntaxes for foreach namely foreach .... in ... and foreach ... of ... and mixing them whether on purpose or by accident is usually a step into disaster. Here you need to initialise your variables before the main loop

Code:

forval j = 1/4 { gen reg`j' = . }

Then you evidently want to loop over a set of variables and (I guess) replace those four variables according to what you find.

You can do that this way

Code:

* (1) foreach v in totasset totfin totbank totcashi totequity totins totnonfin totprop totbus totveh { replace reg1 = replace reg2 = replace reg3 = replace reg4 = }

or this way

Code:

* (2) local vlist totasset totfin totbank totcashi totequity totins totnonfin totprop totbus totveh foreach v of local vlist { replace reg1 = replace reg2 = replace reg3 = replace reg4 = }

Stylistically, the main merit of (2) might be that you want to use the local macro again. There is nothing essential about it. But here's the catch: in local etc. is legal except that the word local is taken literally and not interpreted as flagging that a local macro follows. Only very rarely will that be what you want.
Sorry, but I can't go any further. The code like

Code:

if group == 3 / (`i' * inlist(group, 3, 4, 5))

will at most do something dependent on whether the variable group has or has not particular values. How that relates to assets we can't be expected to know or guess, I have to suggest.
1 like
Comment

Chris Boulis

Join Date: Feb 2019
Posts: 363

07 Apr 2021, 18:34

Thank you for your replies Ken Chui and Nick Cox. Thank you for the help and code. I can follow (1) in #3. Am I correct in understanding that (2) is the same as (1) but is all in one line - for convenience - as you say in case I wanted to use that local macro again?

Group is coded 1-9 with each category representing a different group according to certain characteristics (collated by value labels) - for example, group = 1 if inlist(1000, 1100, 1200, 1300, 1400) where 1000=porsche, 1100=ferrari, 1200=mustang, etc. group = 2 if inlist(2000, 2100, 2200, 2300, 2400) where 2000=camry, 2100=civic, 2200=golf, 2300=octavia, etc. Sorry to leave you guessing, here's a sample of my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte wave int hgage1 byte group float(totasset totbank totcashi totequity totins totnonfin totprop totbus totveh)
 2 50 3  284289  26500 0  20000      0  195000  180000      0 15000
10 30 1   81080   4694 0      0      0   30014       0      0 30014
14 34 1  313100   8100 0      0      0  210000       0 200000 10000
18 38 1  320038    960 0      0 120000   55500       0      0 55500
 2 77 4  180500   8500 0 170000      0    2000       0      0  2000
 6 81 4  177500 116000 0  60000      0    1500       0      0  1500
 2 48 4  618133   5632 0    201      0  426300  410000      0 16300
 6 52 4 1332300  15000 0   4000   3000 1074300 1052000      0 22300
10 56 4 1014100 310000 0   3500      0  622600  605000      0 17600
14 60 4 1212400 228400 0   7200      0  917300  889000      0 28300
 2 20 4    7745    245 0      0      0    7500       0      0  7500
 6 24 4  252088  11088 0      0      0  228000  220000      0  8000
10 28 1  427500  17500 0      0      0  360000  330000      0 30000
18 36 1 2735000 124000 0      0      0 2362000 2350000      0 12000
 2 36 5  200506    506 0      0      0  195000  145000  45000  5000
 6 40 5  371142   6994 0      0      0  357000  350000      0  7000
10 44 2  542600   2075 0      0      0  532525  500525  20000 12000
14 48 5  622655    355 0      0      0  614000  600000      0 14000
18 52 5  550592   4200 0      0      0  532000  450000  40000 42000
10 25 5  613541   5400 0  30000      0  563000  545000      0 18000
14 29 5 1730500   9000 0  20000      0 1100000  565000 500000 35000
18 33 5 1942753  11100 0  33653      0  798000  340000 439000 19000
14 23 1  374095     90 0      0      0  360000  340000      0 20000
18 27 1  867100   9100 0      0      0  773000  750000      0 23000
18 29 3  170083  10500 0  12000      0   30000       0      0 30000
10 21 1   24600   5100 0      0      0    1500       0      0  1500
14 25 1  306000  32000 0      0      0  232000  230000      0  2000
18 42 4  710800  28800 0   2000      0    2000       0      0  2000
 2 42 1  298000   3500 0      0      0  248000  195000      0 53000
end
label values hgage1 SHGAGE
label values group group1
label def group1 1 "[1] region1", modify
label def group1 2 "[2] region2", modify
label def group1 3 "[3] region3", modify
label def group1 4 "[4] region4", modify
label def group1 5 "[5] region5", modify

I hope this clarifies your questions about the data.

In terms of the code for the second loop in #1, here's my update based on help provided:

Code:

foreach v in totasset totfin totbank totcashi totequity totins totnonfin totprop totbus totveh  {forval j = 1/4 {
gen reg`j' = .
replace reg1 = `v' if group == 3 / (`v' * inlist(group, 3, 4, 5))
replace reg2 = `v' if group == 3 / (`v' * inlist(group, 3, 4))  
replace reg3 = `v' if group == 3 / (`v' * inlist(group, 3, 5))
replace reg4 = `v' if group== 4 / (`v' * inlist(group, 4, 5))  }
 
(then to code the loop through the list of asset types)
gen `v'_`j' = `v' * `j' // (??)

}

I appreciate any guidance here.

Last edited by Chris Boulis; 07 Apr 2021, 19:29.

Comment

Ken Chui

Join Date: Aug 2014

Posts: 1057
#5

07 Apr 2021, 19:01

Thanks for the follow up. I tried your -dataex- code and it says variable "group" is missing. I looked at the data input line and also was not able to find the variable named "group."
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 363
#6

07 Apr 2021, 19:06

HI Ken Chui. Thanks for letting me know. I've made a small change. Can you tell me if that works now?
Comment

Ken Chui

Join Date: Aug 2014
Posts: 1057

07 Apr 2021, 19:22

Thanks, just to keep this simple, I am going to show you how I understood what you want. And to keep this manageable, this code will only focus on the first totasset:

Code:

drop totbank-totveh
* First, isolate the data by their group identity into 5 different variables:
gen totasset3   = totasset if inlist(group, 3)
gen totasset345 = totasset if inlist(group, 3, 4, 5)
gen totasset34  = totasset if inlist(group, 3, 4)
gen totasset35  = totasset if inlist(group, 3, 5)
gen totasset45  = totasset if inlist(group, 4, 5)

* Then, collapse all rows into one row by summing all the numbers up:
collapse (sum) totasset*

* Now that all numbers are summed, we can then apply the division:
gen reg1asset = totasset3 / totasset345
gen reg2asset = totasset3 / totasset34
gen reg3asset = totasset3 / totasset35
gen reg4asset = totasset3 / totasset45

Now, browse the data, can you tell me if reg1asset - reg4asset is what you want? I hope this also demonstrates why I was unsure about the division in your code. We can't divide two things that are scatter around rows with different sample sizes.

Comment

Chris Boulis

Join Date: Feb 2019

Posts: 363
#8

07 Apr 2021, 23:13

Thank you Ken Chui. Yes I understand what you mean now. And yes that looks like what I'd expect. However, as I have panel data, I'd like to see the results by wave. How would I address this? add -bysort wave- at the front of each line? Also, as I want to graph these results over a persons life (using age on the x-axis), do I need to sort by age also?
Comment

Ken Chui

Join Date: Aug 2014
Posts: 1057

08 Apr 2021, 05:53

No need to use -bysort-, just do that in the -collapse-:

Code:

drop totbank-totveh
* First, isolate the data by their group identity into 5 different variables:
gen totasset3   = totasset if inlist(group, 3)
gen totasset345 = totasset if inlist(group, 3, 4, 5)
gen totasset34  = totasset if inlist(group, 3, 4)
gen totasset35  = totasset if inlist(group, 3, 5)
gen totasset45  = totasset if inlist(group, 4, 5)

* Then, collapse all rows into one row by summing all the numbers up:
collapse (sum) totasset*, by(wave age)

* Now that all numbers are summed, we can then apply the division:
gen reg1asset = totasset3 / totasset345
gen reg2asset = totasset3 / totasset34
gen reg3asset = totasset3 / totasset35
gen reg4asset = totasset3 / totasset45

And if that's what you need, then we can turn this into a 3-stage process with some foreach loops:

Code:

foreach x in totasset totbank totcashi totequity totins totnonfin totprop totbus totveh{
* First, isolate the data by their group identity into 5 different variables:
gen `x'3   = `x' if inlist(group, 3)
gen `x'345 = `x' if inlist(group, 3, 4, 5)
gen `x'34  = `x' if inlist(group, 3, 4)
gen `x'35  = `x' if inlist(group, 3, 5)
gen `x'45  = `x' if inlist(group, 4, 5)
}

* Then, collapse all rows into one row by summing all the numbers up by wave by age:
collapse (sum) totasset* totbank* totcashi* totequity* totins* totnonfin* totprop* totbus* totveh*, by(wave age)

* Now that all numbers are summed, we can then apply the division:
foreach x in totasset totbank totcashi totequity totins totnonfin totprop totbus totveh{
gen reg1`x' = `x'3 / `x'345
gen reg2`x' = `x'3 / `x'34
gen reg3`x' = `x'3 / `x'35
gen reg4`x' = `x'3 / `x'45
}

Comment

Chris Boulis

Join Date: Feb 2019

Posts: 363
#10

08 Apr 2021, 21:11

Thank you Ken Chui. I ran your second set of code in #9 and then browsed reg1* reg2* reg3* reg4*. Is it correct to say that each line item is one observation (i.e. one couple or one household) and reflects the shares of each listed asset for each of the four ratios? (Note: The values of each asset are the total assets for each couple/household). Although this is sorted by wave, all the results were single values - does that reflect an average of all waves? (Note: I have panel data and the asset data are collected in the survey every four years).

I'm a bit unclear how I will see the effect over time when I graph these new variables? Is it the case that separate wave values will show the accumlating of assets as age will be on the x axis?

After using collapse I lose all of the other variables in my dataset - is there a way to reverse -collapse- so I have all the old and newly created variables? In other code to date, I have been sorting on the age of partner1 (not partner2) - do you think it would be better to take the average age of the couple or to use both ages? This is the code I tried to create average age:

Code:

gen byte couple_age = cond(missing(hgage1, hgage2), max(hgage1, hgage2), (hgage1 + hgage2) / 2)

Which do you believe is the best option?
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 363
#11

09 Apr 2021, 20:00

Is there a way of achieving this without using -collapse-?
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1057
#12

10 Apr 2021, 07:43

Thank you Ken Chui. I ran your second set of code in #9 and then browsed reg1* reg2* reg3* reg4*. Is it correct to say that each line item is one observation (i.e. one couple or one household) and reflects the shares of each listed asset for each of the four ratios? (Note: The values of each asset are the total assets for each couple/household). Although this is sorted by wave, all the results were single values - does that reflect an average of all waves? (Note: I have panel data and the asset data are collected in the survey every four years).

Each row is a summary for all people age x within a wave k. The idea that there are person/couple level ID never came up thus far (in fact, the sample data does not have an explicit ID).

I'm a bit unclear how I will see the effect over time when I graph these new variables? Is it the case that separate wave values will show the accumlating of assets as age will be on the x axis?

That depends on what is your "time." If "wave" is the time variable, then plotting those reg1-reg4 against wave would show the change in ratio of wealth categories. Notice that, so far from what I can glean in the analysis, it's never accumulation; ratios like reg1-reg4 only shows proportion within time point, they don't show absolute growth.

After using collapse I lose all of the other variables in my dataset - is there a way to reverse -collapse- so I have all the old and newly created variables?

First that is a normal behavior. Usually, after collapsing the data, it would be necessary to save this data, put aside. Then go back to the main data and merge the collapsed information back to the cases. To recover the old data set, simply run your do file up to collapse, and save a copy there.

In other code to date, I have been sorting on the age of partner1 (not partner2) - do you think it would be better to take the average age of the couple or to use both ages? This is the code I tried to create average age:

If it's just for sorting either sort of ID and wave or sort by ID and couple_age should do. If you meant it as a research question, then I don't have an answer given what I know, only comments. I would question what does age represent in the analysis, how do the two ages correlate, are young-old couple your research interest, would a 70 years old marrying a 20 years old suddenly made the 70-yo's wealthy drop to as if they were 45? Etc.

Lastly, I hope some other users may be able to provide suggestion that does not use -collapse-. This is #12 already and it seems your questions have evolved quite a bit. If you have a clearer idea what you need now, I think it may not be a bad idea to conclude this thread and start a new one. In the new one, please make sure to:
Continue to use -dataex- to show data. Make sure to add the ID variable.

Clearly state what your ultimate goal of the analysis is (the end goal), instead of many "next steps."

Last edited by Ken Chui; 10 Apr 2021, 07:48.
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 363
#13

13 Apr 2021, 05:08

Thanks Ken Chui. I mistakenly forgot to mention about IDs. I understand we are talking about proportions at a given point in time - the multiple data points that I have will show the relative change in proportions over the lifecycle right?

Each row is a summary for all people age x within a wave k.

Thanks for the explanation. I see from your example that using average couple age is not a good idea - good point! it may not have been clear enough, but I did state my intent in #1 - that I wanted to graph these ratios. As suggested I will begin a new thread. Thanks for your help anyway.
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 363
#14

13 Apr 2021, 18:49

A link to the new thread is https://www.statalist.org/forums/for...ves-panel-data
Comment

Chris Boulis

Join Date: Feb 2019
Posts: 363

#15

23 Apr 2021, 00:44

Hi Nick Cox. I would like clarification on your advice in #3:

you need to initialise your variables before the main loop

with respect to using

Code:

forval j = 1/4 { 
      gen reg`j' = . 
}

I want to apply this idea to creating six ratios using a loop.

Code:

local means x_totasset x_totfin x_totbank x_totequity x_totsuper x_totnonfin x_totprop x_totbus x_totveh
local suffix cecp cexp ecpexp noc noecp noexp    
    
foreach i of local means {

    foreach j of local suffix  { 
      gen `i'_`j' = . 
}
}

foreach i of local means { 
replace `i'_`j' = x_`i'3 / (x_`i'2 + x_`i'3) 
replace `i'_`j' = x_`i'4 / (x_`i'2 + x_`i'4)
replace `i'_`j' = x_`i'3 / (x_`i'3 + x_`i'4) 
replace `i'_`j' = x_`i'1 / (x_`i'1 + x_`i'2)
replace `i'_`j' = x_`i'1 / (x_`i'1 +x_`i'3)
replace `i'_`j' = x_`i'1 / x_`i'1 + x_`i'4)
}

However, Stata gives this message:

. foreach i of local means {
2.
. foreach j of local suffix {
3. gen `i'_`j' = .
4. }
5. }
is not a valid command name

Stata v.15.1. Using panel data.

Announcement