Fail to create dummy by using forvalues with multiple construct variables in a panel dataset.

Sophia Gao

Join Date: Aug 2023

Posts: 16
#1

Fail to create dummy by using forvalues with multiple construct variables in a panel dataset.

04 Aug 2023, 04:19

Hi community,

I want to create a dummy variable with a varlist (two categories, 3 non-continuous waves of construct variables each individually) via forvalues. But it reported "Invalid syntax". The dummy variable is expected to show as a three-section functions with three values. But I failed to do so with a loop function of 'forvalues' firstly. Secondly, when I use a hand-way of 'replace', it only shows the value of one section, but failed to show another two sections.

Could experts help me to check it and figure out a solution for me? Thanks,

//----------- My code is as below" ------------------

Code:

use data.dta tab r1shlt, m tab s1shlt, m forvalues i = 1/2,4 { foreach var of varlist r`i'shlt s`i'shlt { tab `var', m replace `var' = 1 if `var' == 5 & !missing(`var') replace `var' = 2 if `var' >= 3 & `var' <= 4 & !missing(`var') replace `var' = 3 if `var' >= 1 & `var' <= 2 & !missing(`var') replace `var' = . if missing(`var') rename `var' r`i'SelfH label define selfh_labl`i' 1 "poor" 2 "fair" 3 "good" label val r`i'SelfH selfh_labl`i' } }

//------------ The result is as below: -----------------
. use data.dta

.
end of do-file

. tab r1shlt, m

r1shlt | Freq. Percent Cum.
------------+-----------------------------------
| 7,796 30.57 30.57
. | 5,095 19.98 50.55
1.Excellent | 106 0.42 50.96
2.Very good | 1,146 4.49 55.45
3.Good | 2,274 8.92 64.37
4.Fair | 5,918 23.20 87.57
5.Poor | 3,169 12.43 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
end of do-file

. tab s1shlt, m

s1shlt | Freq. Percent Cum.
------------+-----------------------------------
| 7,796 30.57 30.57
. | 6,983 27.38 57.95
1.Excellent | 91 0.36 58.30
2.Very good | 980 3.84 62.15
3.Good | 1,962 7.69 69.84
4.Fair | 5,072 19.89 89.73
5.Poor | 2,620 10.27 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
end of do-file

. forvalues i = 1/2,4 {
2. foreach var of varlist r`i'shlt s`i'shlt {
3. tab `var', m
4. replace `var' = 1 if `var' == 5 & !missing(`var')
5. replace `var' = 2 if `var' >= 3 & `var' <= 4 & !missing(`var')
6. replace `var' = 3 if `var' >= 1 & `var' <= 2 & !missing(`var')
7. replace `var' = . if missing(`var')
8. rename `var' r`i'SelfH
9. label define selfh_labl`i' 1 "poor" 2 "fair" 3 "good"
10. label val r`i'SelfH selfh_labl`i'
11. }
12. }
invalid syntax
r(198);

end of do-file

r(198);

//---------- Change my code (below) to spot where the wrong with it -------------------.
[CODE]
tab r1shlt, m
tab s1shlt, m

gen r_r1SelfH = .
encode r1shlt, generate(r1shlt_num)
encode s1shlt, generate(s1shlt_num)
tab r1shlt_num, m
tab s1shlt_num, m

replace r_r1SelfH = 1 if (r1shlt_num == 5 | s1shlt_num ==5) & !missing(r1shlt_num) | !missing(s1shlt_num)
replace r_r1SelfH = 2 if (r1shlt_num == 3/4 | s1shlt_num == 3/4) & !missing(r1shlt_num) | !missing(s1shlt_num)
replace r_r1SelfH = 3 if (r1shlt_num == 1/2 | s1shlt_num == 1/2) & !missing(r1shlt_num) | !missing(s1shlt_num)
replace r_r1SelfH = . if (r1shlt_num == .| s1shlt_num == .)

label define r_r1selfh_labl 1 "poor" 2 "fair" 3 "good"
label val r_r1SelfH r_r1selfh_labl1
tab r_r1SelfH, m

tab r2shlt, m
gen r_r2SelfH = .
encode r2shlt, generate(r2shlt_num)
encode s2shlt, generate(s2shlt_num)
replace r_r2SelfH = 1 if (r2shlt_num == 5 | s2shlt_num ==5) & !missing(r2shlt_num) & !missing(s2shlt_num)
replace r_r2SelfH = 2 if (r2shlt_num == 3/4 | s2shlt_num == 3/4) & !missing(r2shlt_num) & !missing(s2shlt_num)
replace r_r2SelfH = 3 if (r2shlt_num == 1/2 | s2shlt_num == 1/2) & !missing(r2shlt_num) & !missing(s2shlt_num)

label define r_r2selfh_labl 1 "poor" 2 "fair" 3 "good"
label val r_r2SelfH r_r2selfh_labl1

tab r_r2SelfH, m

[CODE]

//--------- It shows unexpected outcome as below ---------------------
. tab r1shlt, m

r1shlt | Freq. Percent Cum.
------------+-----------------------------------
| 7,796 30.57 30.57
. | 5,095 19.98 50.55
1.Excellent | 106 0.42 50.96
2.Very good | 1,146 4.49 55.45
3.Good | 2,274 8.92 64.37
4.Fair | 5,918 23.20 87.57
5.Poor | 3,169 12.43 100.00
------------+-----------------------------------
Total | 25,504 100.00

. tab s1shlt, m

s1shlt | Freq. Percent Cum.
------------+-----------------------------------
| 7,796 30.57 30.57
. | 6,983 27.38 57.95
1.Excellent | 91 0.36 58.30
2.Very good | 980 3.84 62.15
3.Good | 1,962 7.69 69.84
4.Fair | 5,072 19.89 89.73
5.Poor | 2,620 10.27 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
. gen r_r1SelfH = .
(25,504 missing values generated)

. encode r1shlt, generate(r1shlt_num)

. encode s1shlt, generate(s1shlt_num)

. tab r1shlt_num, m

r1shlt_num | Freq. Percent Cum.
------------+-----------------------------------
. | 5,095 19.98 19.98
1.Excellent | 106 0.42 20.39
2.Very good | 1,146 4.49 24.89
3.Good | 2,274 8.92 33.80
4.Fair | 5,918 23.20 57.01
5.Poor | 3,169 12.43 69.43
. | 7,796 30.57 100.00
------------+-----------------------------------
Total | 25,504 100.00

. tab s1shlt_num, m

s1shlt_num | Freq. Percent Cum.
------------+-----------------------------------
. | 6,983 27.38 27.38
1.Excellent | 91 0.36 27.74
2.Very good | 980 3.84 31.58
3.Good | 1,962 7.69 39.27
4.Fair | 5,072 19.89 59.16
5.Poor | 2,620 10.27 69.43
. | 7,796 30.57 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
end of do-file

. replace r_r1SelfH = 1 if (r1shlt_num == 5 | s1shlt_num ==5) & !missing(r1shlt_num) | !missing
> (s1shlt_num)
(17,708 real changes made)

.
end of do-file

. replace r_r1SelfH = 2 if (r1shlt_num == 3/4 | s1shlt_num == 3/4) & !missing(r1shlt_num) | !mi
> ssing(s1shlt_num)
(17,708 real changes made)

.
end of do-file

. do "C:\Users\ACER\AppData\Local\Temp\STD400c_000000.t mp"

. replace r_r1SelfH = 3 if (r1shlt_num == 1/2 | s1shlt_num == 1/2) & !missing(r1shlt_num) | !mi
> ssing(s1shlt_num)
(17,708 real changes made)

.
end of do-file

. replace r_r1SelfH = . if (r1shlt_num == .| s1shlt_num == .)
(0 real changes made)

.
end of do-file

. label define r_r1selfh_labl 1 "poor" 2 "fair" 3 "good"

. label val r_r1SelfH r_r1selfh_labl1

. tab r_r1SelfH, m

r_r1SelfH | Freq. Percent Cum.
------------+-----------------------------------
3 | 17,708 69.43 69.43
. | 7,796 30.57 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
end of do-file

. tab r2shlt, m

r2shlt | Freq. Percent Cum.
------------+-----------------------------------
| 6,892 27.02 27.02
. | 1,021 4.00 31.03
1.Excellent | 241 0.94 31.97
2.Very good | 1,774 6.96 38.93
3.Good | 2,512 9.85 48.78
4.Fair | 9,232 36.20 84.97
5.Poor | 3,832 15.03 100.00
------------+-----------------------------------
Total | 25,504 100.00

. gen r_r2SelfH = .
(25,504 missing values generated)

. encode r2shlt, generate(r2shlt_num)

. encode s2shlt, generate(s2shlt_num)

. replace r_r2SelfH = 1 if (r2shlt_num == 5 | s2shlt_num ==5) & !missing(r2shlt_num) & !missin
> g(s2shlt_num)
(12,714 real changes made)

. replace r_r2SelfH = 2 if (r2shlt_num == 3/4 | s2shlt_num == 3/4) & !missing(r2shlt_num) & !mi
> ssing(s2shlt_num)
(0 real changes made)

. replace r_r2SelfH = 3 if (r2shlt_num == 1/2 | s2shlt_num == 1/2) & !missing(r2shlt_num) & !mi
> ssing(s2shlt_num)
(0 real changes made)

.
. label define r_r2selfh_labl 1 "poor" 2 "fair" 3 "good"

. label val r_r2SelfH r_r2selfh_labl1

.
. tab r_r2SelfH, m

r_r2SelfH | Freq. Percent Cum.
------------+-----------------------------------
1 | 12,714 49.85 49.85
. | 12,790 50.15 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
end of do-file

.
.
.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

04 Aug 2023, 15:03

Your coding assumes generalizations of Stata syntax that are simply not allowed in Stata.

Code:

forvalues i = 1/2, 4 {

is illegal: no commas are allowed in the list of numbers. This is what causes the "invalid syntax" error message. The entire loop is therefore skipped and nothing that you do after the loop will make any sense as a result. General principle: never ignore error messages. If you are running code and get an error message, execution stops (except under -capture noisily-; let's leave that aside). To then go ahead and run the rest of the code is just inviting garbage in to turn into garbage out. Whenever you get an error message, do not proceed until you identify and fix the problem that caused it. Don't ignore it. And don't build in a "work around" that suppresses the error message but does not fix the problem that caused it. Either way you are simply crunching garbage from that point on.

Code:

replace r_r2SelfH = 2 if (r2shlt_num == 3/4 | s2shlt_num == 3/4) & !missing(r2shlt_num) & !mi > ssing(s2shlt_num)

Here we have an instance of legal syntax, but it does not do what I believe you think it does. The variables r2shlt_num and s2shlt_num, by the way they were constructed, take on only integer values. The comparison of either of these variables to 3/4 will always be false because 3/4 means three-fourths (0.75). It does not mean == 3 or == 4. Yes, in some contexts Stata allows 3/4 to refer to the numlist consisting of 3 and 4. But logical expressions are not among those contexts. If you were thinking that this command would compare those two variables to both 3 or 4, then you have misunderstood the code; it is not doing that. If you really did intend a comparison to the fraction three-fourths, then it seems you do not understand how you created those variables, because they are necessarily integers and cannot equal 0.75.
1 like
Comment
Sophia Gao

Join Date: Aug 2023

Posts: 16
#3

05 Aug 2023, 03:22

Thanks for your advice, Clyde. I remove the comma in my loop, but the loop doesn't function well. Could you help to check how to turn on a loop with two-dimensional construct variables? Thanks,

Code:

. forvalues i = 1(1)3 { 2. foreach var of varlist r`i'shlt s`i'shlt { 3. encode `var', generate(`var'_num) 4. recode `var'_num (5=1)(3/4=2)(1/2=3) /// > (else=.), gen(r`i'SelfH) 5. label define selfh_labl`i' 1 "poor" 2 "fair" 3 "good" 6. label val r`i'SelfH selfh_labl`i' 7. drop `var'_num 8. } 9. } (17708 differences between r1shlt_num and r1SelfH) variable r1SelfH already defined r(110);

//-----------Paralysed loop below-----------------
. forvalues i = 1(1)3 {
2. foreach var of varlist r`i'shlt s`i'shlt {
3. encode `var', generate(`var'_num)
4. recode `var'_num (5=1)(3/4=2)(1/2=3) ///
> (else=.), gen(r`i'SelfH)
5. label define selfh_labl`i' 1 "poor" 2 "fair" 3 "good"
6. label val r`i'SelfH selfh_labl`i'
7. drop `var'_num
8. }
9. }
(17708 differences between r1shlt_num and r1SelfH)
variable r1SelfH already defined
r(110);

end of do-file

r(110);

. drop r1SelfH

. drop r2SelfH
variable r2SelfH not found
r(111);

end of do-file

r(111);

. drop r3SelfH
variable r3SelfH not found
r(111);

end of do-file

r(111);

. drop s1shlt_num

.
end of do-file

. drop s2shlt_num
variable s2shlt_num not found
r(111);

end of do-file

r(111);

. drop s3shlt_num
variable s3shlt_num not found
r(111);

end of do-file

r(111);

. drop var_num
variable var_num not found
r(111);

.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#4

05 Aug 2023, 05:44

First time around the outer loop with i set to 1, you are looking at variables

Code:

r1shlt s1shlt

and in each case you are trying to produce a new variable

Code:

r1SelfH

that works with r1shlt but not with s1shlt as r1SelfH already exists, which is the explicit error message.

In the second case you perhaps would prefer a new variable

Code:

s1SelfH

but your code doesn't produce it,

It might be easier not to loop over two variables and just write code for those two cases.

Last edited by Nick Cox; 05 Aug 2023, 05:57.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#5

05 Aug 2023, 06:18

Your two loops can be rewritten as follows, I think, although I can't test anything.

Code:

label def selfh_labl 1 poor 2 fair 3 good foreach v in r1 r2 r3 s1 s2 s3 { encode `v'shlt, gen(`v'SelfH) recode `v'SelfH (5=1) (3/4=2) (1/2=3) (else=.) label var `v'SelfH selfh_labl }

There are perhaps three points of programming or Stata principle worth commenting on.

1. I create one set of value labels and apply it in turn to the results of the loop, the 6 new variables. I see no point in 6 identical sets of value labels. You could do the application outside the loop.

2. A loop over 3 cases and a loop over 2 cases can both be trivial. In this case if we combine them, you get simpler code.

3. You don't have to generate a new variable with recode. Its purpose is to recode an existing variable.

Last edited by Nick Cox; 05 Aug 2023, 06:30.
Comment
Sophia Gao

Join Date: Aug 2023

Posts: 16
#6

05 Aug 2023, 09:33

Thanks Nick, you were coding elegantly. But the outcome variable r1SelfH (built on r1shlt & slshlt), r2SelfH (built on r2shlt & s2shlt) and r3SelfH (built on r3shlt & s3shlt) are my targeted results. If you said I need to introduce s1SelfH, s2SelfH and s3SelfH in between, how can I map, or perhaps rename them into rISelfH? It would be ambiguous thereby. Besides, when running your code, the value of 3 disappear in the generated var, which seems strange. Do you have any idea about it?

//--------- Running results in Stata ------------------
. label def selfh_labl 1 poor 2 fair 3 good

.
end of do-file

. foreach v in r1 r2 r3 s1 s2 s3 {
2.
. encode `v'shlt, gen(`v'SelfH)
3. recode `v'SelfH (5=1) (3/4=2) (1/2=3) (else=.)
4. label var `v'SelfH selfh_labl
5.
. }
(r1SelfH: 17708 changes made)
(r2SelfH: 18612 changes made)
(r3SelfH: 21097 changes made)
(s1SelfH: 17708 changes made)
(s2SelfH: 18604 changes made)
(s3SelfH: 21093 changes made)

.
end of do-file

. tab r1SelfH, m

selfh_labl | Freq. Percent Cum.
------------+-----------------------------------
. | 4,035 15.82 15.82
1.Very good | 11,581 45.41 61.23
2.Good | 1,246 4.89 66.12
. | 8,642 33.88 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
end of do-file

. tab s1SelH, m
variable s1SelH not found
r(111);

end of do-file

r(111);

. tab r2SelfH, m

selfh_labl | Freq. Percent Cum.
------------+-----------------------------------
. | 9,232 36.20 36.20
1.Excellent | 4,286 16.81 53.00
2.Very good | 1,262 4.95 57.95
. | 10,724 42.05 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
end of do-file

. tab s2SelH, m
variable s2SelH not found
r(111);

end of do-file

r(111);

. tab r3SelfH, m

selfh_labl | Freq. Percent Cum.
------------+-----------------------------------
. | 10,537 41.32 41.32
1.Excellent | 4,828 18.93 60.25
2.Very good | 1,675 6.57 66.81
. | 8,464 33.19 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
end of do-file

.
. tab s3SelfH, m

selfh_labl | Freq. Percent Cum.
------------+-----------------------------------
. | 9,064 35.54 35.54
1.Excellent | 4,194 16.44 51.98
2.Very good | 4,587 17.99 69.97
. | 7,659 30.03 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
end of do-file

.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#7

05 Aug 2023, 10:27

You asked for

Code:

tab s2SelH, m

which doesn't exist because the code created s2SelfH instead. So that was a typo.

Otherwise your aim is now unclear. How are you going to combine e.g. r1SelfH and s1SelfH if that is what you want? This has not been explained as an aim, and how you want to do that is not clear to me.

Now I see your results I notice a typo in my own code

Code:

label var

should be

Code:

label val

Sorry about that.
Comment
Sophia Gao

Join Date: Aug 2023

Posts: 16
#8

05 Aug 2023, 21:16

Thanks, Nick. My outcome variable should be one vector rather than two vectors. The outcome variables of r1SelfH (constructed based on r1shlt & s1shlt1) for wave 1, r2SelfH (constructed based on r2shlt & s2shlt1) for wave 2, r3SelfH (constructed based on r3shlt & s3shlt1) for wave 3 is my aim. But it turned out that r1SelfH (constructed based on r1shlt), slSelfH(constructed based on s1shlt) for wave 1; r2SelfH (constructed based on r2shlt), s2SelfH(constructed based on s2shlt) for wave 2; r3SelfH (constructed based on r3shlt), s3SelfH(constructed based on s3shlt) happened after running your modified code.

//------------ Running results in Stata -----------------
. foreach v in r1 r2 r3 s1 s2 s3 {
2.
. encode `v'shlt, gen(`v'SelfH)
3. recode `v'SelfH (5=1) (3/4=2) (1/2=3) (else=.)
4. label val `v'SelfH selfh_labl
5.
. }
(r1SelfH: 17708 changes made)
(r2SelfH: 18612 changes made)
(r3SelfH: 21097 changes made)
(s1SelfH: 17708 changes made)
(s2SelfH: 18604 changes made)
(s3SelfH: 21093 changes made)

.
end of do-file

. tab r1SelfH, m

r1SelfH | Freq. Percent Cum.
------------+-----------------------------------
poor | 4,035 15.82 15.82
fair | 11,581 45.41 61.23
good | 1,246 4.89 66.12
. | 8,642 33.88 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
end of do-file

. tab s1SelfH, m

s1SelfH | Freq. Percent Cum.
------------+-----------------------------------
poor | 5,072 19.89 19.89
fair | 2,942 11.54 31.42
good | 7,074 27.74 59.16
. | 10,416 40.84 100.00
------------+-----------------------------------
Total | 25,504 100.00

. tab r2SelfH, m

r2SelfH | Freq. Percent Cum.
------------+-----------------------------------
poor | 9,232 36.20 36.20
fair | 4,286 16.81 53.00
good | 1,262 4.95 57.95
. | 10,724 42.05 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
end of do-file

. tab s2SelfH, m

s2SelfH | Freq. Percent Cum.
------------+-----------------------------------
poor | 7,858 30.81 30.81
fair | 3,648 14.30 45.11
good | 3,952 15.50 60.61
. | 10,046 39.39 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
end of do-file

. tab r3SelfH, m

r3SelfH | Freq. Percent Cum.
------------+-----------------------------------
poor | 10,537 41.32 41.32
fair | 4,828 18.93 60.25
good | 1,675 6.57 66.81
. | 8,464 33.19 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
end of do-file

. tab s3SelfH, m

s3SelfH | Freq. Percent Cum.
------------+-----------------------------------
poor | 9,064 35.54 35.54
fair | 4,194 16.44 51.98
good | 4,587 17.99 69.97
. | 7,659 30.03 100.00
------------+-----------------------------------
Total | 25,504 100.00

.
end of do-file
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#9

05 Aug 2023, 23:28

I can't tell from this whether you have any remaining question. "based on r1shlt & s1shlt1" in #8 (where I think you mean s1shlt) is is not clearer to me than "built on r1shlt & slshlt" in #6 (where agaom I think you mean s1shlt).

If one variable is to be formed based on, built on or by combining two variables, you need a rule, perhaps using max(,) or min(,), for how o do that.
Comment
Sophia Gao

Join Date: Aug 2023

Posts: 16
#10

06 Aug 2023, 04:16

You're right Nick. It was a typo that it should be s1shlt other than slshlt1. I'll use them as two separate vectors. Thanks, Nick.
Comment

Announcement

Fail to create dummy by using forvalues with multiple construct variables in a panel dataset.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment