Generating variables and compute average with if conditions

Daniel Bakker

Join Date: Jul 2021

Posts: 4
#1

Generating variables and compute average with if conditions

11 Jul 2021, 09:23

Dear all,
I’m new to stata and for my research I need the following:
I have two treatments:
1) Focus (0/1/2) where 0 stands for control and 1 for process and 2 for Outcome.
2) Certainty (0/1/) where 0 stands for certain and 1 for uncertain.
I want to generate variables for all the possible combinations CC=(0/0) CO= (0/1) CP=(0/2) & UC=(1/0) UO=(1/1) UP(1/2).
If I gen CC and replace CC = 1 if Certainty==0 & Focus==0, I get 0 real changes. What other code can I use to fix this?
Moreover, I have the variable BP and BP2.
I want to generate avgBP1 for all variables above: egen avgBP1 = mean(CC) such that I can compare the average of BP1 between all combinations.

Due to the 0 changes in the first code this gives me means of 0.

Should I instead of generating variables, make use of more if functions?
E,g. egen avgBP1 = rmean(BP1) if Certainty==0 & Focus==0
This gives me 97 (n=97) missing values.., but I don’t know how to continue.

Thanks in advance!
Daniël
Tags: None

Sandra Bloem

Join Date: Jun 2020
Posts: 106

11 Jul 2021, 12:59

It would be best to give some example data so people have a better idea of what the data looks like, and so people don't have to generate this themselves to help you.

This is the way I generate some data. It is important that the categories of the focus and certainty variable have value labels with the correct letters (i.e., C, P, O, and U).

Code:

webuse cattaneo2.dta, clear

keep bweight mbsmoke prenatal
replace prenatal = 2 if prenatal==3
rename mbsmoke certainty
rename prenatal focus
rename bweight bp


label define certainty 0 "C" 1 "U"
label values certainty certainty

label define focus 0 "C" 1 "P" 2 "O"
label values focus focus

This is the code to solve your problem. I didn't understand if you want to have 1 avgbp variable, or different avgbp variables for the different groups. The code shows how to generate both.

Code:

decode certainty, generate(strcertainty)
decode focus, generate(strfocus)

foreach q in C U {
    foreach i in C P O {
        generate `q'`i' = (strcertainty=="`q'" & strfocus=="`i'")
        egen avgbp`q'`i' = mean(bp) if strcertainty=="`q'" & strfocus=="`i'" 
    }
}

egen groups = group(certainty focus)
bys groups: egen avgbp = mean(bp)

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35486
#3

11 Jul 2021, 13:00

Without a data example I am guessing, but

Code:

egen wanted = mean(BP), by(Certainty Focus)

may help.
Comment
Daniel Bakker

Join Date: Jul 2021

Posts: 4
#4

12 Jul 2021, 06:28

Dear Sandra and Nick,

First of all, thanks for the reply. Next time i will specify my question better and provide data.

@sandra, I indeed want a different avgbp variable for the different groups. With the code you provided, it worked out. That being said, the only thing now is that i would like to compare the means and Standard Deviations of the different groups. The code, as i asked, generates the mean, is it possible to compute the SD within the distribution in which the mean is computed by STATA? Or do I have to do this manually?
For CC,

tab bp if certainty==0 & focus==0

Bp | Freq. Percent Cum.
------------------+-----------------------------------
5 | 1 4.76 4.76
6 | 5 23.81 28.57
7 | 9 42.86 71.43
8 | 3 14.29 85.71
9 | 3 14.29 100.00
------------------+-----------------------------------
Total | 21 100.00

CC has 21 observations, and the avg 7.1.

Moreover, Nick also thanks for your help. My bad that I explained it vague.
Comment

Ken Chui

Join Date: Aug 2014
Posts: 1057

12 Jul 2021, 06:54

Why not just aggregate?

Code:

* Making up fake data for demonstration
clear
input focus certainty
0 0
1 1
2 0
0 1
1 0
2 1
end

* Assume ten cases for each combo
expand 10

* Generate BP data
set seed 367
gen bp = rnormal(120, 5)

* Just aggregate
collapse (mean) avgbp = bp (sd) sdbp = bp (count) case = bp, by(focus certainty)
list, sep(0)

Results:

Code:

     +-----------------------------------------------+
     | focus   certai~y      avgbp       sdbp   case |
     |-----------------------------------------------|
  1. |     0          0   120.4681   4.410926     10 |
  2. |     0          1   118.5604   6.675795     10 |
  3. |     1          0   119.9203   4.711068     10 |
  4. |     1          1    120.896   3.422762     10 |
  5. |     2          0   118.9165    6.97125     10 |
  6. |     2          1   118.4124   3.642353     10 |
     +-----------------------------------------------+

Notice that -collapse- will replace whatever data set you're using, so make sure to either save the data or the analysis syntax before running it.

Comment

Daniel Bakker

Join Date: Jul 2021

Posts: 4
#6

15 Jul 2021, 06:45

Thanks! Worked as well. Super grateful for all the support.

I have two typed of BP, BP1, and BP2. If I want to run the code for avgbp, to create avgbp2.I get the error CC already specified.

foreach q in C U {
foreach i in C P O {
generate `q'`i' = (strcertainty=="`q'" & strfocus=="`i'")
egen avgbp2 `q'`i' = mean(bp2) if strcertainty=="`q'" & strfocus=="`i'"
}
}

I want to create avgbp2 to create an index of avgbp1 and avgbp2 to use in a regression.

How can I use to code provided by Sandra to create avgb2, without getting the code CC already specified?

Thanks 100 times.

Daniel
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35486
#7

15 Jul 2021, 06:59

The aim will prove self-defeating as each variable generated will be missing whenever any other variable is not. So the resulting set will be useless for regression.

As already implied by #3 a single variable

Code:

egen avgbp2 = mean(bp2) , by(strcertainty strfocus)

will contain all the group means compactly.

(Although the code is not a good idea, the bug is that

Code:

avgbp2 `q'`i'

should not contain a space.)
Comment

Announcement