Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating variables and compute average with if conditions

    Dear all,
    I’m new to stata and for my research I need the following:
    I have two treatments:
    1) Focus (0/1/2) where 0 stands for control and 1 for process and 2 for Outcome.
    2) Certainty (0/1/) where 0 stands for certain and 1 for uncertain.
    I want to generate variables for all the possible combinations CC=(0/0) CO= (0/1) CP=(0/2) & UC=(1/0) UO=(1/1) UP(1/2).
    If I gen CC and replace CC = 1 if Certainty==0 & Focus==0, I get 0 real changes. What other code can I use to fix this?
    Moreover, I have the variable BP and BP2.
    I want to generate avgBP1 for all variables above: egen avgBP1 = mean(CC) such that I can compare the average of BP1 between all combinations.

    Due to the 0 changes in the first code this gives me means of 0.

    Should I instead of generating variables, make use of more if functions?
    E,g. egen avgBP1 = rmean(BP1) if Certainty==0 & Focus==0
    This gives me 97 (n=97) missing values.., but I don’t know how to continue.

    Thanks in advance!
    Daniël



  • #2
    It would be best to give some example data so people have a better idea of what the data looks like, and so people don't have to generate this themselves to help you.

    This is the way I generate some data. It is important that the categories of the focus and certainty variable have value labels with the correct letters (i.e., C, P, O, and U).
    Code:
    webuse cattaneo2.dta, clear
    
    keep bweight mbsmoke prenatal
    replace prenatal = 2 if prenatal==3
    rename mbsmoke certainty
    rename prenatal focus
    rename bweight bp
    
    
    label define certainty 0 "C" 1 "U"
    label values certainty certainty
    
    label define focus 0 "C" 1 "P" 2 "O"
    label values focus focus

    This is the code to solve your problem. I didn't understand if you want to have 1 avgbp variable, or different avgbp variables for the different groups. The code shows how to generate both.
    Code:
    decode certainty, generate(strcertainty)
    decode focus, generate(strfocus)
    
    foreach q in C U {
        foreach i in C P O {
            generate `q'`i' = (strcertainty=="`q'" & strfocus=="`i'")
            egen avgbp`q'`i' = mean(bp) if strcertainty=="`q'" & strfocus=="`i'" 
        }
    }
    
    egen groups = group(certainty focus)
    bys groups: egen avgbp = mean(bp)

    Comment


    • #3
      Without a data example I am guessing, but

      Code:
      egen wanted = mean(BP), by(Certainty Focus)
      may help.

      Comment


      • #4
        Dear Sandra and Nick,

        First of all, thanks for the reply. Next time i will specify my question better and provide data.

        @sandra, I indeed want a different avgbp variable for the different groups. With the code you provided, it worked out. That being said, the only thing now is that i would like to compare the means and Standard Deviations of the different groups. The code, as i asked, generates the mean, is it possible to compute the SD within the distribution in which the mean is computed by STATA? Or do I have to do this manually?
        For CC,

        tab bp if certainty==0 & focus==0

        Bp | Freq. Percent Cum.
        ------------------+-----------------------------------
        5 | 1 4.76 4.76
        6 | 5 23.81 28.57
        7 | 9 42.86 71.43
        8 | 3 14.29 85.71
        9 | 3 14.29 100.00
        ------------------+-----------------------------------
        Total | 21 100.00

        CC has 21 observations, and the avg 7.1.


        Moreover, Nick also thanks for your help. My bad that I explained it vague.

        Comment


        • #5
          Why not just aggregate?

          Code:
          * Making up fake data for demonstration
          clear
          input focus certainty
          0 0
          1 1
          2 0
          0 1
          1 0
          2 1
          end
          
          * Assume ten cases for each combo
          expand 10
          
          * Generate BP data
          set seed 367
          gen bp = rnormal(120, 5)
          
          * Just aggregate
          collapse (mean) avgbp = bp (sd) sdbp = bp (count) case = bp, by(focus certainty)
          list, sep(0)
          Results:
          Code:
               +-----------------------------------------------+
               | focus   certai~y      avgbp       sdbp   case |
               |-----------------------------------------------|
            1. |     0          0   120.4681   4.410926     10 |
            2. |     0          1   118.5604   6.675795     10 |
            3. |     1          0   119.9203   4.711068     10 |
            4. |     1          1    120.896   3.422762     10 |
            5. |     2          0   118.9165    6.97125     10 |
            6. |     2          1   118.4124   3.642353     10 |
               +-----------------------------------------------+
          Notice that -collapse- will replace whatever data set you're using, so make sure to either save the data or the analysis syntax before running it.

          Comment


          • #6
            Thanks! Worked as well. Super grateful for all the support.

            I have two typed of BP, BP1, and BP2. If I want to run the code for avgbp, to create avgbp2.I get the error CC already specified.

            foreach q in C U {
            foreach i in C P O {
            generate `q'`i' = (strcertainty=="`q'" & strfocus=="`i'")
            egen avgbp2 `q'`i' = mean(bp2) if strcertainty=="`q'" & strfocus=="`i'"
            }
            }

            I want to create avgbp2 to create an index of avgbp1 and avgbp2 to use in a regression.

            How can I use to code provided by Sandra to create avgb2, without getting the code CC already specified?

            Thanks 100 times.

            ​​​​​​​Daniel​​​​​​​

            Comment


            • #7
              The aim will prove self-defeating as each variable generated will be missing whenever any other variable is not. So the resulting set will be useless for regression.

              As already implied by #3 a single variable

              Code:
              egen avgbp2 = mean(bp2) , by(strcertainty strfocus)
              will contain all the group means compactly.

              (Although the code is not a good idea, the bug is that

              Code:
              avgbp2 `q'`i'
              should not contain a space.)

              Comment

              Working...
              X