Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a new variable

    Hello,

    Code:
    bysort year: egen totalsum = total(A), m
    The above code gives me the overall sum of variable A for each year (my dataset has 2 years). I want to create a new variable 'B' that equals (totalsum of year 1 / totalsum of year2). How can I do that?
    How can we get the total sum of variable A for each year in a separate column as we can then calculate B easily? I have attached an example dataset below.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(year a)
    1  1
    1  2
    1  3
    1  .
    1  5
    2 10
    2  9
    2  .
    2  7
    2  6
    end
    Thank you.

  • #2
    The result you want is just a scalar, and also easily accessible after summarize, but here is some technique. See also section 9 in https://www.stata-journal.com/articl...article=dm0055 for how to populate variables so that they occur in the same observations.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(year a)
    1  1
    1  2
    1  3
    1  .
    1  5
    2 10
    2  9
    2  .
    2  7
    2  6
    end
    
    egen total1 = total(cond(year == 1, a, .)) 
    egen total2 = total(cond(year == 2, a, .)) 
    
    su total? 
    
    gen ratio = total2 / total1 
    
    list, sepby(year) 
    
         +----------------------------------------+
         | year    a   total1   total2      ratio |
         |----------------------------------------|
      1. |    1    1       11       32   2.909091 |
      2. |    1    2       11       32   2.909091 |
      3. |    1    3       11       32   2.909091 |
      4. |    1    .       11       32   2.909091 |
      5. |    1    5       11       32   2.909091 |
         |----------------------------------------|
      6. |    2   10       11       32   2.909091 |
      7. |    2    9       11       32   2.909091 |
      8. |    2    .       11       32   2.909091 |
      9. |    2    7       11       32   2.909091 |
     10. |    2    6       11       32   2.909091 |
         +----------------------------------------+
    I see you want the reciprocal of that ratio, so tweak the code accordingly.

    Comment


    • #3
      Two methods, the second being the one that creates columns the way you suggested, but which -- as the first method shows -- is not really necessary.

      Code:
      bysort year: egen totalsum = total(a), m
      
      forval i = 1/2 {
          sum totalsum if year == `i', meanonly
          local sum`i' = r(mean)
      }
      gen b1 = `sum1'/`sum2'
      
      forval i = 1/2 {
          egen totalsum`i' = max(totalsum*(year == `i'))
      }
      gen b2 = totalsum1/totalsum2
      which produces:

      Code:
      . li , noobs sepby(year) ab(20)
      
        +----------------------------------------------------------------+
        | year    a   totalsum       b1   totalsum1   totalsum2       b2 |
        |----------------------------------------------------------------|
        |    1    1         11   .34375          11          32   .34375 |
        |    1    2         11   .34375          11          32   .34375 |
        |    1    3         11   .34375          11          32   .34375 |
        |    1    .         11   .34375          11          32   .34375 |
        |    1    5         11   .34375          11          32   .34375 |
        |----------------------------------------------------------------|
        |    2   10         32   .34375          11          32   .34375 |
        |    2    9         32   .34375          11          32   .34375 |
        |    2    .         32   .34375          11          32   .34375 |
        |    2    7         32   .34375          11          32   .34375 |
        |    2    6         32   .34375          11          32   .34375 |
        +----------------------------------------------------------------+

      Comment


      • #4
        Thank you, Nick and Hemanshu.

        Nick, the end part of section 9 in https://www.stata-journal.com/articl...article=dm0055 talks about the trick of averaging. I'm not able to understand it. Can you please explain that here, if possible?

        Comment


        • #5
          The point is to compare different ways of getting a mean into a variable. If we want the mean weight -- given mpg > 25 -- one method below is right and the other quite wrong, as a direct summarize shows.

          Code:
          . sysuse auto, clear
          (1978 Automobile Data)
          
          . egen mean1 = mean(cond(mpg > 25, weight, .))
          
          . egen mean2 = mean((mpg > 25) * weight)
          
          . su weight if mpg > 25
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                weight |         14    2122.857    386.0536       1760       3260
          
          . l mean1 mean2 in 1
          
               +---------------------+
               |    mean1      mean2 |
               |---------------------|
            1. | 2122.857   401.6216 |
               +---------------------+

          Comment


          • #6
            Understood it now. Thank you, Nick.

            Comment


            • #7
              Hello, in the attached image, there are different types of household compositions. I need all these types as separate variables. Can someone help me with this? I have attached an example dataset for the same.


              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input double(hh_id ind_id) byte(relationship sex age marital)
              1010001 101000101 1 1 58 2
              1010001 101000102 2 2 53 2
              1010001 101000103 3 1 32 1
              1010001 101000104 3 2 29 1
              1010010 101001001 1 1 38 2
              1010010 101001002 2 2 32 2
              1010010 101001003 3 2  4 .
              1010010 101001004 4 1 75 2
              1010010 101001005 4 2 68 2
              1010014 101001401 1 1 35 2
              1010014 101001402 2 2 33 2
              1010014 101001403 3 2  8 .
              1010014 101001404 3 1  6 .
              1010014 101001405 4 1 55 2
              1010014 101001406 4 2 53 2
              1010017 101001701 1 1 25 1
              1010017 101001702 4 1 23 1
              1010017 101001704 4 1 54 2
              1010017 101001705 4 2 49 2
              1010022 101002201 1 1 31 2
              1010022 101002202 2 2 29 2
              1010022 101002203 3 2 13 1
              1010022 101002204 3 1 11 .
              1010022 101002205 3 1  9 .
              1010022 101002206 4 1 56 3
              1010023 101002301 1 1 37 2
              1010023 101002302 2 2 38 2
              1010023 101002303 4 1 69 2
              1010023 101002304 4 2 64 2
              1010025 101002501 1 1 27 1
              1010025 101002502 4 1 24 1
              1010025 101002504 4 1 65 2
              1010025 101002505 4 2 57 2
              1010027 101002701 1 1 35 2
              1010027 101002702 2 2 34 2
              1010027 101002703 3 1  2 .
              1010027 101002704 4 1 73 3
              1010036 101003601 1 1 40 2
              1010036 101003602 2 2 39 2
              1010036 101003603 3 1 10 .
              1010036 101003604 3 2  8 .
              1010036 101003605 3 2  6 .
              1010037 101003701 1 1 28 1
              1010037 101003702 4 1 24 1
              1010037 101003703 4 1 57 2
              1010037 101003704 4 2 55 2
              1010044 101004401 1 1 27 1
              1010044 101004403 4 1 22 1
              1010044 101004404 4 1 66 2
              1010044 101004405 4 2 63 2
              end
              label values relationship lab_rel
              label def lab_rel 1 "head", modify
              label def lab_rel 2 "spouse", modify
              label def lab_rel 3 "child", modify
              label def lab_rel 4 "extended family", modify
              label values sex HV104
              label def HV104 1 "male", modify
              label def HV104 2 "female", modify
              label values age HV105
              label values marital lab_mar
              label def lab_mar 1 "never married", modify
              label def lab_mar 2 "currently married", modify
              label def lab_mar 3 "widowed", modify
              Thanks.
              Attached Files

              Comment


              • #8
                #7 should have been the start of a new thread.

                egen is your friend, in a nutshell. One kind of calculation is a count such as

                Code:
                egen adult_female = total(sex == 2 & inrange(age, 18, 64)), by(hhid)
                and other calculations depend on min() and max().

                Watch out that missing values count as arbitrarily large and that some variables need to be generated in two or three steps.

                Comment


                • #9
                  Thank you, Nick.

                  Comment

                  Working...
                  X