Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standardized variable by year

    Hi everyone,
    I want to standardize a variable within each year. So I used
    Code:
    egen zCSR= std( CSR),by( year)
    but I got this message from stata: egen ... std() may not be combined with by.
    So, is there another option to use for this purpose knowing that my variable is a categorical variable (I don't know if this information is important, however)
    Thanks in advance and sorry, my question may be so simple!
    Last edited by salma ktat; 02 Dec 2014, 11:32.

  • #2
    So you have to write a loop:

    Code:
    gen zCSR = .
    tempvar junk
    levelsof year, local(years)
    foreach y of local years {
        egen `junk' =  std(CSR) if year == `y'
        replace zCSR = `junk' if year == `y'
        drop `junk'
    }
    And if you want to check that it was done correctly, you can see the results with -tabstat zCSR, by(year) statistics(mean sd)-.

    Comment


    • #3
      Thanks Clyde, this is working well, the means and sd are good values. But, I just have a question, why did you use 2 commands
      Code:
       
       egen `junk' =  std(CSR) if year == `y'     replace zCSR = `junk' if year == `y'
      Why don't we use just one command which is:
      egen `zCSR' = std(CSR) if year == `y' Thanks again.

      Comment


      • #4
        I would not write a loop here.

        Code:
        egen mean = mean(CSR), by(year)
        egen z = sd(CSR), by(year)
        replace z = (CSR - mean) / z
        Last edited by Nick Cox; 02 Dec 2014, 12:22.

        Comment


        • #5
          To answer Salma's question about why I used two commands instead of one is, that this was necessitated by the loop. Had I just used the single command -egen zCSR = std(CSR) if year == `y'- is that the second time through the loop, zCSR would already exist and -egen-, wanting to create a new variable would complain and stop.

          Avoiding this complication is one advantage of Nick's approach over mine.

          Comment


          • #6
            It seems an anomaly that std() doesn't support operations by: or by(). The idea makes perfect sense.

            Comment


            • #7
              Nick, how this code changes if I would like to calculate std() for the variable for each industry (sic) and year (year)?

              Thank you

              Comment


              • #8
                Just change what you feed to by().

                By (sic) I take it you mean (SIC). See also https://en.wikipedia.org/wiki/Sic

                Comment


                • #9
                  Thank you Nick
                  By (sic), I have meant industry code. My question was about calculating the standardised value of a variable grouping by both (sic) and (year).

                  Thank you

                  Comment


                  • #10
                    Sure, and I answered you. Just change what you feed to by(). So, the option will be by(sic year) (assuming that those are your variable names).

                    Comment


                    • #11
                      Thank you Nick.

                      Comment

                      Working...
                      X