Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • egen for specific time period

    Hi,

    I want to calculate the standard deviation for a specific year group i.e.-
    year variable SD_variable
    1990 1 a
    1991 2 a
    1992 3 a
    1993 4 b
    1994 5 b
    1995 6 b

    There's countrycode also which implies I need to use egen

    > bysort countrycode: egen SD_variable= sd(variable)

    But the above command calculate SD from 1990-95

    Where I want to calculate the same for a specific periods- 1990-92 & 1993-95


    Can you please help?





  • #2
    Is it correct-
    egen meangdp = mean(gdp) if inrange(year,1963,1980), by(country)

    Comment


    • #3
      From what you show, the code is

      Code:
      egen  meangdp = mean(gdp), by(SD_variable)
      but it is also possible that you re not explaining very well what you want. In particular do you have this SD_variable, or you want to calculate it?

      Comment


      • #4
        That could be correct but over any long period a geometric mean might make more sense. For periods 1990-1992, 1993-1995 consider this


        Code:
        . clear
        
        . set obs 6
        number of observations (_N) was 0, now 6
        
        . gen whatever = _n
        
        . gen year = 1989 + _n
        
        . egen mybin = seq(), block(3)
        
        . egen mean = mean(whatever), by(mybin)
        
        . l, sep(3)
        
             +--------------------------------+
             | whatever   year   mybin   mean |
             |--------------------------------|
          1. |        1   1990       1      2 |
          2. |        2   1991       1      2 |
          3. |        3   1992       1      2 |
             |--------------------------------|
          4. |        4   1993       2      5 |
          5. |        5   1994       2      5 |
          6. |        6   1995       2      5 |
             +--------------------------------+
        and you naturally may work with blocks for periods and countries at once, say

        Code:
        egen mean2 = mean(whatever), by(country mybin)
        Much more at

        SJ-18-3 dm0095 . . . . . . . . . . . Speaking Stata: From rounding to binning
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
        Q3/18 SJ 18(3):741--754 (no commands)
        basic review of how to bin variables in Stata, meaning how to
        divide their range or support into disjoint intervals

        https://www.stata-journal.com/articl...article=dm0095

        Comment


        • #5
          Hi Joro Kolev,
          I want to calculate SD_variable
          I think inrange could be a good way to go
          So the command can be:
          HTML Code:
          egen meangdp = mean(gdp) if inrange(year,1963,1980), by(country)
          ?

          Comment


          • #6
            I think you want to do what Nick was explaining above.

            You firstly need to create a variable that is containing your bins for the years. For example in the initial data that you showed, what you called SD_variable was containing one value (a) for the first three years, and another value (b) for the other three years. You need to create such a variable, because you know what are your bins, but I do not know them, neither does Stata know them.

            There are more clever ways how to do this, but to tell you the clever ways, you need to say exactly what your years are and how you want to bin them.

            Brute force is always an option:

            Code:
            gen bin = 1 if inrange(year,1963,1980)
            replace bin = 2 if inrange(year,1981,1985)
            replace bin = 3 if inrange(year,1986,1990)
            ...
            and so on until you have created all the bins
            Once you have the bins

            Code:
            egen meangdp = mean(gdp), by(country bin)
            will give you the mean GDP in the country within the respective bin, and

            Code:
            egen sdgdp = sd(gdp), by(country bin)
            will give you the standard deviation of GDP in the country within the respective bin.


            Originally posted by Souryabrata Mohapatra View Post
            Hi Joro Kolev,
            I want to calculate SD_variable
            I think inrange could be a good way to go
            So the command can be:
            HTML Code:
            egen meangdp = mean(gdp) if inrange(year,1963,1980), by(country)
            ?

            Comment


            • #7
              Hi Nick Cox,
              What I did is
              HTML Code:
              egen SD_variable1 = sd(variable) if inrange(year,1990,1992), by(id)
              egen SD_variable2 = sd(variable) if inrange(year,1993,1995), by(id)
              gen SD_variable = max(SD_variable1, SD_variable2)
              drop SD_variable1 SD_variable2
              Is that okay?

              Comment


              • #8
                So your SDs are based on most 3 values?

                You could do this

                Code:
                gen period = cond(inrange(year, 1990, 1992), 1 , cond(inrange(year, 1993, 1995), 2, .)) 
                egen SD = SD(variable), by(id period)

                Comment


                • #9
                  No Nick Cox,
                  That was just an example to ask & clarify my doubts.
                  I've 30 years of panel data.
                  I divided it into two periods, so 15 years each.

                  Comment

                  Working...
                  X