Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate dummy based on values of another variable

    Dear Statalist community,

    I have a dataset on the length of growing seasons (los) at different locations (id's) over multiple years. The locations shown below have two growing seasons per year. The variable meanlos indicates the long-term average los of this location's first and second season, respectively. I want to find out if it is the first or second season that typically is the long one.

    E.g., I could create a new dummy variable "long" that is 1 if the season is long, and 0 otherwise. In this case, the variable would be 0 if firstseason == 1, and 1 if firstseason == 2, because meanlos is higher during the second season. Do you have advices on how to do that?

    Thanks a lot!

    Code:
    clear
    input double id int year byte(firstseason los) float meanlos
    3 1982 1  4         4
    3 1982 2 13 10.242424
    3 1983 1  4         4
    3 1983 2 13 10.242424
    3 1984 1  5         4
    3 1984 2  3 10.242424
    3 1985 1  4         4
    3 1985 2 12 10.242424
    3 1986 1  2         4
    3 1986 2  3 10.242424
    3 1987 1  7         4
    3 1987 2 10 10.242424
    3 1988 1  6         4
    3 1988 2 10 10.242424
    3 1989 1  5         4
    3 1989 2  7 10.242424
    3 1990 1  3         4
    3 1990 2 13 10.242424
    end

  • #2
    I don't see why you need a new variable. Given an appropriate model with season as an indicator (so-called dummy), the sign of the coefficient on season indicates the tendency for season 2 to have a longer season than season 1 (or conversely.).

    Comment


    • #3
      Thanks, Nick! I thought about maybe excluding the short seasons from my later analysis. Depending on the id's, sometimes season 2 and sometimes season 1 is the long season. This is why I wanted to create a dummy that tells me, for each id, which season is the long one.

      Comment


      • #4
        That sounds like


        Code:
        clear
        input double id int year byte(firstseason los) float meanlos
        3 1982 1  4         4
        3 1982 2 13 10.242424
        3 1983 1  4         4
        3 1983 2 13 10.242424
        3 1984 1  5         4
        3 1984 2  3 10.242424
        3 1985 1  4         4
        3 1985 2 12 10.242424
        3 1986 1  2         4
        3 1986 2  3 10.242424
        3 1987 1  7         4
        3 1987 2 10 10.242424
        3 1988 1  6         4
        3 1988 2 10 10.242424
        3 1989 1  5         4
        3 1989 2  7 10.242424
        3 1990 1  3         4
        3 1990 2 13 10.242424
        end
        
        bysort id year (los) : gen longer = _n == 2 
        
        list, sepby(id year)
        
             +------------------------------------------------+
             | id   year   firsts~n   los    meanlos   longer |
             |------------------------------------------------|
          1. |  3   1982          1     4          4        0 |
          2. |  3   1982          2    13   10.24242        1 |
             |------------------------------------------------|
          3. |  3   1983          1     4          4        0 |
          4. |  3   1983          2    13   10.24242        1 |
             |------------------------------------------------|
          5. |  3   1984          2     3   10.24242        0 |
          6. |  3   1984          1     5          4        1 |
             |------------------------------------------------|
          7. |  3   1985          1     4          4        0 |
          8. |  3   1985          2    12   10.24242        1 |
             |------------------------------------------------|
          9. |  3   1986          1     2          4        0 |
         10. |  3   1986          2     3   10.24242        1 |
             |------------------------------------------------|
         11. |  3   1987          1     7          4        0 |
         12. |  3   1987          2    10   10.24242        1 |
             |------------------------------------------------|
         13. |  3   1988          1     6          4        0 |
         14. |  3   1988          2    10   10.24242        1 |
             |------------------------------------------------|
         15. |  3   1989          1     5          4        0 |
         16. |  3   1989          2     7   10.24242        1 |
             |------------------------------------------------|
         17. |  3   1990          1     3          4        0 |
         18. |  3   1990          2    13   10.24242        1 |
             +------------------------------------------------+
        
        .
        See https://www.stata-journal.com/articl...article=dm0099 for an argument in favour of the term indicator variables.

        Comment


        • #5
          This approach works perfectly fine, thank you!

          I tried something else, which should give the same result:

          Code:
          bysort id year: egen maxlos = max(los) 
          sort id Time
          
          gen major = 0
          replace major = 1 if los == maxlos

          Comment


          • #6
            Your second two statements can be condensed to

            Code:
            gen major = los == maxlos

            Comment


            • #7
              Great, thank you very much!

              Comment

              Working...
              X