Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating complicated indicator variable

    I have data on garden participation in a program from years 2004-2018. Gardens can have entered the program any year, and stayed in the program for any number of years..

    Variables: Garden_Active_Year, where Year equals 2004-2018. Dummy variable for garden being active in that year (1=yes, 0=no)
    Year_Started: first year garden was in the program

    I would like to make an indicator variable for gardens that entered the program, left the program at some point, and then came back.
    For example, Garden_Active_2004=1, Garden_Active_2005=0, Garden_Active_2006=1, Garden_Active_2007=1.


    I wasn't able to dataex the variables for all years, but I have included the data for years 2004-2007 below.

    Please let me know if you need any clarification for this.

    Many thanks.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float Garden_Active_2004 byte(Garden_Active_2005 Garden_Active_2006 Garden_Active_2007) float Year_Started
    1 1 1 1 2004
    1 0 0 0 2004
    1 1 1 1 2004
    1 1 1 1 2004
    1 1 1 1 2004
    1 0 1 1 2004
    1 0 0 0 2004
    1 1 0 0 2004
    1 0 0 0 2004
    1 0 1 1 2004
    1 0 0 0 2004
    1 0 0 0 2004
    1 0 1 1 2004
    1 1 1 1 2004
    1 1 1 1 2004
    1 1 0 0 2004
    1 1 1 1 2004
    1 0 0 1 2004
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 0 1 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 0 0 2005
    0 1 0 0 2005
    0 1 0 0 2005
    0 1 0 0 2005
    0 1 1 1 2005
    0 1 1 0 2005
    0 1 1 0 2005
    0 1 1 0 2005
    0 1 1 0 2005
    0 1 0 0 2005
    0 1 0 0 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 0 0 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 0 0 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 0 0 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 0 0 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 0 0 2005
    0 1 1 0 2005
    0 1 0 0 2005
    0 1 0 0 2005
    0 1 1 1 2005
    1 1 1 1 2004
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 0 0 2005
    0 1 1 0 2005
    0 1 0 0 2005
    0 1 1 0 2005
    0 1 1 1 2005
    0 1 0 0 2005
    0 1 1 0 2005
    0 1 0 1 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 0 0 2005
    0 1 0 0 2005
    0 1 0 0 2005
    0 1 0 0 2005
    0 1 0 0 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 0 0 2005
    0 1 1 1 2005
    0 1 0 0 2005
    0 1 0 0 2005
    0 1 1 1 2005
    0 1 0 0 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 1 1 2005
    0 1 0 1 2005
    0 1 1 1 2005
    0 1 1 0 2005
    0 1 1 1 2005
    0 0 1 0 2006
    0 0 1 0 2006
    0 0 1 1 2006
    0 0 1 0 2006
    0 0 1 0 2006
    0 0 1 1 2006
    0 0 1 1 2006
    end

  • #2
    Hey Alyssa,

    At first glance, I'd say this is a very strange way to record this data. Wouldn't it be easier to create Panel data, where you give each garden an identifier and then mark it active in each year?
    That way, you don't need to have all these variables 'Active_year_2004', 'Active_year_2005', etc. Just something to think about.

    If this is your preferred way of organizing the data, I'd solve your problem like this:
    Code:
    gen sum_active_years = Garden_Active_2004 + Garden_Active_2005 + Garden_Active_2006 + Garden_Active_2007
    gen total_active_years = 2008 - Year_Started
    gen returnedDummy = 0
    replace returnedDummy = 1 if sum_active_years < total_active_year
    Wherein you replace '2008' on the second line with the final year + 1 of your dataset.

    This solution is a bit hacky and depending on how many years you have, you could do a loop over the variables. Anyways, this should work.

    If you need a more elegant/better solution, I could potentially get you one with a little more detail about your dataset

    Comment


    • #3
      Like Jesse Tielens, I wondered if it would be better to have a LONG file format, with a GardenID variable, and multiple rows per garden. However, for the WIDE file layout you have, I think the following works:

      Code:
      generate byte left = 0
      generate byte cameback = 0
      forvalues y = 2005(1)2018 {
       local prev = `y'-1
       replace left = 1 if !left & Garden_Active_`prev' & !Garden_Active_`y'
       replace cameback = 1 if !cameback & left & !Garden_Active_`prev' & Garden_Active_`y'
      }
      For the data you posted in #1, you have to loop from 2005 to 2007, not 2018.

      HTH.
      --
      Bruce Weaver
      Email: [email protected]
      Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
      Version: Stata/MP 18.0 (Windows)

      Comment


      • #4
        If you concatenate the history to a string

        Code:
        egen history = concat(Garden_Active*)
        then you are looking for a pattern such as

        101

        or

        1001

        or

        10001

        and so on. Regular expression experts can write syntax for that. But we will use a more Stataish way. Warning: not tested.

        So let's keep track only of changes either way. That is all, of those are abbreviated to 101

        Code:
        gen history = string(Garden_Active_2004) 
        
        forval y = 5/18 
             local Y : di %02.0f `y' 
             replace history = history + string(Garden_Active_`Y') if  string(Garden_Active_`Y')  != substr(history, -1, 1) 
        } 
        
        gen wanted = strpos(history, "101") > 0

        Comment


        • #5
          Inspired by Nick’s suggestion, the below code compares the first position of “10” and the last position of “01” in history as an alternative to take you to the target.
          Code:
          egen h = concat(Garden_Active*)
          gen wanted = strpos(h,"10") >0 & strpos(h,"10") < strrpos(h,"01")
          drop h

          Comment


          • #6
            Romalpa Akzo That's nice. Also, I think

            Code:
            if strmatch(history, "*10*1*")

            Comment


            • #7
              Perfect! Nick Cox sensei. I do learn a new thing - such a convenient and sharp "tool". Please kindly allow me to rewrite down here your solution for future reference.

              Code:
              egen history = concat(Garden_Active*)
              gen wanted = strmatch(history, "*10*1*")

              Comment


              • #8
                All that said, Jesse Tielens in #2 and Bruce Weaver in #3 are right. A long data layout would be better for most purposes. I don't know why we can't see an identifier in #1.

                Comment


                • #9
                  Kudos to Nick & Romalpa--that's a neat solution. To confirm that it has worked, you could add a cross-tabulation of variables history and wanted.

                  Code:
                  . tabulate history wanted
                  
                             |        wanted
                     history |         0          1 |     Total
                  -----------+----------------------+----------
                        0010 |         4          0 |         4
                        0011 |         3          0 |         3
                        0100 |        25          0 |        25
                        0101 |         0          3 |         3
                        0110 |         9          0 |         9
                        0111 |        37          0 |        37
                        1000 |         5          0 |         5
                        1001 |         0          1 |         1
                        1011 |         0          3 |         3
                        1100 |         2          0 |         2
                        1111 |         8          0 |         8
                  -----------+----------------------+----------
                       Total |        93          7 |       100
                  --
                  Bruce Weaver
                  Email: [email protected]
                  Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
                  Version: Stata/MP 18.0 (Windows)

                  Comment


                  • #10
                    Thank you all for your comments. I'm not opposed to transforming the data into a long format, but in that case, how would I tabulate how many gardens entered, left, and came back, for each starting year?

                    Comment


                    • #11
                      Also that was my bad in #1. There is an identifier (Garden_ID), which I must have forgotten to include when specifying variables in dataex.

                      Comment


                      • #12
                        If your layout is


                        Code:
                        garden  date  active (0 or 1)
                        then

                        Code:
                        bysort garden (date) : gen change = active - active[n-1]
                        is -1 for left, +1 for entered. And so on.

                        Also, whatever can be calculated easily in wide layout can always be merged with the data in long layout.

                        Further even in long layout, the history can be created by concatenating rows.

                        Code:
                        gen history = "" 
                        bysort garden (year) : replace history = history[_n-1] + string(active)   
                        by garden : replace history = history[_N]

                        Comment


                        • #13
                          Just to add onto that. Your data is now on wide format, if you have a unique identifier and the garden_active_xxxx variables foor your data. You can easilyitch it to long format by typing:

                          Code:
                          reshape long Garden_Active_, i(Garden_ID) j(year)
                          rename garden_active_ garden_active
                          Most likely, you already knew this. But if you didn't this will save you a lot of time switching around columns etc. manually in Excel or some other software.

                          Comment


                          • #14
                            Putting together the pieces from various posts above:

                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear *
                            input float Garden_Active_2004 byte(Garden_Active_2005 Garden_Active_2006 Garden_Active_2007) float Year_Started
                            1 1 1 1 2004
                            1 0 0 0 2004
                            1 1 1 1 2004
                            1 1 1 1 2004
                            1 1 1 1 2004
                            1 0 1 1 2004
                            1 0 0 0 2004
                            1 1 0 0 2004
                            1 0 0 0 2004
                            1 0 1 1 2004
                            1 0 0 0 2004
                            1 0 0 0 2004
                            1 0 1 1 2004
                            1 1 1 1 2004
                            1 1 1 1 2004
                            1 1 0 0 2004
                            1 1 1 1 2004
                            1 0 0 1 2004
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 0 1 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 0 0 2005
                            0 1 0 0 2005
                            0 1 0 0 2005
                            0 1 0 0 2005
                            0 1 1 1 2005
                            0 1 1 0 2005
                            0 1 1 0 2005
                            0 1 1 0 2005
                            0 1 1 0 2005
                            0 1 0 0 2005
                            0 1 0 0 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 0 0 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 0 0 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 0 0 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 0 0 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 0 0 2005
                            0 1 1 0 2005
                            0 1 0 0 2005
                            0 1 0 0 2005
                            0 1 1 1 2005
                            1 1 1 1 2004
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 0 0 2005
                            0 1 1 0 2005
                            0 1 0 0 2005
                            0 1 1 0 2005
                            0 1 1 1 2005
                            0 1 0 0 2005
                            0 1 1 0 2005
                            0 1 0 1 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 0 0 2005
                            0 1 0 0 2005
                            0 1 0 0 2005
                            0 1 0 0 2005
                            0 1 0 0 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 0 0 2005
                            0 1 1 1 2005
                            0 1 0 0 2005
                            0 1 0 0 2005
                            0 1 1 1 2005
                            0 1 0 0 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 1 1 2005
                            0 1 0 1 2005
                            0 1 1 1 2005
                            0 1 1 0 2005
                            0 1 1 1 2005
                            0 0 1 0 2006
                            0 0 1 0 2006
                            0 0 1 1 2006
                            0 0 1 0 2006
                            0 0 1 0 2006
                            0 0 1 1 2006
                            0 0 1 1 2006
                            end
                            
                            * Reshape data to LONG format
                            generate Garden_ID = _n // Skip this if you already have a Garden ID variable
                            reshape long Garden_Active_, i(Garden_ID) j(year)
                            rename Garden_Active_  active
                            drop if year < Year_Started // Drop unneeded records
                            
                            generate history = ""
                            bysort Garden_ID (year) : replace history = history[_n-1] + string(active)   
                            by Garden_ID : replace history = history[_N]
                            generate byte wanted = strmatch(history, "*10*1*")
                            egen byte tagged = tag(Garden_ID) // Tag one record per garden
                            tabulate history wanted if tagged // Confirm that selection of records worked
                            Output from -tabulate-:

                            Code:
                            . tabulate history wanted if tagged // Confirm that selection of records worked
                            
                                       |        wanted
                               history |         0          1 |     Total
                            -----------+----------------------+----------
                                    10 |         4          0 |         4
                                   100 |        25          0 |        25
                                  1000 |         5          0 |         5
                                  1001 |         0          1 |         1
                                   101 |         0          3 |         3
                                  1011 |         0          3 |         3
                                    11 |         3          0 |         3
                                   110 |         9          0 |         9
                                  1100 |         2          0 |         2
                                   111 |        37          0 |        37
                                  1111 |         8          0 |         8
                            -----------+----------------------+----------
                                 Total |        93          7 |       100

                            --
                            Bruce Weaver
                            Email: [email protected]
                            Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
                            Version: Stata/MP 18.0 (Windows)

                            Comment


                            • #15
                              Thanks all for your comments. I decided to go with @Romalpa Akzo 's solution in #7.

                              Comment

                              Working...
                              X