Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • compare values within one observation first, loop through all observations and create a new variable

    I am trying to do a descriptive analysis about people's moving behaviour. My data is like this (of course, the real data contains more obs and years):

    id address_1990 address_1991 address_1992 move
    1 a b b 1
    2 c c c 0
    3 d e d 2
    4 f . f 0
    5 g . h 1

    The last variable "move" is my goal, I don't have it initially. I just want to create such a variable to record the numbers of movement for each individual given their past addresses. Especially, I got trouble about how to handle the missing addresses.

    Any help? Thanks a lot!

  • #2
    As with almost all data management and analysis in Stata this is much easier in long layout than wide:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte id str1(addr199 addr1991 addr1992) byte move_desired
    1 "a" "b" "b" 1
    2 "c" "c" "c" 0
    3 "d" "e" "d" 2
    4 "f" ""  "f" 0
    5 "g" ""  "h" 1
    end
    
    
    reshape long addr, i(id) j(year)
    drop if missing(addr)
    by id (year), sort: egen int moves = total(addr != addr[_n-1] & _n > 1)
    reshape wide
    I don't know what you will be doing next with this data, but it is likely that that, too, will be facilitated by a long layout. So you are probably best off not doing the -reshape wide- at the end there. But that's up to you. Note, by the way, that no explicit loops are required.

    In the future, please use the -dataex- command to post example data, as I have done above. You can get it by running -ssc install dataex-, and then read -help dataex- for instructions how to use it. It took me far longer to create your data example in my Stata than it did to solve your problem. When -dataex- is used, it is a simple copy, paste, and run to create a completely faithful replica of the example data to work with. Please help those who want to help you.

    Comment


    • #3
      Thanks a lot! Your code works well for my question. I have installed dataex. For future usage: if I need to post some data, should I just copy from stata and paste here? like below:

      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str18 make int(price mpg rep78)
      "AMC Concord" 4099 22 3
      "AMC Pacer" 4749 17 3
      "AMC Spirit" 3799 22 .
      "Buick Century" 4816 20 3
      "Buick Electra" 7827 15 4
      end

      Comment


      • #4
        Most people giving advice on Statlist prefer that you put the data example in a code block. If you click on the # sign in the tools above where you enter the text of your question (at least that's where it appears when replying to a question...it's been a while since I've started a thread myself), you get two code delimiters with the curser in between. Just paste the example there and it will appear nicely formatted and easier to read, similar to the gray box that you see in Clyde's response in 2.

        Comment


        • #5
          I got it, and it is generated by dataex. It was in stata output, I am just blind and didn't follow the instruction there Thanks for your comment though!

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str18 make int(price mpg rep78)
          "AMC Concord"   4099 22 3
          "AMC Pacer"     4749 17 3
          "AMC Spirit"    3799 22 .
          "Buick Century" 4816 20 3
          "Buick Electra" 7827 15 4
          end

          Comment


          • #6
            I have the same question. I am looking for an alternative answer without converting data to long format (in my case reverting back to long)

            Comment


            • #7
              You can concatenate and eliminate repeated characters in #2. But this works with single characters, I do not understand the reluctance to reshape.

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input byte id str1(addr199 addr1991 addr1992) byte move_desired
              1 "a" "b" "b" 1
              2 "c" "c" "c" 0
              3 "d" "e" "d" 2
              4 "f" ""  "f" 0
              5 "g" ""  "h" 1
              end
              
              egen all= concat(addr*)
              replace all= ustrregexra(all, "(.)\1{1,}", "$1") 
              gen wanted= length(all)-1
              Res.:

              Code:
              . l
              
                   +--------------------------------------------------------------+
                   | id   addr199   addr1991   addr1992   move_d~d   all   wanted |
                   |--------------------------------------------------------------|
                1. |  1         a          b          b          1    ab        1 |
                2. |  2         c          c          c          0     c        0 |
                3. |  3         d          e          d          2   ded        2 |
                4. |  4         f                     f          0     f        0 |
                5. |  5         g                     h          1    gh        1 |
                   +--------------------------------------------------------------+

              Comment


              • #8
                Note that if moving back to an older address is still counted as a move, Andrew's code in #7 does not capture that. Moreover, it seems unlikely that addresses in your actual data are single characters. In considering that, I would suggest using a loop.

                One more thing, in any circumstance, caution (for spaces, typos..) is strongly recommended when comparing string variables.
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input byte id str1(addr1990 addr1991 addr1992) byte move_desired
                1 "a" "b" "b" 1
                2 "c" "c" "c" 0
                3 "d" "e" "d" 2
                4 "f" ""  "f" 0
                5 "g" ""  "h" 1
                6 "x" "y" "x" 2
                end
                
                gen countmove = 0
                
                gen lastaddr = addr1990
                
                forval y = 1991/1992 {
                    replace countmove = countmove + 1 if addr`y' != lastaddr & addr`y' !=""
                    replace lastaddr = addr`y' if addr`y' !=""
                }

                Comment

                Working...
                X