Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create Indicator Based on Change in Another Variable

    Hi again, STATA friends!

    Great first experience using the forum, thank you to the respondents for the help. I now have a question regarding categorical variables based off of the content of other variables.

    I'm trying to create a field which will tell me if, and how many times, an individual in the data changed living location based on city. I could be more specific and try to obtain changes by the field containing street address, but there is input [spelling] error on the dataset's side which would taint the results. City changes will be fine for now.

    I used the following code to successfully create a variable:

    by ind_id (city), sort: gen changed = (city[1] != city[_N])

    ...but it doesn't account for an individual who may have moved more than once. I would like to have a result which displays a category for how many changes in 'city' are noted per ind_id, but then stop at the final change.

    For example, I'd like to create the variable changed below:
    ind_id visit_id city changed
    1 1 Hope 0
    1 2 Hope 0
    1 3 Aurora 1
    1 4 Aurora .
    1 5 Aurora .
    2 1 Jackson 0
    2 2 Jackson 0
    3 1 Hope 0
    3 2 Aurora 1
    3 3 Aurora .
    3 4 Jackson 2
    3 5 Jackson 2
    3 6 Hope 3
    Thanks in advance and I'm sorry I couldn't paste any code or screenshots in,data is housed on a secured remote desktop for privacy reasons.

    -Peter
    Last edited by Cody Fiduccia; 05 Jun 2019, 15:27.

  • #2
    NOTE: If it makes a difference, I could also note a change in zip code (numerical field), instead of city (string field)!

    Comment


    • #3
      Well, I don't fully understand what you want. It's something like this:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte(ind_id visit_id) str7 city byte changed
      1 1 "Hope"    0
      1 2 "Hope"    0
      1 3 "Aurora"  1
      1 4 "Aurora"  .
      1 5 "Aurora"  .
      2 1 "Jackson" 0
      2 2 "Jackson" 0
      3 1 "Hope"    0
      3 2 "Aurora"  1
      3 3 "Aurora"  .
      3 4 "Jackson" 2
      3 5 "Jackson" 2
      3 6 "Hope"    3
      end
      
      by ind_id (visit_id), sort: gen wanted = sum(city != city[_n-1]) - 1
      Where this disagrees with your variable, changed, is when you have missing value for changed. At first I thought perhaps you want to put a missing value there after the final change for a given ind_id. That's what you (appear to) do with ind_id1. But then with ind_id3, you have a missing value when visit_id == 3 even though there are still changes to come. So I haven't been able to figure out what you want those missing values to signify. But apart from the missing values, this code gives you what you asked for.

      Comment


      • #4
        Hi Clyde - Thank you once again! Apologies for my error in describing the example data above. By indicating '.' I meant to signify that I wished the sequence to end after the first instance of each change in city. What you provided did work, though!

        Is there a way by which I can modify the code you provided above to provide a numerical output only for the first instance of change? For example, once someone is noted as moved, until the next instance of city change it does not repeat the number but instead marks it as null. The reason I'm asking for it this way is because I'd like to be able to tabulate 'wanted' and show how many people moved cities within one year (but as it is now it is also counting each instance of the new city, inflating my n per category within 'wanted').


        I've modified the example with which we were working in the previous reply to what I'm looking for:


        ind_id visit_id city wanted
        1 1 "Hope" 0
        1 2 "Hope" -
        1 3 "Aurora" 1
        1 4 "Aurora" -
        1 5 "Aurora" -
        2 1 "Jackson" 0
        2 2 "Jackson" -
        3 1 "Hope" 0
        3 2 "Aurora" 1
        3 3 "Aurora" -
        3 4 "Jackson" 2
        3 5 "Jackson" -
        3 6 "Hope" 3

        Thank you so much again for the help, I'm learning a lot.

        Comment


        • #5
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte(nd_id visit_id) str9 city byte wanted
          1 1 `""Hope""'    0
          1 2 `""Hope""'    .
          1 3 `""Aurora""'  1
          1 4 `""Aurora""'  .
          1 5 `""Aurora""'  .
          2 1 `""Jackson""' 0
          2 2 `""Jackson""' .
          3 1 `""Hope""'    0
          3 2 `""Aurora""'  1
          3 3 `""Aurora""'  .
          3 4 `""Jackson""' 2
          3 5 `""Jackson""' .
          3 6 `""Hope""'    3
          end
          
          by nd_id (visit_id), sort: gen change_count = sum(city != city[_n-1]) - 1
          by nd_id (visit_id): replace change_count = . if city == city[_n-1]

          Comment


          • #6
            Clyde - wonderful, exactly what I was looking for and it helped rework my tables to show a rough proxy for mobility across the year of data. You're a lifesaver! Thank you very much, I sincerely appreciate both the easy-to-follow instructions as well as the exceedingly timely responses.

            Warm regards,
            PCF

            Comment


            • #7
              Just make 1-line for the solution.
              Code:
              bys nd_id (visit_id): gen x = sum(_n>1) if city != city[_n-1]

              Comment

              Working...
              X