Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Renaming values of string variables using conditions

    Hi all,

    I am very new to STATA and unfortunately working on a deadline to submit a cleaned dataset.

    Can someone please help me with the following?

    I have three variables of interest. Day the observation is collected (start_day), location incorrectly entered (location) and correct locations (location2).

    For example I would like to replace all observations with "london" under variable location WITH "durban" under new variable location2 IF the day was 18

    I have looked on the STATA help section and all over the internet but haven't been able to find a solution.

  • #2
    It seems unlikely that anyone would have had this precise problem before.

    Code:
    help replace
    gives pertinent examples.

    Code:
    replace location2 = "durban" if location == "london" & start_day == 18

    Comment


    • #3
      Thank you so much, Nick! That worked perfectly. Your help is indeed much appreciated.

      Comment


      • #4
        Hi.

        i have a variable called location. In that list, there is a set of observations that begin with CS. In my new variable, I want to rename all the observations that being with CS to Community. Is there a command in stata where I can do that? I have seen that I can do that for renaming the variables. But I have not been able to figure it out for observations. Can you also
        also, following the same, it will be helpful if someone can tell me the same that if I want to extract the last three letters from a string variable, is that possible? For example, I have location names that end with -KHI and -PSH. In the new variable, I want to extract the KHI and PSH from the observations and that will easily give me the city name as these letters KHI and PSH signify city name.

        Help will be much appreciated.
        Thanks

        Comment


        • #5
          #4 seems to be two linked questions about values in a string variable. It's not for Stata a question of new names. In Stata, observations are entire rows, records, or cases in the dataset.

          What you need to know is about key string functions. See https://journals.sagepub.com/doi/pdf...867X1101100308 for one personal overview.

          In both cases, leading and trailing spaces will mess up some operations.

          Code:
          gen location2 = trim(itrim(location)) 
          
          replace location2  = "Community" + substr(location2, 3, .) if substr(location2, 1, 2) == "CS" 
          
          gen last3 = substr(location2, -3, 3)
          If location2 is what you want,

          Code:
          replace location = location2


          Comment


          • #6
            Hello everyone,

            My dataset is for the years 1998 to 2023. It has 20 counties. Each county has several districts with their names (District_name) and unique numbers (District_no). I want to assign district names for the year 2023 to district names from 2022 to 1998. So, all districts with the same district number within a given county will have the same name, which is their name for the year 2023.

            I tried the following code. But it does not work.

            sort County District_no ( District_name) Year
            by County District_no ( District_name) Year: generate name_2023 = District_name if Year == 2023
            by County District_no ( District_name) Year: replace District_name = name_2023 if Year >= 1998 & Year <= 2022


            Thanks in advance!


            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str11 County str8 District_no str55 District_name float Year double Gross_tax_rate
            "adams" "1"  "north blue creek township"  1998 7.0968
            "adams" "1"  "north blue creek township"  1999 6.8312
            "adams" "1"  "north blue creek township"  2000 7.0852
            "adams" "1"  "north blue creek township"  2001 7.4966
            "adams" "1"  "north blue creek township"  2002 2.6538
            "adams" "1"  "north blue creek township"  2004 1.8515
            "adams" "1"  "north blue creek township"  2005 2.0251
            "adams" "1"  "north blue creek township"  2006 2.1824
            "adams" "1"  "north blue creek township"  2007 1.8156
            "adams" "1"  "north blue creek township"  2008 2.0141
            "adams" "1"  "north blue creek township"  2009      .
            "adams" "1"  "north blue creek township"  2010      .
            "adams" "1"  "north blue creek township"  2011      .
            "adams" "1"  "north blue creek township"  2012      .
            "adams" "1"  "north (blue) creek township"  2013      .
            "adams" "1"  "north blue creek township"  2014      .
            "adams" "1"  "north blue creek township"  2015      .
            "adams" "1"  "north blue creek township"  2016      .
            "adams" "1"  "north blue (creek) township"  2017      .
            "adams" "1"  "north blue creek township"  2018      .
            "adams" "1"  "north blue creek township"  2019      .
            "adams" "1"  "north blue/creek township"  2020      .
            "adams" "1"  "north blue creek township"  2021      .
            "adams" "1"  "north blue creek township"  2022      .
            "adams" "1"  "north blue creek township"  2023      .
            end

            Comment


            • #7
              Hello everyone,

              The following code worked for me. Thanks to https://www.stata.com/support/faqs/d...issing-values/

              Thank you.

              Code:
              gsort County District_no - Year
              by County District_no :generate District_name_2023 = District_name if Year == 2023
              by County District_no : replace District_name_2023 = District_name_2023[_n-1] if missing(District_name_2023)
              Last edited by Chinmay Korgaonkar; 01 Nov 2023, 01:51.

              Comment

              Working...
              X