Renaming values of string variables using conditions

Ets Bek

Join Date: May 2017

Posts: 2
#1

Renaming values of string variables using conditions

30 May 2017, 07:55

Hi all,

I am very new to STATA and unfortunately working on a deadline to submit a cleaned dataset.

Can someone please help me with the following?

I have three variables of interest. Day the observation is collected (start_day), location incorrectly entered (location) and correct locations (location2).

For example I would like to replace all observations with "london" under variable location WITH "durban" under new variable location2 IF the day was 18

I have looked on the STATA help section and all over the internet but haven't been able to find a solution.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35724
#2

30 May 2017, 08:27

It seems unlikely that anyone would have had this precise problem before.

Code:

help replace

gives pertinent examples.

Code:

replace location2 = "durban" if location == "london" & start_day == 18
Comment
Ets Bek

Join Date: May 2017

Posts: 2
#3

30 May 2017, 13:59

Thank you so much, Nick! That worked perfectly. Your help is indeed much appreciated.
Comment
MR Jaswal

Join Date: Dec 2021

Posts: 2
#4

30 Mar 2023, 20:20

Hi.

i have a variable called location. In that list, there is a set of observations that begin with CS. In my new variable, I want to rename all the observations that being with CS to Community. Is there a command in stata where I can do that? I have seen that I can do that for renaming the variables. But I have not been able to figure it out for observations. Can you also
also, following the same, it will be helpful if someone can tell me the same that if I want to extract the last three letters from a string variable, is that possible? For example, I have location names that end with -KHI and -PSH. In the new variable, I want to extract the KHI and PSH from the observations and that will easily give me the city name as these letters KHI and PSH signify city name.

Help will be much appreciated.
Thanks
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#5

31 Mar 2023, 01:00

#4 seems to be two linked questions about values in a string variable. It's not for Stata a question of new names. In Stata, observations are entire rows, records, or cases in the dataset.

What you need to know is about key string functions. See https://journals.sagepub.com/doi/pdf...867X1101100308 for one personal overview.

In both cases, leading and trailing spaces will mess up some operations.

Code:

gen location2 = trim(itrim(location)) replace location2 = "Community" + substr(location2, 3, .) if substr(location2, 1, 2) == "CS" gen last3 = substr(location2, -3, 3)

If location2 is what you want,

Code:

replace location = location2
Comment

Chinmay Korgaonkar

Join Date: Dec 2022
Posts: 34

31 Oct 2023, 04:20

Hello everyone,

My dataset is for the years 1998 to 2023. It has 20 counties. Each county has several districts with their names (District_name) and unique numbers (District_no). I want to assign district names for the year 2023 to district names from 2022 to 1998. So, all districts with the same district number within a given county will have the same name, which is their name for the year 2023.

I tried the following code. But it does not work.

sort County District_no ( District_name) Year
by County District_no ( District_name) Year: generate name_2023 = District_name if Year == 2023
by County District_no ( District_name) Year: replace District_name = name_2023 if Year >= 1998 & Year <= 2022

Thanks in advance!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str11 County str8 District_no str55 District_name float Year double Gross_tax_rate
"adams" "1"  "north blue creek township"  1998 7.0968
"adams" "1"  "north blue creek township"  1999 6.8312
"adams" "1"  "north blue creek township"  2000 7.0852
"adams" "1"  "north blue creek township"  2001 7.4966
"adams" "1"  "north blue creek township"  2002 2.6538
"adams" "1"  "north blue creek township"  2004 1.8515
"adams" "1"  "north blue creek township"  2005 2.0251
"adams" "1"  "north blue creek township"  2006 2.1824
"adams" "1"  "north blue creek township"  2007 1.8156
"adams" "1"  "north blue creek township"  2008 2.0141
"adams" "1"  "north blue creek township"  2009      .
"adams" "1"  "north blue creek township"  2010      .
"adams" "1"  "north blue creek township"  2011      .
"adams" "1"  "north blue creek township"  2012      .
"adams" "1"  "north (blue) creek township"  2013      .
"adams" "1"  "north blue creek township"  2014      .
"adams" "1"  "north blue creek township"  2015      .
"adams" "1"  "north blue creek township"  2016      .
"adams" "1"  "north blue (creek) township"  2017      .
"adams" "1"  "north blue creek township"  2018      .
"adams" "1"  "north blue creek township"  2019      .
"adams" "1"  "north blue/creek township"  2020      .
"adams" "1"  "north blue creek township"  2021      .
"adams" "1"  "north blue creek township"  2022      .
"adams" "1"  "north blue creek township"  2023      .
end

Comment

Chinmay Korgaonkar

Join Date: Dec 2022

Posts: 34
#7

01 Nov 2023, 01:46

Hello everyone,

The following code worked for me. Thanks to https://www.stata.com/support/faqs/d...issing-values/

Thank you.

Code:

gsort County District_no - Year by County District_no :generate District_name_2023 = District_name if Year == 2023 by County District_no : replace District_name_2023 = District_name_2023[_n-1] if missing(District_name_2023)

Last edited by Chinmay Korgaonkar; 01 Nov 2023, 01:51.
1 like
Comment

Announcement