Trouble naming variables through loops

Michael Walters

Join Date: Aug 2018

Posts: 24
#1

Trouble naming variables through loops

13 Aug 2018, 21:30

Hello,

I am having troubles creating loops when naming variables.
I am using Stata/IC 15.1 for Windows.

When I uploaded the data into Stata from an NCES csv file, it named the variables v1, v2, v3, etc.
I had to rename these, so I did the following:

insheet using C:\Users\Mike\Desktop\Stata\Demographics\1.csv, clear
rename v1 School
rename v2 State
rename v3 PTD1516
rename v4 PTD1415
rename v5 PTD1314
rename v6 PTD1213
rename v7 PTD1112
rename v8 PTD1011
rename v9 PTD0910
rename v10 PTD0809
rename v11 PTD0708
rename v12 PTD0607
rename v13 PTD0506
rename v14 PTD0405
rename v15 PTD0304
rename v16 PTD0203
rename v17 PTD0102
rename v18 PTD0001
rename v19 PTD9900
rename v20 PTD9899

From v3 to v20, as you can see, the PTD (Pupil teacher ratio for Year 2015-2016, 2014-2015, etc) are the variables.
Is there a simpler way to use loops to rename all of the variables? I have hundreds of variables in my do-file, including many demographic variables (Grade 1 Hispanic Males in 1998-99, etc).

Thank you.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

13 Aug 2018, 22:35

Well, here's a way to do it:

Code:

rename v1 School rename v2 State forvalues i = 3/20 { local final = 19-`i' local initial = `final' - 1 if `initial' < 0 { local initial = `initial' + 100 } if `final' < 0 { local final = `final' + 100 } local initial: display %02.0f `initial' local final: display %02.0f `final' rename v`i' PTD`initial'`final' }

Evidently you will have to change the code somewhat for each series of variables.

But I probably wouldn't do this at all. It's just going to leave you with a very wide data set that will probably prove unworkable for most analysis in Stata. So let's say we have a bunch of series like this, v3 through v20 are PTD1516 through PTD9899, and v21 through v38 are G1HM1516 through G1HM9899 (G1HM meaning grade 1 hispanic males). I would, instead, do this:

Code:

// CREATE A TOY DATA SET TO ILLUSTRATE THE CODE clear* set obs 10 set seed 1234 gen v1 = _n gen v2 = cond(_n <= 5, 1, 2) forvalues i = 3/38 { gen v`i' = runiform() } // RENAME SCHOOL AND STATE FIRST rename v1 school rename v2 state // WORK ON ALL THE OTHER SERIES OF VARIABLES, ONE SERIES AT A TIME // ptd FIRST rename (v3-v20) ptd= rename ptdv* ptd* rename ptd# ptd#, renumber(1) // g1hm NEXT rename (v21-v38) g1hm= rename g1hmv* g1hm* rename g1hm# g1hm#, renumber(1) // ETC. // NOW GO TO LONG LAYOUT reshape long ptd g1hm, i(school state) j(school_year_ending) replace school_year_ending = 2017 - school_year_ending // IF YOU REALLY NEED A VARIABLE THAT LOOKS LIKE 1516, GOING DOWN TO 9899 // YOU CAN GET IT FROM HERE AS FOLLOWS: gen school_year_starting = school_year_ending - 1 gen schoolyear = substr(string(school_year_starting), 3, 2) /// + substr(string(school_year_ending), 3, 2)

Note: I assume in this code that there is at most one observation for any particular school-state combination. If that is not true, the code will break when it hits the -reshape- command. There is a fix for that; post back if you need it.

Note also that I used all lowercase letters for the variable names. That is my habit: it makes typing the names easier. You are free to use upper case, or any mix of case, as you see fit. Nothing in the code hangs on this choice.

This code can generalize to any number of series of variables that correspond to academic years 1516 back through 9899, All you have to do is add more blocks of three -rename- commands before the line that says // ETC., to assign appropriate names to these variables. The -reshape- command will also require adding the new variable name prefixes after ptd and g1hm.

The data set that results from this is in long layout and you will almost certainly find it much easier to work with in Stata than the wide layout that you started with.

I have put at the end code to create a school year name that looks like 1516 or 9899. But I think you will find that it is just a nuisance to work with that. The variable school_year_ending contains all of that same information but has the advantage of making sense as a number: it sorts properly, you can calculate time elapsed between years by subtracting, etc. You can't do any of that with 1516 through 9899.

Last edited by Clyde Schechter; 13 Aug 2018, 22:38.
2 likes
Comment
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#3

14 Aug 2018, 08:26

You might also wonder why Stata doenst use the variable names from the csv.
Note that "insheet has been superseded by import delimited." (https://www.stata.com/help13.cgi?insheet)
And both insheet and import delimited have options to preserve variable names from your csv. https://www.stata.com/help13.cgi?import+delimited
All of the variable names listed in post #1 are valid Stata variable names, so you might as well try to preserve them on import rather than recreating them, if those names are the same in the csv file being imported.
Of course the reshaping as described in post #2 would still have to be done after that.
Comment
Michael Walters

Join Date: Aug 2018

Posts: 24
#4

19 Aug 2018, 13:33

Hello,

I'm running into some issues when running that code. I have decided to use ptd1, 2, 3 etc instead of 1516.
From v3-v20, it is ptd1 until ptd18. From v21-v38, it is ptps1 until ptps18 (another variable).

Here is the code I ran:

forvalues i = 3/38 {
gen v`i' = runiform()
rename v1 school
rename v2 state
rename (v3-v20) ptd=
rename ptdv* ptd*
rename ptd# ptd#, renumber(1)
rename (v21-v38) ptps=
rename ptpsv* ptps*
rename ptps# ptps#, renumber(1)
}

Thanks.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#5

19 Aug 2018, 13:41

No, you've mangled the logic of the code.

First, the entire part of the code between "// CREATE A TOY DATA SET" and "// RENAME SCHOOL AND STATE FIRST" was there to just create a demonstration data set. It was not intended for you to copy and use that part of the code: use your real data set.

But given that you did copy it, you have to copy it as is. The code you show in #4 fails because you do not generate v1 and v2 first. That causes a break at -rename v1 school- and another at -rename v2 state-. Once you fix that, you also can't have the rest of the code inside the -forvalues- loop: the code from -rename (v3-v20) ptd= - on down is meant to be run just once, and only on data that already contains variables v3 through v38.
Comment
Michael Walters

Join Date: Aug 2018

Posts: 24
#6

19 Aug 2018, 14:36

Hi,

The code ran well and there is no problems now. However, it sas v3 already defined.

. import delimited using C:\Users\Mike\Desktop\Stata\Demographics\1.csv, clear
(38 vars, 4,398 obs)

.
. forvalues i = 3/38 {
2.
. gen v`i' = runiform()
3.
. }
variable v3 already defined
r(110);

Then the rest of the code works (ie. changes the variable names). How can I remove the v3 already defined portion?

Thank you Clyde for your prompt response.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#7

19 Aug 2018, 14:46

Please read the code in #2 top to bottom and take the time to understand what each line, or block of code is doing. Also re-read what I said in #5. The entire -forvalues i= 3/38- loop is there for the sole purpose of creating a toy data set to demonstrate how the code works. The comment above the code states that in so many words. And I reiterated that in #5. It does not belong in the code to use with your actual data set, which already has the variables.
Comment
Michael Walters

Join Date: Aug 2018

Posts: 24
#8

19 Aug 2018, 14:51

I see that now. Thank you for your help
Comment
Michael Walters

Join Date: Aug 2018

Posts: 24
#9

20 Aug 2018, 16:46

Hi,

I appreciate your help and patience as I work with Stata.

Is there a way to make this count down, from 18 to 1, instead of from 1 to 18?

I've ran this so far and it adds from 1 to 18:

import delimited using C:\Users\Mike\Desktop\Stata\Demographics\1.csv, clear
rename v1 school
rename v2 state
rename (v3-v20) ptd=
rename ptdv* ptd*
rename ptd# ptd#, renumber(1)

Thank you.
Comment

Jorrit Gosens

Join Date: Jan 2015
Posts: 1019

#10

21 Aug 2018, 13:08

Code:

//    CREATE A TOY DATA SET TO ILLUSTRATE THE CODE
clear*
set obs 10
set seed 1234
gen v1 = _n
gen v2 = cond(_n <= 5, 1, 2)

forvalues i = 3/20 {
    gen v`i' = runiform()
}

//    RENAME SCHOOL AND STATE FIRST
rename v1 school
rename v2 state
local i=3
forvalues n = 18(-1)1 {
rename v`i' ptd`n'
local ++i
}

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35724
#11

21 Aug 2018, 14:01

Michael seems to have asked this twice. See https://www.statalist.org/forums/for...ological-order

Please close a thread explicitly or just keep running the same thread.
Comment
Michael Walters

Join Date: Aug 2018

Posts: 24
#12

21 Aug 2018, 14:59

I apologize, this thread is closed.
Comment

Announcement

Trouble naming variables through loops

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment