Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble naming variables through loops

    Hello,

    I am having troubles creating loops when naming variables.
    I am using Stata/IC 15.1 for Windows.

    When I uploaded the data into Stata from an NCES csv file, it named the variables v1, v2, v3, etc.
    I had to rename these, so I did the following:

    insheet using C:\Users\Mike\Desktop\Stata\Demographics\1.csv, clear
    rename v1 School
    rename v2 State
    rename v3 PTD1516
    rename v4 PTD1415
    rename v5 PTD1314
    rename v6 PTD1213
    rename v7 PTD1112
    rename v8 PTD1011
    rename v9 PTD0910
    rename v10 PTD0809
    rename v11 PTD0708
    rename v12 PTD0607
    rename v13 PTD0506
    rename v14 PTD0405
    rename v15 PTD0304
    rename v16 PTD0203
    rename v17 PTD0102
    rename v18 PTD0001
    rename v19 PTD9900
    rename v20 PTD9899

    From v3 to v20, as you can see, the PTD (Pupil teacher ratio for Year 2015-2016, 2014-2015, etc) are the variables.
    Is there a simpler way to use loops to rename all of the variables? I have hundreds of variables in my do-file, including many demographic variables (Grade 1 Hispanic Males in 1998-99, etc).

    Thank you.

  • #2
    Well, here's a way to do it:
    Code:
    rename v1 School
    rename v2 State
    forvalues i = 3/20 {
        local final = 19-`i'
        local initial = `final' - 1
        if `initial' < 0 {
            local initial = `initial' + 100
        }
        if `final' < 0 {
            local final = `final' + 100
        }
        local initial: display %02.0f `initial'
        local final: display %02.0f `final'
        rename v`i' PTD`initial'`final'
    }
    Evidently you will have to change the code somewhat for each series of variables.

    But I probably wouldn't do this at all. It's just going to leave you with a very wide data set that will probably prove unworkable for most analysis in Stata. So let's say we have a bunch of series like this, v3 through v20 are PTD1516 through PTD9899, and v21 through v38 are G1HM1516 through G1HM9899 (G1HM meaning grade 1 hispanic males). I would, instead, do this:

    Code:
    //    CREATE A TOY DATA SET TO ILLUSTRATE THE CODE
    clear*
    set obs 10
    set seed 1234
    gen v1 = _n
    gen v2 = cond(_n <= 5, 1, 2)
    
    forvalues i = 3/38 {
        gen v`i' = runiform()
    }
    
    //    RENAME SCHOOL AND STATE FIRST
    rename v1 school
    rename v2 state
    
    //    WORK ON ALL THE OTHER SERIES OF VARIABLES, ONE SERIES AT A TIME
    //    ptd FIRST
    rename (v3-v20) ptd=
    rename ptdv* ptd*
    rename ptd# ptd#, renumber(1)
    
    //    g1hm NEXT
    rename (v21-v38) g1hm=
    rename g1hmv* g1hm*
    rename g1hm# g1hm#, renumber(1)
     
    // ETC.
    
    //    NOW GO TO LONG LAYOUT
    reshape long ptd g1hm, i(school state) j(school_year_ending)
    replace school_year_ending = 2017 - school_year_ending
    
    //    IF YOU REALLY NEED A VARIABLE THAT LOOKS LIKE 1516, GOING DOWN TO 9899
    //    YOU CAN GET IT FROM HERE AS FOLLOWS:
    gen school_year_starting = school_year_ending - 1
    gen schoolyear = substr(string(school_year_starting), 3, 2) ///
        + substr(string(school_year_ending), 3, 2)
    Note: I assume in this code that there is at most one observation for any particular school-state combination. If that is not true, the code will break when it hits the -reshape- command. There is a fix for that; post back if you need it.

    Note also that I used all lowercase letters for the variable names. That is my habit: it makes typing the names easier. You are free to use upper case, or any mix of case, as you see fit. Nothing in the code hangs on this choice.

    This code can generalize to any number of series of variables that correspond to academic years 1516 back through 9899, All you have to do is add more blocks of three -rename- commands before the line that says // ETC., to assign appropriate names to these variables. The -reshape- command will also require adding the new variable name prefixes after ptd and g1hm.

    The data set that results from this is in long layout and you will almost certainly find it much easier to work with in Stata than the wide layout that you started with.

    I have put at the end code to create a school year name that looks like 1516 or 9899. But I think you will find that it is just a nuisance to work with that. The variable school_year_ending contains all of that same information but has the advantage of making sense as a number: it sorts properly, you can calculate time elapsed between years by subtracting, etc. You can't do any of that with 1516 through 9899.
    Last edited by Clyde Schechter; 13 Aug 2018, 22:38.

    Comment


    • #3
      You might also wonder why Stata doenst use the variable names from the csv.
      Note that "insheet has been superseded by import delimited." (https://www.stata.com/help13.cgi?insheet)
      And both insheet and import delimited have options to preserve variable names from your csv. https://www.stata.com/help13.cgi?import+delimited
      All of the variable names listed in post #1 are valid Stata variable names, so you might as well try to preserve them on import rather than recreating them, if those names are the same in the csv file being imported.
      Of course the reshaping as described in post #2 would still have to be done after that.

      Comment


      • #4
        Hello,

        I'm running into some issues when running that code. I have decided to use ptd1, 2, 3 etc instead of 1516.
        From v3-v20, it is ptd1 until ptd18. From v21-v38, it is ptps1 until ptps18 (another variable).

        Here is the code I ran:

        forvalues i = 3/38 {
        gen v`i' = runiform()
        rename v1 school
        rename v2 state
        rename (v3-v20) ptd=
        rename ptdv* ptd*
        rename ptd# ptd#, renumber(1)
        rename (v21-v38) ptps=
        rename ptpsv* ptps*
        rename ptps# ptps#, renumber(1)
        }

        Thanks.

        Comment


        • #5
          No, you've mangled the logic of the code.

          First, the entire part of the code between "// CREATE A TOY DATA SET" and "// RENAME SCHOOL AND STATE FIRST" was there to just create a demonstration data set. It was not intended for you to copy and use that part of the code: use your real data set.

          But given that you did copy it, you have to copy it as is. The code you show in #4 fails because you do not generate v1 and v2 first. That causes a break at -rename v1 school- and another at -rename v2 state-. Once you fix that, you also can't have the rest of the code inside the -forvalues- loop: the code from -rename (v3-v20) ptd= - on down is meant to be run just once, and only on data that already contains variables v3 through v38.

          Comment


          • #6
            Hi,

            The code ran well and there is no problems now. However, it sas v3 already defined.

            . import delimited using C:\Users\Mike\Desktop\Stata\Demographics\1.csv, clear
            (38 vars, 4,398 obs)

            .
            . forvalues i = 3/38 {
            2.
            . gen v`i' = runiform()
            3.
            . }
            variable v3 already defined
            r(110);

            Then the rest of the code works (ie. changes the variable names). How can I remove the v3 already defined portion?

            Thank you Clyde for your prompt response.

            Comment


            • #7
              Please read the code in #2 top to bottom and take the time to understand what each line, or block of code is doing. Also re-read what I said in #5. The entire -forvalues i= 3/38- loop is there for the sole purpose of creating a toy data set to demonstrate how the code works. The comment above the code states that in so many words. And I reiterated that in #5. It does not belong in the code to use with your actual data set, which already has the variables.

              Comment


              • #8
                I see that now. Thank you for your help

                Comment


                • #9
                  Hi,

                  I appreciate your help and patience as I work with Stata.

                  Is there a way to make this count down, from 18 to 1, instead of from 1 to 18?

                  I've ran this so far and it adds from 1 to 18:

                  import delimited using C:\Users\Mike\Desktop\Stata\Demographics\1.csv, clear
                  rename v1 school
                  rename v2 state
                  rename (v3-v20) ptd=
                  rename ptdv* ptd*
                  rename ptd# ptd#, renumber(1)

                  Thank you.

                  Comment


                  • #10
                    Code:
                    //    CREATE A TOY DATA SET TO ILLUSTRATE THE CODE
                    clear*
                    set obs 10
                    set seed 1234
                    gen v1 = _n
                    gen v2 = cond(_n <= 5, 1, 2)
                    
                    forvalues i = 3/20 {
                        gen v`i' = runiform()
                    }
                    
                    //    RENAME SCHOOL AND STATE FIRST
                    rename v1 school
                    rename v2 state
                    local i=3
                    forvalues n = 18(-1)1 {
                    rename v`i' ptd`n'
                    local ++i
                    }

                    Comment


                    • #11
                      Michael seems to have asked this twice. See https://www.statalist.org/forums/for...ological-order

                      Please close a thread explicitly or just keep running the same thread.

                      Comment


                      • #12
                        I apologize, this thread is closed.

                        Comment

                        Working...
                        X