Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract partial strings and values from data set to create scalars in a for loop

    Hello, I am very new to Stata (though not to programming or data manipulation - I am a proficient R user) and despite extensive searching, I cannot how to figure out how to do the following.

    I have a data set where one column contains strings with academic year, and one column contains integers with enrollment numbers.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str9 year int homeless long enroll float rate
    "2012-2013" 2010 107133 1.88
    "2013-2014" 2230 108902 2.05
    "2014-2015" 2287 110301 2.07
    "2015-2016" 2518 111810 2.25
    "2016-2017" 2582 113041 2.28
    "2017-2018" 2549 114174 2.23
    "2018-2019" 2616 114763 2.28
    "2019-2020" 2533 115629 2.19
    "2020-2021" 1734 111946 1.55
    "2021-2022" 2251 111897 2.01
    end
    I want to create scalars with the enrollment for each year (to use in multiple other places throughout a do file), and I want the scalars to be named h[last 2 digits of year], for example, h13 for the year 2012-2013. The result should be h13 = 2010, h14 = 2230, etc. I can successfully do this with the following:
    Code:
    scalar h13 = homeless[1]
    scalar h14 = homeless[2]
    scalar h15 = homeless[3]
    scalar h16 = homeless[4]
    scalar h17 = homeless[5]
    scalar h18 = homeless[6]
    scalar h19 = homeless[7]
    scalar h20 = homeless[8]
    scalar h21 = homeless[9]
    scalar h22 = homeless[10]
    But I would like to do this in one step, and without having to name each scalar explicitly.

    First, I cannot figure out how to combine the string 'h' with an extracted substring from the 'year' column. I tried this as a one-off but it only yields h = 2010.
    Code:
    scalar h`substr(year[1],-2,2)' = homeless[1]
    This is probably really 2 issues, one being what is the best method to extract substrings (I've looked at others, this one seemed very simple and easier than regex), and the second being how to combine strings and use them for the names of scalars.

    Second, I cannot figure out how to get a for loop to generate a scalar for each row in the data set. This loop below results in the following scalars.
    Code:
    foreach x in year homeless {
        scalar `x' = `x'
    }
    
    . scalar list
      homeless =       2010
          year = 2012-2013
    Any guidance on this would be appreciated.

  • #2
    Code:
    forvalues i = 1/`c(N)' {
        scalar h`=substr(year[`i'],-2,2)' = homeless[`i']
    }
    scalar list
    Code:
    . scalar list
           h22 =       2251
           h21 =       1734
           h20 =       2533
           h19 =       2616
           h18 =       2549
           h17 =       2582
           h16 =       2518
           h15 =       2287
           h14 =       2230
           h13 =       2010
    
    .

    Comment


    • #3
      Thank you! This works perfectly. Next step is to understand how it works so that I can do similar things myself in the future. The part setting up the scalar itself is pretty easy to interpret but I need to learn more about for loops.

      Comment


      • #4
        On loops, beyond entries in the Programming manual, there are various tutorials, including

        https://journals.sagepub.com/doi/ful...36867X20976340

        https://journals.sagepub.com/doi/ful...6867X211063415

        Comment


        • #5
          Thank you! I will check those out.

          Comment

          Working...
          X