Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to refer to specific pairs of observations-variables using locals, ex: "`=price[make == "AMC Concord"]'"

    Dear all,

    I was trying to automate the creation of a two-sided bar graph and then realized that there is something wrong with my code when I try to refer to the value of the variable 'infection' for other regions than the first one (i.e., region == 1).

    In the code, I try to create a variable called 'temp', that is by design meant to be equal to the variable 'infection', but Stata confirms that it is not the case (with the command 'assert').

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(region infection)
     1 -259131
     2 -123457
     3  -69751
     4  -65337
     5  -33808
     6  -33306
     7  -30560
     8  -25685
     9  -11081
    10   -5423
    end
      
       gen temp = .
       qui glevelsof region, local(test)
       foreach i of local test {
       replace temp = infection[region == `i'] if region == `i'
       }
       
      assert temp == infection
    I noted the following: the value of infection for the first region is correctly displayed but the ones for the second region and on, I get just a missing value (shown below):

    Code:
    di "`=infection[region == 1]'"
    -259131
    
    di "`=infection[region == 2]'"
    .
    
    di "`=infection[region == 3]'"
    .
    I am wondering why the commands " di "`=infection[region == 2]'" " and " ​​​​​​​di "`=infection[region == 3]'" " do not show the right values, i.e, -123457 and -69751.

    Can you help me, please, figure out what is going here?

    Thank you in advance.

    Best,

    Otavio

  • #2
    I don't understand what you want, but your construction infection[region == Whatever] does something entirely different than what I would guess you think it does. Numbers in square brackets after variables refer to observation numbers. When Stata evaluates "region ==3", the result of that is either a 0 or a 1 since that's a logical expression. So, you are really asking for infection[1], for observations with region ==3, and infection[0], for observations where region !=3. Perhaps that will help you reframe what you are doing.

    Comment


    • #3
      Dear Mike Lacy , thank you very much!

      Sure, you're absolutely right: I thought the construction 'infection[region == Whatever]' was used for a different purpose.

      Thanks again.

      Otavio

      Comment


      • #4
        So I believe all o these misunderstandings are side-effects of trying to literally apply knowledge of R (or similar) syntax to Stata. Naturally, Stata is a different language with different syntax so there's no reason why this should work. For some general advice, I recommend that you start with -help getting started- to learn a bit how to run basic commands and get a feel for the typical syntax. You could also look at the [P] Programming chapter in the PDF documentation included with your Stata installation to find more advanced topics.

        In your code:

        Code:
           gen temp = .
           qui glevelsof region, local(test)
           foreach i of local test {
           replace temp = infection[region == `i'] if region == `i'
           }
        Stata does the following. First, create a new variable called -temp- and assign system missing values. Second, gather a list of unique levels of the variable -region- and put them into a local macro called -test-. (Side note: -glevelsof- is part of the -gtools- package from SSC). Then we begin a loop, indexed over each of these levels. Within the loop, you replace the values of temp in all observations where -region- is equal to the value of -i- being iterated. The value to be replaced is the value of infection from the observation equal to the evaluated value of -region == `i'-. This is the main issue, and does not translate from, say, R. Stata must evaluate -region == `i'-, and while you might think this is vectorized comparison, it is not. Stata sees that you are comparing -region-, but did not specify the specific observation, so it takes the first value of region, exactly the same as -region[1]-. So the first iteration through your loop produces exactly what you want, but only by luck that the data happened to be sorted in the same sequence as the levels stored in -test-.

        Fortunately, since you stated your goal was to copy a variable's values to a new variable, there's a much more direct way. You can make a "clone" which is a copy of the same data storage type with the value labels and variable label copied as well. This is done with -clonevar newvar=oldvar-. Alternatively, you create a new variable and assign the values to this new variable using -gen newvar = oldvar-. this method needs a bit care when dealing with float or double storage types, as going from double to float results in a loss of machine precision, and the default storage type if none is specified is float.

        In general, the Stata way to vectorize operations over variables is to use the -if- qualifier, as you did in your last line of the loop. The brackets however reference specific observations, and will not generalize to vectors.

        Comment


        • #5
          Sure, thanks Leonardo Guizzetti !

          Comment

          Working...
          X