Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • create new variables using a loop

    I have a dataset that contains data on foods, such as vegetables, for different stores (each observation in my dataset is a single store).

    My question is, How do I write a loop to create 1. price per pound and 2. price per piece for each vegetable?

    I have a bulleted list of the relevant variables for one of the vegetables, carrots. Then below that is code to create these two new variables without doing a loop
    • Carrots_U is the units for carrots, where 1=pound and 2=piece (some vegetables are typically sold by weight and others are typically sold by piece)
    • Carrots_P_1 is the price for carrots in dollars
    • Carrots_N_1 is the number of units for the price given (for example, if surveyor indicated 1 for Carrots_U and 2 for Carrots_N_1 that would mean the price they wrote down is for a 2 pound bag of carrots)
    • Carrots_A is the availability of carrots where 0=carrots are not available and 1=carrots are available
    • Carrots_P_Per_Pound is my new price per pound variable for carrots (only applicable if the surveyor indicated pound as unit for carrots)
    • Carrots_P_Per_Piece is my new price per piece variable for carrots (only applicable if the surveyor indicated piece as unit for carrots)
    Code I have for creating needed variables for carrots:

    gen Carrots_P_Per_Pound=.
    *next line creates price per pound for carrots only if surveyor indicated units were in pounds
    replace Carrots_P_Per_Pound=Carrots_P_1/Carrots_N_1 if Carrots_U==1 & Carrots_A==1
    gen Carrots_P_Per_Piece=.
    *next line creates price per piece for carrots only if surveyor indicated units were for a number of pieces
    replace Carrots_P_Per_Piece=Carrots_P_1/Carrots_N_1 if Carrots_U==2 & Carrots_A==1

    For each of my 13 other vegetables, there is a price variable that is of the format Vegetable_P_1, a unit variable Vegetable_U, a number variable Vegetable_N_1, and availability variable Vegetable_A.

    Please find an example of my data that only has two of the vegetables (carrots and tomatoes) below. My unique identifier here is new_ResponseId, which uniquely identified the store that was surveyed

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(Carrots_A Tomatoes_A Carrots_U Tomatoes_U) double(Carrots_P_1 Tomatoes_P_1) byte(Carrots_N_1 Tomatoes_N_1) float(Carrots_P_Per_Pound Carrots_P_Per_Piece new_ResponseId)
    1 1 1 1  .94 2.19 1 1  .94    .  8
    1 1 2 1 2.49 2.99 1 1    . 2.49  3
    1 1 2 1 2.79 3.49 1 1    . 2.79  7
    1 1 1 1 1.99 1.49 1 1 1.99    .  6
    1 1 2 1    . 1.69 1 1    .    . 14
    1 1 2 1  .99 1.99 1 1    .  .99  9
    1 1 2 1  .99 2.99 1 1    .  .99 12
    1 1 1 1    . 1.49 1 1    .    .  1
    1 0 1 2 2.99    0 1 0 2.99    . 11
    1 1 1 1 1.99  .99 1 1 1.99    . 13
    1 1 1 1 1.09 1.49 1 1 1.09    . 15
    1 1 1 1 1.16 2.99 1 1 1.16    .  2
    1 1 1 1 2.99  2.9 1 1 2.99    .  5
    1 1 1 1    1 1.99 1 1    1    . 10
    1 1 2 1  .99 1.99 1 1    .  .99  4
    end
    Sincerely,
    Alyssa Beavers
    Last edited by Alyssa Beavers; 20 Feb 2023, 10:23.

  • #2
    Code:
    ds *_U
    local vegetables `r(varlist)'
    local vegetables: subinstr local vegetables "_U" "", all
    
    foreach v of local vegetables {
        gen `v'_p_per_pound = `v'_P_1/`v'_N_1 if `v'_U == 1 & `v'_A == 1
        gen `v'_p_per_piece = `v'_P_1/`v'_N_1 if `v'_U == 2 & `v'_A == 1
    }
    Note: There is no need to use two steps to create each variable here. Initially setting the variable to . accomplishes nothing in this situation. You may as well just use -gen- instead of -replace- in the second command.

    Comment


    • #3
      Thanks so much Clyde Schechter , this worked to calculate the two new variables. Would you be willing and able to explain your code prior to the loop in order for me to apply this to other similar circumstances? From my review of the Stata manual page for ds, it appears that line 1 stores the names of variables whose names end in _U in r(varlist). However, I am unsure what line 2 and 3 of your code do.

      Thanks again,
      Alyssa Beavers

      Comment


      • #4
        You correctly understand line 1. Now, `r(varlist)' disappears as soon as you run another command. So we can't rely on its contents being there when we need them. So the second line copies it into local macro vegetables. Then the third line goes through it and removes all the _U character sequences, so that we are left with just the names of the vegetables.

        Comment


        • #5
          Thank you so much, that will certainly be useful in the future!

          Comment


          • #6
            This also works as an alternative to the first two lines.

            Code:
             
             unab vegetables : *_U

            Comment

            Working...
            X