Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reshape with local variables

    Hi,

    I need to use the reshape command within a program, using newly generated variables month_1 month_2, etc.

    I understand that, inside a program, newly generated variables must be first defined as tempvar, and then called as if they were local variables.

    Therefore, I am asking this question in these terms:

    How can I use the reshape command with local variables?

    For example, this is working:

    Code:
    sysuse auto, clear
    gen bene_id = _n
    
    gen month_1 = 0
    gen month_2 = 0
    
    reshape long month_, i(bene_id) j(month)
    reg price mpg weight length
    How can I make it work when month_1 and month_2 are local variables?

    The try below obviously does not work, but I post it here to give the idea of what I am trying to do. Thanks

    Code:
    sysuse auto, clear
    gen bene_id = _n
    
    local month_1 = 0
    local month_2 = 0
    
    reshape long month_, i(bene_id) j(month)
    reg price mpg weight length

  • #2
    I can't follow the point here. Local macros (what you call "local variables") can hold variable names, but you are using them to hold constants. I can't see that your code would do anything useful even if it worked, as

    1. If you wanted a dataset full of constants, you could create one directly.

    2. The regress command you cite does not depend on the reshape, so from that point of view why do you need to reshape?

    Comment


    • #3
      Hi NIck,

      I have posted an example, which is just a simplified example of what I want to achieve. The regression part is not relevant, as you point out.

      I will try to make the point clearer:

      This works:
      Code:
      sysuse auto, clear
      gen bene_id = _n  
      gen month_1 = 0
      gen month_2 = 0  
      reshape long month_, i(bene_id) j(month)
      How can I make the reshape command work when month_1 and month_2 are local variables? e.g.:

      Code:
        
      sysuse auto, clear
      gen bene_id = _n  
      local month_1 = 0
      local month_2 = 0  
      reshape long month_, i(bene_id) j(month)
      Thanks

      Comment


      • #4
        Sorry, but that just looks the same question slightly reworded and I do not have a different answer. What do you expect the dataset to look like after you have done this? Why do you want to do this at all?

        Comment


        • #5
          I think this question would be easier to answer if you showed us what the actual program you're writing looks like, or at least the relevant portions of it. Currently you're asking the question about writing a program with tempvars and then showing example code that doesn't use any tempvars.
          If I am understanding your question what you're trying to do might look something like this.
          Code:
          tempvar month_1 month_2
          gen `month_1'=0
          gen `month_2'=0
          
          reshape long [how do you specify the stub], i(bene_id) j(month)
          If I'm understanding correctly, you want to reshape but aren't sure how to reference your tempvars as stubs in the part I have italicized.
          I'm assuming the constant zeros here are a red herring and you're actually doing something in that program that gets you a variable result for each of the months.
          The problem is how to reference the stub when you're using tempvars. Maybe someone will have a more satisfying answer to how to reference your tempvars as stubs in the reshape command.

          However, another way to approach the problem would be to not use tempvars. It's best practice but it isn't required. You'll want to make sure that it's unlikely that your names will conflict with existing names in the data and be sure to drop them at the end of the program. For example to help avoid conflicting variable names you could use names like __month_1 __month_2 etc.

          This doesn't address Nick's question about whether you really need to reshape but perhaps you just simplified your problem to the point of obscuring that.

          Comment


          • #6
            OK, I realise I took the wrong approach in trying to explain the problem. Apologies for that. Let me post here a less simplified version of the problem.

            ***

            I use the reshape command to create a data frame with 60 rows (5 years * 12 months).

            I want to reshape the data to long because I need to account for time-varying information in a diff-in-diff model and apply a certain transformation to a certain variable if the date happens to be earlier than a threshold. In my original dataset, this threshold is determined by the occurrence of an event, whose date is stored in a variable. In this simplified code, however, I just set the threshold to be fixed at 15 Jan 2015.

            In this simplified code, therefore, there is no time-varying variable, so it may look pointless to reshape the data to long. But this is just for illustration purposes and we shall not worry about it. I am really only interested in understanding how to write the correct syntax for the presented commands, when operating within a program.

            Three more considerations:

            i) I generate the months and year variables in this very inefficient way (instead of using a loop) because I am not confident with the way the generative loop would work within the program.

            ii) I am using areg ..., absorb(bene_id) instead of reg... i.bene_id because my original dataset is made of tens of thousands of individuals and would not be able to handle it otherwise.

            iii) Please consider that I realise that I am not calling variables in the right manner when I use the rename or the tostring or the replace commands in the code posted below; I realise they are not called as they should be called inside a program, but to me it is still obscure what the correct way would be. Your input on this matter would be extremely appreciated as well.

            Code:
            **** working (outiside a program)
            
            sysuse auto, clear
            
            gen bene_id = _n
            
            gen month_1 = 0
            gen month_2 = 0
            gen month_3 = 0
            gen month_4 = 0
            gen month_5 = 0
            gen month_6 = 0
            gen month_7 = 0
            gen month_8 = 0
            gen month_9 = 0
            gen month_10 = 0
            gen month_11 = 0
            gen month_12 = 0
            
            reshape long month_, i(bene_id) j(months) 
            
            gen year_2014 = 0
            gen year_2015 = 0
            gen year_2016 = 0
            gen year_2017 = 0
            gen year_2018 = 0
            
            reshape long year_, i(bene_id months) j(years)
            sort bene_id years months
            rename months month
            rename years year
            tostring(year), gen(year_str)
            tostring(month), gen(month_str)
            gen day = "15"
            gen date_str = day + "/" + month_str + "/" + year_str
            gen reference_date = date(date_str, "DMY")
            format reference_date %td
            
            replace price = price * 1.1 if reference_date < 20103 /* 15 Jan 2015 */
            
            areg price mpg weight length, absorb(bene_id)
            Code:
            **** THIS IS WHAT I WOULD LIKE TO ACHIEVE INSIDE THE PROGRAM
            
            sysuse auto, clear
            
            gen bene_id = _n
            
            capture program drop myboot
            
            program define myboot, eclass
            
            tempvar month_1 month_2 month_3 month_4 month_5 month_6 month_7 month_8 month_9 month_10 month_11 month_12 year_2014 year_2015 year_2016 year_2017 year_2018
            
            gen `month_1' = 0
            gen `month_2' = 0
            gen `month_3' = 0
            gen `month_4' = 0
            gen `month_5' = 0
            gen `month_6' = 0
            gen `month_7' = 0
            gen `month_8' = 0
            gen `month_9' = 0
            gen `month_10' = 0
            gen `month_11' = 0
            gen `month_12' = 0
            
            reshape long month_, i(bene_id) j(months)
            
            gen `year_2014' = 0
            gen `year_2015' = 0
            gen `year_2016' = 0
            gen `year_2017' = 0
            gen `year_2018' = 0
            
            reshape long year_, i(bene_id months) j(years)
            sort bene_id years months
            rename months month
            rename years year
            tostring(year), gen(`year_str')
            tostring(month), gen(`month_str')
            gen `day' = "15"
            gen `date_str' = `day' + "/" + `month_str' + "/" + `year_str'
            gen `reference_date' = date(`date_str', "DMY")
            format `reference_date' %td
            
            replace price = price * 1.1 if `reference_date' < 20103 /* 15 Jan 2015 */
            
            areg price mpg weight length, absorb(`bene_id')
            
            matrix temp = e(b)
            ereturn post temp
            end
            
            bootstrap, reps(100) seed(200) nodrop: myboot
            Last edited by dimitris karletsos; 22 May 2023, 14:16.

            Comment


            • #7
              In your real application are your month variables being generated as constant 0s as shown? Or is that a simplification for purposes of the example?
              Is your end goal just to get 60 observations for each bene_id? If so, why not just expand and create your month and year variables after the expand?
              Last edited by Sarah Edgington; 22 May 2023, 13:34. Reason: edit to fix typo & punctuation

              Comment


              • #8
                Hi Sarah, thanks for the input. Can you please post the code as you would write it using your suggestion? I am not sure it can achieve the same result.
                Last edited by dimitris karletsos; 22 May 2023, 14:04.

                Comment

                Working...
                X