Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple first difference generation

    I have a number of macro variables that are common and are only differentiated in their names by adding the ISO country code. Variables are in time series format. That is for inflation and GDP in Australia, the USA, and Canada, which would be
    Code:
    AUS_inflation CAN_inflation USA_inflation AUS_gdp, CAN_gdp USA_gdp
    etc.
    There are about one hundred countries and fifty such variables. What I need is to rename all variables by replacing the underscore ( _ )with a dot . That should look like
    Code:
    AUS.inflation, USA.inflation, CAN.inflation AUS.gdp CAN.gdp USA.gdp
    And then create first differences for all of them, How is that done? I tried for example the following, but it failed

    Code:
    foreach var in AUS_inflation- USA_gdp{
        gen D`var'= D.`var'
    }
    Thank you

    Best

    Giorgio

  • #2
    You cannot have a . in a Stata variable name. It is simply a violation of the conditions of a variable name. Beyond that, I don't grasp why you would want to do that anyway.

    Your code for first differences was close to correct in some ways but misunderstands how -foreach..in...- works and also gets the syntax of time series operators wrong. Try
    Code:
    foreach var of varlist AUS_inflation-USA_gdp {
        gen D`var' = D1.`var' // NOTE CORRECT LOCATION OF . CHARACTER
    }
    In addition, it seems like your data set already has an enormous number of variables. Adding more will make it quite unwieldy and difficult to work with. Why do you want these variables? If your plan is just to use them as variables in, say, a regression command, there is no need to create them. The time series operators can be used "on the fly" in many Stata commands (and all Stata estimation commands). For example, you can:
    Code:
    regress outcome_var ... D1(*_inflation) ...
    and Stata will include the first difference of every inflation variable in your data set in the regression without any need to create the first differences as variables themselves.

    Finally, although it depends on what you plan to do, it is likely that you would be better off with your data in long layout, with perhaps country as a separate variable. So think about doing:
    Code:
    gen `c(obs_t)' id = _n
    ds AUS_*
    local stubs `r(varlist)'
    local stubs: subinstr local stubs "AUS_" "@_", all
    reshape long `stubs', i(id) j(country_code) string
    rename _* *
    It is likely that this arrangement will be much more amenable to data analysis and management than what you have now.
    Last edited by Clyde Schechter; 14 Feb 2023, 16:35.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      You cannot have a . in a Stata variable name. It is simply a violation of the conditions of a variable name. Beyond that, I don't grasp why you would want to do that anyway.

      Your code for first differences was close to correct in some ways but misunderstands how -foreach..in...- works and also gets the syntax of time series operators wrong. Try
      Code:
      foreach var of varlist AUS_inflation-USA_gdp {
      gen D`var' = D1.`var' // NOTE CORRECT LOCATION OF . CHARACTER
      }
      In addition, it seems like your data set already has an enormous number of variables. Adding more will make it quite unwieldy and difficult to work with. Why do you want these variables? If your plan is just to use them as variables in, say, a regression command, there is no need to create them. The time series operators can be used "on the fly" in many Stata commands (and all Stata estimation commands). For example, you can:
      Code:
      regress outcome_var ... D1(*_inflation) ...
      and Stata will include the first difference of every inflation variable in your data set in the regression without any need to create the first differences as variables themselves.

      Finally, although it depends on what you plan to do, it is likely that you would be better off with your data in long layout, with perhaps country as a separate variable. So think about doing:
      Code:
      gen `c(obs_t)' id = _n
      ds AUS_*
      local stubs `r(varlist)'
      local stubs: subinstr local stubs "AUS_" "@_", all
      reshape long `stubs', i(id) j(country_code) string
      rename _* *
      It is likely that this arrangement will be much more amenable to data analysis and management than what you have now.
      Thanks so much, Clyde. I need to do that because I have a code in R for an operation that Stata for the moment cannot perform. Thanks again!

      Comment


      • #4
        OK. I can't help you with R. If you need the variable names in R to contain . characters, you will need to do that in some way outside of Stata. You might export the data as a delimited text file, and then make an edit that replaces the _'s with .'s in the row containing the variable names. Or, better, to have an audit trail of what you've done, after saving the delimited text file, make the _ to . change using Stata's -filefilter- command. Then import the data into R from that.

        Comment

        Working...
        X