Multiple first difference generation

Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#1

Multiple first difference generation

14 Feb 2023, 15:57

I have a number of macro variables that are common and are only differentiated in their names by adding the ISO country code. Variables are in time series format. That is for inflation and GDP in Australia, the USA, and Canada, which would be

Code:

AUS_inflation CAN_inflation USA_inflation AUS_gdp, CAN_gdp USA_gdp

etc.
There are about one hundred countries and fifty such variables. What I need is to rename all variables by replacing the underscore ( _ )with a dot . That should look like

Code:

AUS.inflation, USA.inflation, CAN.inflation AUS.gdp CAN.gdp USA.gdp

And then create first differences for all of them, How is that done? I tried for example the following, but it failed

Code:

foreach var in AUS_inflation- USA_gdp{ gen D`var'= D.`var' }

Thank you

Best

Giorgio
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#2

14 Feb 2023, 16:32

You cannot have a . in a Stata variable name. It is simply a violation of the conditions of a variable name. Beyond that, I don't grasp why you would want to do that anyway.

Your code for first differences was close to correct in some ways but misunderstands how -foreach..in...- works and also gets the syntax of time series operators wrong. Try

Code:

foreach var of varlist AUS_inflation-USA_gdp { gen D`var' = D1.`var' // NOTE CORRECT LOCATION OF . CHARACTER }

In addition, it seems like your data set already has an enormous number of variables. Adding more will make it quite unwieldy and difficult to work with. Why do you want these variables? If your plan is just to use them as variables in, say, a regression command, there is no need to create them. The time series operators can be used "on the fly" in many Stata commands (and all Stata estimation commands). For example, you can:

Code:

regress outcome_var ... D1(*_inflation) ...

and Stata will include the first difference of every inflation variable in your data set in the regression without any need to create the first differences as variables themselves.

Finally, although it depends on what you plan to do, it is likely that you would be better off with your data in long layout, with perhaps country as a separate variable. So think about doing:

Code:

gen `c(obs_t)' id = _n ds AUS_* local stubs `r(varlist)' local stubs: subinstr local stubs "AUS_" "@_", all reshape long `stubs', i(id) j(country_code) string rename _* *

It is likely that this arrangement will be much more amenable to data analysis and management than what you have now.

Last edited by Clyde Schechter; 14 Feb 2023, 16:35.
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#3

14 Feb 2023, 17:02

Originally posted by Clyde Schechter View Post

You cannot have a . in a Stata variable name. It is simply a violation of the conditions of a variable name. Beyond that, I don't grasp why you would want to do that anyway.

Your code for first differences was close to correct in some ways but misunderstands how -foreach..in...- works and also gets the syntax of time series operators wrong. Try

Code:

foreach var of varlist AUS_inflation-USA_gdp { gen D`var' = D1.`var' // NOTE CORRECT LOCATION OF . CHARACTER }

In addition, it seems like your data set already has an enormous number of variables. Adding more will make it quite unwieldy and difficult to work with. Why do you want these variables? If your plan is just to use them as variables in, say, a regression command, there is no need to create them. The time series operators can be used "on the fly" in many Stata commands (and all Stata estimation commands). For example, you can:

Code:

regress outcome_var ... D1(*_inflation) ...

and Stata will include the first difference of every inflation variable in your data set in the regression without any need to create the first differences as variables themselves.

Finally, although it depends on what you plan to do, it is likely that you would be better off with your data in long layout, with perhaps country as a separate variable. So think about doing:

Code:

gen `c(obs_t)' id = _n ds AUS_* local stubs `r(varlist)' local stubs: subinstr local stubs "AUS_" "@_", all reshape long `stubs', i(id) j(country_code) string rename _* *

It is likely that this arrangement will be much more amenable to data analysis and management than what you have now.

Thanks so much, Clyde. I need to do that because I have a code in R for an operation that Stata for the moment cannot perform. Thanks again!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#4

14 Feb 2023, 17:06

OK. I can't help you with R. If you need the variable names in R to contain . characters, you will need to do that in some way outside of Stata. You might export the data as a delimited text file, and then make an edit that replaces the _'s with .'s in the row containing the variable names. Or, better, to have an audit trail of what you've done, after saving the delimited text file, make the _ to . change using Stata's -filefilter- command. Then import the data into R from that.
Comment

Announcement

Multiple first difference generation

Comment

Comment

Comment