Generating a Variable to store REGRESS R-Squared Value Across Observations for Multiple Variables

Hunter Keenan

Join Date: Apr 2020

Posts: 4
#1

Generating a Variable to store REGRESS R-Squared Value Across Observations for Multiple Variables

18 Apr 2020, 23:54

Hello,

I am using Stata 15.1 and I have two datasets. There are two economic values in each dataset for a shared set of countries, each dataset containing one of two different time periods. One has 23 variables with 118 observations. The second dataset has 93 variables with 118 observations. The first variable stores the names of countries. Each variable after stores numerical values for a given year across each of these countries in chronological order, each country being a different observation. Half of the variables measure one economic value across the years (GDP) and the other half measures the other value across those same years (Population Density). I am looking to generate a variable which stores REGRESS values, specifically R-squared, between the two different kinds of variables across the given years.

Conceptually it's like this as an example. Observation 1 (Country A) has Variable 1 (v1): Country Name, Variable 2 (v2gdp): Year 1 GDP, Variable 3: Year 2 GDP etc. until Variable 13 (v2): Year 1 Population Density, Variable 14: Year 2 Population Density etc. I am trying to have each economic value paired with the other economic value for the same year, and across the many years for that country. It should be an X and Y plot that I am performing the REGRESS function on where X contains the values of economic value 1 across the timeframe and Y contains the values of economic value 2 across the timeframe. I then want to store the R-Squared value for each observation in a new variable.

I am not sure how to treat the data across these variables as an X and Y pair that I can perform REGRESS on. I am not sure what code to use when generating a new variable that stores the REGRESS R-Squared information. I use v1 v2 v3 formatting for the one set of variables and v1gdp v2gdp v3gdp as the name format for the second set of variables to differentiate what kind of economic value they are storing. If anything needs clarifying please let me know.

Thank You
Tags: data, regression
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

19 Apr 2020, 09:04

Welcome to Statalist.

You have what is known as longitudinal data and will want to use the tools described in the Stata Longitudinal-Data/Panel-Data Reference Manual PDF included in your Stata installation and accessible through Stata's Help menu. You should start with the section "Introduction to xt commands" paying particular attention to the Remarks and Examples portion. In the first example you will learn that you have stored your data in a wide layout, and you will want to use the reshape long command to transform each dataset to a long layout, as described by the output of the help reshape command and the PDF documentation linked to from the top of that output.

You are far too early in the process to be thinking about storing the r-squared values. Walk before you run: first get your code to the point where you can run your regressions, then ask about storing the results.
Comment
Hunter Keenan

Join Date: Apr 2020

Posts: 4
#3

19 Apr 2020, 20:49

Thank you for this. I am currently reading up on this and trying to get my data into long layout. I notice that my variable names are v1gdp - v16gdp and that it seems to prefer I have it in the format of vgdp1-vgdp16 for giving variable ranges when setting the stubs during reshape long. I am currently trying to figure out how to rename a range of variables in such a way that it moves the numbers to the outside. Using just the rename * * variations I cannot seem to get all v1gdp-v16gdp into vgdp1-vgdp16 format.
Comment
Hunter Keenan

Join Date: Apr 2020

Posts: 4
#4

19 Apr 2020, 21:12

I found a way through testing other code I've found online with people trying to rename variables. I first do
rename (*gdp*) (gdp*[1]) and then rename *gdpv* *gdp* followed by rename *gdp* *vgdp* although I feel there was a more direct route to do this and I am not positive I understand what I am telling the program to do. I will now try to reshape the data.
Comment
Hunter Keenan

Join Date: Apr 2020

Posts: 4
#5

19 Apr 2020, 22:57

I now have everything in long form exactly as needed ex. country year gdpgrowth popdensity for each year for that country. I am now looking to create a variable that stores regression results across gdpgrowth popdensity for each year for each country.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

20 Apr 2020, 08:35

Originally posted by Hunter Keenan View Post

I found a way through testing other code I've found online with people trying to rename variables. I first do
rename (*gdp*) (gdp*[1]) and then rename *gdpv* *gdp* followed by rename *gdp* *vgdp* although I feel there was a more direct route to do this and I am not positive I understand what I am telling the program to do.

In this post I want to address some capabilities of the rename and reshape commands that Hunter overlooked, so that others who have followed, or later find, this discussion will see a description of how the rename and reshape commands can be used to easily overcome the problem Hunter faced.

In the Description section of the output from the help rename command you will see a clickable link "rename group". Click on it, that will take you to the output of help rename group which documents code like that which Hunter found online. There you will learn about the following technique.

Code:

. describe, simple v1gdp v3gdp v5gdp v7gdp v9gdp v11gdp v13gdp v15gdp v2gdp v4gdp v6gdp v8gdp v10gdp v12gdp v14gdp v16gdp . rename (v#gdp) (vgdp#) . describe, simple vgdp1 vgdp3 vgdp5 vgdp7 vgdp9 vgdp11 vgdp13 vgdp15 vgdp2 vgdp4 vgdp6 vgdp8 vgdp10 vgdp12 vgdp14 vgdp16

But beyond this, a careful reading of the documentation for reshape would have suggested the following approach, which avoids the rename altogether.

Code:

. describe, simple id v2gdp v4gdp v6gdp v8gdp v10gdp v12gdp v14gdp v16gdp v1gdp v3gdp v5gdp v7gdp v9gdp v11gdp v13gdp v15gdp . reshape long v@gdp, i(id) j(year) (note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16) Data wide -> long ----------------------------------------------------------------------------- Number of obs. 5 -> 80 Number of variables 17 -> 3 j variable (16 values) -> year xij variables: v1gdp v2gdp ... v16gdp -> vgdp ----------------------------------------------------------------------------- . describe, simple id year vgdp . levelsof year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Stata supplies exceptionally good documentation that amply repays the time spent studying it.

For Hunter, I will be writing a separate post to address the questions in post #5.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

20 Apr 2020, 08:47

Hunter Keenan -

It is not clear to me what regression results you want to store separately for each observation. The title you created in your initial post is "Generating a Variable to store REGRESS R-Squared Value Across Observations for Multiple Variables". But it seems to me you will be running just one regress command which will display the results of regressing one of your two variables on the other? If so, there is only a single R-Squared value for the regression. What would you plan to store in each observation of your dataset that would vary from observation to observation?

Have you actually run the regression you have in mind? If so, you should copy the command and the output from Stata's Results window and paste it into your next post using CODE delimiters (as my results in post #6 is shown, I'll describe CODE delimiters below), and then explain what you want stored in your dataset.

If you are still trying to decide what sort of regression you need to run, you need to better explain your objectives. At this point, we don't even know which is the dependent variable and which is the independent variable, and it's a lot easier to write advice about a single well-described problem than general advice.

Regarding CODE delimiters. To assure maximum readability of results that you post, please copy them from the Results window or your log file into a code block in the Forum editor using code delimiters [CODE] and [/CODE], as explained in section 12 of the Statalist FAQ linked to at the top of the page. For example, the following:

[CODE]
. sysuse auto, clear
(1978 Automobile Data)

. describe make price

storage display value
variable name type format label variable label
-----------------------------------------------------------------
make str18 %-18s Make and Model
price int %8.0gc Price
[/CODE]

will be presented in the post as the following:

Code:

. sysuse auto, clear (1978 Automobile Data) . describe make price storage display value variable name type format label variable label ----------------------------------------------------------------- make str18 %-18s Make and Model price int %8.0gc Price
1 like
Comment

Announcement

Generating a Variable to store REGRESS R-Squared Value Across Observations for Multiple Variables

Comment

Comment

Comment

Comment

Comment

Comment