Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a Variable to store REGRESS R-Squared Value Across Observations for Multiple Variables

    Hello,

    I am using Stata 15.1 and I have two datasets. There are two economic values in each dataset for a shared set of countries, each dataset containing one of two different time periods. One has 23 variables with 118 observations. The second dataset has 93 variables with 118 observations. The first variable stores the names of countries. Each variable after stores numerical values for a given year across each of these countries in chronological order, each country being a different observation. Half of the variables measure one economic value across the years (GDP) and the other half measures the other value across those same years (Population Density). I am looking to generate a variable which stores REGRESS values, specifically R-squared, between the two different kinds of variables across the given years.

    Conceptually it's like this as an example. Observation 1 (Country A) has Variable 1 (v1): Country Name, Variable 2 (v2gdp): Year 1 GDP, Variable 3: Year 2 GDP etc. until Variable 13 (v2): Year 1 Population Density, Variable 14: Year 2 Population Density etc. I am trying to have each economic value paired with the other economic value for the same year, and across the many years for that country. It should be an X and Y plot that I am performing the REGRESS function on where X contains the values of economic value 1 across the timeframe and Y contains the values of economic value 2 across the timeframe. I then want to store the R-Squared value for each observation in a new variable.

    I am not sure how to treat the data across these variables as an X and Y pair that I can perform REGRESS on. I am not sure what code to use when generating a new variable that stores the REGRESS R-Squared information. I use v1 v2 v3 formatting for the one set of variables and v1gdp v2gdp v3gdp as the name format for the second set of variables to differentiate what kind of economic value they are storing. If anything needs clarifying please let me know.

    Thank You

  • #2
    Welcome to Statalist.

    You have what is known as longitudinal data and will want to use the tools described in the Stata Longitudinal-Data/Panel-Data Reference Manual PDF included in your Stata installation and accessible through Stata's Help menu. You should start with the section "Introduction to xt commands" paying particular attention to the Remarks and Examples portion. In the first example you will learn that you have stored your data in a wide layout, and you will want to use the reshape long command to transform each dataset to a long layout, as described by the output of the help reshape command and the PDF documentation linked to from the top of that output.

    You are far too early in the process to be thinking about storing the r-squared values. Walk before you run: first get your code to the point where you can run your regressions, then ask about storing the results.

    Comment


    • #3
      Thank you for this. I am currently reading up on this and trying to get my data into long layout. I notice that my variable names are v1gdp - v16gdp and that it seems to prefer I have it in the format of vgdp1-vgdp16 for giving variable ranges when setting the stubs during reshape long. I am currently trying to figure out how to rename a range of variables in such a way that it moves the numbers to the outside. Using just the rename * * variations I cannot seem to get all v1gdp-v16gdp into vgdp1-vgdp16 format.

      Comment


      • #4
        I found a way through testing other code I've found online with people trying to rename variables. I first do
        rename (*gdp*) (gdp*[1]) and then rename *gdpv* *gdp* followed by rename *gdp* *vgdp* although I feel there was a more direct route to do this and I am not positive I understand what I am telling the program to do. I will now try to reshape the data.


        Comment


        • #5
          I now have everything in long form exactly as needed ex. country year gdpgrowth popdensity for each year for that country. I am now looking to create a variable that stores regression results across gdpgrowth popdensity for each year for each country.

          Comment


          • #6
            Originally posted by Hunter Keenan View Post
            I found a way through testing other code I've found online with people trying to rename variables. I first do
            rename (*gdp*) (gdp*[1]) and then rename *gdpv* *gdp* followed by rename *gdp* *vgdp* although I feel there was a more direct route to do this and I am not positive I understand what I am telling the program to do.
            In this post I want to address some capabilities of the rename and reshape commands that Hunter overlooked, so that others who have followed, or later find, this discussion will see a description of how the rename and reshape commands can be used to easily overcome the problem Hunter faced.

            In the Description section of the output from the help rename command you will see a clickable link "rename group". Click on it, that will take you to the output of help rename group which documents code like that which Hunter found online. There you will learn about the following technique.
            Code:
            . describe, simple
            v1gdp   v3gdp   v5gdp   v7gdp   v9gdp   v11gdp  v13gdp  v15gdp
            v2gdp   v4gdp   v6gdp   v8gdp   v10gdp  v12gdp  v14gdp  v16gdp
            
            . rename (v#gdp) (vgdp#)
            
            . describe, simple
            vgdp1   vgdp3   vgdp5   vgdp7   vgdp9   vgdp11  vgdp13  vgdp15
            vgdp2   vgdp4   vgdp6   vgdp8   vgdp10  vgdp12  vgdp14  vgdp16
            But beyond this, a careful reading of the documentation for reshape would have suggested the following approach, which avoids the rename altogether.
            Code:
            . describe, simple
            id      v2gdp   v4gdp   v6gdp   v8gdp   v10gdp  v12gdp  v14gdp  v16gdp
            v1gdp   v3gdp   v5gdp   v7gdp   v9gdp   v11gdp  v13gdp  v15gdp
            
            . reshape long v@gdp, i(id) j(year)
            (note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16)
            
            Data                               wide   ->   long
            -----------------------------------------------------------------------------
            Number of obs.                        5   ->      80
            Number of variables                  17   ->       3
            j variable (16 values)                    ->   year
            xij variables:
                             v1gdp v2gdp ... v16gdp   ->   vgdp
            -----------------------------------------------------------------------------
            
            . describe, simple
            id    year  vgdp
            
            . levelsof year
            1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
            Stata supplies exceptionally good documentation that amply repays the time spent studying it.

            For Hunter, I will be writing a separate post to address the questions in post #5.

            Comment


            • #7
              Hunter Keenan -

              It is not clear to me what regression results you want to store separately for each observation. The title you created in your initial post is "Generating a Variable to store REGRESS R-Squared Value Across Observations for Multiple Variables". But it seems to me you will be running just one regress command which will display the results of regressing one of your two variables on the other? If so, there is only a single R-Squared value for the regression. What would you plan to store in each observation of your dataset that would vary from observation to observation?

              Have you actually run the regression you have in mind? If so, you should copy the command and the output from Stata's Results window and paste it into your next post using CODE delimiters (as my results in post #6 is shown, I'll describe CODE delimiters below), and then explain what you want stored in your dataset.

              If you are still trying to decide what sort of regression you need to run, you need to better explain your objectives. At this point, we don't even know which is the dependent variable and which is the independent variable, and it's a lot easier to write advice about a single well-described problem than general advice.

              Regarding CODE delimiters. To assure maximum readability of results that you post, please copy them from the Results window or your log file into a code block in the Forum editor using code delimiters [CODE] and [/CODE], as explained in section 12 of the Statalist FAQ linked to at the top of the page. For example, the following:

              [CODE]
              . sysuse auto, clear
              (1978 Automobile Data)

              . describe make price

              storage display value
              variable name type format label variable label
              -----------------------------------------------------------------
              make str18 %-18s Make and Model
              price int %8.0gc Price
              [/CODE]

              will be presented in the post as the following:
              Code:
              . sysuse auto, clear
              (1978 Automobile Data)
              
              . describe make price
              
                            storage   display    value
              variable name   type    format     label      variable label
              -----------------------------------------------------------------
              make            str18   %-18s                 Make and Model
              price           int     %8.0gc                Price

              Comment

              Working...
              X