Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Saving looped regression output into a matrix and then graphing the matrix

    Hi all:

    I am a student and fairly new to stata, so not as well versed in the syntax as I would like.

    I have a data set with 1000 observations (25 countries over 40 years). I am trying to loop a regression and then save the regression coefficients of each observation into a matrix. I then want to graph this matrix. Would any of you know how I should write this series of commands? I would really appreciate any help that anyone could offer.

    Thank you!
    Last edited by Stephanie Galen; 31 Mar 2018, 10:45.

  • #2
    Welcome to Statalist.

    When you say you "are trying to loop a regression" over your 1000 observations of 25 countries over 40 years, it is not at all clear what you hope to loop over - 25 countries, 40 years, or (somehow) 1000 observations?

    When you say you want to "save the regression coefficients of each observation into a matrix [and then] graph this matrix", it is not clear what you expect to have on your horizontal and vertical axes of your graph. Nor for that matter to we have any idea how many coefficients you are estimating in your regressions.

    Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

    The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

    Comment


    • #3
      Hi Stephanie,

      First up, welcome to Statalist.

      Second, read the FAQ linked in the top-right of the screen. You will massively increase the likelihood of a helpful response if you follow the guidelines there on how to make posts. An important piece of this guidance is to share a snippet of your data, and any code you have already attempted.

      Third, if you want to advice on how to run regressions and store results a would-be helper will need to know what your DV and IV(s) are to give a more helpful response.

      Fourth, you might want to do something like this (assuming what varies in your regressions is the IV) to extract coefficients etc.:

      Code:
      sysuse auto, clear
      rename (mpg rep78 headroom weight) (iv1 iv2 iv3 iv4)
      postfile buffer N beta beta_se const const_se str20 iv using "results.dta", replace
      
      forvalues i = 1/4{
          regress price iv`i'
          mat results = r(table)
          local N = e(N)
          local beta = results[1,1]
          local beta_se = results[2,1]
          local const = results[1,2]
          local const_se = results[2,2]
          post buffer (`N') (`beta') (`beta_se') (`const') (`const_se') ("iv`i'")
      }
      postclose buffer
      
      preserve
      use results.dta, clear
      list
      restore
      I have no idea what kind of graph you want so will leave it there

      Comment


      • #4
        Hi William! Thank you for the advice on how to word my question, I can see how it was confusing and I do want my question to be clear. I will try and be as specific as possible about my questions.

        I am trying to loop over the 25 countries.

        I will be running 4 regressions, all four will have the same dependent variable "a" and the same 3 independent variables "b", "c", and "d" and then each regression will have a unique fourth variable either "w", "x", "y", or "z". Each regression is describing the relationship between "a" with either "w" "x" "y" or "z". I want to make four matrices with the resulting coefficients from each respective series of regressions.

        For the graph: I want the graphs that I produce to illustrate the relationship between "a" and either "w" "x" "y" or "z". I would also love to have the graph color code the statistical significance of the results, but that's not imperative.

        I know from speaking with a professor at my university that I could use some combination of "foreach", "e(b)", and "svmat", but I am totally open to any suggestions that you and the community might have.

        Please, let me know if you have additional questions!
        Last edited by Stephanie Galen; 31 Mar 2018, 12:57.

        Comment


        • #5
          Well, I don't see any reason to use a matrix here. Matrices are fine when you plan to do some linear algebra, but as a way of storing results, they are just awkward. You don't show example data, so I cannot test this code out, but something like this should work:

          Code:
          capture program drop one_country
          program define one_country
              foreach v of varlist w x y z {
                  regress a `v' b c d
                  gen b_`v' = _b[`v']
                  gen se_`v' = _se[`v']
              }
              exit
          end
          
          runby one_country, by(country)
          To use this code you will need to install -runby-, written by Robert Picard and me, from SSC. At the end of this your data set will contain variables for the coefficient (b_*) and standard errors (se_*) of the variables w x y and z for each country.

          I do not at all understand what kind of graph you want to make out of these, so I have no advice to offer you on that.

          As you did not post any example data, this code is untested and may contain typographical or other errors.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            Well, I don't see any reason to use a matrix here. Matrices are fine when you plan to do some linear algebra, but as a way of storing results, they are just awkward. You don't show example data, so I cannot test this code out, but something like this should work:

            Code:
            capture program drop one_country
            program define one_country
            foreach v of varlist w x y z {
            regress a `v' b c d
            gen b_`v' = _b[`v']
            gen se_`v' = _se[`v']
            }
            exit
            end
            
            runby one_country, by(country)
            To use this code you will need to install -runby-, written by Robert Picard and me, from SSC. At the end of this your data set will contain variables for the coefficient (b_*) and standard errors (se_*) of the variables w x y and z for each country.

            I do not at all understand what kind of graph you want to make out of these, so I have no advice to offer you on that.

            As you did not post any example data, this code is untested and may contain typographical or other errors.
            Thank you or your help so far. I have added a sample of my data set below. I tried running the code that you provided:

            Code:
            capture program drop one_country
            program define one_country
                foreach v of varlist intinf fininf distinf {
                    regress inf `v' l.inf dom for 
                    gen b_`v' = _b[`v']
                    gen se_`v' = _se[`v']
                }
                exit
            end
            
            runby one_country, by(country)
            It did not produce an error message, however it also did not produce any results. After running the program all of my observations and variables were cleared from the data set. I may have filled in my variables incorrectly?

            Code:
            Number of by-groups    =            25
            by-groups with errors  =            25
            by-groups with no data =             0
            Observations processed =         1,000
            Observations saved     =             0
            --------------------------------------
            To shed more light on my project, my basic regression relates the variable "inf" from my data set with a time lagged inf "l.inf", "dom", and "for". The following 4 regressions will all have a different 4th variable in the regression, either "intinf", "fininf", "totinf", or "distinf". Regarding the graph, I am just trying to generate a basic scatter plot and maybe a line of fit.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str34 country int year byte id float(inf dom for intinf fininf totinf distinf)
            "Australia"     1970  1  3.91  5.291898 1.144315  2.654155 1.9411644 2.2998424 120.03104
            "Canada"        1970  4  3.37 -1.615579 2.858498   3.40012  2.393032  2.944241 198.04306
            "Chile"         1970  5 32.51 13.954868 1.896313  .3036433 .19670624 .25099042  67.85429
            "Hungary"       1970 11  3.56 -1.656362 -.732641   .556038  .4570352 .50712395  419.2871
            "Japan"         1970 16  7.67  3.987641  .763428   7.49547  7.390033  7.457651  95.15249
            "Mexico"        1970 19  5.21 -4.196098 2.600109 1.8668767 1.8034844 1.8219602 139.05592
            "Portugal"      1970 24  4.53 -1.614741  -.78938  .8413468   .824538  .8274047 279.36856
            "Spain"         1970 25  5.76  1.088908 -.286436  4.479183  4.245802 4.3609924 308.35245
            "Turkey"        1970 28  6.93  5.461395 -.207028  .9419509  .7712067  .8693929  253.8699
            "United States" 1970 30   5.9  4.592723  .214456  23.75248 23.600687 23.645744 172.58147
            "Australia"     1971  1  6.06  5.347171 1.228021 3.4994795  2.559932 3.0253866 138.27443
            "Canada"        1971  4  2.84 -1.805837 2.304076 2.9734864  2.177877   2.60556  191.0292
            "Chile"         1971  5 20.06  23.81663 1.768904 .36796924 .22468364  .2980499  84.38907
            "Hungary"       1971 11  4.32 -2.916567 -.249195  .7131019  .5894932  .6532361  504.5839
            "Japan"         1971 16  6.35  2.015009  .656267  7.659554  7.583803   7.62578  110.6539
            "Mexico"        1971 19  5.26 -6.364313 2.151936 1.5206908 1.4117744 1.4600062 132.46614
            "Portugal"      1971 24   7.5  -.241598  .145269 1.0576829  .9944938 1.0218209  328.2196
            "Spain"         1971 25  8.24  -.148523 -.211337  5.516378  5.139913  5.332777  372.2717
            "Turkey"        1971 28 15.74  4.466906 -.719816 1.0890989  .9486948 1.0274459  309.8927
            "United States" 1971 30  4.26   3.57021  .094383  25.27516 24.313826   24.8178  173.8966
            "Australia"     1972  1  5.86  4.210366 2.427974   2.97106 2.1128817  2.562584  170.8268
            "Canada"        1972  4  4.77  -.574149 3.470739   3.81894 2.8633356 3.4014454 252.02016
            "Chile"         1972  5 77.81  20.51589 2.134173  .3439955 .21521845 .28117535  79.45746
            "Hungary"       1972 11  4.43 -4.052812  .110688  .7714205  .6240294  .7002852  580.0872
            "Japan"         1972 16  4.84  4.041126 1.284025 12.643266  11.85301  12.34029 143.16832
            "Mexico"        1972 19     5 -4.324135 3.154648   2.87022 3.0013406  2.904436 227.44777
            "Portugal"      1972 24  8.94  2.712111 1.477891  1.146711 1.0966774 1.1163825  390.2904
            "Spain"         1972 25  8.29  -.012039  .903903   7.23073  6.862173  7.045549  444.7046
            "Turkey"        1972 28 11.67  5.702386 1.052899  1.457241 1.0339427  1.283685  374.6072
            "United States" 1972 30  3.31  4.687621 1.241893  35.52935  35.94584 35.607014 268.09958
            end
            label var id "Id" 
            label var inf "Inf" 
            label var dom "Dom " 
            label var for "For"
            I have tried to include as much information about my project and questions as possible, adhering to the statalist FAQ's but please let me know if you need any other information.

            Thank you so much for all of your help. This community is such an awesome resource for new stata users like myself.

            Comment


            • #7
              You cannot use the lag operator in
              Code:
                      regress inf `v' l.inf dom for
              without first having tsset your data. Add
              Code:
                  tsset year
              before the foreach loop in your program.

              Comment


              • #8
                And actually, -runby- does not "play well" with -tsset-. You actually have to do the -tsset- within the program.

                This works with your example data:

                Code:
                capture program drop one_country
                program define one_country
                    foreach v of varlist intinf fininf distinf {
                        tsset year
                        regress inf `v' l.inf dom for 
                        gen b_`v' = _b[`v']
                        gen se_`v' = _se[`v']
                    }
                    exit
                end
                
                runby one_country, by(country)
                Moral of the story: when you ask a question, ask the real question. Don't ask a "similar" question and hope that the answer you get will also apply. Omitting the fact that you had a lagged variable in your regression broke otherwise perfectly good code. In coding there are no unimportant details. To avoid wasting your time and the time of others, always ask the question you really need the answer to. Don't "simplify" it.

                Comment


                • #9
                  Thank you both so much!! In asking future questions I will make sure to be transparent in my premise. I am sorry for trying to over-simplify my questions.

                  One last question: once the program is run what command should I execute to generate a scatter plot with a trend line for each set of regressions? or to analyse the results of the regressions (p value, statistical significance)? Is it possible to color code the points on the scatter plot based on statistical significance? I am trying to use the results of these regressions to compare the effects of the independent variables on domestic inflation.

                  Comment


                  • #10
                    once the program is run what command should I execute to generate a scatter plot with a trend line for each set of regressions?
                    I'm not sure what you want to do here. You have three regressions for each of ten countries. Either that's 30 separate graphs or one graph with 30 scatters and linear fits. The first is unmanageable and the last will be unreadable. Or do you want to make some kind of graph out of the regression coefficients themselves? If so, what would that be like? I don't get it.

                    or to analyse the results of the regressions (p value, statistical significance)?
                    This is basic statistics, but rather than calculating them, we can modify the program to pull the results out in the first place.
                    Code:
                    capture program drop one_country
                    program define one_country
                        foreach v of varlist intinf fininf distinf {
                         tsset year
                            regress inf `v' l.inf dom for
                           gen b_`v' = _b[`v']
                           gen se_`v' = _se[`v']
                         matrix M = r(table)
                           gen p_`v' = M[4, 1]
                        }
                        exit
                    end
                    will give you the p-values in the results.

                    Is it possible to color code the points on the scatter plot based on statistical significance?
                    Possibly, but as I don't understand what the graphs are supposed to be in the first place, I couldn't begin to say how. Also, even not fully understanding what you want, this strikes me as a terrible idea. You have two very useful statistics, a regression coefficient and a standard error which gives you good point and interval estimates of the effects. Why would you want to then focus on the p-value which is an uninterpretable mish-mash of sample size, measurement precision and effect size and, in particular, is useless as a measure of these effects? Its like having two adults and a five year old in the room and asking the five-year old for its opinion about an important question.

                    Last edited by Clyde Schechter; 01 Apr 2018, 19:48.

                    Comment

                    Working...
                    X