Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to run regression by industry and year in STATA

    Hello, I am new to STATA and encountering problem running regression by industry and year to estimate residuals. Any help would be appreciated. Thanks

  • #2
    M, this is far too little information to give a helpful response. Please provide an excerpt from your data (you can use dataex from SSC) and show the commands that you tried so far.

    Please also change your name to your full first and last name, as you were asked in a response to an earlier question. The FAQ explains how you can change your name and contains other useful advice on how to use Statalist.

    Finally, some list members are sensitive about the spelling of the software discussed in this forum. It is Stata, not STATA.

    Comment


    • #3
      Thanks Friedrich for your reply. First of all, please accept my apology for using STATA instead of Stata. I am running a model (TA = B0 + B1 + B2 + B3 + e ) by simply using a command reg TA B0 B1 B2 B3 and get the attached results. However, I am unable to figure out how regression by industry and year is carried out by using Stata. My sample consists of 9 industries (two digit) and 10 year (2005-2014) and I need to estimate residuals for each observation. I hope I have explained my issue properly this time. Many thanks.


      Comment


      • #4
        Muhammad, thank you for the additional information and for changing your name.

        Your data cannot be understood from the description in your last post. What do TAit1, B0, B1, B2 and B3 represent? Which variable is the industry? Which variable is the year?

        Please share an excerpt from your data and please use CODE tags (explained in section 12 of the FAQ) instead of screenshots.

        Comment


        • #5
          I have not included industry and year as variable in the model as they are neither dependent nor independent variables. TAit1 is the dependent variable and B0 B1 B2 B3 are independent variables. Please find blow the dataset example. Data consists of 204 companies, 9 industries and based over a period of 10 year (2005-2014).


          Ind Code year Company TA (it-1) B0 B1 B2 B3
          44 2005 1 -0.009283 0.00006 -0.089451 0.2766245 15.43
          44 2006 1 -0.002257 0.00005 0.0234633 0.2417196 9.60
          44 2007 1 -0.002363 0.00006 0.0037691 0.2719397 10.30
          44 2008 1 -0.032189 0.00005 0.0845768 0.3278619 11.66
          44 2009 1 -0.015571 0.00004 0.077783 0.2228594 10.61
          44 2010 1 -0.031224 0.00004 0.0216427 0.247351 10.02
          44 2011 1 -0.024838 0.00004 0.0175879 0.2258794 10.57
          44 2012 1 0.0051624 0.00004 -0.016225 0.2336001 11.26
          44 2013 1 0.0095876 0.00004 -0.00183 0.2223442 13.95
          44 2014 1 0.0068822 0.00004 -0.042856 0.2208623 14.40
          33 2005 2 -0.114936 0.00271 0.1705069 1.9753321 5.04
          33 2006 2 -0.178915 0.00219 0.2497813 1.7576553 8.89
          33 2007 2 -0.181965 0.00162 0.2170957 1.5729318 8.60
          33 2008 2 -0.133615 0.00136 0.2438925 2.0431282 11.95
          33 2009 2 -0.151148 0.00082 0.1073516 1.2338026 12.60
          33 2010 2 -0.157116 0.00089 0.1339453 1.598075 14.41
          33 2011 2 -0.101951 0.00071 0.066923 1.542432 16.87
          33 2012 2 -0.054728 0.00056 0.0892011 1.4105168 16.37
          33 2013 2 -0.123823 0.00047 -0.000942 1.1953861 14.17
          33 2014 2 -0.094177 0.00051 -0.018734 1.4101266 12.00




          Comment


          • #6
            it is not at all clear what you are trying to do; I suggest you look at the help for "statsby" and see if that gives you what you want

            Comment


            • #7
              Please consider using dataex. This would allow list members to import your data easily into Stata. Please also learn how to use CODE tags.

              You can run the regression by industry and year with the commands below. The example assumes that the name of the variable with the industry code is indcode. The table in your last post shows variable labels, not variable names.
              Code:
              bysort indcode year: regress TAit1 B0 B1 B2 B3

              Comment


              • #8
                I have just come across the same issue raised by Amedeo, who shared the code earlier in the year. As I know very little about Stata and have no idea to use detailed code by importing xls file onto Stata.

                My data labels are given below:

                Ind Code Year Co Code Y X0 X1 X2 X3
                I would highly appreciate if anyone could help to assist me how I can use below detailed code to estimate residuals from a regression of Y on X0 X1 X2 X3 performed by industry (two digit sic) and year. My sample has 1940 observations, two digit sic goes from 11 to 99 and year spans from 2005 until 2014.

                Many thanks.

                Tahir



                Hi,
                I am trying to estimate predicted values and residuals from a regression of y on x1 x2 x3 performed by industry (two digit sic) and year.
                Also, the condition is that there are at least 10 observation for each industry/year, otherwise no estimation should be performed
                My sample has 29000 observations, two-digit sic goes from 10 to 99 and year spans from 1999 until 2014.

                I have tried the following code and seems to work. Does it seem right to the experts?

                gen y_hat=. // empty variable for predictions
                gen y_res=. // empty variable for residuals
                tempvar acc_tot_fitted acc_tot_res // temporary variables for each set of predictions
                levelsof sic_2_digit, local(levels)
                foreach x of local levels {
                foreach z of numlist 1999/2014 {
                capture reg y x1 x2 x3 if sic_2_digit==`x' & year==`z' & sic_year_numerosity>9
                if !_rc {
                predict `y_hat' // predictions are now in temporary variable
                replace y_hat=`y_hat' if e(sample) // transfer predictions from temp variable
                predict `y_res', residuals // residuals are now in temporary variable
                replace y_res=`y_res' if e(sample) // transfer residuals from temp variable
                drop `y_hat' `acc_tot_res' // drop temporary variables in preparation for next regression
                }
                }
                }

                Also, as a side, I would like to pull out the average R-square from the regressions.

                Thanks.
                Kind regards
                Amedeo ​





                Comment


                • #9
                  Muhammad:
                  you seem to have a panel data set.
                  So why do not consider -xt- suite of Stata commands that deal exactly with this kind of data?
                  As an aside, the regression result that you attached in post #3 are hardly convincing, beacause you treated repeated observations on the same panel units as they were independent observations (i.e., as they were one-shot measure taken from 1940 different units). As per the excerpt of your dataset, this is not the case. Following that road, the statndard errors of your regression are biased.
                  Assuming you are not intended to switch to -xt- command (that choice would be highly advisable, though), the only way to perform that kind of (pooled) OLS is to cluster the standard errors on id (please, see -vce(cluster) among -regression- option).
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Thanks Carlo,
                    I will take a look what you have advised in the last post. However, my last post referring to Amedeo's detailed code seems be answering my question. Could you please advice me how to use that code to estimate residuals from a regression of Y on X0 X1 X2 X3 performed by industry (two digit sic) and year. My sample has 1940 observations, two digit sic goes from 11 to 99 and year spans from 2005 until 2014? I know I have got panel data set.

                    Please ignore my post 3 as this does not make sense at all. Many Thanks





                    Comment

                    Working...
                    X