Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thank you very much Nick - I am learning a lot through the process of revising this code.

    Comment


    • #17
      Originally posted by Nick Cox View Post
      Now that Clyde has worked to get the code in very decent form, further small improvements can be suggested:

      Code:
      gen sic_2 = real(substr(sic,1,2))
      xtset sic_2 fyear
      gen uhat = .
      gen ta = (ib - oancf)/L.at
      gen x1 = 1/L.at
      gen x2 = (d.revt - d.rect)/L.at
      gen x3 = ppegt/L.at
      forvalues j = 1/`=_N' {
      capture noisily {
      reg ta x1 x2 x3 if sic_2 == sic_2[`j'] & fyear == fyear[`j'] & _n != `j', nocons
      if e(N) >= 10 {
      predict uhat_2 in `j', resid
      replace uhat = uhat_2 in `j'
      drop uhat_2
      }
      }
      }
      Commentary:

      1. The call to destring can be cut. It's more direct just to use the function real(). The extra flexibility and security of destring is not needed here at all.

      2. count was used to get the number of observations, but Stata already knows that as _N.

      3. We want to use residuals if and only if they are for a regression with at least 10 observations. That decision can be made once and need not be repeated for every observation, including those not in the regression at all. Hence, replace the if qualifier with an if command.

      4. predict with in in principle allows Stata to do the calculation just once where it's needed.

      5. With a regression this simple, you could also sidestep predict. Instead of

      Code:
      predict uhat_2 in `j', resid
      replace uhat = uhat_2 in `j'
      drop uhat_2
      you could just go

      Code:
      replace uhat = ta - (_b[x1] * x1 + _b[x2] * x2 + _b[x3] * x3) in `j'
      Warning: None of this is tested.

      Naturally, it is supremely rational to spend 10 minutes explaining how to save a few seconds in computation, but some of these issues arise much more widely.

      Hello everyone! Thank you for the great posts. I just have one question. The script seems that it should be working but when the .do file reaches the xtset command it gives me the following message:

      xtset sic_2 fyear
      repeated time values within panel
      r(451);
      What do I do wrong?

      Comment


      • #18
        We can't see your data, but see e.g. http://www.stata.com/support/faqs/da...d-time-values/

        If you have (say) several firms with the same SIC2 code, SIC2 will not serve as a panel identifier.

        Comment


        • #19
          Yes. Thank you very much. Once more you are absolutely right. This is the case for me. Plenty of firms might belong to the same industry (SIC). The gvkey would be a viable choice for panel id but the problem is that I cannot think of a way to adjust the script to this, running at the same time industry - year regressions.
          Last edited by mike sam; 15 Nov 2014, 17:52.

          Comment


          • #20
            Mike,
            As long as you have already changed the gvkey to non-string format in your data, all you have to do is modify the -xtset- statement from:

            Code:
            xtset sic_2 fyear
            to

            Code:
            xtset gvkey fyear
            This allows ta- x3 to be generated at the firm level using the appropriate lag values, but the loop itself for the industry-year regression shouldn't be impacted. It will still run based on the observation number in the dataset.

            Comment


            • #21
              Originally posted by Clyde Schechter View Post
              Well, 7,000 observations is not that large a data set, and regressions on 7,000 observations don't take very long. But you have a "combinatorial explosion" on your hands. Looking at your earlier output you have about 60 values of sic_2, 10 values of year, and 6,581 values of obs, for a total of just under 4,000,000 runs through the loops. But here's the thing: with 7,000 observations, nearly all of those 4,000,000 combinations will never even occur in your data. Now, because you loop over obs to exclude individual observations, I don't see an easy way to take that out. But you can certainly replace the double looping on sic_2 and year with a single loop on the combinations of it that actually exist.

              Code:
              egen combo = group(sic_2 year)
              summarize combo
              forvalues k = 1/`r(max)' {
              forvalues j= `=scalar(e)’/`=scalar(f)’ {
              capture noisily reg ta x1 x2 x3 if combo == `k' & obs != `j', nocons
              capture noisily predict uhat_2, resid
              capture noisily replace uhat_2=. if e(N) < 10
              capture noisily replace uhat= uhat_2 if combo == `k' & obs == `j'
              capture noisily drop uhat_2
              display `k', `j'
              }
              }
              This should speed things up considerably.

              . Hi Clyde, i also want to calculate the regression by year and industry (same as Ally looking for), in my case data also consist of 7 years and sic industry sectors. In my data sic is
              > a string variable and data is unbalanced panel data. First i tried the code mentioned by you Post# 10, it give me syntax error. i used the following code:

              egen combo = group(sic year)
              summarize combo
              forvalues k = 1/`r(max)' {
              forvalues j= `=scalar(e)’/`=scalar(f)’ {
              if combo[`j'] == `k' {
              capture noisily reg y x1 x2 x3 if combo == `k' & obs != `j', nocons
              capture noisily predict uhat_2, resid
              capture noisily replace uhat_2=. if e(N) < 10
              capture noisily replace uhat= uhat_2 if combo == `k' & obs == `j'
              capture noisily drop uhat_2
              }
              display `k', `j'
              }
              }

              invalid syntax
              r(198);


              Then I also tried the following code, I got from the forum:


              egen combo = group (sic year )
              su combo, meanonly
              gen fitted1 =.
              forval g = 1/`r(max)` {
              regress y x1 x2 x3 if combo ==`g'
              predict work
              replace fitted1 = work if combo == `g'
              drop work
              }
              invalid syntax
              r(198);
              gen residual = y - fitted1

              In both cases I got syntax error. Can you please guide me where I am doing mistake, further can you please clarify that if both codes are doing same work or not. I want the same output one Ally looking for.Thanks

              Comment


              • #22
                Originally posted by Clyde Schechter View Post
                Well, 7,000 observations is not that large a data set, and regressions on 7,000 observations don't take very long. But you have a "combinatorial explosion" on your hands. Looking at your earlier output you have about 60 values of sic_2, 10 values of year, and 6,581 values of obs, for a total of just under 4,000,000 runs through the loops. But here's the thing: with 7,000 observations, nearly all of those 4,000,000 combinations will never even occur in your data. Now, because you loop over obs to exclude individual observations, I don't see an easy way to take that out. But you can certainly replace the double looping on sic_2 and year with a single loop on the combinations of it that actually exist.

                Code:
                egen combo = group(sic_2 year)
                summarize combo
                forvalues k = 1/`r(max)' {
                forvalues j= `=scalar(e)’/`=scalar(f)’ {
                capture noisily reg ta x1 x2 x3 if combo == `k' & obs != `j', nocons
                capture noisily predict uhat_2, resid
                capture noisily replace uhat_2=. if e(N) < 10
                capture noisily replace uhat= uhat_2 if combo == `k' & obs == `j'
                capture noisily drop uhat_2
                display `k', `j'
                }
                }
                This should speed things up considerably.

                . Hi Clyde, i also want to calculate the regression by year and industry (same as Ally looking for), in my case data also consist of 7 years and sic industry sectors. In my data sic is
                > a string variable and data is unbalanced panel data. First i tried the code mentioned by you Post# 10, it give me syntax error. i used the following code:

                egen combo = group(sic year)
                summarize combo
                forvalues k = 1/`r(max)' {
                forvalues j= `=scalar(e)’/`=scalar(f)’ {
                if combo[`j'] == `k' {
                capture noisily reg y x1 x2 x3 if combo == `k' & obs != `j', nocons
                capture noisily predict uhat_2, resid
                capture noisily replace uhat_2=. if e(N) < 10
                capture noisily replace uhat= uhat_2 if combo == `k' & obs == `j'
                capture noisily drop uhat_2
                }
                display `k', `j'
                }
                }

                invalid syntax
                r(198);


                Then I also tried the following code, I got from the forum:


                egen combo = group (sic year )
                su combo, meanonly
                gen fitted1 =.
                forval g = 1/`r(max)` {
                regress y x1 x2 x3 if combo ==`g'
                predict work
                replace fitted1 = work if combo == `g'
                drop work
                }
                invalid syntax
                r(198);
                gen residual = y - fitted1

                In both cases I got syntax error. Can you please guide me where I am doing mistake, further can you please clarify that if both codes are doing same work or not. I want the same output one Ally looking for.Thanks

                Comment


                • #23
                  Now I am able to solve it. Thanks a lot!

                  Comment


                  • #24
                    Hi all,

                    I'm having a problem and I hope you can help me. I'm doing a research about the effect of audit tenure and auditor industry specialization on audit quality. I have 1058 observations, 114 Dutch firms. I need to calculate the discretionary accruals and I am using STATA to do this. Now I've tried this code:
                    gen sic2= substr(sic,1,2)
                    destring sic2, replace

                    egen combo= group(sic2 FYEAR)
                    levelsof combo, local(a)

                    gen uhat=.
                    xtset gvkey FYEAR

                    gen obs= [_n]
                    summ obs

                    scalar e= r(min)
                    scalar f= r(max)

                    gen TA= (NI-CFO)/lagAT
                    gen x1= 1/lagAT
                    gen x2= (dREV-dREC)/lagAT
                    gen x3= PPE/lagAT


                    foreach i in `a’ {
                    foreach x in `b’ {
                    forvalues j= `=scalar(e)’/`=scalar(f)’ {
                    capture noisily reg TA x1 x2 x3 if sic2==`i’ & FYEAR==`x’ & obs != `j’, nocons
                    capture noisily predict uhat_2, resid
                    capture noisily replace uhat_2=. if e(N) < 10
                    capture noisily replace uhat= uhat_2 if sic2==`i' & FYEAR==`x' & obs== `j'
                    capture noisily drop uhat_2
                    di `i'
                    di `x'
                    di `j'
                    }
                    }
                    }

                    And than I got the error: invalid syntax r(198).

                    Can you please help me? Because I really don't know what I am doing wrong?
                    Thank you in advance.

                    Comment


                    • #25
                      Hi all,
                      Do you think the following code can work a little bit fast since it runs fewer regression? Let me know if you find any error in the code.
                      gen da=.
                      gen ta= (ib-oancf)/at_lag1
                      gen x1= 1/at_lag1
                      gen x2= (sales-sales_lag1-rect+rect_lag1)/at_lag1
                      gen x3= ppegt/at_lag1
                      egen id=group(sic2digit fyear )
                      bys id: egen count=count(fyear)
                      gen less20=1 if count<20
                      replace less20=0 if less20==.
                      forvalues i=1/1699 {
                      capture noisily{
                      reg ta x1 x2 x3 if id==`i' & less20==0
                      predict p if id==`i' & less20==0
                      replace da=ta-p if id==`i' & less20==0
                      drop p
                      }
                      }

                      Comment


                      • #26
                        Hello,
                        I have got a question. If I ran the following code on stata to estimate the discretionary accruals:
                        egen combo = group(sic_2 year) summarize combo forvalues k = 1/`r(max)' { forvalues j= `=scalar(e)’/`=scalar(f)’ { if combo[`j'] == `k' { capture noisily reg ta x1 x2 x3 if combo == `k' & obs != `j', nocons capture noisily predict uhat_2, resid capture noisily replace uhat_2=. if e(N) < 10 capture noisily replace uhat= uhat_2 if combo == `k' & obs == `j' capture noisily drop uhat_2 } display `k', `j' } } But I always get the following message:

                        no observations
                        last estimates not found
                        variable uhat_2 not found
                        uhat_2 not found
                        variable uhat_2 not found

                        Can anyone help me?

                        Thanks!

                        Comment


                        • #27
                          Your code appears as an unreadable jumble. Please read FAQ #12 to get a better understanding of how to post information in the most usable form. Then try again.

                          Comment


                          • #28
                            For a better reading:
                            egen combo = group(sic_2 year)
                            summarize combo
                            forvalues k = 1/`r(max)' {
                            forvalues j= `=scalar(e)’/`=scalar(f)’ {
                            if combo[`j'] == `k' {
                            capture noisily reg ta x1 x2 x3 if combo == `k' & obs != `j', nocons
                            capture noisily predict uhat_2, resid
                            capture noisily replace uhat_2=. if e(N) < 10
                            capture noisily replace uhat= uhat_2 if combo == `k' & obs == `j'
                            capture noisily drop uhat_2 }
                            display `k', `j
                            }
                            }

                            Stata error messsage:

                            no observations
                            last estimates not found
                            variable uhat_2 not found
                            uhat_2 not found
                            variable uhat_2 not found

                            Thank you for your help!

                            Comment


                            • #29
                              That's much better. A code block would have been better still. Next time.

                              The first error message is coming from your -capture noisily reg...- command. It means that there is some combination of `k' and `j' for which there are no usable observations with combo == `k' and obs != `j'. Remember that an observation is only usable if it has no missing values for any of the variables in the regression model. The first step is probably to find out which values of `k' and `j' are causing this. You can do that by putting -display `k', `j'- before your regression command. Then you can see what's going on with -list ta x1 x2 x3 if combo ==that_value_of_`k' & obs ! = that_value_of_`j'-. The output will either be empty because there are no such observations, or you will be able to see that each such observation has missing values for one of the variables. Then you will have to figure out if this represents an error in your data to fix, or is an expected situation, in which case you can just ignore it.

                              The other error message are all cascading from the same event. Because the -reg- command failed, there are no estimates to use in the -predict- command. Because the -predict- command failed, there is no variable uhat_2 to do anything with. But these other error messages will all go away when you fix the first problem.

                              Comment


                              • #30
                                Hi all:
                                I'm new in stata. Was trying to estimate discretionary accrual, and was using user posted command. Was getting error message.
                                The full message is posted below!
                                Any help would be simply great!!
                                Regards,

                                Mahmud

                                gen sic_2= substr(sic,1,2)

                                . destring sic_2, replace
                                sic_2: all characters numeric; replaced as byte

                                .
                                . egen combo= group(sic_2 fyear)
                                (575 missing values generated)

                                gen uhat=.
                                (254,697 missing values generated)

                                .
                                end of do-file

                                . do "C:\Users\mhossain\AppData\Local\Temp\STD784_00000 0.tmp"

                                . xtset gvkey fyear
                                panel variable: gvkey (unbalanced)
                                time variable: fyear, 1995 to 2018, but with gaps
                                delta: 1 unit

                                .
                                . gen obs= [_n]

                                . summ obs

                                Variable | Obs Mean Std. Dev. Min Max
                                -------------+---------------------------------------------------------
                                obs | 254,697 127349 73524.84 1 254697

                                . scalar e= r(min)

                                . scalar f= r(max)

                                .
                                . gen ta= (ib-oancf)/L.at
                                (81,677 missing values generated)

                                . gen x1= 1/L.at
                                (65,104 missing values generated)

                                . gen x2= (d.revt – d.rect)/L.at
                                d: operator invalid
                                r(198);

                                end of do-file

                                r(198);

                                Comment

                                Working...
                                X