Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble saving predicted values and residuals

    I am stuck on the below code, which was posted before and was successful for the OP but I can't seem to obtain the same success. The code seems to run through very quickly while creating no values. I tried playing around a bit, such as removing the sic_year_numerosity restriction, but I still get the same end result of no values being created. I am not sure whats happening.

    gen y_hat=. // empty variable for predictions
    gen y_res=. // empty variable for residuals
    tempvar acc_tot_fitted acc_tot_res // temporary variables for each set of predictions
    levelsof sic_2_digit, local(levels)
    foreach x of local levels {
    foreach z of numlist 1999/2014 {
    capture reg y x1 x2 x3 if sic_2_digit==`x' & year==`z' & sic_year_numerosity>9
    if !_rc {
    predict `y_hat' // predictions are now in temporary variable
    replace y_hat=`y_hat' if e(sample) // transfer predictions from temp variable
    predict `y_res', residuals // residuals are now in temporary variable
    replace y_res=`y_res' if e(sample) // transfer residuals from temp variable
    drop `y_hat' `acc_tot_res' // drop temporary variables in preparation for next regression
    }
    }
    }

  • #2
    First, next time please post your code in a code block so that things like indentation wll be better preserved. As it is, it's a bit jumbled and hard to read.

    I see a couple of things. First, in the code shown, you never declare tempvar y_hat, nor y_res. So you should be getting a syntax error at -predict `yhat'-. But perhaps that happens elsewhere in the code that you did not show us?

    If you have declared tempvars y_hat and y_res, then I suspect your problem is coming from the regression. Because you have it -capture-d, you get no warning if something is going wrong there, nor any idea what. And I suspect that's what is happening here. Take out the -capture- and Stata will complain about whatever the problem is. In these situations it often turns out that there are combinations of sic2_code and year for which there are either no observations at all, or for which the number of observations is too small for a regression to be done. Then you can explore whether that situation means there is a problem in your data set that needs fixing, or if the data should indeed have those gaps and you need to revise the code to work around this situation.

    Comment


    • #3
      Following up on on Clyde's advice, to increase the likelihood that Statalist readers will be able to assist you instead of passing over difficult-to-understand presentation, please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. In particular, FAQ #12 describes the use dataex and CODE delimiters when posting to Statalist.

      Comment


      • #4
        Thanks, clyde. It turns out the problem was not due to the lack of observations but rather the fact that some of the x variables I was using in the regressions actually used _n within them (ex: ocf[_n-1]); I was getting the error message "_n unknown weight type"). After naming the variables outside the loop (ex: gen ocf1=ocf[_n-1]) and putting them back into the loop with the new names, I was able to obtain values. Thanks for the help.

        Comment


        • #5
          Originally posted by michael joe View Post
          Thanks, clyde. It turns out the problem was not due to the lack of observations but rather the fact that some of the x variables I was using in the regressions actually used _n within them (ex: ocf[_n-1]); I was getting the error message "_n unknown weight type"). After naming the variables outside the loop (ex: gen ocf1=ocf[_n-1]) and putting them back into the loop with the new names, I was able to obtain values. Thanks for the help.
          Just a general word of warning - are you using panel data? In that case you should not use [_n-1] to create lagged values. The problem is that for the 2nd panel unit, observation [_n-1] is the last observation from the previous panel unit.

          Either xtset your data and then use the lag operator ocf1 = L.ocf1 or place your generate command after a by: statement, e.g. by panelunit (timevar): gen ocf1 = ocf[_n-1].

          Comment


          • #6
            Here is my updated code:

            Code:
            sort firm year
             
            by firm: gen ocf1=ocf[_n-1]
            by firm: gen ocf2=ocf
            by firm: gen ocf3=ocf[_n+1]
             
            gen ismissing = missing(tca, ocf1, ocf2, ocf3, change_in_revenues, ppe)
            bysort industry year: egen sic_year_numerosity = total(ismissing== 0)
             
            gen y_hat=.
            gen y_res=.
            tempvar y_hat y_res
            levelsof industry, local(levels)
            foreach x of local levels {
            foreach z of numlist 2000/2001 {
            xtset firm year
            capture xtreg tca L.ocf ocf F.ocf change_in_revenues ppe if industry==`x' & year==`z' & sic_year_numerosity>9
            if !_rc {
            predict `y_hat'
            replace y_hat=`y_hat' if e(sample)
            predict `y_res', residuals
            replace y_res=`y_res' if e(sample)
            drop `y_hat' `y_res'
            }
            }
            }
            Now, I am getting the error "insufficient observations" when I take out capture. I have over 4,500 unique firm year observations and many unique industry year observations as well. I'm not exactly sure what I am doing wrong as I am trying to replicate a prior study, for which the observation count is similar to my replication. I basically want to create a cross sectional regression where all firms are pooled in the same year within each industry group that have at least 9 firms in each industry classification. I thought this loop would do that but perhaps I am wrong.
            Last edited by michael joe; 15 Jul 2016, 14:47.

            Comment


            • #7
              The problem, I think, is that the if clause on your xtreg does not prevent the xtreg from running, it just restricts the observations used. So in an industry/year where sic_year_numerosity<=9, no observations will be selected as xtreg attempst to run. I've tried my hand at repairing the logic below; no guarantees, though!

              Code:
              sort firm year
               
              by firm (year): gen ocf1=ocf[_n-1]
              by firm (year): gen ocf2=ocf
              by firm (year): gen ocf3=ocf[_n+1]
               gen ismissing = missing(tca, ocf1, ocf2, ocf3, change_in_revenues, ppe)
              
              xtset firm year
               
              gen y_hat=.
              gen y_res=.
              tempvar y_hat y_res
              levelsof industry, local(levels)
              foreach x of local levels {
                foreach z of numlist 2000/2001 {
                  count if industry==`x' & year==`z' & ismissing==0
                  if `r(N)'>9 {
                    capture xtreg tca L.ocf ocf F.ocf change_in_revenues ppe if industry==`x' & year==`z'
                    if !_rc {
                      predict `y_hat'
                      replace y_hat=`y_hat' if e(sample)
                      predict `y_res', residuals
                      replace y_res=`y_res' if e(sample)
                      drop `y_hat' `y_res'
                    }
                  }
                }
              }

              Comment


              • #8
                Thanks, William. Still not working for me. However, the lacing the generate command after a by recommendation seems to be working for me. Any idea why this way produces results but the other gives me some observation error?

                Comment


                • #9
                  However, the lacing the generate command after a by recommendation seems to be working for me.
                  It would be more helpful were you to copy the code that works (the entire code, not just the changed line) and paste it into a code block.

                  Comment


                  • #10
                    Originally posted by William Lisowski View Post
                    The problem, I think, is that the if clause on your xtreg does not prevent the xtreg from running, it just restricts the observations used. So in an industry/year where sic_year_numerosity<=9, no observations will be selected as xtreg attempst to run. I've tried my hand at repairing the logic below; no guarantees, though!

                    Code:
                    sort firm year
                    gen ismissing = missing(tca, ocf1, ocf2, ocf3, change_in_revenues, ppe)
                    Today I learned you can place multiple variables in a missing() function. Would be nice if "help missing##functions" included a link to help missing()8

                    Comment


                    • #11
                      Would be nice if "help missing##functions" included a link to help missing()
                      Agreed! While missing() is mentioned at the start of the help missing file, it would be better if the name were a clickable link to the complete documentation, just like the command names in the list of useful commands further down in the file. The description in help missing hints that missing() can accept more than one argument, but it is possible to read it as meaning something different.
                      Last edited by William Lisowski; 17 Jul 2016, 09:29.

                      Comment

                      Working...
                      X