Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Program not running at all.

    Dear All,

    I posted earlier today but did not get a response. So, probably I did not do a good job posting my question. I will try again.

    I have a panel data at the individual (N)-week level. I have 14 weeks/ waves - 7 before and 7 after an intervention. The 10 percent sample, which is not balanced, looks as follows:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str10 npi int year float week int userTRA
    "J338339LLR" 2014 4 0
    "J338339LLR" 2014 6 0
    "J338339LLR" 2014 7 0
    "J33833J3J3" 2014 2 0
    "J33833J99R" 2014 2 1
    "J33833JOLJ" 2014 4 0
    "J33833NF9L" 2014 5 0
    "J33833R8F8" 2014 1 0
    "J33833RLFF" 2014 7 0
    "J33833RO8R" 2014 2 0
    "J338383FRV" 2014 7 0
    "J338383R89" 2014 2 0
    "J33838FR9R" 2014 6 0
    "J33838LJ8R" 2014 3 0
    "J33838LVFO" 2014 6 1
    "J33838RFNL" 2014 1 1
    "J338393FOR" 2014 7 0
    "J338398J88" 2014 6 0
    "J3383998JF" 2014 4 0
    "J338399JRF" 2014 5 0
    "J338399N33" 2014 2 1
    "J338399V3R" 2014 7 0
    "J33839F99O" 2014 6 1
    "J33839F99O" 2014 7 0
    "J33839FNL3" 2014 5 0
    "J33839JFRV" 2014 6 0
    "J33839JLRL" 2014 5 0
    "J33839NNOL" 2014 6 0
    "J33839O383" 2014 4 0
    "J33839O8R8" 2014 2 0
    "J33839OR8R" 2014 6 2
    "J3383F33NJ" 2014 2 0
    "J3383F38JN" 2014 4 0
    "J3383F988V" 2014 2 0
    "J3383FN3VR" 2014 2 1
    "J3383FNFNL" 2014 1 0
    "J3383FNFNL" 2014 5 0
    "J3383FR8L9" 2014 2 0
    "J3383FROOF" 2014 2 0
    "J3383FVRVO" 2014 5 0
    "J3383J3983" 2014 1 0
    "J3383J3JV8" 2014 3 1
    "J3383J88FO" 2014 3 0
    "J3383J8RJV" 2014 3 0
    "J3383J8RJV" 2014 4 0
    "J3383J8VFV" 2014 5 0
    "J3383JONVF" 2014 5 0
    "J3383JRLJ8" 2014 2 1
    "J3383JRLJ8" 2014 7 1
    "J3383L3VJV" 2014 7 0
    "J3383L88NV" 2014 5 0
    "J3383LF888" 2014 1 0
    "J3383LFR3J" 2014 7 0
    "J3383LJJFO" 2014 2 0
    "J3383LL9RN" 2014 6 1
    "J3383LLN8N" 2014 5 0
    "J3383LLVFJ" 2014 1 0
    "J3383LLVFJ" 2014 5 0
    "J3383LRVO8" 2014 2 0
    "J3383LVFOR" 2014 7 0
    "J3383LVN93" 2014 3 0
    "J3383N83R8" 2014 5 0
    "J3383N888L" 2014 2 0
    "J3383N9LFJ" 2014 5 0
    "J3383NL93R" 2014 2 0
    "J3383NLV8O" 2014 7 1
    "J3383NNJFV" 2014 1 0
    "J3383NNJFV" 2014 6 1
    "J3383NO3RF" 2014 6 0
    "J3383NVJNJ" 2014 4 0
    "J3383O3LVV" 2014 7 0
    "J3383OFJJL" 2014 5 0
    "J3383ON8LO" 2014 3 1
    "J3383OOO9F" 2014 1 2
    "J3383OORLN" 2014 1 0
    "J3383ORLLO" 2014 2 0
    "J3383OVN8F" 2014 4 0
    "J3383OVRF3" 2014 6 0
    "J3383R3F9N" 2014 2 0
    "J3383RF9O9" 2014 6 0
    "J3383RF9O9" 2014 7 0
    "J3383RNV9R" 2014 1 0
    "J3383ROLLV" 2014 6 0
    "J3383V3OOO" 2014 4 0
    "J3383V3OOO" 2014 7 0
    "J3383V83FL" 2014 6 0
    "J3383V83N9" 2014 7 0
    "J3383V9J89" 2014 3 0
    "J3383VL398" 2014 1 1
    "J3383VL398" 2014 2 1
    "J338F8FJFV" 2014 4 0
    "J338FLNVF3" 2014 5 0
    "J338FR9RLL" 2014 3 0
    "J338FRR3LF" 2014 6 0
    "J338FRV38F" 2014 5 0
    "J338J33FOV" 2014 3 0
    "J338J33RNO" 2014 5 1
    "J338J38JR3" 2014 5 1
    "J338J398ON" 2014 1 1
    "J338J39OV3" 2014 5 0
    end
    I want to run a program to calculate the mean square prediction error for panels of varying lengths.
    For prediction, I want to iteratively leave out one individual each time (drop all waves of this one individual) and then use the estimates from the remaining sample to predict outcome for the individual who was left out. I repeat this one-by-one for each individual in the panel.
    Then I add the prediction errors for all and store in a matrix. Next, I want to repeat this exercise for different panel lengths. So, I re-do the iterative exercise by leaving out one observation each time and calculating the prediction error for it using the estimates calculated for the remaining observations
    for pre-intervention panels of 7, 6, 5, 4, 3 and 2 weeks.
    The idea being that I want to optimize the panel length by minimizing the MSPE for the 7 week period prior to the intervention. The program is as follows:

    Code:
    /*NOTES: cllr_crossval
    The goal is to estimate the bandwidth that minimizes the IMSE of a local linear regression.
    A grid search is used and estimation is based on the cllr program described above.
    
    Arguments
      outcome: a stata variable containing the dependent variable
            x: a stata variable containing the independent variable
        start: a hardcoded number or local variable defining start of a sequence candidate bandwidths
         step: a hardcoded number or local variable defining the stepsize of the sequence of candidate bandwidth
         stop: a hardcoded number or local variable defining the end of a sequence of candidate bandwidths.
          sub: a stata variable set to 1 if the observation should be included in the analysis
    
    Returns
      A stata matrix and set of stata variables that contain the estimated IMSE for each candidate bandwidth.
    
    */
    
    sort npi
    gen N=_n if npi[_n]~=npi[_n-1]
    bysort npi: egen maxN=max(N)
    replace N=maxN if N==.
    bysort N week: gen counter=_n
    drop if counter>1
    xtset N week
    
    gen outcome = userTRA
    gen x = week
    
    capture program drop cllr_crossval
    program define cllr_crossval
     set more off
     args outcome x start step stop sub narrowsub
     tempvar cx ew e2 e2n
    
     local stop = 7
     local start = 1
     local step = 1
     *make a matrix to store the estimated IMSE
     local size = ((`stop' - `start')/`step')+1
     matrix M = J(`size', 3, .)
     
     
     *Iterate over candidate bandwidths
     local count = 0
     forvalues h = `start'(`step')`stop'{
     
      *increment counter
      local count = `count' + 1
      
      *store location on the bandwidth grid
      matrix M[`count', 1] = `h'
      
      *initialize the residual variable
      gen `e2' = .
      
      
      *Iterate over observations
      forvalues i = 1(1)`N'{
      capture quietly reghdfe /*regress*/ `outcome' `x' if  _n~=`i' & week=<`h', absorb(npi)
        replace `e2' = (`outcome' - _b[_cons])^2 in `i'
         }
      
      *compute IMSE for the candidate bandwidth
      su `e2'
      matrix M[`count',2] = r(mean)
    
        
      drop `e2'
     }
    
     matrix list M
     svmat M
    end
    But its not running. It does nothing. I will greatly appreciate some help please.

    Sincerely,
    Sumedha.

  • #2
    I see your code defines the program -cllr_crossval-, but nowhere calls it. That would certainly result in no results.<grin>. Presuming that you did actually call the program: Answering your question is very hard without seeing exactly what you actually typed, including particularly your call to -cllr_crossval- and its arguments, followed by Stata's response. I'd also recommend you start your debugging by inserting various "display" commands into your program, and seeing what part of your program, if any, is getting executed--that's what I'd do in your situation.

    Comment


    • #3
      Thank you Dr. Lacy for response. Here is what I get when I try to run it:

      Code:
      . sort npi
      . gen N=_n if npi[_n]~=npi[_n-1]
      (6,945 missing values generated)
      . bysort npi: egen maxN=max(N)
      . replace N=maxN if N==.
      (6,945 real changes made)
      . bysort N week: gen counter=_n
      . drop if counter>1
      (141 observations deleted)
      . xtset N week
      panel variable: N (unbalanced)
      time variable: week, 1 to 7, but with gaps
      delta: 1 unit
      .
      . gen outcome = userTRA
      . gen x = week
      .
      . capture program drop cllr_crossval
      . program define cllr_crossval
      1. set more off
      2. args outcome x start step stop sub narrowsub
      3. tempvar cx ew e2 e2n
      4.
      . local stop = 7
      5. local start = 2
      6. local step = 1
      7. *make a matrix to store the estimated IMSE
      . local size = ((`stop' - `start')/`step')+1
      8. matrix M = J(`size', 3, .)
      9.
      .
      . *Iterate over candidate bandwidths
      . local count = 0
      10. forvalues h = `start'(`step')`stop'{
      11. display "h"
      12. *increment counter
      . local count = `count' + 1
      13. display `count'
      14. *store location on the bandwidth grid
      . matrix M[`count', 1] = `h'
      15.
      . *initialize the residual variable
      . gen `e2' = .
      16.
      .
      . *Iterate over observations
      . forvalues i = 1(1)`N'{
      17. capture quietly reghdfe /*regress*/ `outcome' `x' if _n~=`i' & week
      > =<`h', absorb(npi)
      18. replace `e2' = (`outcome' - _b[_cons])^2 in `i'
      19. }
      20.
      . *compute IMSE for the candidate bandwidth
      . su `e2'
      21. matrix M[`count',2] = r(mean)
      22.
      .
      . drop `e2'
      23. }
      24.
      . matrix list M
      25. svmat M
      26. end
      .
      . cllr_crossval
      h
      1
      (50,810 missing values generated)
      invalid syntax
      r(198);
      end of do-file
      r(198);

      Will be grateful for any help you may be able to offer.
      Sincerely,
      Sumedha.

      Comment


      • #4
        Local macro N is never defined.

        Comment


        • #5
          Thank you for your response. Sorry for the dumb question, but N is defined at the top of the code, outside the program. How should I define it again as a local within the program?

          Comment


          • #6
            Correct but immaterial. A reference to `N' can only to be a local macro and in this context that can only be a local macro defined earlier in the same program. That is what local means.

            There is no connection between a local macro N and a variable N unless you make such a connection.

            Worse, in your case N is, or appears to be, a variable taking on different values, so it's hard to see what correspondence what you have in mind.

            I am only reading your program looking for bugs. I have no clear idea what your program is intended to do. Perhaps you are copying someone else's program and don't fully understand it either. We've all been there, or many of us.

            Better news: if and only if you mean the number of observations, then the code should be

            Code:
            forval i = 1/`=_N'
            Looking ahead I see another problem as

            Code:
             week =<
            makes no sense. I guess you mean
            Code:
            week >=
            Note that it's dangerous to put capture while debugging as Stata would eat that error and give you a puzzling error message later because something else did not work, or perhaps no results at all.


            Note:

            Earlier you had


            Code:
            display "h"
            and indeed h was happily displayed. That didn't bite you but it seems to suggest that you are fairly fuzzy on local macros. I guess what was intended was

            Code:
            display "`h'"
            Last edited by Nick Cox; 25 Oct 2019, 08:26.

            Comment


            • #7
              Prof. Cox,

              Everything you have said is accurate, including being *fairly fuzzy on local macros* (understatement)

              But, I could really use your help and will be very grateful for the same. So, what I am trying to do is as follows:

              1) I have a panel of physicians, each observed for 7 waves and N is the unique physician identifier
              2) I want calculate for varying lengths of the panel `h' (only 2 waves, 3 waves, 4, waves....., all 7 waves) mean square prediction error - first loop
              3) The second loop (within the first loop) calculates the mean square prediction error by dropping one physician each time (so drop one N each time), estimating the model for the N-1 panel of physicians for `h' waves, use the prediction estimates to predict for the physician N who was left out and calculate prediction error and store in matrix e2. Then MSPE is the mean of the prediction errors, for a given panel length (2, 3...., 7), across all prescribers N
              4) matrix M finally lists the panel length `h' and the corresponding MSPE.

              I am clearly not doing what I intend to do... and my Stata programming skills are not helping. I will be grateful for you help.
              Sincerely,
              S.

              Comment


              • #8
                Sorry, no. There is considerable caprice over what I do here. Sometimes if I understand quickly what's involved because I have done it before, more or less, then writing a few dozen lines is much less generous than it may look. Other times if I don't understand the problem fully then what I am being asked goes far beyond what I am able to do, let alone willing. This lies squarely in that second group.

                There is some context that you don't have to give. If you're an independent researcher, you're out of your depth unless you can find a collaborator. If you're a student at some level, then look for support in your own institution.

                More positively, I (think I) identified two bugs. So, what you should be telling people is what works or doesn't work after you have fixed them.
                Last edited by Nick Cox; 25 Oct 2019, 10:34.

                Comment


                • #9
                  Sorry, it seems I unintendedly offended you. I was simply responding to your earlier comment "
                  I have no clear idea what your program is intended to do." I mistook that statement to mean that you wanted to have a better idea. N does not refer to the number of observations, instead to the individual identifier in the panel. You did help me by identifying some bugs. I am grateful for that and have fixed them as well. Thank you.

                  Comment


                  • #10
                    Really, no offence inferred (or intended). Sufficient for me to see reghdfe (a command I have never used) at the core of this to think that the problem is far from my usual territory.

                    Comment


                    • #11
                      No problem. Thank you.

                      Comment


                      • #12
                        If interested please folllow https://www.statalist.org/forums/for...me-not-running

                        Comment


                        • #13
                          Thank you Prof. Cox!

                          Comment

                          Working...
                          X