Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Run simple regression over a subsample by "in"

    Hello everyone, I am new to Stata and woud like your help. I have a sample of size 249, and I would like to run a regression over the first 50 observations. The only way I know is to create a sequence variable and use "if", and it works (the blue line). I was wondering if I can use "in" instead? (the red line)
    When I run the following code, I got an error message "'50-start_T+1' invalid observation number"


    clear all

    import excel "data/xyz.xlsx", firstrow case(lower) clear

    egen t1 = seq()

    tsset t1
    sca n1 = 65
    sca start_T = 50

    gen b_cons = .
    gen b_x1 = .
    gen b_x2 = .
    gen y_f = .

    forvalues i = `=start_T'/`=_N-n1' {
    //reg y x1 x2 if t1<= `i' & t1> `i'-start_T // it works
    reg y x1 x2 in `i'-start_T+1/`i' // it does not work; error
    predict temp in `i+n1'
    replace y_f = temp in `i+n1'
    drop temp
    }



    I would like to know how to do it because "if" does not extend to more advanced models, but I hope "in" can.


  • #2
    John:
    I think you can safely use -in- as you surmise:
    Code:
    . sysuse auto.dta
    (1978 Automobile Data)
    
    . regress price mpg in 1/24
    
          Source |       SS           df       MS      Number of obs   =        24
    -------------+----------------------------------   F(1, 22)        =      6.90
           Model |    64504736         1    64504736   Prob > F        =    0.0154
        Residual |   205717267        22  9350784.84   R-squared       =    0.2387
    -------------+----------------------------------   Adj R-squared   =    0.2041
           Total |   270222003        23  11748782.7   Root MSE        =    3057.9
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             mpg |  -364.3311   138.7153    -2.63   0.015    -652.0092   -76.65311
           _cons |   13521.27    2871.86     4.71   0.000     7565.402    19477.15
    ------------------------------------------------------------------------------
    
    .
    If you want to collect coefficients and the like, take a look at -statsby-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      asreg makes quick work of these sorts of problems, storing the coefficients/etc for each round.

      Code:
      search asreg

      Comment


      • #4
        The problem with #1 is the problem with your previous thread: you want scalar calculations on the fly, which is possible but with different syntax. I can't test your code -- your spreadsheet isn't visible to me -- but I venture this rewriting, which is just one way to do it;


        Code:
        clear all
        
        import excel "data/xyz.xlsx", firstrow case(lower) clear
        
        gen t1 = _n
        // egen t1 = seq()
        
        tsset t1
        sca n1 = 65
        sca start_T = 50
        
        gen b_cons = .
        gen b_x1 = .
        gen b_x2 = .
        gen y_f = .
        
        forvalues i = `=start_T'/`=_N-n1' {
        
            local j = `i' - start_T 1
            reg y x1 x2 in `j'/`i'
        
            local k = `i' + n1
            predict temp in `k'
            replace y_f = temp in `k'
            drop temp
        
        }
        Last edited by Nick Cox; 15 Oct 2021, 09:30.

        Comment


        • #5
          Thank you everyone for help!! Nick, would you mind testing the code with the xlsx file I just attached and provide a modification on my previous code? Thank you!!
          Attached Files

          Comment


          • #6
            John:
            nobody on this list will ever open spreadsheets coming from unknown sources due to the risk of active contents.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              To paraphrase Carlo Lazzaro slightly, we ask that you don't post spreadsheets. See https://www.statalist.org/forums/help and especially #12. All posters are asked to read this before posting.

              I should apologise in this sense: it may have appeared that I was asking to see the spreadsheet, but I wasn't. My comment was intended as a little wry, along the lines of "You're using a spreadsheet on your machine which naturally I can't see; hence I can't be confident that this is exactly right."

              In any case, I have already offered a modification of your previous code!

              Comment

              Working...
              X