Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • running a recurring regression model across numerous observations

    Hello,

    I am seeking advice on the best syntax to accomplish a series of regression analyses. My apologies, but I am new to Stata and have never run this type of analysis before. I cannot seem to determine how to apply what I think would be a loop command.

    Below, I have inserted an excerpt of 100 observations from an approximately 10,000-observation dataset in which this analysis would be implemented.

    The specific steps I am trying to accomplish are:

    1) I wish to serially run model iterations using logistic on a designated set of observations. The variable obs identifies the observation numbers. The series of models would progressively use one additional observation at a time.

    2) I wish to calculate a predicted probability of the outcome associated with each model and save that value as the variable yprob on the observation after the last observation used in the model iteration.

    3) I wish to then run a new model in which the next observation (the one on which the prior predicted value is listed) is added to those used in the model and a new predicted value is designated for yprob on the next observation.

    4) This process would continue for each subsequent observation up to a selected point in the data.

    So, if we ran models beginning with iteration 1 using observations 1 – 50, we would save the yprob value on the observation corresponding to obs = 51. Model iteration 2 would be run on observations 1 – 51 and we would save its yprob value on the observation corresponding to obs = 52, etc., to a designated stopping point or until running out of observations.

    If I only were to only employ obs 1 – 52 in the process, then the following crude approach works:

    Code:
    gen yprob = .
    label var yprob "predicted probability"
     
    logistic y x1 ib4.x2 ib3.x3 i.x4 if index<=50, nolog
    predict tempval if index==51
    replace yprob = tempval if index==51
    drop tempval
     
    logistic y x1 ib4.x2 ib3.x3 i.x4 if index<=51, nolog
    predict tempval if index==52
    replace yprob = tempval if index==52
    drop tempval
    Of course, this approach would not be feasible for thousands of observations. I have attempted to use forvalues but I am frankly uncertain how to structure the syntax - what I am doing doesn't seem to match the options. I also saw the command rangestat but I understand it will not run logistic regression.

    Any advice on best syntax will be greatly appreciated, thank you.

    Code:
    obs    y    x1    x2    x3    x4
    1    0    0    3    3    3
    2    1    1    2    1    2
    3    1    0    2    2    3
    4    0    1    3    2    2
    5    1    0    1    4    3
    6    0    1    4    1    2
    7    0    0    2    4    4
    8    1    1    3    4    1
    9    0    0    2    2    1
    10    1    1    2    3    4
    11    0    0    1    3    2
    12    1    1    3    2    3
    13    1    0    3    2    4
    14    0    1    1    2    1
    15    1    0    1    4    4
    16    0    1    1    4    1
    17    1    0    2    3    2
    18    0    1    1    3    3
    19    1    0    3    1    2
    20    0    1    2    2    3
    21    0    0    3    1    3
    22    1    1    4    1    2
    23    0    0    4    2    4
    24    1    1    3    1    1
    25    0    0    1    4    3
    26    1    1    4    1    2
    27    1    0    1    3    2
    28    0    1    4    3    3
    29    1    0    4    4    2
    30    0    1    4    4    3
    31    1    0    2    3    2
    32    0    1    1    3    3
    33    0    0    3    3    3
    34    1    1    2    1    2
    35    0    0    2    2    3
    36    1    1    3    2    2
    37    0    0    3    4    2
    38    1    1    4    1    3
    39    0    0    2    2    1
    40    1    1    2    3    4
    41    0    0    1    3    2
    42    1    1    3    2    3
    43    0    0    3    2    4
    44    1    1    1    2    1
    45    0    0    1    4    4
    46    1    1    1    4    1
    47    0    0    3    1    2
    48    1    1    2    2    3
    49    0    0    3    1    3
    50    1    1    4    1    2
    51    0    0    1    4    2
    52    1    1    4    2    3
    53    1    0    2    4    4
    54    0    1    4    1    1
    55    1    0    1    3    2
    56    0    1    4    3    3
    57    0    0    4    4    2
    58    1    1    4    4    3
    59    1    0    3    4    2
    60    0    1    4    1    3
    61    0    0    3    3    3
    62    1    1    1    4    2
    63    0    0    2    3    3
    64    1    1    2    2    2
    65    0    0    1    3    3
    66    1    1    1    3    2
    67    0    0    1    2    1
    68    1    1    3    2    4
    69    0    0    1    4    3
    70    1    1    4    4    2
    71    1    0    1    3    2
    72    0    1    2    3    3
    73    1    0    3    1    4
    74    0    1    4    1    1
    75    1    0    3    1    1
    76    0    1    2    2    4
    77    0    0    1    4    2
    78    1    1    4    2    3
    79    1    0    2    4    4
    80    0    1    4    1    1
    81    0    0    3    2    1
    82    1    1    2    2    4
    83    0    0    3    1    1
    84    1    1    3    4    4
    85    0    0    1    3    2
    86    1    1    2    3    3
    87    0    0    3    1    1
    88    1    1    2    2    4
    89    0    0    3    3    3
    90    1    1    1    4    2
    91    1    0    3    2    2
    92    0    1    4    3    3
    93    1    0    1    4    1
    94    0    1    4    4    4
    95    1    0    3    1    1
    96    0    1    4    1    4
    97    0    0    2    3    3
    98    1    1    2    2    2
    99    1    0    1    3    3
    100    0    1    1    3    2
    ​​​​​​​

  • #2
    This is a job for -rangerun-.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(obs y x1 x2 x3 x4)
      1 0 0 3 3 3
      2 1 1 2 1 2
      3 1 0 2 2 3
      4 0 1 3 2 2
      5 1 0 1 4 3
      6 0 1 4 1 2
      7 0 0 2 4 4
      8 1 1 3 4 1
      9 0 0 2 2 1
     10 1 1 2 3 4
     11 0 0 1 3 2
     12 1 1 3 2 3
     13 1 0 3 2 4
     14 0 1 1 2 1
     15 1 0 1 4 4
     16 0 1 1 4 1
     17 1 0 2 3 2
     18 0 1 1 3 3
     19 1 0 3 1 2
     20 0 1 2 2 3
     21 0 0 3 1 3
     22 1 1 4 1 2
     23 0 0 4 2 4
     24 1 1 3 1 1
     25 0 0 1 4 3
     26 1 1 4 1 2
     27 1 0 1 3 2
     28 0 1 4 3 3
     29 1 0 4 4 2
     30 0 1 4 4 3
     31 1 0 2 3 2
     32 0 1 1 3 3
     33 0 0 3 3 3
     34 1 1 2 1 2
     35 0 0 2 2 3
     36 1 1 3 2 2
     37 0 0 3 4 2
     38 1 1 4 1 3
     39 0 0 2 2 1
     40 1 1 2 3 4
     41 0 0 1 3 2
     42 1 1 3 2 3
     43 0 0 3 2 4
     44 1 1 1 2 1
     45 0 0 1 4 4
     46 1 1 1 4 1
     47 0 0 3 1 2
     48 1 1 2 2 3
     49 0 0 3 1 3
     50 1 1 4 1 2
     51 0 0 1 4 2
     52 1 1 4 2 3
     53 1 0 2 4 4
     54 0 1 4 1 1
     55 1 0 1 3 2
     56 0 1 4 3 3
     57 0 0 4 4 2
     58 1 1 4 4 3
     59 1 0 3 4 2
     60 0 1 4 1 3
     61 0 0 3 3 3
     62 1 1 1 4 2
     63 0 0 2 3 3
     64 1 1 2 2 2
     65 0 0 1 3 3
     66 1 1 1 3 2
     67 0 0 1 2 1
     68 1 1 3 2 4
     69 0 0 1 4 3
     70 1 1 4 4 2
     71 1 0 1 3 2
     72 0 1 2 3 3
     73 1 0 3 1 4
     74 0 1 4 1 1
     75 1 0 3 1 1
     76 0 1 2 2 4
     77 0 0 1 4 2
     78 1 1 4 2 3
     79 1 0 2 4 4
     80 0 1 4 1 1
     81 0 0 3 2 1
     82 1 1 2 2 4
     83 0 0 3 1 1
     84 1 1 3 4 4
     85 0 0 1 3 2
     86 1 1 2 3 3
     87 0 0 3 1 1
     88 1 1 2 2 4
     89 0 0 3 3 3
     90 1 1 1 4 2
     91 1 0 3 2 2
     92 0 1 4 3 3
     93 1 0 1 4 1
     94 0 1 4 4 4
     95 1 0 3 1 1
     96 0 1 4 1 4
     97 0 0 2 3 3
     98 1 1 2 2 2
     99 1 0 1 3 3
    100 0 1 1 3 2
    end
    
    capture program drop one_logistic
    program define one_logistic
        logistic y x1 ib4.x2 ib3.x3 i.x4 in 1/-2
        predict yprob in L
        exit
    end
    
    gen first = cond(obs < 51, `=_N', 1)
    gen last = cond(obs < 51, 1, obs)
    rangerun one_logistic, interval(obs first last)
    -rangerun- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, which, as I imagine you already know, is also available at SSC.

    Comment


    • #3
      Thank you for the rapid response! I will apply it to the total dataset and see how it goes. Bert

      Comment

      Working...
      X