Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with running a loop through the observations of a variable

    Hi,

    I have a dataset in which I have the share price of different companies for every day for 10 years. I would like to create a variable that uses the share prices for every month and I am trying to do so with a loop. To do so, I created a new variable (date_01) with the end of month dates and I am trying to use this code below:

    "levelsof date_01, local(lista_1)
    foreach l of local lista_1 {
    if date <= `l'[_n] & date > `l'[_n-1] {
    whatever ...
    }
    }"
    Where date is a variable containing the dates for all the daily observation. The dates are in this format 19960131. This code does not work because it says that the first observation in the date_01 variable, which is 19960131, is an invalid name.

    Could you please give me some suggestions on how to fix the problem please?

    Thank you.



  • #2
    Your question really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what you have shared. Please help us help you. Show example data. The Statalist FAQ provides advice on effectively posing your questions, posting example data with the assistance of the dataex command, and sharing Stata output.

    Comment


    • #3
      Thank you for your reply.
      This is the exact code I would like to work:
      Code:
      levelsof date_01, local(lista_1)
      foreach l of local lista_1 {
               if Date <= `l'[_n] & Date > `l'[_n-1] {
                   mkspline strike_p_`l' = strike_price , cubic  nknots(3)
                   quietly regress impl_volatility strike_p_`l'*, noconstant
                   predict y_hat_`l'
                   drop strike_p_`l'1 strike_p_`l'2 
             }
               }
      and this is the example of the data sample:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long Date float(strike_price_01 impl_volatility_01 date_01)
      19960104         0       . 19960132
      19960110  310.4686 .369596 19960228
      19960110         0       . 19960332
      19960115  426.3265 .260744 19960430
      19960115  59.92363 .190384        .
      19960205  310.4686  .37235        .
      19960205 272.24646 .170089        .
      19960205  53.83344 .102904        .
      19960213         0       .        .
      19960213         0       .        .
      19960213  33.57839  .18374        .
      19960301 363.28085 .128773        .
      19960310         0       .        .
      19960325         0       .        .
      19960225  424.2615 .211032        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
             .         .       .        .
      end
      The code that I am using for the loop says that 19960131 is an invalid name so not sure what to do. Could you please help me?

      Thank you a lot in advance.

      Comment


      • #4
        When you are writing a loop, you are generalizing a piece of code. Therefore, you need to make sure that the code runs. So you should back up a few steps and consider what you want to do. For example, the following does not make sense

        if Date <= `l'[_n] & Date > `l'[_n-1] {
        `l' is a level of your variable "Date" and not a variable itself.

        Code:
        sysuse auto, clear
        list make in 1/5
        di make[3]
        Res.:

        Code:
        . list make in 1/5
        
             +---------------+
             | make          |
             |---------------|
          1. | AMC Concord   |
          2. | AMC Pacer     |
          3. | AMC Spirit    |
          4. | Buick Century |
          5. | Buick Electra |
             +---------------+
        
        . di make[3]
        AMC Spirit
        Based on your data, explain what you want to do and better still, create a "wanted" variable with the values that you expect.

        Comment


        • #5
          Hi,
          Thank you for your message.

          I would like to create a new variable, which in the code is called y_hat_`l', but I would like this variable to be created only for the financial observations that belong to a certain month (for example January). For this reason, I created a new date variable (date_01) containing tje end of month dates. My idea was that, if a financial observation is included between the two end of month dates, it would be used to create the y_hat variable for that specific month. However, I am not able to turn the level of the variable date_01 into a nth observation. Do you have any idea on how I coudl do this?

          Thank you for your help.

          Comment


          • #6
            You can create a month variable to identify months. You need to transform your date variable to one that is recognized by Stata. Then create an indicator for last available day of the month (=1 if the observation is the last recorded day of the month, 0 otherwise)

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input long Date float(strike_price_01 impl_volatility_01 date_01)
            19960104         0       . 19960132
            19960110  310.4686 .369596 19960228
            19960110         0       . 19960332
            19960115  426.3265 .260744 19960430
            19960115  59.92363 .190384        .
            19960205  310.4686  .37235        .
            19960205 272.24646 .170089        .
            19960205  53.83344 .102904        .
            19960213         0       .        .
            19960213         0       .        .
            19960213  33.57839  .18374        .
            19960301 363.28085 .128773        .
            19960310         0       .        .
            19960325         0       .        .
            19960225  424.2615 .211032        .
            end
            
            gen year= int(Date/10000)
            gen month=  int(Date/100) - (year*100)
            gen day= Date-int(Date/100)*100
            gen date= mdy(month, day, year), after(Date)
            format date %td
            bys year month (day): gen lastdayofmonth=_n==_N
            bys year month day (lastdayofmonth): replace lastdayofmonth= lastdayofmonth[_N]
            Res.

            Code:
            . l date- lastdayofmonth , sepby(year month)
            
                 +----------------------------------------------------------------------------+
                 |      date   strik~01   impl_~01    date_01   year   month   day   lastda~h |
                 |----------------------------------------------------------------------------|
              1. | 04jan1996          0          .   2.00e+07   1996       1     4          0 |
              2. | 10jan1996          0          .   2.00e+07   1996       1    10          0 |
              3. | 10jan1996   310.4686    .369596   2.00e+07   1996       1    10          0 |
              4. | 15jan1996   59.92363    .190384          .   1996       1    15          1 |
              5. | 15jan1996   426.3265    .260744   2.00e+07   1996       1    15          1 |
                 |----------------------------------------------------------------------------|
              6. | 05feb1996   310.4686     .37235          .   1996       2     5          0 |
              7. | 05feb1996   53.83344    .102904          .   1996       2     5          0 |
              8. | 05feb1996   272.2465    .170089          .   1996       2     5          0 |
              9. | 13feb1996   33.57839     .18374          .   1996       2    13          0 |
             10. | 13feb1996          0          .          .   1996       2    13          0 |
             11. | 13feb1996          0          .          .   1996       2    13          0 |
             12. | 25feb1996   424.2615    .211032          .   1996       2    25          1 |
                 |----------------------------------------------------------------------------|
             13. | 01mar1996   363.2809    .128773          .   1996       3     1          0 |
             14. | 10mar1996          0          .          .   1996       3    10          0 |
             15. | 25mar1996          0          .          .   1996       3    25          1 |
                 +----------------------------------------------------------------------------+

            However, I am not able to turn the level of the variable date_01 into a nth observation. Do you have any idea on how I coudl do this?
            I am still not clear on this. Show me an example (by hand) of one calculation that you need to do.
            Last edited by Andrew Musau; 16 May 2021, 09:38.

            Comment


            • #7
              The first problem you seem to have is vast numbers of observation with no data. You need to get rid of them. Realistically, you only want observations where Date is not missing.

              The second problem you have is that your variable Data is not a Stata Internal Format date variable suitable for use in Stata calculations.

              Stata's "date and time" variables are complicated and there is a lot to learn. If you have not already read the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF, do so now. If you have, it's time for a refresher. After that, the help datetime documentation will usually be enough to point the way. You can't remember everything; even the most experienced users end up referring to the help datetime documentation or back to the manual for details. But at least you will get a good understanding of the basics and the underlying principles. An investment of time that will be amply repaid.

              All Stata manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

              Because your Date variable is just one long number, the easiest way to convert it to a Stata Internal Format date is using the community contributed numdate package available from SSC. I use it in the example code below. Run the following two commands one time to install it and review the documentation.
              Code:
              ssc install numdate
              help numdate
              The fourth problem is that your data is not sorted by date.

              This code should start you in a useful direction.
              Code:
              drop if missing(Date)
              drop date_01
              numdate daily date = Date, pattern("YMD")
              format date %td
              sort date
              
              generate month = mofd(date)
              format month %tm
              order date month, after(Date)
              levelsof month, local(months)
              generate monthlymax = .
              // I don't understand what you're doing but this is similar
              foreach m of local months {
                  egen temp = max(strike_price_01) if month==`m'
                  replace monthlymax = temp if month==`m'
                  drop temp
              }
              
              list, sepby(month) abbreviate(20)
              Code:
              . list, sepby(month) abbreviate(20)
              
                   +-----------------------------------------------------------------------------------+
                   |     Date        date    month   strike_price_01   impl_volatility_01   monthlymax |
                   |-----------------------------------------------------------------------------------|
                1. | 19960104   04jan1996   1996m1                 0                    .     426.3265 |
                2. | 19960110   10jan1996   1996m1          310.4686              .369596     426.3265 |
                3. | 19960110   10jan1996   1996m1                 0                    .     426.3265 |
                4. | 19960115   15jan1996   1996m1          59.92363              .190384     426.3265 |
                5. | 19960115   15jan1996   1996m1          426.3265              .260744     426.3265 |
                   |-----------------------------------------------------------------------------------|
                6. | 19960205   05feb1996   1996m2          53.83344              .102904     424.2615 |
                7. | 19960205   05feb1996   1996m2          272.2465              .170089     424.2615 |
                8. | 19960205   05feb1996   1996m2          310.4686               .37235     424.2615 |
                9. | 19960213   13feb1996   1996m2                 0                    .     424.2615 |
               10. | 19960213   13feb1996   1996m2          33.57839               .18374     424.2615 |
               11. | 19960213   13feb1996   1996m2                 0                    .     424.2615 |
               12. | 19960225   25feb1996   1996m2          424.2615              .211032     424.2615 |
                   |-----------------------------------------------------------------------------------|
               13. | 19960301   01mar1996   1996m3          363.2809              .128773     363.2809 |
               14. | 19960310   10mar1996   1996m3                 0                    .     363.2809 |
               15. | 19960325   25mar1996   1996m3                 0                    .     363.2809 |
                   +-----------------------------------------------------------------------------------+

              Comment


              • #8
                Hi, thank you for your help.

                After your last post, I created a new version of the loop which could work:
                Code:
                levelsof year, local(year_l)
                levelsof month, local(month_l)
                foreach l of local year_l {
                     foreach s of local month_l {
                        if year == `l' & month == `s' {
                           mkspline strike_p_`l'_`s' = strike_price if days <= 60 , cubic  nknots(3)
                           quietly regress impl_volatility strike_p_`l'_`s'*, noconstant
                           predict y_hat_`l'_`s'_60d
                           *drop strike_p_`l'_`s'1 strike_p_`l'_`s'2 
                         }
                          }
                            }
                However, it predicts y_hat only for the first month and the first year.
                This is what I am trying to aim with the loop:
                I have some financial data about the volatility and I would like to fit a cubic spline to the volatility curve created with the volatility observations and this is done with the mkspline command. Then, the loop should predict what the level of volatility should be (based on the level of the cubic spline) and this is done with the command predict. I would like to do this for all the observations belonging to January 2003 for example.

                Could you tell me how I could improve the last loop?

                Thank you for your help.

                Comment


                • #9
                  Sorry, the new code is this one:
                  Code:
                   
                   levelsof year, local(year_l) levelsof month, local(month_l) foreach l of local year_l {      foreach s of local month_l {         if year == `l' & month == `s' {            mkspline strike_p_`l'_`s' = strike_price, cubic  nknots(3)            quietly regress impl_volatility strike_p_`l'_`s'*, noconstant            predict y_hat_`l'_`s'            *drop strike_p_`l'_`s'1 strike_p_`l'_`s'2           }           }             }

                  Comment


                  • #10

                    EDITED: You don't need to loop months and years separately, you can create a year-month variable. I create a single "yhat" variable to hold all the predictions.

                    Code:
                    gen yhat=.
                    gen yearmonth= ym(year, month)
                    format yearmonth %tm
                    levelsof yearmonth, local(months)
                    foreach m of local months{
                        cap mkspline strike_p_`m' = strike_price  if yearmonth==`m', cubic  nknots(3)
                        quietly cap regress impl_volatility strike_p_`m'*  if yearmonth==`m', noconstant            
                        cap predict y_hat_`m'      
                        cap replace yhat=  y_hat_`m' if yearmonth==`m'  
                        cap drop strike_p_`m'*  y_hat_`m'    
                    }
                    Last edited by Andrew Musau; 16 May 2021, 10:50.

                    Comment


                    • #11
                      Thank you for your message.

                      When I run this code, the loop works only for the first yearmonth observation.
                      Would you have any idea of why it does not work for all yearmonth observations?
                      Thank you

                      Comment


                      • #12
                        I have edited the code after posting. Check whether the last version works and let me know if not.

                        Comment


                        • #13
                          Hi, thank you for your help. The code does not work because it keeps buffering but no output was produced after 30 minutes. Could you please tell me if you know how to improve it? Thank you a lot

                          Comment


                          • #14
                            It has to be something specific to your data. First things first, you have a lot of observations with missing dates from your example in #3. You need to drop them as they are not informative. Secondly, I put in a lot of capture commands in the code as mkspline will not execute if the sample size is too small for the number of knots specified. A case in point is 1996m3 in your example dataset which has only 1 nonmissing observation. In these cases, no predictions will be generated and you may want to exclude such groups. As to why the code takes too long to execute in your full dataset, I have no specific insights. It could be the size of the dataset. You should try with a subset of the dataset and see if you get results and how long it takes. After deleting missing values and groups with very few observations, remove the capture commands and confirm that there are no errors when executing the code. For your dataset above, appending the code in #10, I get predictions for 1996m1 and 1996m2.


                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input long Date float(strike_price_01 impl_volatility_01 date_01)
                            19960104         0       . 19960132
                            19960110  310.4686 .369596 19960228
                            19960110         0       . 19960332
                            19960115  426.3265 .260744 19960430
                            19960115  59.92363 .190384        .
                            19960205  310.4686  .37235        .
                            19960205 272.24646 .170089        .
                            19960205  53.83344 .102904        .
                            19960213         0       .        .
                            19960213         0       .        .
                            19960213  33.57839  .18374        .
                            19960301 363.28085 .128773        .
                            19960310         0       .        .
                            19960325         0       .        .
                            19960225  424.2615 .211032        .
                            end
                            
                            drop if missing(Date)
                            gen year= int(Date/10000)
                            gen month=  int(Date/100) - (year*100)
                            gen day= Date-int(Date/100)*100
                            gen date= mdy(month, day, year), after(Date)
                            format date %td
                            bys year month (day): gen lastdayofmonth=_n==_N
                            bys year month day (lastdayofmonth): replace lastdayofmonth= lastdayofmonth[_N]
                            gen yhat=.
                            gen yearmonth= ym(year, month)
                            format yearmonth %tm
                            levelsof yearmonth, local(months)
                            foreach m of local months{
                                cap mkspline strike_p_`m' = strike_price  if yearmonth==`m', cubic  nknots(3)
                                quietly cap regress impl_volatility strike_p_`m'*  if yearmonth==`m', noconstant            
                                cap predict y_hat_`m'      
                                cap replace yhat=  y_hat_`m' if yearmonth==`m'  
                                cap drop strike_p_`m'*  y_hat_`m'    
                            }
                            Res.:

                            Code:
                            . l date strike_price_01 impl_volatility_01 yearmonth yhat, sepby(yearmonth)
                            
                                 +-------------------------------------------------------+
                                 |      date   strik~01   impl_~01   yearmo~h       yhat |
                                 |-------------------------------------------------------|
                              1. | 04jan1996          0          .     1996m1          0 |
                              2. | 10jan1996          0          .     1996m1          0 |
                              3. | 10jan1996   310.4686    .369596     1996m1   .3515871 |
                              4. | 15jan1996   59.92363    .190384     1996m1   .2068114 |
                              5. | 15jan1996   426.3265    .260744     1996m1   .2715498 |
                                 |-------------------------------------------------------|
                              6. | 05feb1996   310.4686     .37235     1996m2   .2626289 |
                              7. | 05feb1996   53.83344    .102904     1996m2   .1384297 |
                              8. | 05feb1996   272.2465    .170089     1996m2   .2743523 |
                              9. | 13feb1996   33.57839     .18374     1996m2   .0889732 |
                             10. | 13feb1996          0          .     1996m2          0 |
                             11. | 13feb1996          0          .     1996m2          0 |
                             12. | 25feb1996   424.2615    .211032     1996m2   .2274117 |
                                 |-------------------------------------------------------|
                             13. | 01mar1996   363.2809    .128773     1996m3          . |
                             14. | 10mar1996          0          .     1996m3          . |
                             15. | 25mar1996          0          .     1996m3          . |
                                 +-------------------------------------------------------+
                            
                            .
                            Last edited by Andrew Musau; 16 May 2021, 14:14.

                            Comment


                            • #15
                              The code now works for a smaller sample size! Thanks a lot for your help. I have one last question, sometimes the loop breaks because for a certain month there are too few observations for the mkspline function. Would it be possible to insert in the loop the option to go to the month x+1 in case the observations for month x are too few? many thanks in advance!

                              Comment

                              Working...
                              X