Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate 6 lagged variables for thousands of underlying variables?

    Hi all,

    I'm new on here. I know some bits and pieces about STATA so am not completely green. The issue I have at the moment is:

    I have a dynamic dataset (with basically thousands of variables (which are simply the returns of some investment vehicles)). I need to generate 6 lagged variables from every variable which name begins with "lux" and ends with either 1,23...3000. So my dataset has over 3000 variables numbered as lux1, lux2, lux3...lux3000. At the moment I generate lags like this (by entering a line in the command window). Can someone please advise how to create a loop or a line of code that would repeat this process for every single "lux" variable?

    gen lux1lag1= lux1[n-1], gen lux1lag2= lux1[n-2], gen lux1lag3= lux1[n-3], gen lux1lag4= lux1[n-4], gen lux1lag5= lux1[n-5], gen lux1lag6= lux1[n-6]

    Thanks

    West


  • #2
    One option its to do the following:
    Code:
    foreach i of varlist lux* {
       forvalues j=1/6 {
        gen lag`j'_`i'=l`j'.`i'
      }
    }
    HTH

    Comment


    • #3
      Hi West,

      Something like this should do it
      Code:
      forvalues i = 1/3000 {
      forvalues lag = 1/6 {
      gen lux`i'lag`lag' = lux`i'[_n-`lag']
      }
      }
      Make sure that the data is properly sorted before you do this.

      Comment


      • #4
        Originally posted by FernandoRios View Post
        One option its to do the following:
        Code:
        foreach i of varlist lux* {
        forvalues j=1/6 {
        gen lag`j'_`i'=l`j'.`i'
        }
        }
        HTH
        Hi, I tried it but unfortunately I get time variable not set.

        PS.
        I have my data set in long format with "mydate" with dates between 1995m1 and 2016m12, then "securities" specific number listed and then "returns".

        Comment


        • #5
          All time series operators need the data to be either tsset or xtset. If your time variable is named as date and your data is time series in nature, you would declare the data as time series by typing
          Code:
          tsset date
          and if your data is panel in nature and assume the panel identifier is named as ID, then
          Code:
          tsset ID date
          Regards
          --------------------------------------------------
          Attaullah Shah, PhD.
          Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
          FinTechProfessor.com
          https://asdocx.com
          Check out my asdoc program, which sends outputs to MS Word.
          For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

          Comment


          • #6
            No-one wants to bear the bad news here. Adding 18000 new variables to 3000 existing variables won't make your Stata life easier. For example, what is your strategy for getting and examining results? To write yet more loops over thousands of variables to get results for each stock?

            There should be much better news. If you reshape your data long then you don't need any new variables at all. You will have a panel identifier, a time variable, and a single variable for stocks. Then you won't need lag variables at all as time series operators will get you lagged values.

            Attaullah Shah is, I think, in agreement with this that a panel structure is better, but I can't see clearly from #1 and #4 that you are yet there. #1 seems to imply a wide layout, but #4 says long.
            Last edited by Nick Cox; 12 Nov 2018, 17:54.

            Comment


            • #7
              Hi Nick,

              Thanks for your message. I agree that it seems like an excessive approach from my side. The point is to generate regression after I have all these variables. For each stock I would the regress the stock returns versus lag1, lag2, lag3, lag4, lag5, lag6. The output of this would be required to calculate some other values such as average Alpha and Beta.

              I have reshaped data to long and I wanted to add a screenshot but all I get is "server error" and thus am unable to upload tiny screenshot.

              My data is formatted as long with the following columns: mydate (STATA date format), fund_no (number from 1-2418), returns (for all funds, naturally with missing values i.e. dots "."), home (e.g. usa1, usa2, and so on until usa 2418). It is worth to mention that after transferring this number of funds (2418) from wide to long, the data has approx 660 000 rows.

              Thanks
              West

              Comment


              • #8
                Thanks for the extra information. But as already explained, you really don't need the lagged variables as after a tsset or xtset the lagged values are available through time series operators.

                Here is a simple example you can run.

                Code:
                . webuse grunfeld, clear
                
                . tsset
                       panel variable:  company (strongly balanced)
                        time variable:  year, 1935 to 1954
                                delta:  1 year
                
                . regress invest L.invest L2.invest
                
                      Source |       SS           df       MS      Number of obs   =       180
                -------------+----------------------------------   F(2, 177)       =   1496.50
                       Model |   8489891.2         2   4244945.6   Prob > F        =    0.0000
                    Residual |  502074.493       177  2836.57905   R-squared       =    0.9442
                -------------+----------------------------------   Adj R-squared   =    0.9435
                       Total |  8991965.69       179  50234.4452   Root MSE        =     53.26
                
                ------------------------------------------------------------------------------
                      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                      invest |
                         L1. |   1.235742   .0774943    15.95   0.000     1.082811    1.388674
                         L2. |  -.1807256   .0870669    -2.08   0.039    -.3525484   -.0089028
                             |
                       _cons |  -.3345513   4.929618    -0.07   0.946    -10.06294    9.393838
                ------------------------------------------------------------------------------
                Screenshots are less useful than you hope, as explained in the FAQ Advice #12. https://www.statalist.org/forums/help#stata
                While visiting please stop off at #18.

                Your flashing image is, frankly, disconcerting and may cause problems for some sensitive members. (Hereabouts, there are warnings on some television channels if some footage to be shown contains flashing images as some people react badly to them.) Please edit it to something static.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  Thanks for the extra information. But as already explained, you really don't need the lagged variables as after a tsset or xtset the lagged values are available through time series operators.

                  Here is a simple example you can run.

                  Code:
                  . webuse grunfeld, clear
                  
                  . tsset
                  panel variable: company (strongly balanced)
                  time variable: year, 1935 to 1954
                  delta: 1 year
                  
                  . regress invest L.invest L2.invest
                  
                  Source | SS df MS Number of obs = 180
                  -------------+---------------------------------- F(2, 177) = 1496.50
                  Model | 8489891.2 2 4244945.6 Prob > F = 0.0000
                  Residual | 502074.493 177 2836.57905 R-squared = 0.9442
                  -------------+---------------------------------- Adj R-squared = 0.9435
                  Total | 8991965.69 179 50234.4452 Root MSE = 53.26
                  
                  ------------------------------------------------------------------------------
                  invest | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                  invest |
                  L1. | 1.235742 .0774943 15.95 0.000 1.082811 1.388674
                  L2. | -.1807256 .0870669 -2.08 0.039 -.3525484 -.0089028
                  |
                  _cons | -.3345513 4.929618 -0.07 0.946 -10.06294 9.393838
                  ------------------------------------------------------------------------------
                  Screenshots are less useful than you hope, as explained in the FAQ Advice #12. https://www.statalist.org/forums/help#stata
                  While visiting please stop off at #18.

                  Your flashing image is, frankly, disconcerting and may cause problems for some sensitive members. (Hereabouts, there are warnings on some television channels if some footage to be shown contains flashing images as some people react badly to them.) Please edit it to something static.
                  Hi Nick,

                  Thanks. Apologies for the avatar - it was annoying to me as well - I was going to change it sooner or later.

                  With regards to your advice, now I can see what you mean. I have tried to alter this code in order to fit it in with my dataset but I think am doing something wrong in here:



                  Code:
                  . tsset
                  
                         panel variable:  usa
                  
                          time variable:  year, 1995 to 2016
                  
                                  delta:  1 year (how would I change delta to 1 month? I have tried with "1 month" but it fails on delta)
                  
                  . regress usa2 L1.usa2 L2.usa2 L3.usa2 L4.usa2
                  PS. I am still receiving time variable not set

                  Thanks
                  West
                  Click image for larger version

Name:	date.png
Views:	1
Size:	8.4 KB
ID:	1470173









                  Last edited by West Ray; 13 Nov 2018, 05:22. Reason: Missed the line:time variable not set

                  Comment


                  • #10
                    You declared year to be the time variable when you appear to have monthly data. So, don't do that. From what I can see you should go

                    Code:
                    tsset fund_no mydate
                    but I am still fuzzy about what defines your panels, partly because I never work with your kind of data but more because you're not giving any data examples.

                    In #8 I advised

                    Screenshots are less useful than you hope, as explained in the FAQ Advice #12. https://www.statalist.org/forums/help#stata
                    and that's still true.

                    Comment


                    • #11
                      Originally posted by Nick Cox View Post
                      You declared year to be the time variable when you appear to have monthly data. So, don't do that. From what I can see you should go

                      Code:
                      tsset fund_no mydate
                      but I am still fuzzy about what defines your panels, partly because I never work with your kind of data but more because you're not giving any data examples.

                      In #8 I advised



                      and that's still true.
                      Hi Nick,

                      To clarify how my data looks like:

                      Click image for larger version

Name:	data.png
Views:	1
Size:	46.1 KB
ID:	1470202


                      Here is the result I get:

                      Click image for larger version

Name:	results.png
Views:	1
Size:	17.0 KB
ID:	1470203


                      I have used this:


                      Code:
                      tsset no mydate
                      
                             panel variable:  usa
                      
                              time variable:  month, 1995 to 2016
                      
                                      delta:  1 month
                      
                      . regress usa1 L.usa1 L2.usa1 L3.usa1
                      Regards
                      West

                      Comment


                      • #12
                        Sorry, but I can't advise further beyond these comments.

                        You {decline, refuse} to follow recommended practice on showing data examples. I ask you to read and act on the FAQ again and again and there's no sign that you even understand the request. That's making communication very difficult and suggesting to me that I am wasting my time.

                        Specifically:

                        1. My guess for a panel identifier was fund_no. You say nothing about that. No such variable appears within your screenshot.

                        2. I don't understand why the command

                        Code:
                        tsset no mydate
                        is apparently followed by a report that the panel identifier is usa and the time identifier is month. usa doesn't appear within your screenshot.

                        3. Stata is telling you that month has values 1995 to 2016 -- but those are years! month doesn't appear within your screenshot.

                        Generally, you seem very confused, and/or (wild surmise this) you have a non-standard corrupted Stata. I am sympathetic. Stata with a large complicated dataset is a nightmare for a learner, and I have to wonder which Department of Cruelty is obliging you to do this. But I'm out of ideas.

                        If you're a lone researcher, you need to study the manual sections on panel data, as it seems that you are just guessing randomly on what your identifiers are. Getting that wrong is a recipe for utter garbage at best and no results at all otherwise.

                        If you're not a lone researcher, all that applies, and you sorely, surely need a buddy at your workplace who knows more about Stata to talk you through this.

                        Comment

                        Working...
                        X