Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Yearly to Quatery Data: Decode not working

    Hi,
    I have two variables in my data - Date and jet fuel.

    I need to extract the quarter and year out of Date variable in order to collapse the data on jetfuel prices.

    The date variable is in int format, and "encode" command is not working. Furthermore, when I use the "tostring" command. I see that the Date is labelled. Can anyone tell me what I am missing here, and how to convert the monthly data into quarterly data. Thank you.



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int Date double USGulfCoastKeroseneTypeJet
    11062   .54
    11092  .515
    11123  .494
    11153  .535
    11184  .791
    11215 1.012
    11245 1.196
    11276  .971
    11306  .803
    11337  .741
    11368  .637
    11396  .558
    11427  .552
    11457  .569
    11488  .547
    11518  .586
    11549  .623
    11580  .635
    11610  .671
    11641  .645
    11671  .522
    11702  .509
    11733  .543
    11762  .514
    11793  .543
    11823  .579
    11854  .615
    11884  .611
    11915   .59
    11946  .623
    11976  .621
    12007  .561
    12037  .544
    12068  .534
    12099  .553
    12127  .558
    12158  .551
    12188  .553
    12219  .526
    12249  .494
    12280  .501
    12311  .535
    12341  .568
    12372  .533
    12402  .446
    12433  .528
    12464  .501
    12492  .451
    12523  .469
    12553  .472
    12584  .493
    12614  .509
    12645  .498
    12676   .49
    12706  .509
    12737  .513
    12767  .487
    12798  .472
    12829  .467
    12857  .449
    12888    .5
    12918  .511
    12949  .478
    12979  .467
    13010  .496
    13041  .513
    13071  .502
    13102  .524
    13132  .553
    13163   .55
    13194  .557
    13223  .586
    13254  .617
    13284  .557
    13315  .516
    13345  .556
    13376  .615
    13407  .682
    13437  .703
    13468  .696
    13498  .693
    13529   .68
    13560  .619
    13588  .557
    13619  .541
    13649  .549
    13680  .523
    13710  .536
    13741  .559
    13772  .541
    13802  .571
    13833  .549
    13863  .496
    13894  .476
    13925  .449
    13953  .413
    13984  .426
    14014  .416
    14045  .389
    14075  .386
    end
    format %td Date

  • #2
    Thanks for the data example, making it clear that you have here a daily date variable which is (necessarily) numeric.

    Of the commands you mention, encode maps from a string variable, to a numeric variable, which doesn't apply here; decode maps from a numeric variable to a string variable, which isn't needed here, and doesn't apply either as you have no value labels to use any way; and tostring maps to a string variable, which is no use here as "14075" (say) as string is further away from a date than 14075 is, and nothing to do with quarterly dates any way.

    What you're missing is some understanding how dates are implemented in Stata, and specifically how to convert from one kind of date to another.

    Code:
    help dates and times
    is a good way in, to the documentation you need to read for full understanding.

    To wrap it up, what I think you need is qofd() followed an appropriate format assignment.

    Code:
    clear
    
    input int Date double USGulfCoastKeroseneTypeJet
    11062   .54
    11092  .515
    11123  .494
    11153  .535
    11184  .791
    11215 1.012
    11245 1.196
    end
    format %td Date
    
    gen Qdate = qofd(Date)
    format %tq Qdate
    
    list, sepby(Qdate)  
    
    
         +-------------------------------+
         |      Date   USGulf~t    Qdate |
         |-------------------------------|
      1. | 15apr1990        .54   1990q2 |
      2. | 15may1990       .515   1990q2 |
      3. | 15jun1990       .494   1990q2 |
         |-------------------------------|
      4. | 15jul1990       .535   1990q3 |
      5. | 15aug1990       .791   1990q3 |
      6. | 15sep1990      1.012   1990q3 |
         |-------------------------------|
      7. | 15oct1990      1.196   1990q4 |
         +-------------------------------+
    
    .
    How to extract the year is, as they say, left as an exercise, but it's plainly documented at the place cited.

    Comment


    • #3
      Thanks for the prompt reply Nick.
      I now understand the distinction between HRF and SIF, and was misinterpreting HRF as value labels - thus, I was not getting the desired result upon using the "decode" command.

      The documentation does help a lot, but I am afraid I am still not able to generate a unique identifier by year and quarter variables.
      After using the qofd() command, I do the following -

      Code:
      collapse jetfuel, by(qtr)
      tsset qtr, quarterly
      gen year_fuel=year(qtr)
      gen quarter_fuel=quarter(qtr)
      While I am able to generate the new year_fuel and quarter_fuel variables, I have been unable to extract the date components from SIF correctly. Could you please explain what is happening here ? Thanks again.

      Comment


      • #4
        Note incidentally that qofd() is a function, not a command.

        You don't say what you got beyond stating that you got incorrect results, but your code permits diagnosis.

        To see what you did wrong, work through a simple example using display where you know the answer in advance.

        Code:
        . di yq(2017, 1)
        228
        
        . di year(228)
        1960
        
        . di quarter(228)
        3
        
        . di quarter(dofq(228))
        1
        
        . di year(dofq(228))
        2017
        The functions year() and quarter() expect daily dates. This is documented. To apply them to quarterly dates, you must convert the latter into daily dates first. You can do that on the fly.

        Comment


        • #5
          Thanks again Nick.

          Following code does the trick -
          Code:
          format %td Date
          gen qtr=qofd(Date)
          format %tq qtr  
          list,sepby(qtr) 
          
          format %tq Date
          di yq(2000, 4)
          di year(163)
          di quarter(163)
          di quarter(dofq(163))
          di year(dofq(163))
          
          gen year=year(dofq(qtr))
          gen quarter=quarter(dofq(qtr))
          
          collapse jetfuel, by(year quarter)
          tostring *, replace

          Comment


          • #6
            Thanks for the thanks, But I don't know why you convert everything to string at the end. That is not obviously a good idea. Almost no analyses will now be possible.

            Comment


            • #7
              It has been my experience that appending or merging files is easier with string variables (less prone to errors, especially when using loops). I can simply destring my variables once the transformation is complete.

              Comment


              • #8
                You're more confident about that being a good idea than I am. It really shouldn't be necessary.

                I am the original author of destring and tostring and don't recommend what you are doing and affirm that it is no part of their intended purposes.

                Comment


                • #9
                  Okay. Will try avoiding that in the future.

                  I have had quarterly data files in which a certain variable is in numeric form. However, this variable may get imported as string in a particular year making appending/merging troublesome. Even after identifying the file with string variable, I could not destring the variable. However, by stringing this variable in all years, I was able to use the merge/append commands easily.
                  For the statistical analysis part such as running regressions after some data cleaning on the new data (say keep entries with unique identifiers only) , I could simply destring all the variables.

                  I guess what happens with many time series data is that there may be objects in a variable (or many other reasons I am not aware of) that makes Stata ascribe a string status to the variable automatically while importing. Either one can clean files individually and destring them so that merge/append run smoothly, or string everything in the loop followed by append/merge command, and clean this final data. The later approach has worked better for me and is more convenient. It may not be theoretically superior but given my Stata knowledge limitations does the trick. Any comments on why this is a bad practice? Thank you.

                  Comment


                  • #10
                    You're right that if a particular variable is string in one file and numeric in another then it needs to be standardised for file joins to work. This happens fairly often with identifiers for several reasons and ad hoc clean up, although tedious, is usually necessary. I can't comment on what has happened to your files, but a common reason for problems is that datasets have been in and out of spreadsheets and metadata has got mixed up with data. Brute force is is usually not the best long-term strategy, here as elsewhere.

                    What should never be necessary for the purpose of file management is to make all variables string. Numeric variables such as price and sales should always be numeric.

                    Comment


                    • #11
                      Got it. Thanks for the insights Nick.

                      Comment

                      Working...
                      X