Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting a time serie variable which uses a different quarter distribution than my quarterly using dataset

    Hi all,

    I would like to merge data of a time series variable into my using dataset. However, the problem is that in my using data I have quarterly data where quarters are specified as follows:

    date:
    310320xx is quarter 1
    310620xx is quarter 2
    310920xx is quarter 3
    311220xx is quarter 4

    So beforehand i had data in the form of day month year and I converted this to quarters using the following code in stata:
    Code:
    ​generate datum= qofd(date(Data, "DM20Y"))
    format %tq datum
    This is al fine. However the problem now is that my data source, uses a different distrubution of quarters. Here quarter 1 is equal to 01-01-20XX, quarter 2 01-04-20xx etc. However, monthly data is also available of this time series variable.
    Now I can apply two seperate methouds to links these two:

    1) using the monthly data and link the date 01-04-20XX to the first quarter of my dataset, 01-07-20xx to the second quarter etc.

    OR

    2) using the averaging method where the average value of the months 01-01-20XX and 01-02-20XX and 01-03-20XX is the value for the first quarter (so 31-03-20XX).

    My specific question is the following: how can I do this in Stata? So how can I convert this time serie variable in order to merge it to my using dataset?

    This is how my time serie variable looks (both in quarterly and monthly values) which I want to merge with my using dataset:
    Quarterly:
    Code:
     
    2001-01-01 4.81
    2001-04-01 3.47
    2001-07-01 2.77
    2001-10-01 1.33
    2002-01-01 1.25
    2002-04-01 1.25
    2002-07-01 1.25
    2002-10-01 0.75
    Monthly:

    Code:
     
    2001-01-01 5.52
    2001-02-01 5.00
    2001-03-01 4.81
    2001-04-01 4.28
    2001-05-01 3.73
    2001-06-01 3.47
    2001-07-01 3.25
    2001-08-01 3.16
    2001-09-01 2.77
    2001-10-01 2.02
    2001-11-01 1.58
    2001-12-01 1.33
    2002-01-01 1.25
    2002-02-01 1.25
    2002-03-01 1.25
    2002-04-01 1.25
    2002-05-01 1.25
    2002-06-01 1.25
    2002-07-01 1.25
    2002-08-01 1.25
    2002-09-01 1.25
    2002-10-01 1.25
    2002-11-01 0.83
    2002-12-01 0.75

    Thank you in advance for helping me out!

    Yannick
    Last edited by YH jordaan; 04 Feb 2016, 18:29.

  • #2
    Your question doesn't include a reproducible example with data and code. In particular, consider this attempt to reconstruct your first section:

    Code:
     
    . clear
    
    . input str8 Data 
    
              Data
      1. "31032014"  
      2. "31062014" 
      3. "31092014" 
      4. "31122014" 
      5. end
    
    . 
    . generate datum = qofd(date(Data, "DM20Y"))
    (2 missing values generated)
    
    . format %tq datum
    
    . l datum 
    
         +--------+
         |  datum |
         |--------|
      1. | 2014q1 |
      2. |      . |
      3. |      . |
      4. | 2014q4 |
         +--------+
    1. Stata does not recognise dates 31 June and 31 September, unsurprisingly. Perhaps you mean 30 June and 30 September.

    2. You are stating that you formatted a quarterly date as such, but there is no variable of that kind in the rest of your question.

    Further, your last two CODE segments display date variables with daily dates, but we can't reverse engineer from the display exactly what the underlying values are, as you don't give a dataex (SSC) or similar example to clarify whether these are daily, monthly or quarterly date variables with particular display formats. Nor do you give these variables names so that we can suggest code.

    Unfortunately, members' word descriptions of what (they think) they have are often unclear, confused or mistaken, which lies behind their questions, but that is why we ask for reproducible examples.

    Please revisit http://www.statalist.org/forums/help#stata for advice on asking self-contained questions.

    Comment


    • #3
      Thanx for your feedback Nick Cox . Actually I did not imported my data from excel into STATA, because I had no idea how to proceed in STATA. Now I did, because I understand it is more useful in getting support.

      So, now I imported data in the form of Day Month Year which changes per month. I imported the data as strings and converted it into quarterly data (because this is the fomat I have in my using dataset). Note that I refer to my using dataset as my working dataset, where al my variables are in (is this correct?) Anyhow, here is what I did in the master dataset.

      Code:
      . generate datum= qofd(date(DATE, "DMY"))
      
      . 
      . format %tq datum
      
      . l DATE datum
      
           +--------------------+
           |      DATE    datum |
           |--------------------|
        1. | 01jan2001   2001q1 |
        2. | 01feb2001   2001q1 |
        3. | 01mar2001   2001q1 |
        4. | 01apr2001   2001q2 |
        5. | 01may2001   2001q2 |
           |--------------------|
        6. | 01jun2001   2001q2 |
        7. | 01jul2001   2001q3 |
        8. | 01aug2001   2001q3 |
        9. | 01sep2001   2001q3 |
       10. | 01oct2001   2001q4 |
           |--------------------|
      While in my using dataset I have the following format:

      Code:
      . list DataDate datum
      
            +-------------------+
            | DataDate    datum |
            |-------------------|
         1. |   310301   2001q1 |
         2. |   300601   2001q2 |
         3. |   300901   2001q3 |
         4. |   311201   2001q4 |
         5. |   310302   2002q1 |
            |-------------------|
         6. |   300602   2002q2 |
         7. |   300902   2002q3 |
         8. |   311202   2002q4 |
         9. |   310303   2003q1 |
        10. |   300603   2003q2 |
            |-------------------|
      Where DataDate is in DAY MONTH YEAR format where YEAR 01 represent 2001, 02 represents 2002 etc.The code I used to convert the data from DataDate to datum is what I highlighted in my first post.

      So in my opinion there are 2 problems now:
      1) in my using dataset , 310301 is treated as the first quarter, while in my master dataset 01 apr 2001 is treated as the second quarter, while in fact those dates differ by only by one day. What I want is that 01 apr 2001 is treated as the first quarter, as in my using dataset.
      2) I have multiple values for the same quarter and year. So I have 3 values for 2001Q1. This will give me problems when merging the dataset as well when I want to use it in my using data for panel data econometrics (I think I will get the error of ''repeated observations within panel'').

      I hope I clarified my problem. Note that I have also data available of the same variable as discussed above, but then in quarterly data (so 01 jan 2001, 01 april 2001 etc). But to keep it simple I only showed the monthly data in this post now.

      Best,

      Yannick

      Comment


      • #4
        I don't know what to recommend on 2) as it depends what you want to do with the data.

        On 1) it seems that you should just subtract 1 day, as

        Code:
        . di %tq qofd(daily("1apr2001", "DMY"))
        2001q2
        
        . di %tq qofd(daily("1apr2001", "DMY") - 1 )
        2001q1

        Comment


        • #5
          Thank you Nick Cox . After your recommendation on 1, I want to use the last month of the quarter (so the end of march for Q1) as the unique value for quarter 1.

          So after the transformation you recommended for 1, I want to have the data 310320XX which is now Q1 as the unique value for Q1. How do I delete all other Q1 values (that of jan and february)?

          Do you get what I mean?

          Comment


          • #6
            As I understand it

            datum is a quarterly date

            and

            DATE is a monthly date

            (frankly, such names are all too likely to be confusing and hard work, which is difficult for others too, who you are asking to help)

            but you may have a panel identifier too. So, the principle is to sort on panel identifier, quarterly date and monthly date and keep only the last observation for each panel and quarter.

            Comment


            • #7
              Yes Nick Cox you are right. The panel indentifier is named ''gvkey''. I agree with you about the names, for me it is convenient, but for the future it is better to have more uniform names in the purpose for asking things on Statalist.

              So now I am wondering how I can do this is Stata: merging the two datasets which have a quarterly date and monthly date and keep only the last observation for each panel and quarter. I tried to search relating topics to this question and put effort in myself in trying to fix this in Stata, but unfortunately no results until now.

              Comment


              • #8
                If you are going to merge two data sets they must have a key (1 or more variables) that is common to both. When part of the key is a time variable, the time variable has to be in the same units in order for matches to be properly made. If I understand you correctly, one data set has a quarterly date, and one a monthly date. Assuming you want to match each monthly date the quarter in which that month occurs, you have to calculate the quarterly date corresponding to that month in the data set with the monthly dates. This can be done using a few of Stata's datetime functions. I suspect you are not familiar with them all. See -help datetime_functions- for more information.

                Code:
                use data_with_monthlies, clear 
                gen quarterly_date = qofd(dofm(monthly_date)) 
                format quarterly_date %tq
                merge m:1 gvkey quarterly_date using data_with_quarterlies
                Obviously, I am using generic variable names, rather than yours (which I find confusing).

                The key thing is that the quarterly date you create in the data set with monthly dates must be given the same name as the quarterly date that already exists in the other data set.

                I also assumed that the data set with monthly dates actually has something like monthly periodicity in its observations, from which I inferred that for any given quarter, there can be several observations. Hence the m:1 merge.

                Finally, let me also be clear that the quarterly date in the data set with quarterlies must be a genuine Stata quarterly date, not a daily date that marks some particular day in the quarter. If that is not the case (as I gleanfrom the earliest posts in this thread) , then before you run the above, you need to fix up that data set with something like:

                Code:
                use data_with_dailies_masquerading_as_quarterlies, clear
                gen quarterly_date = qofd(daily_date_masquerading_as_quarterly)
                format quaraterly_date %tq
                save data_with_quarterlies

                Comment


                • #9
                  Many thank again Clyde Schechter . The two codes you provided solved my problem

                  Comment

                  Working...
                  X