Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extending Yearly Data to Quarterly Data

    Hi,
    I have real per capita income for Boston and New York Metropolitan Statistical Area (MSA).

    Ideally, I would like to assign the current yearly value to a mid year quarter (say 2 or 3) and fill in the missing quarters with linear fitted values(say using ipolate command). Furthermore, I would like to extrapolate the linear fitting upto 2016 quarter 1.

    However, I am not sure what is the best method to convert this to quarterly data. Any suggestions would be highly appreciated.


    Code:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year long(pci_boston pci_newyork)
    2001 64362 61332
    2002 64223 61063
    2003 65655 61044
    2004 67421 62733
    2005 68926 65390
    2006 70115 67357
    2007 71990 68173
    2008 71445 65854
    2009 69411 65627
    2010 71696 67563
    2011 72643 67187
    2012 73815 68723
    2013 72518 68421
    2014 73595 69105
    2015 74545 69971
    end

  • #2
    Try this:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year long(pci_boston pci_newyork)
    2001 64362 61332
    2002 64223 61063
    2003 65655 61044
    2004 67421 62733
    2005 68926 65390
    2006 70115 67357
    2007 71990 68173
    2008 71445 65854
    2009 69411 65627
    2010 71696 67563
    2011 72643 67187
    2012 73815 68723
    2013 72518 68421
    2014 73595 69105
    2015 74545 69971
    end
    
    //    CREATE AN OBSERVATION FOR 2016
    isid year
    expand 2 if year == 2015
    by year, sort: replace year = 2016 if _n == 2
    replace pci_boston = . if year == 2016
    replace pci_newyork = . if year == 2016
    
    //    NOW EXPAND DATA TO QUARTERLY
    expand 4
    by year, sort: gen month = 3*_n
    gen quarterly_date = qofd(mdy(month, 1, year))
    format quarterly_date %tq
    
    //    REPLACE VALUES FOR QUARTERS OTHER THAN 2 WITH
    //    MISSING
    replace pci_boston = . if inlist(month, 3, 9, 12)
    replace pci_newyork = . if inlist(month, 3, 9, 12)
    
    //    INTERPOLATE/EXTRAPOLATE
    ipolate pci_boston quarterly_date, epolate generate(pci_boston_ep)
    ipolate pci_newyork quarterly_date, epolate generate(pci_newyork_ep)

    Comment


    • #3
      Works perfectly fine Clyde. Thanks a lot for the quick reply.

      Comment


      • #4
        Hey guys, i have a similar problem. My data looks like this:
        year washington
        1972 31.7
        1973 34.9
        1974 38.5
        1975 43.7
        1976 45.7
        1977 48.4
        1978 51.5
        1979 53.9
        1980 59.6
        1981 64.9
        ...

        2014 197

        What i want to do is to interpolate those annual data to quarterly data, assuming constant growth rates between years. These data are published for January of year y, which are treated as appropriate for December of y-1.

        I tried your code, Clyde, but it doesn't work since my values are those of q1.

        Does anybody have an idea, how to change this code so that it works for my data?

        Hope you can help me with this issue.

        Thanks a lot!

        Comment


        • #5
          Master Economics Please note our longstanding explicit preference for full real names (and how to declare yours):

          https://www.statalist.org/forums/help#realnames

          https://www.statalist.org/forums/help#adviceextras #3

          Your data already show growth between years; if you want constant growth rates within years, it is not clear why you are happy with linear interpolation.

          If the yearly data apply to quarter 1 (why?) then here are two possible solutions:

          Code:
          clear
          input year washington
          1972 31.7
          1973 34.9
          1974 38.5
          1975 43.7
          1976 45.7
          1977 48.4
          1978 51.5
          1979 53.9
          1980 59.6
          1981 64.9
          end
          
          expand 4
          bysort year : gen qdate = yq(year, _n)
          by year: replace washington = . if _n > 1
          
          list, sepby(year)
          
          gen logw = log(washington)
          
          ipolate washington qdate, gen(linear) epolate
          ipolate logw qdate, gen(log) epolate
          replace log = exp(log)
          format linear log %4.1f
          list, sepby(year)
          
          
          
          
               +----------------------------------------------------+
               | year   washin~n   qdate       logw   linear    log |
               |----------------------------------------------------|
            1. | 1972       31.7      48   3.456317     31.7   31.7 |
            2. | 1972          .      49          .     32.5   32.5 |
            3. | 1972          .      50          .     33.3   33.3 |
            4. | 1972          .      51          .     34.1   34.1 |
               |----------------------------------------------------|
            5. | 1973       34.9      52   3.552487     34.9   34.9 |
            6. | 1973          .      53          .     35.8   35.8 |
            7. | 1973          .      54          .     36.7   36.7 |
            8. | 1973          .      55          .     37.6   37.6 |
               |----------------------------------------------------|
            9. | 1974       38.5      56   3.650658     38.5   38.5 |
           10. | 1974          .      57          .     39.8   39.7 |
           11. | 1974          .      58          .     41.1   41.0 |
           12. | 1974          .      59          .     42.4   42.3 |
               |----------------------------------------------------|
           13. | 1975       43.7      60   3.777348     43.7   43.7 |
           14. | 1975          .      61          .     44.2   44.2 |
           15. | 1975          .      62          .     44.7   44.7 |
           16. | 1975          .      63          .     45.2   45.2 |
               |----------------------------------------------------|
           17. | 1976       45.7      64   3.822098     45.7   45.7 |
           18. | 1976          .      65          .     46.4   46.4 |
           19. | 1976          .      66          .     47.1   47.0 |
           20. | 1976          .      67          .     47.7   47.7 |
               |----------------------------------------------------|
           21. | 1977       48.4      68     3.8795     48.4   48.4 |
           22. | 1977          .      69          .     49.2   49.2 |
           23. | 1977          .      70          .     50.0   49.9 |
           24. | 1977          .      71          .     50.7   50.7 |
               |----------------------------------------------------|
           25. | 1978       51.5      72   3.941582     51.5   51.5 |
           26. | 1978          .      73          .     52.1   52.1 |
           27. | 1978          .      74          .     52.7   52.7 |
           28. | 1978          .      75          .     53.3   53.3 |
               |----------------------------------------------------|
           29. | 1979       53.9      76    3.98713     53.9   53.9 |
           30. | 1979          .      77          .     55.3   55.3 |
           31. | 1979          .      78          .     56.8   56.7 |
           32. | 1979          .      79          .     58.2   58.1 |
               |----------------------------------------------------|
           33. | 1980       59.6      80   4.087656     59.6   59.6 |
           34. | 1980          .      81          .     60.9   60.9 |
           35. | 1980          .      82          .     62.3   62.2 |
           36. | 1980          .      83          .     63.6   63.5 |
               |----------------------------------------------------|
           37. | 1981       64.9      84   4.172848     64.9   64.9 |
           38. | 1981          .      85          .     66.2   66.3 |
           39. | 1981          .      86          .     67.6   67.7 |
           40. | 1981          .      87          .     68.9   69.2 |
               +----------------------------------------------------+
          To refer the annual growth rates e.g. to notional quarter 2.5, the code would be different in the middle:


          Code:
          expand 5
          bysort year : gen qdate = yq(year, _n) if _n <= 4
          by year : replace qdate = (yq(year, 2) + yq(year, 3))/2 if _n == 5
          by year: replace washington = . if _n < 5
          Getting the referent right is surely not trivial.
          Last edited by Nick Cox; 06 Aug 2018, 05:08.

          Comment


          • #6
            Hi Nick & Clyde, if i may ask, I followed the process of converting annual data to quarterly as described above and it actually worked out for me, however, i got stuck at the point at which i used the command
            replace pci_boston = . if inlist(month, 3, 9, 12) my data was directly entered into the data editor and is too large to be entered from the command window. how may i continue from here as the "inlist" function doesn't apply here. I await your swift reply. Thanks.

            Comment


            • #7
              Expecting a "swift reply" is optimistic here as (1) Clyde Schechter is away, although you're not expected to know that (2) I sleep sometimes, which you are expected to imagine (3) we are not at anyone's beck and call.

              More crucially, sorry, but I don't understand one bit of #6 except that you're alluding to code earlier in the thread.

              my data was directly entered into the data editor and is too large to be entered from the command window
              I don't see that how you entered data has any bearing on what code you should use.

              the "inlist" function doesn't apply here
              I have no idea what that means.

              Data example, please. Reproducible code, please.

              Comment


              • #8
                Pardon my choice of words, your timing has actually been swift. I later figured it out from retrying the codes. Thanks.

                Comment


                • #9
                  Hi,

                  I have a similar question as above, but with the difference that my data is in panel form. I have data for a number of country-pairs for a 10-year frequency that goes from 1960 to 2010. I want to transform this into annual data, for each country pair, and interpolate the missing data in between.

                  Any help would be extremely appreciated.

                  Ainhoa

                  Comment


                  • #10
                    Dear Clyde Schechter and Nick Cox,
                    Can you help me to apply seasonally adjusted on that data, please?
                    I have tried but I don't have made it. Thank you very much

                    year real_gdp_waemu
                    1994 10471,47835
                    1995 14920,82489
                    1996 17921,40278
                    1997 19774,11867
                    1998 21581,41542
                    1999 23308,64913
                    2000 23778,53439
                    2001 25244,32803
                    2002 26784,46926
                    2003 28936,71906
                    2004 30210,49259
                    2005 31326,83375
                    2006 33363,10632
                    2007 35031,68126
                    2008 37934,37124
                    2009 42162,28993
                    2010 44690,85466
                    2011 46633,4893
                    2012 52228,50797
                    2013 57326,33765
                    2014 61859,43517
                    2015 66668,67151
                    2016 72160,76177
                    2017 76872,97108
                    2018 81763,17722
                    2019 87210,8968


                    Comment


                    • #11
                      There are some things I do not understand here.

                      First and foremost, you have annual data, and so I don't see how "seasonal" adjustment applies here. Seasonal adjustment is typically adjustment for period phenomena over periods of time that are longer than the units in which the data are denominated. For example, one might take daily data and adjust them for four seasons per year, or adjust them for monthly cycles or the like. Perhaps the notion of seasonality you have in mind is some cycle with a period of several years. If that is what you have in mind, you need to state what the seasonality period is.

                      I also don't understand the gdp variable. What is the comma? Are there two different numbers, concatenated with a comma between them--and if so, which of the numbers (or both) is to be adjusted? Or is the comma a decimal point? In that case, does it really make sense to have 4 or 5 decimal place precision in these numbers? (Or equivalently, do we have accuracy to 9 or 10 significant figures in the gdp?)

                      Finally, in the future, when showing data examples, please use the -dataex- command to do so. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                      Comment


                      • #12
                        Good evening dear Clyde Schechter,

                        First, thank you very much for your answer.

                        The seasonality period is 1994. The variable Y has only one number. Values after the comma are decimal values and may be rounded, for example, "10471.48" or
                        "10471.478"; "14920.82" or "14920.825" and so on.

                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input int Year str11 pib_real_waemu
                        1994 "10471,47835"
                        1995 "14920,82489"
                        1996 "17921,40278"
                        1997 "19774,11867"
                        1998 "21581,41542"
                        1999 "23308,64913"
                        2000 "23778,53439"
                        2001 "25244,32803"
                        2002 "26784,46926"
                        2003 "28936,71906"
                        2004 "30210,49259"
                        2005 "31326,83375"
                        2006 "33363,10632"
                        2007 "35031,68126"
                        2008 "37934,37124"
                        2009 "42162,28993"
                        2010 "44690,85466"
                        2011 "46633,4893" 
                        2012 "52228,50797"
                        2013 "57326,33765"
                        2014 "61859,43517"
                        2015 "66668,67151"
                        2016 "72160,76177"
                        2017 "76872,97108"
                        2018 "81763,17722"
                        2019 "87210,8968" 
                        end
                        In reality, I have the annual data which I have converted into quarterly data, with the "ipolate" command that you indicated.

                        I had doubts about whether I should apply the seasonal adjustment on these quarterly data or apply the seasonal adjustment on annual data before converting to quarterly data.

                        The results of converting annual data to quarterly data are attached.

                        Thank you very much.

                        Attached Files

                        Comment


                        • #13
                          I don't understand what you mean when you say the seasonality period is 1994. Seasonal adjustment means modifying the data to remove some periodic cycle of variation in the data with a period that is at least as long as the intervals between data observations. The seasonality period is the length of time from the beginning to the end of each cycle--it is not a point in time. For example, with monthly data, and seasonal adjustment referring to actual seasons (periods of 3 months) we have a 3 month cycle. What is the cycle in your data?

                          Comment


                          • #14
                            The cycle of my data behaves as follows: [1994, 1995]; [1996, 1997]; [1998, 1999] and so on. That is to say a cycle of 2 years.

                            Comment


                            • #15
                              So your cycle basically has odd and even periods. Adjusting for these periods within your two year cycle would then look like this:

                              Code:
                              destring pib_real_waemu, replace dpcomma
                              
                              tsset Year
                              
                              summ pib_real_waemu, meanonly
                              loca mean `r(mean)'
                              
                              gen seasonality = mod(Year, 2)
                              regress pib_real_waemu i.seasonality
                              predict seasonally_adjusted_pib, residual
                              replace seasonally_adjusted_pib = seasonally_adjusted_pib + `mean'

                              Comment

                              Working...
                              X