Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • match annual data with quarterly data

    Hi All,

    I have two data sets with different variables in different frequencies. The first one has quarterly data, as shown below.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str62 LOCATION str7 TIME float qdate double(CPI LT_interest)
    "Afghanistan" "2013q2" 213 . .
    "Afghanistan" "2013q3" 214 . .
    "Afghanistan" "2013q4" 215 . .
    "Afghanistan" "2014q1" 216 . .
    "Afghanistan" "2014q2" 217 . .
    "Afghanistan" "2014q3" 218 . .
    "Afghanistan" "2014q4" 219 . .
    "Afghanistan" "2015q1" 220 . .
    "Afghanistan" "2015q2" 221 . .
    "Afghanistan" "2015q3" 222 . .
    "Afghanistan" "2015q4" 223 . .
    "Afghanistan" "2016q1" 224 . .
    "Afghanistan" "2016q2" 225 . .
    end
    format %tq qdate
    The second one has annual data, as shown below.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str3 LOCATION int TIME double annual_pop
    "AUS" 1960    10.275
    "AUS" 1961   10.5082
    "AUS" 1962   10.7005
    "AUS" 1963   10.9069
    "AUS" 1964   11.1216
    "AUS" 1965   11.3409
    "AUS" 1966 11.599498
    "AUS" 1967 11.799078
    "AUS" 1968 12.008635
    "AUS" 1969 12.263014
    "AUS" 1970 12.507349
    "AUS" 1971 13.067265
    "AUS" 1972 13.303664
    "AUS" 1973 13.504538
    "AUS" 1974 13.722571
    "AUS" 1975 13.892995
    "AUS" 1976 14.033083
    "AUS" 1977 14.192234
    "AUS" 1978 14.359255
    "AUS" 1979 14.515729
    "AUS" 1980 14.695356
    "AUS" 1981  14.92326
    "AUS" 1982 15.184247
    "AUS" 1983 15.393472
    "AUS" 1984 15.579391
    "AUS" 1985 15.788312
    "AUS" 1986  16.01835
    "AUS" 1987 16.263874
    "AUS" 1988 16.532164
    "AUS" 1989 16.814416
    "AUS" 1990 17.065128
    "AUS" 1991 17.284036
    "AUS" 1992 17.478635
    "AUS" 1993 17.634808
    "AUS" 1994 17.805468
    "AUS" 1995 18.004882
    "AUS" 1996 18.224767
    "AUS" 1997 18.423037
    "AUS" 1998 18.607584
    "AUS" 1999 18.812264
    "AUS" 2000 19.028802
    "AUS" 2001 19.274701
    "AUS" 2002  19.49521
    "AUS" 2003 19.720737
    "AUS" 2004 19.932722
    "AUS" 2005 20.176844
    "AUS" 2006 20.450966
    "AUS" 2007 20.827622
    "AUS" 2008 21.249199
    "AUS" 2009 21.691653
    "AUS" 2010  22.03175
    "AUS" 2011 22.340024
    "AUS" 2012 22.733465
    "AUS" 2013 23.128129
    "AUS" 2014 23.475686
    "AUS" 2015 23.815995
    "AUS" 2016 24.190907
    "AUS" 2017  24.60186
    "AUS" 2018  24.99286
    "AUT" 1960  7.047539
    "AUT" 1961  7.086299
    "AUT" 1962  7.129864
    "AUT" 1963  7.175811
    "AUT" 1964  7.223801
    end
    I want to merge the second dataset which has annual data with the first dataset which has quarterly data, by using the annual number for each of the quarter. Can someone please help me with the code?

    Thank You.

  • #2
    Code:
    gen year= year(qdate)
    in the first dataset and

    Code:
    rename TIME year
    in the second, then merge one-to-many on year. To get a quarterly varying population variable, you can interpolate values in the second dataset.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str3 LOCATION int TIME double annual_pop
    "AUS" 1960    10.275
    "AUS" 1961   10.5082
    "AUS" 1962   10.7005
    "AUS" 1963   10.9069
    "AUS" 1964   11.1216
    "AUS" 1965   11.3409
    "AUS" 1966 11.599498
    "AUS" 1967 11.799078
    "AUS" 1968 12.008635
    "AUS" 1969 12.263014
    "AUS" 1970 12.507349
    "AUS" 1971 13.067265
    "AUS" 1972 13.303664
    "AUS" 1973 13.504538
    "AUS" 1974 13.722571
    "AUS" 1975 13.892995
    "AUS" 1976 14.033083
    "AUS" 1977 14.192234
    "AUS" 1978 14.359255
    "AUS" 1979 14.515729
    "AUS" 1980 14.695356
    "AUS" 1981  14.92326
    "AUS" 1982 15.184247
    "AUS" 1983 15.393472
    "AUS" 1984 15.579391
    "AUS" 1985 15.788312
    "AUS" 1986  16.01835
    "AUS" 1987 16.263874
    "AUS" 1988 16.532164
    "AUS" 1989 16.814416
    "AUS" 1990 17.065128
    "AUS" 1991 17.284036
    "AUS" 1992 17.478635
    "AUS" 1993 17.634808
    "AUS" 1994 17.805468
    "AUS" 1995 18.004882
    "AUS" 1996 18.224767
    "AUS" 1997 18.423037
    "AUS" 1998 18.607584
    "AUS" 1999 18.812264
    "AUS" 2000 19.028802
    "AUS" 2001 19.274701
    "AUS" 2002  19.49521
    "AUS" 2003 19.720737
    "AUS" 2004 19.932722
    "AUS" 2005 20.176844
    "AUS" 2006 20.450966
    "AUS" 2007 20.827622
    "AUS" 2008 21.249199
    "AUS" 2009 21.691653
    "AUS" 2010  22.03175
    "AUS" 2011 22.340024
    "AUS" 2012 22.733465
    "AUS" 2013 23.128129
    "AUS" 2014 23.475686
    "AUS" 2015 23.815995
    "AUS" 2016 24.190907
    "AUS" 2017  24.60186
    "AUS" 2018  24.99286
    "AUT" 1960  7.047539
    "AUT" 1961  7.086299
    "AUT" 1962  7.129864
    "AUT" 1963  7.175811
    "AUT" 1964  7.223801
    end
    
    expand 4
    bys LOCATION TIME: replace annual_pop=. if _n!=_N
    by LOCATION: gen quarter=_n
    by LOCATION: ipolate annual_pop quarter, gen(q_pop) epolate
    by LOCATION TIME: replace quarter=_n
    gen qdate= yq(TIME, quarter)
    format %tq qdate
    keep LOCATION qdate q_pop
    This now allows you to merge one-to-one on quarterly dates.
    Last edited by Andrew Musau; 09 May 2021, 08:16.

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      Code:
      gen year= year(qdate)
      in the first dataset and

      Code:
      rename TIME year
      in the second, then merge one-to-many on year. To get a quarterly varying population variable, you can interpolate values in the second dataset.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str3 LOCATION int TIME double annual_pop
      "AUS" 1960 10.275
      "AUS" 1961 10.5082
      "AUS" 1962 10.7005
      "AUS" 1963 10.9069
      "AUS" 1964 11.1216
      "AUS" 1965 11.3409
      "AUS" 1966 11.599498
      "AUS" 1967 11.799078
      "AUS" 1968 12.008635
      "AUS" 1969 12.263014
      "AUS" 1970 12.507349
      "AUS" 1971 13.067265
      "AUS" 1972 13.303664
      "AUS" 1973 13.504538
      "AUS" 1974 13.722571
      "AUS" 1975 13.892995
      "AUS" 1976 14.033083
      "AUS" 1977 14.192234
      "AUS" 1978 14.359255
      "AUS" 1979 14.515729
      "AUS" 1980 14.695356
      "AUS" 1981 14.92326
      "AUS" 1982 15.184247
      "AUS" 1983 15.393472
      "AUS" 1984 15.579391
      "AUS" 1985 15.788312
      "AUS" 1986 16.01835
      "AUS" 1987 16.263874
      "AUS" 1988 16.532164
      "AUS" 1989 16.814416
      "AUS" 1990 17.065128
      "AUS" 1991 17.284036
      "AUS" 1992 17.478635
      "AUS" 1993 17.634808
      "AUS" 1994 17.805468
      "AUS" 1995 18.004882
      "AUS" 1996 18.224767
      "AUS" 1997 18.423037
      "AUS" 1998 18.607584
      "AUS" 1999 18.812264
      "AUS" 2000 19.028802
      "AUS" 2001 19.274701
      "AUS" 2002 19.49521
      "AUS" 2003 19.720737
      "AUS" 2004 19.932722
      "AUS" 2005 20.176844
      "AUS" 2006 20.450966
      "AUS" 2007 20.827622
      "AUS" 2008 21.249199
      "AUS" 2009 21.691653
      "AUS" 2010 22.03175
      "AUS" 2011 22.340024
      "AUS" 2012 22.733465
      "AUS" 2013 23.128129
      "AUS" 2014 23.475686
      "AUS" 2015 23.815995
      "AUS" 2016 24.190907
      "AUS" 2017 24.60186
      "AUS" 2018 24.99286
      "AUT" 1960 7.047539
      "AUT" 1961 7.086299
      "AUT" 1962 7.129864
      "AUT" 1963 7.175811
      "AUT" 1964 7.223801
      end
      
      expand 4
      bys LOCATION TIME: replace annual_pop=. if _n!=_N
      by LOCATION: gen quarter=_n
      by LOCATION: ipolate annual_pop quarter, gen(q_pop) epolate
      by LOCATION TIME: replace quarter=_n
      gen qdate= yq(TIME, quarter)
      format %tq qdate
      keep LOCATION qdate q_pop
      This now allows you to merge one-to-one on quarterly dates.
      Thank you for the reply. I tried the suggested code but once the two datasets are merged, it gives me the following error message.

      "variable year does not uniquely identify observations in the master data" (where master dataset is the first dataset).

      As you suggested, I first created the "year" variable in the first dataset. Sample of the results are shown below. However, for 1970, 1971... the year variable shows "1960".

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str62 LOCATION str7 TIME float(qdate year) double(CPI LT_interest)
      "Advanced economies" "1970-Q1" 40 1960 . .
      "Advanced economies" "1970-Q2" 41 1960 . .
      "Advanced economies" "1970-Q3" 42 1960 . .
      "Advanced economies" "1970-Q4" 43 1960 . .
      "Advanced economies" "1971-Q1" 44 1960 . .
      "Advanced economies" "1971-Q2" 45 1960 . .
      "Advanced economies" "1971-Q3" 46 1960 . .
      "Advanced economies" "1971-Q4" 47 1960 . .
      "Advanced economies" "1972-Q1" 48 1960 . .
      "Advanced economies" "1972-Q2" 49 1960 . .
      "Advanced economies" "1972-Q3" 50 1960 . .
      "Advanced economies" "1972-Q4" 51 1960 . .
      "Advanced economies" "1973-Q1" 52 1960 . .
      "Advanced economies" "1973-Q2" 53 1960 . .
      "Advanced economies" "1973-Q3" 54 1960 . .
      "Advanced economies" "1973-Q4" 55 1960 . .
      "Advanced economies" "1974-Q1" 56 1960 . .
      "Advanced economies" "1974-Q2" 57 1960 . .
      "Advanced economies" "1974-Q3" 58 1960 . .
      "Advanced economies" "1974-Q4" 59 1960 . .
      end
      format %tq qdate
      Can you please help with this?

      Thanks.


      Comment


      • #4
        Originally posted by Ama Perera View Post

        Thank you for the reply. I tried the suggested code but once the two datasets are merged, it gives me the following error message.

        "variable year does not uniquely identify observations in the master data" (where master dataset is the first dataset).

        As you suggested, I first created the "year" variable in the first dataset. Sample of the results are shown below. However, for 1970, 1971... the year variable shows "1960".

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str62 LOCATION str7 TIME float(qdate year) double(CPI LT_interest)
        "Advanced economies" "1970-Q1" 40 1960 . .
        "Advanced economies" "1970-Q2" 41 1960 . .
        "Advanced economies" "1970-Q3" 42 1960 . .
        "Advanced economies" "1970-Q4" 43 1960 . .
        "Advanced economies" "1971-Q1" 44 1960 . .
        "Advanced economies" "1971-Q2" 45 1960 . .
        "Advanced economies" "1971-Q3" 46 1960 . .
        "Advanced economies" "1971-Q4" 47 1960 . .
        "Advanced economies" "1972-Q1" 48 1960 . .
        "Advanced economies" "1972-Q2" 49 1960 . .
        "Advanced economies" "1972-Q3" 50 1960 . .
        "Advanced economies" "1972-Q4" 51 1960 . .
        "Advanced economies" "1973-Q1" 52 1960 . .
        "Advanced economies" "1973-Q2" 53 1960 . .
        "Advanced economies" "1973-Q3" 54 1960 . .
        "Advanced economies" "1973-Q4" 55 1960 . .
        "Advanced economies" "1974-Q1" 56 1960 . .
        "Advanced economies" "1974-Q2" 57 1960 . .
        "Advanced economies" "1974-Q3" 58 1960 . .
        "Advanced economies" "1974-Q4" 59 1960 . .
        end
        format %tq qdate
        Can you please help with this?

        Thanks.

        gen year1 = yofd(dofq(qdate)) generates correct yearly data corresponding to each quarter. But still when merging, it gives the same error message.

        Comment


        • #5
          gen year = yofd(dofq(qdate))


          rename TIME year

          merge m:1 LOCATION year using "Dataset2"

          worked for me. Thank you.

          Comment

          Working...
          X