Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • combining datasets with different formats of a variable and ts values

    I have a dataset that includes information on naic codes, years, the quarter of those years in the format 101 for 2001Q1 102 for 2001Q2 and so on for each naic code until 2012. I want to combine this with another dataset to extend it, that includes the same information for naic codes for years 1997-2000 but doesn't yet have a ts variable and yq is just in the form q= 1 for 1997Q1 q=2 for 1997Q2, for example:

    (filename dataset.dta )

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long naic float(year qtr Y M)
    311111 1997 1  8688.239  4402.139
    311111 1997 2  8688.239  4402.139
    311111 1997 3  8688.239  4402.139
    311111 1997 4  8688.239  4402.139
    311111 1998 1  8967.129  4548.587
    311111 1998 2  8967.129  4548.587
    311111 1998 3  8967.129  4548.587
    311111 1998 4  8967.129  4548.587
    311111 1999 1  8559.564  4324.871
    311111 1999 2  8559.564  4324.871
    311111 1999 3  8559.564  4324.871
    311111 1999 4  8559.564  4324.871
    311111 2000 1   8751.42  4471.446
    311111 2000 2   8751.42  4471.446
    311111 2000 3   8751.42  4471.446
    311111 2000 4   8751.42  4471.446
    311119 1997 1 19044.107 14589.377
    311119 1997 2 19044.107 14589.377
    311119 1997 3 19044.107 14589.377
    311119 1997 4 19044.107 14589.377
    311119 1998 1 17883.795 13387.547
    311119 1998 2 17883.795 13387.547
    311119 1998 3 17883.795 13387.547
    311119 1998 4 17883.795 13387.547
    311119 1999 1 15930.134  12021.11
    311119 1999 2 15930.134  12021.11
    311119 1999 3 15930.134  12021.11
    311119 1999 4 15930.134  12021.11
    311119 2000 1 15483.787 11937.382
    311119 2000 2 15483.787 11937.382
    311119 2000 3 15483.787 11937.382
    311119 2000 4 15483.787 11937.382
    end
    label values year year

    Whereas the set that I would like to put this into (filename nottb_synth.dta ) is the following:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double naic float(year yq) double(ts Y M)
    311111 2001  101 164             9734.9              4576
    311111 2001  102 165             9734.9              4576
    311111 2001  103 166             9734.9              4576
    311111 2001  104 167             9734.9              4576
    311111 2002  201 168            10662.2            4786.7
    311111 2002  202 169            10662.2            4786.7
    311111 2002  203 170            10662.2            4786.7
    311111 2002  204 171            10662.2            4786.7
    311111 2003  301 172 11006.900000000001            4713.7
    311111 2003  302 173 11006.900000000001            4713.7
    311111 2003  303 174 11006.900000000001            4713.7
    311111 2003  304 175 11006.900000000001            4713.7
    311111 2004  401 176 12127.900000000001 5405.700000000001
    311111 2004  402 177 12127.900000000001 5405.700000000001
    311111 2004  403 178 12127.900000000001 5405.700000000001
    311111 2004  404 179 12127.900000000001 5405.700000000001
    311111 2005  501 180          13141.967           5810.56
    311111 2005  502 181          13141.967           5810.56
    311111 2005  503 182          13141.967           5810.56
    311111 2005  504 183          13141.967           5810.56
    311111 2006  601 184          13596.647          5805.259
    311111 2006  602 185          13596.647          5805.259
    311111 2006  603 186          13596.647          5805.259
    311111 2006  604 187          13596.647          5805.259
    311111 2007  701 188          14393.898          6876.225
    311111 2007  702 189          14393.898          6876.225
    311111 2007  703 190          14393.898          6876.225
    311111 2007  704 191          14393.898          6876.225
    311111 2008  801 192           18582.01           8708.27
    311111 2008  802 193           18582.01           8708.27
    311111 2008  803 194           18582.01           8708.27
    311111 2008  804 195           18582.01           8708.27
    311111 2009  901 196          19690.966            9776.6
    311111 2009  902 197          19690.966            9776.6
    311111 2009  903 198          19690.966            9776.6
    311111 2009  904 199          19690.966            9776.6
    311111 2010 1001 200          20386.975         10780.551
    311111 2010 1002 201          20386.975         10780.551
    311111 2010 1003 202          20386.975         10780.551
    311111 2010 1004 203          20386.975         10780.551
    311111 2011 1101 204          20585.563         11439.632
    311111 2011 1102 205          20585.563         11439.632
    311111 2011 1103 206          20585.563         11439.632
    311111 2011 1104 207          20585.563         11439.632
    311111 2012 1201 208     21993.56640625   12634.986328125
    311111 2012 1202 209     21993.56640625   12634.986328125
    311111 2012 1203 210     21993.56640625   12634.986328125
    311119 2001  101 164              16990           12388.1
    311119 2001  102 165              16990           12388.1
    311119 2001  103 166              16990           12388.1
    311119 2001  104 167              16990           12388.1
    311119 2002  201 168            17318.2           12735.1
    311119 2002  202 169            17318.2           12735.1
    311119 2002  203 170            17318.2           12735.1
    311119 2002  204 171            17318.2           12735.1
    311119 2003  301 172 22790.600000000002           17382.9
    311119 2003  302 173 22790.600000000002           17382.9
    311119 2003  303 174 22790.600000000002           17382.9
    311119 2003  304 175 22790.600000000002           17382.9
    311119 2004  401 176            19793.7           13965.5
    311119 2004  402 177            19793.7           13965.5
    311119 2004  403 178            19793.7           13965.5
    311119 2004  404 179            19793.7           13965.5
    311119 2005  501 180          20091.555         13488.947
    311119 2005  502 181          20091.555         13488.947
    311119 2005  503 182          20091.555         13488.947
    311119 2005  504 183          20091.555         13488.947
    311119 2006  601 184          20391.109         13711.802
    311119 2006  602 185          20391.109         13711.802
    311119 2006  603 186          20391.109         13711.802
    311119 2006  604 187          20391.109         13711.802
    311119 2007  701 188          24615.987         18424.872
    311119 2007  702 189          24615.987         18424.872
    311119 2007  703 190          24615.987         18424.872
    311119 2007  704 191          24615.987         18424.872
    311119 2008  801 192            31008.4         23673.331
    311119 2008  802 193            31008.4         23673.331
    311119 2008  803 194            31008.4         23673.331
    311119 2008  804 195            31008.4         23673.331
    311119 2009  901 196          30063.958          21910.43
    311119 2009  902 197          30063.958          21910.43
    311119 2009  903 198          30063.958          21910.43
    311119 2009  904 199          30063.958          21910.43
    311119 2010 1001 200          27759.287         21079.425
    311119 2010 1002 201          27759.287         21079.425
    311119 2010 1003 202          27759.287         21079.425
    311119 2010 1004 203          27759.287         21079.425
    311119 2011 1101 204          33496.767         26584.376
    311119 2011 1102 205          33496.767         26584.376
    311119 2011 1103 206          33496.767         26584.376
    311119 2011 1104 207          33496.767         26584.376
    311119 2012 1201 208     36391.08984375   29941.099609375
    311119 2012 1202 209     36391.08984375   29941.099609375
    311119 2012 1203 210     36391.08984375   29941.099609375
    end

  • #2
    In your first dataset, do:
    Code:
    gen ts = yq(year, qtr)
    In your second dataset, the variable ts is already your year/quarter variable.

    Optionally, in both datasets do:
    Code:
    format ts %tq
    Then you can append both datasets if they are for different periods but same variables, or merge if they are for same periods but different variables.
    Appending them will mean you need to keep the same variable names in both sets for variables that hold the same information

    Comment


    • #3
      Thanks, is there a way to alter the first dataset so that qtr is in the right form that when I join the datasets it makes sense to precede the values of eg 2001Q1 - 101, 2001Q2- 1-02 etc
      as opposed to just being in 1 2 3 4 for the quarters of each year?

      Hope this makes sense

      Comment


      • #4
        Depends on what you'd make of it. If 2001q1 is 101, then 2000q1 could be 001, but what would 1999q1 be? 901 is taken by 2009q1 already.
        If ti were me i'd just forget about that variable. You have your years, quarters, and a stata quarter-variable. The variable doesn't add any info and isnt particularly intuitive.
        Last edited by Jorrit Gosens; 01 Mar 2019, 05:26.

        Comment

        Working...
        X