Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching and generating hourly data

    Dear Stata Users,

    I am working on hourly data sets. The datasets have the "zipcodes" variable as the common var to merge/joinby. In data-A I have the hourly data, which should be merged with data-B.

    Now, in both datasets, I have each zipcodes repeated more than one time, therefore, it is impossible to use merge (m:1, or 1:m). I would use "joinby" to merge the datasets.

    I would like to get some help with generating a variable (such as Uhrzeit) to get in data-B an hourly-time (Uhrzeit) for each zipcode for each year.

    Finally, do I have to generate the hourly-time before or after joining (joinby).

    data-A:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long zipcode str11 stationcode double no2 str7 Uhrzeit float(Datum datum_tag datum_monat datum_jahr)
    63741 "DEBY005" 30.49 "01:00" 20089 1 1 2015
    91522 "DEBY001" 33.75 "01:00" 20089 1 1 2015
    63741 "DEBY005" 30.25 "02:00" 20089 1 1 2015
    91522 "DEBY001" 32.98 "02:00" 20089 1 1 2015
    63741 "DEBY005" 32.94 "03:00" 20089 1 1 2015
    91522 "DEBY001" 30.58 "03:00" 20089 1 1 2015
    63741 "DEBY005" 30.54 "04:00" 20089 1 1 2015
    91522 "DEBY001" 27.21 "04:00" 20089 1 1 2015
    63741 "DEBY005" 29.84 "05:00" 20089 1 1 2015
    91522 "DEBY001" 28.95 "05:00" 20089 1 1 2015
    63741 "DEBY005" 28.39 "06:00" 20089 1 1 2015
    91522 "DEBY001"  27.5 "06:00" 20089 1 1 2015
    63741 "DEBY005" 28.89 "07:00" 20089 1 1 2015
    91522 "DEBY001" 26.32 "07:00" 20089 1 1 2015
    63741 "DEBY005" 31.39 "08:00" 20089 1 1 2015
    91522 "DEBY001" 27.57 "08:00" 20089 1 1 2015
    63741 "DEBY005" 33.36 "09:00" 20089 1 1 2015
    91522 "DEBY001" 25.35 "09:00" 20089 1 1 2015
    63741 "DEBY005" 28.75 "10:00" 20089 1 1 2015
    91522 "DEBY001" 21.24 "10:00" 20089 1 1 2015
    63741 "DEBY005" 27.13 "11:00" 20089 1 1 2015
    91522 "DEBY001" 20.72 "11:00" 20089 1 1 2015
    63741 "DEBY005" 23.34 "12:00" 20089 1 1 2015
    91522 "DEBY001" 25.09 "12:00" 20089 1 1 2015
    63741 "DEBY005" 30.87 "13:00" 20089 1 1 2015
    91522 "DEBY001" 21.51 "13:00" 20089 1 1 2015
    63741 "DEBY005" 35.34 "14:00" 20089 1 1 2015
    91522 "DEBY001" 23.25 "14:00" 20089 1 1 2015
    63741 "DEBY005" 38.24 "15:00" 20089 1 1 2015
    91522 "DEBY001"    28 "15:00" 20089 1 1 2015
    63741 "DEBY005" 34.68 "16:00" 20089 1 1 2015
    91522 "DEBY001" 21.67 "16:00" 20089 1 1 2015
    63741 "DEBY005" 42.17 "17:00" 20089 1 1 2015
    91522 "DEBY001" 24.92 "17:00" 20089 1 1 2015
    63741 "DEBY005" 43.97 "18:00" 20089 1 1 2015
    91522 "DEBY001" 29.73 "18:00" 20089 1 1 2015
    63741 "DEBY005" 33.09 "19:00" 20089 1 1 2015
    91522 "DEBY001"  27.7 "19:00" 20089 1 1 2015
    63741 "DEBY005" 20.34 "20:00" 20089 1 1 2015
    91522 "DEBY001" 23.79 "20:00" 20089 1 1 2015
    63741 "DEBY005" 17.08 "21:00" 20089 1 1 2015
    91522 "DEBY001" 18.47 "21:00" 20089 1 1 2015
    63741 "DEBY005" 18.25 "22:00" 20089 1 1 2015
    91522 "DEBY001" 18.46 "22:00" 20089 1 1 2015
    63741 "DEBY005" 18.69 "23:00" 20089 1 1 2015
    91522 "DEBY001" 17.73 "23:00" 20089 1 1 2015
    63741 "DEBY005" 21.54 "24:00" 20089 1 1 2015
    91522 "DEBY001" 13.44 "24:00" 20089 1 1 2015
    63741 "DEBY005" 15.73 "01:00" 20090 2 1 2015
    91522 "DEBY001" 14.07 "01:00" 20090 2 1 2015
    63741 "DEBY005" 13.33 "02:00" 20090 2 1 2015
    91522 "DEBY001" 15.11 "02:00" 20090 2 1 2015
    63741 "DEBY005" 12.15 "03:00" 20090 2 1 2015
    91522 "DEBY001" 18.38 "03:00" 20090 2 1 2015
    63741 "DEBY005" 15.88 "04:00" 20090 2 1 2015
    91522 "DEBY001" 21.81 "04:00" 20090 2 1 2015
    63741 "DEBY005" 39.34 "05:00" 20090 2 1 2015
    91522 "DEBY001" 23.83 "05:00" 20090 2 1 2015
    63741 "DEBY005" 36.97 "06:00" 20090 2 1 2015
    91522 "DEBY001" 21.01 "06:00" 20090 2 1 2015
    63741 "DEBY005" 29.83 "07:00" 20090 2 1 2015
    91522 "DEBY001" 40.23 "07:00" 20090 2 1 2015
    63741 "DEBY005" 30.41 "08:00" 20090 2 1 2015
    91522 "DEBY001" 45.88 "08:00" 20090 2 1 2015
    63741 "DEBY005" 23.77 "09:00" 20090 2 1 2015
    91522 "DEBY001" 53.98 "09:00" 20090 2 1 2015
    63741 "DEBY005" 35.29 "10:00" 20090 2 1 2015
    91522 "DEBY001" 39.31 "10:00" 20090 2 1 2015
    63741 "DEBY005" 19.62 "11:00" 20090 2 1 2015
    91522 "DEBY001" 44.46 "11:00" 20090 2 1 2015
    63741 "DEBY005" 15.99 "12:00" 20090 2 1 2015
    91522 "DEBY001" 44.58 "12:00" 20090 2 1 2015
    63741 "DEBY005" 20.42 "13:00" 20090 2 1 2015
    91522 "DEBY001"  33.5 "13:00" 20090 2 1 2015
    63741 "DEBY005" 19.29 "14:00" 20090 2 1 2015
    91522 "DEBY001" 35.01 "14:00" 20090 2 1 2015
    63741 "DEBY005" 22.52 "15:00" 20090 2 1 2015
    91522 "DEBY001" 34.49 "15:00" 20090 2 1 2015
    63741 "DEBY005" 25.37 "16:00" 20090 2 1 2015
    91522 "DEBY001" 33.18 "16:00" 20090 2 1 2015
    63741 "DEBY005" 25.44 "17:00" 20090 2 1 2015
    91522 "DEBY001" 32.39 "17:00" 20090 2 1 2015
    63741 "DEBY005" 32.56 "18:00" 20090 2 1 2015
    91522 "DEBY001" 35.52 "18:00" 20090 2 1 2015
    63741 "DEBY005"  35.9 "19:00" 20090 2 1 2015
    91522 "DEBY001" 29.65 "19:00" 20090 2 1 2015
    63741 "DEBY005" 23.61 "20:00" 20090 2 1 2015
    91522 "DEBY001" 23.87 "20:00" 20090 2 1 2015
    63741 "DEBY005" 33.62 "21:00" 20090 2 1 2015
    91522 "DEBY001" 22.03 "21:00" 20090 2 1 2015
    63741 "DEBY005" 38.14 "22:00" 20090 2 1 2015
    91522 "DEBY001" 17.59 "22:00" 20090 2 1 2015
    63741 "DEBY005"    32 "23:00" 20090 2 1 2015
    91522 "DEBY001" 14.65 "23:00" 20090 2 1 2015
    63741 "DEBY005" 30.47 "24:00" 20090 2 1 2015
    91522 "DEBY001" 11.15 "24:00" 20090 2 1 2015
    63741 "DEBY005" 45.94 "01:00" 20091 3 1 2015
    91522 "DEBY001" 10.94 "01:00" 20091 3 1 2015
    63741 "DEBY005" 53.28 "02:00" 20091 3 1 2015
    91522 "DEBY001" 10.15 "02:00" 20091 3 1 2015
    end
    format %td Datum


    data-B:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long zipcode str13 city float population str5 areakm2 str10 county str14 street_type str11 location str7 speedlimit
    63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
    63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
    63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
    63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
    63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
    63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
    63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
    63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
    63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
    63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
    63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
    63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "innerorts"   "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Ortsstraße"    "innerorts"   "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Kreisstraßen"  "innerorts"   "50"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "50"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "50"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Ortsstraße"    "innerorts"   "Vz325"  
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Ortsstraße"    "innerorts"   "Vz325"  
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "50"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "50"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "60"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "60"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "70"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Ortsstraße"    "innerorts"   "Vz325"  
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Ortsstraße"    "innerorts"   "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Ortsstraße"    "innerorts"   "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "60"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       " innerorts" "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "innerorts"   "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "Schritt"
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Ortsstraße"    "innerorts"   "30"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "100"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "außerorts"  "70"    
    91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Sonstige"       "innerorts"   "30"    
    end
    If you need further clarification, I would be happy to do it.

    Best,
    Ami

  • #2
    You perhaps want to use the joinby command to create every combination of observations, 1 from data-A and one from data-B, with the same zipcode.

    But before that, you may want to consider why data-B has a very large number of observations that are identical across all the variables.
    Code:
    . duplicates report
    
    Duplicates in terms of all variables
    
    --------------------------------------
       copies | observations       surplus
    ----------+---------------------------
            1 |            3             0
            2 |           12             6
            3 |            3             2
            4 |            4             3
            5 |           10             8
            6 |            6             5
            8 |            8             7
           10 |           20            18
    --------------------------------------

    Comment


    • #3
      Hi William,

      Thank you for recommending the joinby command which I have already tried and it worked well.

      There are no duplicates. Actually, when I created the example, using -dataex-, Stata gave me an error and I have had to withdraw some variables to create the example.

      Code:
      . dataex
      input statement exceeds linesize limit. Try specifying fewer variables
      r(1000);
      But how would I generate the hourly-time variable for each zipcode in data-B before matching it with data-A?

      Best,

      Comment


      • #4
        I do not understand your question in post #3.

        Let's take a concrete example invented from a subset of your data.
        Code:
        // set up our example data
        
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long zipcode str11 stationcode double no2 str7 Uhrzeit float(Datum datum_tag datum_monat datum_jahr)
        63741 "DEBY005" 30.49 "01:00" 20089 1 1 2015
        63741 "DEBY005" 30.25 "02:00" 20089 1 1 2015
        63741 "DEBY005" 32.94 "03:00" 20089 1 1 2015
        91522 "DEBY001" 33.75 "01:00" 20089 1 1 2015
        91522 "DEBY001" 32.98 "02:00" 20089 1 1 2015
        91522 "DEBY001" 30.58 "03:00" 20089 1 1 2015
        end
        format %td Datum
        tempfile dataA
        save `"`dataA'"'
        cls
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long zipcode str13 city float population str5 areakm2 str10 county str14 street_type str11 location str7 speedlimit
        63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Bundesstraßen" "innerorts"   "50"    
        63741 "Aschaffenburg" 70.527 "62,45" "kreisfreie" "Staatsstraßen" "innerorts"   "50"    
        91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Bundesstraßen" "außerorts"  "100"    
        91522 "Ansbach"       41.847 "99,91" "kreisfreie" "Staatsstraßen" "innerorts"   "30"    
        end
        tempfile dataB
        save `"`dataB'"'
        
        // join the datasets
        clear 
        use `"`dataA'"'
        joinby zipcode using `"`dataB'"'
        sort zipcode Uhrzeit
        list zipcode stationcode Uhrzeit street_type speedlimit, sepby(zipcode Uhrzeit) abbreviate(16) noobs
        Code:
        . list zipcode stationcode Uhrzeit street_type speedlimit, sepby(zipcode Uhrzeit) abbreviate(16)
        >  noobs
        
          +--------------------------------------------------------------+
          | zipcode   stationcode   Uhrzeit     street_type   speedlimit |
          |--------------------------------------------------------------|
          |   63741       DEBY005     01:00   Staatsstraßen           50 |
          |   63741       DEBY005     01:00   Bundesstraßen           50 |
          |--------------------------------------------------------------|
          |   63741       DEBY005     02:00   Staatsstraßen           50 |
          |   63741       DEBY005     02:00   Bundesstraßen           50 |
          |--------------------------------------------------------------|
          |   63741       DEBY005     03:00   Staatsstraßen           50 |
          |   63741       DEBY005     03:00   Bundesstraßen           50 |
          |--------------------------------------------------------------|
          |   91522       DEBY001     01:00   Bundesstraßen          100 |
          |   91522       DEBY001     01:00   Staatsstraßen           30 |
          |--------------------------------------------------------------|
          |   91522       DEBY001     02:00   Bundesstraßen          100 |
          |   91522       DEBY001     02:00   Staatsstraßen           30 |
          |--------------------------------------------------------------|
          |   91522       DEBY001     03:00   Bundesstraßen          100 |
          |   91522       DEBY001     03:00   Staatsstraßen           30 |
          +--------------------------------------------------------------+
        What is it that you want that is different than this?

        Comment


        • #5
          That's it! I don't know why it did not work before. I was thinking about generating the Uhrzeit var.

          Thank you, William.

          Comment


          • #6
            William, it's my bad! I dropped the var and that's why it did not appear after I "joinby" the two datasets.

            Code:
            use info_data, clear
            merge m:1 stationcode using main_outcome
            drop if _merge==2
            drop Uhrzeit Datum station_newcode
            cd "/Users/ami/Desktop/proj3/Reg_data"
            save  data-A, replace
            use data_A, clear
            joinby zipcode using data_B

            Comment

            Working...
            X