Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • merge panels with different dates

    Hi there,

    I have two-panel datasets as below and I want to merge the data together. However, since these two have different dates, I want to use the dates in dataset 2 as the panel structure. I want to calculate until time t in the second dataset, what's the total sales.

    For instance, until 12-Aug-10, the sales is 2000+1000=3000 etc.

    I wonder if anyone knows the command to realize this?

    Thanks!



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte firm str9 date int sales
    1 "9-Aug-10"   2000
    1 "10-Aug-10"  1000
    1 "13-Sep-10" 12000
    1 "12-Nov-10"   300
    1 "13-Dec-10"  1000
    2 "9-Oct-10"   1000
    2 "20-Oct-10"  2000
    2 "11-Nov-10"  2000
    2 "12-Nov-10"   300
    2 "13-Dec-10"  4000
    end

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte firm str9 date int v1
    1 "12-Aug-10" 300
    1 "15-Aug-10" 100
    1 "15-Nov-10" 200
    1 "20-Dec-10" 100
    2 "25-Oct-10" 200
    2 "10-Nov-10" 100
    2 "15-Dec-10" 100
    end

  • #2
    The first obstacle to overcome is that your dates are string variables, which cannot be used for the purposes as hand. So the first thing that must happen in each data set is converting those to Stata internal format numeric daily date variables.

    The next step is to combine the two data sets, pairing up each observation in the second data set with all observations of the same firm in the first data set that precede it chronologically. Then we aggregate the data to the level of the firm and date from the second data set.

    You do not say if "until time t" includes time t itself or ends the day before. In this code I assume that time t itself is included. If that is not what you want, change 0 to -1 in the -rangejoin- command below.

    Code:
    use `dataset1', clear
    gen _date = daily(date, "DM20Y"), after(date)
    assert missing(_date) == missing(date)
    format _date %td
    drop date
    rename _date date
    isid firm date, sort
    tempfile holding
    save `holding'
    
    use `dataset2', clear
    gen _date = daily(date, "DM20Y")
    assert missing(_date) == missing(date)
    format _date %td
    drop date
    rename _date date
    isid firm date, sort
    
    rangejoin date . 0 using `holding', by(firm)
    collapse (first) v1 (sum) sales, by(firm date)
    -rangejoin- is written by Robert Picard and is available from SSC. To use -rangejoin-, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      The first obstacle to overcome is that your dates are string variables, which cannot be used for the purposes as hand. So the first thing that must happen in each data set is converting those to Stata internal format numeric daily date variables.

      The next step is to combine the two data sets, pairing up each observation in the second data set with all observations of the same firm in the first data set that precede it chronologically. Then we aggregate the data to the level of the firm and date from the second data set.

      You do not say if "until time t" includes time t itself or ends the day before. In this code I assume that time t itself is included. If that is not what you want, change 0 to -1 in the -rangejoin- command below.

      Code:
      use `dataset1', clear
      gen _date = daily(date, "DM20Y"), after(date)
      assert missing(_date) == missing(date)
      format _date %td
      drop date
      rename _date date
      isid firm date, sort
      tempfile holding
      save `holding'
      
      use `dataset2', clear
      gen _date = daily(date, "DM20Y")
      assert missing(_date) == missing(date)
      format _date %td
      drop date
      rename _date date
      isid firm date, sort
      
      rangejoin date . 0 using `holding', by(firm)
      collapse (first) v1 (sum) sales, by(firm date)
      -rangejoin- is written by Robert Picard and is available from SSC. To use -rangejoin-, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.
      Hi Clyde,

      Thanks a lot for your help. The code works perfectly for me. I really appreciate it. However, I don't know how can apply this to another case I have and I wonder if you could take a look:

      I have another dataset that looks as below. In this case, I want to calculate the variance and the moving average of the ratings until t (where t is the time in dataset2 above). I can't do the daily calculation first and then merge with dataset2, but to merge to dataset2 first and calculate the average and variance directly. And I'm not sure if I can still use the previous code to realize this and I wonder if you know how to revise the previous code. Thanks a lot!

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte firm str9 date byte ratings
      1 "9-Aug-10"  3
      1 "9-Aug-10"  4
      1 "9-Aug-10"  5
      1 "13-Aug-10" 3
      1 "13-Aug-10" 4
      1 "12-Nov-10" 4
      1 "12-Nov-10" 5
      1 "13-Dec-10" 5
      2 "9-Oct-10"  3
      2 "9-Oct-10"  4
      2 "11-Nov-10" 3
      2 "11-Nov-10" 4
      2 "13-Dec-10" 4
      end

      Comment


      • #4
        I can get you most of the way there.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input byte firm float date int v1 double sales
        1 18486 300  3000
        1 18489 100  3000
        1 18581 200 15300
        1 18616 100 16300
        2 18560 200  3000
        2 18576 100  3000
        2 18611 100  9300
        end
        format %td date
        tempfile from_2
        save `from_2'
        
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte firm str9 date byte ratings
        1 "9-Aug-10"  3
        1 "9-Aug-10"  4
        1 "9-Aug-10"  5
        1 "13-Aug-10" 3
        1 "13-Aug-10" 4
        1 "12-Nov-10" 4
        1 "12-Nov-10" 5
        1 "13-Dec-10" 5
        2 "9-Oct-10"  3
        2 "9-Oct-10"  4
        2 "11-Nov-10" 3
        2 "11-Nov-10" 4
        2 "13-Dec-10" 4
        end
        tempfile dataset3
        save `dataset3'
        
        use `dataset3', clear
        gen _date = daily(date, "DM20Y")
        assert missing(_date) == missing(date)
        format _date %td
        drop date
        rename _date date
        tempfile holding
        save `holding'
        
        use `from_2', clear
        rangejoin date . 0 using `holding', by(firm)
        collapse (mean) mean_ratings = ratings (sd) sd_ratings = ratings ///
            (first) v1 sales, by(firm date)
        The data set that I have called `from_2' is the results of the code shown in #2. The code joins that with the new data set, which I'm calling dataset3, to bring in the mean and standard deviation of all ratings up to the date in `from_2' (which is the same date as in dataset2 in #1.

        But I don't know what you mean by a moving average in this context. The data in dataset3 does not lend itself to moving averages: in fact there seem to be multiple observations on each date rather than a time series, so I wouldn't even know how to make sense of the term "moving average" in this context. If you mean after combining with `from_2' you now want to do a moving average of the mean ratings within a firm, that could make sense, but then you have to specify what window you want the moving average to be taken over. And you will need to specify the window with exceptional clarity here because the dates in `from_2' are erratically spaced.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          I can get you most of the way there.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte firm float date int v1 double sales
          1 18486 300 3000
          1 18489 100 3000
          1 18581 200 15300
          1 18616 100 16300
          2 18560 200 3000
          2 18576 100 3000
          2 18611 100 9300
          end
          format %td date
          tempfile from_2
          save `from_2'
          
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte firm str9 date byte ratings
          1 "9-Aug-10" 3
          1 "9-Aug-10" 4
          1 "9-Aug-10" 5
          1 "13-Aug-10" 3
          1 "13-Aug-10" 4
          1 "12-Nov-10" 4
          1 "12-Nov-10" 5
          1 "13-Dec-10" 5
          2 "9-Oct-10" 3
          2 "9-Oct-10" 4
          2 "11-Nov-10" 3
          2 "11-Nov-10" 4
          2 "13-Dec-10" 4
          end
          tempfile dataset3
          save `dataset3'
          
          use `dataset3', clear
          gen _date = daily(date, "DM20Y")
          assert missing(_date) == missing(date)
          format _date %td
          drop date
          rename _date date
          tempfile holding
          save `holding'
          
          use `from_2', clear
          rangejoin date . 0 using `holding', by(firm)
          collapse (mean) mean_ratings = ratings (sd) sd_ratings = ratings ///
          (first) v1 sales, by(firm date)
          The data set that I have called `from_2' is the results of the code shown in #2. The code joins that with the new data set, which I'm calling dataset3, to bring in the mean and standard deviation of all ratings up to the date in `from_2' (which is the same date as in dataset2 in #1.

          But I don't know what you mean by a moving average in this context. The data in dataset3 does not lend itself to moving averages: in fact there seem to be multiple observations on each date rather than a time series, so I wouldn't even know how to make sense of the term "moving average" in this context. If you mean after combining with `from_2' you now want to do a moving average of the mean ratings within a firm, that could make sense, but then you have to specify what window you want the moving average to be taken over. And you will need to specify the window with exceptional clarity here because the dates in `from_2' are erratically spaced.
          Hi Clyde,

          Thank you so much for your help. I really appreciate it.

          This is exactly what I need and I've successfully solved the issue.

          Comment

          Working...
          X