Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging monthly and daily dataset

    I want to merge a UK cci dataset to my master dataset, but the UK cci is monthly whereas the master is daily. How can I merge them so the UK CCI values coincide with the first day of each month to my master dataset?

  • #2
    The details depend on the specific ways in which the dates are represented in your data sets. You do not show example data, so the most crucial information for solving this problem is unrevealed.

    Assuming that in your UK cci data set, the date variable is a Stata internal format monthly date, create a new date variable from that one by applying the -dofm()- function. This new variable will then contain the first day of the month as a Stata internal format daily date. Use that variable to link to the Stata internal format daily date variable in your master data set. (Also assuming that in the master data set the date variable is a Stata internal format daily date.)

    Comment


    • #3
      Thanks, Clyde, that seems to have worked!

      I am just encountering an issue now where it won't merge datasets as the date variables in the master are of 'int' type (still American date format 1/01/2004) and the other is of string type (also still the same- 1/01/2004). How do I convert the date variable in the master to of string type?

      Comment


      • #4
        -help daily()-

        Comment


        • #5
          It has come to my attention that I misread your question in #2. -daily- is used to convert strings to Stata internal format date variables.

          To go the other way, you have to use the -strofreal()- function and specify a %td type format.

          That said, dates as string variables are pretty close to useless in Stata. I would urge you to work only with the Stata internal format date variables (int types), and not with strings. So instead of converting both to strings for the purposes of merging, convert both to int type for merging.

          It is also difficult to understand how you have arrived at this impasse. According to #1, the UK CCI data set is your -using- data set for the -merge-, and if you used the dofm() function as suggested in #2, one of two things must have happened:

          1. If the CCI date variable is a string, then the use of -dofm()- would have led to a "type mismatch" error message, because its argument must be a Stata internal format date variable (type int).
          2. If the CCI date variable is, as #2 explicitly assumed, a Stata internal format monthly date variable, then the result of applying -dofm()- to it would be a Stata internal format daily date variable, and so would match with the int type in the master data.

          Bottom line: you have not shown example data for these data sets. So I am wasting my time and yours speculating about what might be in them and how they would be brought into alignment with each other. If you cannot figure it out on your own from here, you cannot expect further help to be forthcoming unless you fire up the -dataex- command and post example data from both data sets.
          Last edited by Clyde Schechter; 03 Jan 2023, 11:38.

          Comment


          • #6
            Sorry Clyde I wasn't aware you could show data using dataex

            Code:
            input float CCI str10 date
             101.244 "01/01/2004"
            101.3045 "02/01/2004"
            101.3478 "03/01/2004"
            101.2077 "04/01/2004"
            100.9792 "05/01/2004"
            100.8247 "06/01/2004"
            100.7911 "07/01/2004"
            100.7837 "08/01/2004"
            This is monthly CCI data

            Code:
            date    cost    cheap    donation
            1/1/2004    10    70    0
            1/2/2004    15    75    0
            1/3/2004    10    65    0
            1/4/2004    11    76    0
            1/5/2004    0    44    0
            1/6/2004    15    74    0
            1/7/2004    10    67    5
            1/8/2004    16    64    0
            1/9/2004    15    58    3
            1/10/2004    14    79    0
            1/11/2004    14    81    0
            1/12/2004    19    74    0
            1/13/2004    19    70    0
            1/14/2004    17    65    0
            1/15/2004    20    61    0
            1/16/2004    14    63    0
            1/17/2004    33    66    15
            1/18/2004    14    80    0
            This is my master file data, where to data is an 'int' type- I can't seem to show this using dataex as I get a file disk full error (I think the dataset is too large)

            Comment


            • #7
              What you have shown for the master file is not -dataex- output. It looks like it comes from -list-. That's not the same thing and it doesn't have all the information needed. Use -dataex-.

              Comment


              • #8
                This is from the master file

                Code:
                input int date byte(cost cheap donation)
                16071 10 70  0
                16072 15 75  0
                16073 10 65  0
                16074 11 76  0
                16075  0 44  0
                16076 15 74  0
                16077 10 67  5
                16078 16 64  0
                16079 15 58  3
                16080 14 79  0

                Comment


                • #9
                  Better. Now at least we know what we are working with. The key here is to convert the string variable named date in the CCI data to a daily Stata internal format date. Then it is a simple merge to the master data.
                  Code:
                  clear
                  input float CCI str10 date
                  101.244 "01/01/2004"
                  101.3045 "02/01/2004"
                  101.3478 "03/01/2004"
                  101.2077 "04/01/2004"
                  100.9792 "05/01/2004"
                  100.8247 "06/01/2004"
                  100.7911 "07/01/2004"
                  100.7837 "08/01/2004"
                  end
                  tempfile cci
                  save `cci'
                  
                  clear
                  input int date byte(cost cheap donation)
                  16071 10 70  0
                  16072 15 75  0
                  16073 10 65  0
                  16074 11 76  0
                  16075  0 44  0
                  16076 15 74  0
                  16077 10 67  5
                  16078 16 64  0
                  16079 15 58  3
                  16080 14 79  0
                  end
                  format date %tdnn/dd/CCYY
                  tempfile master
                  save `master'
                  
                  use `cci'
                  gen sif_date = daily(date, "MDY")
                  assert missing(sif_date) == missing(date)
                  format sif_date %tdnn/dd/CCYY
                  drop date
                  rename sif_date date
                  
                  merge 1:1 date using `master'
                  In the example data, there is only one date, 1/1/2004, that appears in both data sets, so the rest of the observations in the data go unmatched. Now, from your description, I assume that the other dates that show up in the CCI data set will appear later in the master data set. But the unmatched dates in the master data set, not being first of the month, will not encounter a match in the CCI data.

                  Comment


                  • #10
                    Thanks Clyde, that has worked for my UK cci dataset. I had another dataset 'UK EPU' which was also a string and applied the same code to convert date into 'int' type, but the merge doesn't seem to work.

                    Code:
                    input float daily_policy_index int date
                      50.7 14976
                    110.77 14977
                    179.41 14978
                    183.22 14979
                     89.31 14980
                    112.22 14981
                    114.25 14982
                    365.29 14983
                    260.13 14984
                    268.23 14985
                    140.42 14986
                    182.32 14987
                    179.62 14988
                    132.53 14989
                    249.26 14990
                    208.63 14991
                    262.46 14992
                    183.97 14993
                    190.07 14994
                    This is a subset of the entire dataset with dates from 2004-2022 that do coincide with my master. I have tried the m:1 command and I get this error-
                    Code:
                     merge m:1 date using "/Users/raulathwall/OneDrive/Dis STATA/UK EPU.dta"
                    variable date does not uniquely identify observations in the using data
                    r(459);
                    And also this with 1:1-
                    Code:
                     merge 1:1 date using "/Users/raulathwall/OneDrive/Dis STATA/UK EPU.dta"
                    variable date does not uniquely identify observations in the master data
                    r(459);
                    Not sure what is going wrong here, but would greatly appreciate any help!

                    Comment


                    • #11
                      Not sure what is going wrong here...
                      Neither am I. The situation you describe is impossible given what you say. The code in #9 relies on a -merge 1:1- command, the result of which necessarily results in a data set having date as a unique identifier. So if that -merge- actually worked, and you then tried to -merge 1:1- that result with the EPU data you could not possibly have gotten the result "variable date does not uniquely identify observations in the master data." You might well have gotten that message about the usingdata. So I am quite sure you have not actually done what you say you have done. Or you have done other things along with that which have undone the unique identification of observations by date and not disclosed them.

                      Now as for the "variable date does not uniquely identify observations in the using data" message that you got when you tried -merge m:1-, that is possible. But if true it means that you have some date for which there is more than one observation in the UK EPU data set. I don't know what an EPU is, but since the example data you show has a variable called daily_policy_index, that would suggest that there should never be more than one observation per date. If it is a daily anything, it should be uniquely defined for any given date. So something is wrong with that data set. You need to find the duplicate date observations and fix them. To do that:
                      Code:
                      duplicates tag date, gen(flag)
                      browse if flag
                      will show them to you. Then you have to figure out how those got there, and repair whatever mistakes were made in generating that data set so that they are no longer there.

                      If you are unable to work this out from here, when posting back, please use -dataex- to show new example data from all three data sets. And please choose the example data so that there will be some observations with the same date in each of the data set so that the -merge-'s will actually find some matches all the way across.

                      Comment


                      • #12
                        I am still struggling to see why this won't work, I tried finding duplicates but there weren't any.

                        Here is the master file:
                        Code:
                        input int date byte cost
                        16071 10
                        16072 15
                        16073 10
                        16074 11
                        16075  0
                        16076 15
                        16077 10
                        16078 16
                        16079 15
                        16080 14
                        16081 14
                        16082 19
                        16083 19
                        16084 17
                        Code:
                        input int date float daily_policy_index
                        16071  87.51
                        16072  54.02
                        16073  89.88
                        16074 182.99
                        16075  58.08
                        16076  84.42
                        16077  22.54
                        16078 205.08
                        16079 138.36
                        16080 221.58
                        16081      0
                        16082 142.86
                        The UK CCI dataset seems to merge fine with the master so I don't have any problems there.

                        Comment


                        • #13
                          As you can see below, the -merge- runs without errors in these example data sets.
                          Code:
                          . clear
                          
                          . input int date byte cost
                          
                                   date      cost
                            1. 16071 10
                            2. 16072 15
                            3. 16073 10
                            4. 16074 11
                            5. 16075  0
                            6. 16076 15
                            7. 16077 10
                            8. 16078 16
                            9. 16079 15
                           10. 16080 14
                           11. 16081 14
                           12. 16082 19
                           13. 16083 19
                           14. 16084 17
                           15. end
                          
                          . format date %td
                          
                          . tempfile master
                          
                          . save `master'
                          file C:\Users\clyde\AppData\Local\Temp\ST_bb68_000001.tmp saved as .dta format
                          
                          .
                          . clear
                          
                          . input int date float daily_policy_index
                          
                                   date  daily_p~x
                            1. 16071  87.51
                            2. 16072  54.02
                            3. 16073  89.88
                            4. 16074 182.99
                            5. 16075  58.08
                            6. 16076  84.42
                            7. 16077  22.54
                            8. 16078 205.08
                            9. 16079 138.36
                           10. 16080 221.58
                           11. 16081      0
                           12. 16082 142.86
                           13. end
                          
                          . format date %td
                          
                          . tempfile epu
                          
                          . save `epu'
                          file C:\Users\clyde\AppData\Local\Temp\ST_bb68_000002.tmp saved as .dta format
                          
                          .
                          . use `master', clear
                          
                          . merge 1:1 date using `epu'
                          
                              Result                      Number of obs
                              -----------------------------------------
                              Not matched                             2
                                  from master                         2  (_merge==1)
                                  from using                          0  (_merge==2)
                          
                              Matched                                12  (_merge==3)
                              -----------------------------------------
                          
                          .
                          end of do-file
                          There is no reason they should not do the same on your setup. I have no choice but to conclude that you are doing something different, or the examples you show are not reflective of your actual data sets. And, of course, it is also possible that notwithstanding your statement that the UK CCI data set "seems to merge fine with the master" that it is the source of a problem you have not recognized or not disclosed.

                          If you can post example data sets that actually exhibit the problem you are having, I'll try to help. But at this point, I consider this case closed.

                          Comment

                          Working...
                          X