Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • using monthly and yearly data

    hello everyone. my dependent variable is monthly data(birth) , and my independent variable (GDP, GDP per capita, unemployment, share of women in population, and stringency dataset). my problem and question are that GDP, GDP per capita, and population are annual datasets but other variables are monthly. how can I estimate because some are annual and some months. how can I convert them? many thanks for your advice.

    this is my model : cross-section:
    yi=a+b*Xi+c*GDPGrowth(i)+d*Containment(i)

    y=birth
    X=set of control affecting birth(gdpper capita, unemployment ,share of women in population)
    Containment=average measure of restriction
    this is some part of my data:
    Code:
     
    Country month birth unem
    AUT January 6,932 4.4
    AUT February 6,327 4.5
    AUT March 6,802 4.8
    AUT April 6,666 4.9
    AUT May 7,082 5.8
    AUT June 7,099 6
    AUT July 7,679 5.8
    AUT August 7,543 5.6
    AUT September 7,330 5.5
    AUT October 7,195 5.4
    AUT November 6,413 5.5
    AUT December 6,535 5.8
    BEL January 9,656 5.1
    BEL February 8,680 5
    BEL March 9,734 4.9
    BEL April 9,577 4.9
    and
    Code:
     
    Country year GDP
    AUT 2020 -6.997
    BEL 2020 -6.924
    BGR 2020 -4
    CHE 2020 -5.966
    CHL 2020 -4.486
    CZE 2020 -6.5
    DEUTNP 2020 ..
    DNK 2020 -6.5
    ESP 2020 -8
    EST 2020 -7.5
    I am so thankful to receive your advice.

    best regards ,

  • #2
    In general, you would want to have a year variable in both of your datasets, as you now have country in both of your datasets, and you would want to do something like
    Code:
    use montlydata, clear
    merge m:1 country year using yearlydata, keep(master match)

    Comment


    • #3
      @William Lisowski thank you so much for your reply.
      --- I cannot have a variable in both in month? (because GDP is in year? )

      ---when I follow your guidance I recieve error.

      first I import my monthly data, then I save it . (in both the Country1 is the same) , then I import the yearly data and want to merge with monthly, but :

      Code:
      m:1 Country1 year using "C:\Users\khatereh\Desktop\baselin--2\New folder\merge.dta" , keep(master match)
      Code:
      . merge m:1 Country1 year using "C:\Users\khatereh\Desktop\baselin--2\New folder\merge.dta" , keep(master match)
      variables Country1 year do not uniquely identify observations in the using data
      r(459);
      I want to estimate the effect of the gdp, gdpgrowth , unemployment and containment on the birth.

      best regards ,


      Comment


      • #4
        You tell us that your using dataset "C:\Users\khatereh\Desktop\baselin--2\New folder\merge.dt" is your monthly dataset. If that is so, then you have your merge backwards, you need
        Code:
        merge 1:m Country1 year using "C:\Users\khatereh\Desktop\baselin--2\New folder\merge.dta" , keep(master match)
        because for each 1 observation (Country1 and year) in your first dataset you have multiple observations (Country1, year, and month) in your using dataset.

        Comment


        • #5
          @William Lisowski thanks alot for your reply. I could merge. I have just "Counntry1" the same . but when I do
          Code:
            
           merge 1:m Country1 year using "C:\Users\khatereh\Desktop\baselin--2\New folder\merge.dta" , keep(master match)
          I et an error that " variable year is not recognized. so I rename it. but now the problem is that however they merge but just the data of GDP is in my dataset and repeated . and for birth and other monthly data it is emplty. I am so thankful to help me. many thanks, this is my data after merging"
          Code:
           
          Country year1 GDP Country1 GDP1 year birth unem birth1 unem1 _merge
          ALABAMA 2020 104.782 ALABAMA 104.782 matched (3)
          ALASKA 2020 87.747 ALASKA 87.747 matched (3)
          ARIZONA 2020 119.618 ARIZONA 119.618 matched (3)
          ARKANSAS 2020 106.175 ARKANSAS 106.175 matched (3)
          AUT 2020 -6.997 AUT -6.997 matched (3)
          BEL 2020 -6.924 BEL -6.924 matched (3)
          BGR 2020 -4 BGR -4 matched (3)
          CALIFORNIA 2020 126.993 CALIFORNIA 126.993 matched (3)
          CHE 2020 -5.966 CHE -5.966 matched (3)
          CHL 2020 -4.486 CHL -4.486 matched (3)
          COLORADO 2020 128.317 COLORADO 128.317 matched (3)
          CONNECTICUT 2020 98.746 CONNECTICUT 98.746 matched (3)
          CZE 2020 -6.5 CZE -6.5 matched (3)
          many thanks

          Comment


          • #6
            My mistake, I should have written
            Code:
            merge 1:m Country1 year1 using "C:\Users\khatereh\Desktop\baselin--2\New folder\merge.dta" , keep(using match)

            Comment


            • #7
              @ William Lisowski thanks alot for your quick reply. and I apologize if i take your time . I did but there is an error.

              I have one data in monthly and another in yearly. so based on your advice :
              -- I import monthly data
              --then I sort it in terms of Country1
              --then I save it
              --then I import the yearly data , so now I use 1:m (one to many )
              --some thing like this :
              Code:
              merge 1:m Country1 using "C:\Users\khatereh\Desktop\baselin--2\New folder\baseline _response to pandemic recession\sort month strin.dta", keep(using match)
              the only same in my dataset is "Country1'. so I use now just ""Country1'"

              it is just the same item , i am so thankful to help me if I tell wrong.
              but know this is :

              in every situation:
              Code:
              . merge 1:m Country1 year using "C:\Users\khatereh\Desktop\baselin--2\New folder\baseline _response to pandemic recession\sort month st
              > rin.dta", keep(using match)
              variables Country1 year do not uniquely identify observations in the master data
              r(459);
              or
              Code:
              . merge 1:m Country1  using "C:\Users\khatereh\Desktop\baselin--2\New folder\baseline _response to pandemic recession\sort month strin.
              > dta"
              variable Country1 does not uniquely identify observations in the master data
              r(459);
              many thanks for your valuable time and advice

              Comment


              • #8
                this is my dataset : (yearly)

                Code:
                Country    year    GDP    Country1    GDP1
                ALABAMA    2020    104.782    ALABAMA    104.782
                ALASKA    2020    87.747    ALASKA    87.747
                ARIZONA    2020    119.618    ARIZONA    119.618
                ARKANSAS    2020    106.175    ARKANSAS    106.175
                AUT    2020    -6.997    AUT    -6.997
                BEL    2020    -6.924    BEL    -6.924
                BGR    2020    -4    BGR    -4
                CALIFORNIA    2020    126.993    CALIFORNIA    126.993
                CHE    2020    -5.966    CHE    -5.966
                CHL    2020    -4.486    CHL    -4.486
                monthly dataset like this :

                Code:
                Country    month    birth    unem    Country1    month1    birth1    unem1    AverageofC1_Schoolclosing    AverageofC2_Workplaceclosing    AverageofC3_Cancelpubliceven    AverageofC4_Restrictionsonga    AverageofC5_Closepublictrans    AverageofC6_Stayathomerequi    AverageofC7_Restrictionsonin    AverageofC8_Internationaltrav    AverageofE1_Incomesupport    AverageofE2_Debtcontractreli    AverageofE3_Fiscalmeasures    AverageofE4_Internationalsupp    AverageofH1_Publicinformation    AverageofH2_Testingpolicy    AverageofH4_Emergencyinvestme    AverageofH3_Contacttracing    AverageofH5_Investmentinvacc    AverageofH6_FacialCoverings    AverageofH7_Vaccinationpolicy    AverageofH8_Protectionofelde    AverageofConfirmedCases    AverageofConfirmedDeaths    AverageofStringencyIndex    AverageofStringencyLegacyIndex    AverageofStringencyIndexForDis    AB    AverageofGovernmentResponseInd    AD    AverageofContainmentHealthInde    AF    AverageofEconomicSupportIndex    AverageofEconomicSupportIndexF    AI    merge1
                ALABAMA    April    4573    13.2    ALABAMA    April    4573    13.2    3    3    2    4    0    2.1    2    4    2    2    0    0    2    1    0    1    0    0    0    1    87.8    1.0333333    85.56    85.71    85.56    85.71    66.358    66.358    63.338    63.338    87.5    87.5    87.5    matched (3)
                ALABAMA    August    5200    7.1    ALABAMA    August    5200    7.1    1.3548387    1.8064516    2    3.6129032    0    1.1290323    1    3    1    2    0    0    2    2    0    1    0    1.9333333    0    2.8064516    1062.1935    5.1935484    61.855484    65.436129    61.855484    65.436129    58.668387    58.668387    58.122581    58.122581    62.5    62.5    62.5    matched (3)
                ALABAMA    December    4863    4.7    ALABAMA    December    4863    4.7    0    .4516129    1    3    0    1    0    1    1    1.3548387    0    0    2    2    0    1    0    3    0    3    5122.9032    46.677419    31.9    39.654516    31.9    39.654516    41.96871    41.96871    41.340968    41.340968    46.370968    46.370968    46.370968    matched (3)
                ALABAMA    February    4464    4.760189    ALABAMA    February    4464    4.760189    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    matched (3)
                ALABAMA    January    4928    4.911003    ALABAMA    January    4928    4.911003    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    matched (3)
                ALABAMA    July    5230    7.4    ALABAMA    July    5230    7.4    .38709677    1    2    0    0    1    1    3    1.6129032    2    0    0    2    2    0    1    0    0    0    2    110.83871    3    44.952258    58.717742    44.952258    58.717742    46.473548    46.473548    41.99129    41.99129    77.822581    77.822581    77.822581    matched (3)
                ALABAMA    June    4803    7.7    ALABAMA    June    4803    7.7    1.2666667    1.3    2    0    0    1    1    3.3    2    2    0    0    2    1.2    0    1    0    0    0    1.7    101.13333    3    50.153333    66.194    50.153333    66.194    48.318667    48.318667    42.717667    42.717667    87.5    87.5    87.5    matched (3)
                ALABAMA    March    4781    2.6    ALABAMA    March    4781    2.6    1.5483871    .80645161    .19354839    .38709677    0    .4516129    .19354839    1.9354839    0    .12903226    0    .01806452    .90322581    .70967742    0    .41935484    0    0    0    .4516129    10.903226    0    24.013226    29.491935    24.013226    29.491935    17.64    17.64    19.699355    19.699355    3.2258065    3.2258065    3.2258065    matched (3)
                ALABAMA    May    4530    7.9    ALABAMA    May    4530    7.9    2.5483871    2.0967742    2    2.1935484    0    1.8709677    1.9677419    4    2    2    0    0    2    1    0    1    0    0    0    1    100.87097    2.8064516    74.492903    80.719677    74.492903    80.719677    60.130968    60.130968    56.219677    56.219677    87.5    87.5    87.5    matched (3)
                ALABAMA    November    4514    4.7    ALABAMA    November    4514    4.7    1.3    0    1.7666667    3.7666667    0    0    0    1    1    2    0    0    2    2    0    1    0    2    0    3    4672.9667    42.8    35.464    38.264    35.464    38.264    44.426667    44.426667    41.847333    41.847333    62.5    62.5    62.5    matched (3)
                ALABAMA    October    4821    4.7    ALABAMA    October    4821    4.7    2    1.3548387    2    4    0    1.3548387    0    1    1    2    0    0    2    2    0    1    0    2.6774194    0    3    4276.6129    33.419355    49.837742    55.439677    49.837742    55.439677    53.57129    53.57129    52.291935    52.291935    62.5    62.5    62.5    matched (3)
                ALABAMA    September    4980    6.7    ALABAMA    September    4980    6.7    2.0666667    2    2    4    0    2    .2    2.3333333    1    2    0    0    2    2    0    1    0    3    0    3    3158.8667    19.966667    60.155333    67.381333    60.155333    67.381333    59.88    59.88    59.502667    59.502667    62.5    62.5    62.5    matched (3)
                ALASKA    April    761    11.8    ALASKA    April    761    11.8    3    3    2    3.4666667    2    2    1.6333333    3    0    0    0    0    2    1    6499258.3    1    0    .13333333    0    3    902.46667    29.666667    80.74    82.457333    80.74    82.457333    56.042    56.042    64.047667    64.047667    0    0    0    matched (3)
                ALASKA    August    792    7.2    ALASKA    August    792    7.2    2.1290323    1.7096774    .90322581    2.1935484    1.6129032    1.6129032    1.3548387    1    0    0    0    1270967.7    2    1.9032258    0    1.4516129    0    3    0    3    37525.71    1358.0968    55.40871    59.869032    55.40871    59.869032    50.605484    50.605484    57.836774    57.836774    0    0    0    matched (3)
                ALASKA    December    805    6.5    ALASKA    December    805    6.5    1    0    0    0    0    0    0    1    0    0    0    0    1    3    0    1    0    3    0    0    49103.516    2011.5161    12.04    20.24    12.04    20.24    20.83    20.83    23.81    23.81    0    0    0    matched (3)
                ALASKA    February    712    5.1    ALASKA    February    712    5.1    0    0    0    0    0    0    0    .72413793    0    0    0    0    0    .24137931    0    .20689655    0    0    0    0    .20689655    0    2.0106897    2.5851724    2.0106897    2.5851724    2.2817241    2.2817241    2.6068966    2.6068966    0    0    0    matched (3)
                ALASKA    January    745    5    ALASKA    January    745    5    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    matched (3)
                the only same item is "Country1"


                many thanks .

                best regards ,

                Comment


                • #9
                  First, thank you for the much more complete description. It has given me a better idea of your data, and some ideas on what is needed to move forward.

                  In your yearly data do you have observations for several years, or just for 2020? Usually we use "yearly" to mean more than just one year, but perhaps you mean that it is for the year 2020 only.

                  So that is one question I ask you to clarify. If your yearly data is for more than just 2020, then I don't understand - what year is your montly data supposed to match to?

                  If indeed your data is for 2020 only, then the next ideas might help. Certainly the first one is a good idea in any event, because you should always review the results of importing data to be sure that what you got is what you expected to get - no more and no less.

                  You tell us you "import" your data, and your monthly dataset strongly suggests you are importing Excel worksheets. If you are not using the cellrange() option to tell Stata what rows and columns to use, it is possible that Stata is guessing incorrectly based on misleading information in the worksheet. So you should try importing each of your worksheets, and then using Stata's Data Browser window, and scroll to the bottom and to the right and be sure that the last numbered observation and last named variable actually have data in them. If Stata added several rows at the end that have a missing Country1, those are likely the source of your problem.

                  Here is further "homework". If you run the command
                  Code:
                  duplicates report Country1
                  on your yearly data you might get output like the following made-up example
                  Code:
                  . duplicates report Country1
                  
                  Duplicates in terms of Country1
                  
                  --------------------------------------
                     Copies | Observations       Surplus
                  ----------+---------------------------
                          1 |           70             0
                          2 |            4             2
                  --------------------------------------
                  This tells us that 70 values appear only one time (a total of 70 observations), and that 2 values appear twice (a total of 4 observations). Now
                  Code:
                  . duplicates list Country1
                  
                  Duplicates in terms of Country1
                  
                    +------------------------+
                    | Group   Obs   Country1 |
                    |------------------------|
                    |     1    35          1 |
                    |     1    47          1 |
                    |     2    25          2 |
                    |     2    41          2 |
                    +------------------------+
                  tells us that Country 1 is 1 in both observations 35 and 47, and was 2 in observations 25 and 41. (I warned you this was made-up date, I was too lazy to use real country names.) This could help you find duplicates if browsing your imported data doesn't reveal the problem.

                  Let us know what you find!

                  Comment


                  • #10
                    thanks for your reply. both my data either monthly or yearly are for 2020.(just) . one in monthly (from January to Dec) it is included birth, unemployment ,and stringency index.
                    , and another is in yearly (GDP)
                    and the only similarity is Country1

                    regards
                    Last edited by Khati Zolfaghari; 16 Aug 2021, 15:41.

                    Comment


                    • #11
                      thanks alot. and appoloigze alot. now I did in this way but again nothing . I saw the last row and column was fill also .

                      in monthly data :
                      Code:
                       egen id=group(Country1)
                      Code:
                      . duplicates report id
                      
                      Duplicates in terms of id
                      
                      --------------------------------------
                         copies | observations       surplus
                      ----------+---------------------------
                             12 |          936           858
                      --------------------------------------
                      and in my yearly data:
                      Code:
                      egen id=group(Country1)
                      Code:
                      . duplicates report id
                      
                      Duplicates in terms of id
                      
                      --------------------------------------
                         copies | observations       surplus
                      ----------+---------------------------
                              1 |           84             0
                              3 |            3             2
                      --------------------------------------
                      but the error:
                      Code:
                      merge 1:m id using "C:\Users\khatereh\Desktop\baselin--2\New folder\baseline _response to pandemic recession\dupli-unem.dta"
                      variable Country1 does not uniquely identify observations in the master data
                      r(459);

                      or even :
                      Code:
                      . merge 1:m Country1 using "C:\Users\khatereh\Desktop\baselin--2\New folder\baseline _response to pandemic recession\dupli-unem.dta"
                      variable Country1 does not uniquely identify observations in the master data
                      r(459);
                      many thanks for your valuable time and advice.

                      I apologize a try a lot but i get error several times.

                      Comment


                      • #12
                        in monthly data :
                        Code:
                        egen id=group(Country1)
                        Code:
                        . duplicates report id

                        Duplicates in terms of id

                        --------------------------------------
                        copies | observations surplus
                        ----------+---------------------------
                        12 | 936 858
                        --------------------------------------
                        This is what we expect - each country appears exactly 12 times, as you expect.

                        and in my yearly data:
                        Code:
                        egen id=group(Country1)
                        Code:
                        . duplicates report id

                        Duplicates in terms of id

                        --------------------------------------
                        copies | observations surplus
                        ----------+---------------------------
                        1 | 84 0
                        3 | 3 2
                        --------------------------------------
                        This is not what we expect: one of your countries appears 3 times. That is what merge is telling you - that your Country1 values do not always identify just one observation apiece in your yearly data, even though you think they do.

                        You have an error in your yearly data that you need to correct.
                        Code:
                        duplicates tag Country1, generate(copied)
                        list if copied>0
                        will list the observations that have the same value of Country1.
                        Last edited by William Lisowski; 16 Aug 2021, 19:10.

                        Comment


                        • #13
                          @William Lisowski thanks alot, yes one of my countries in yearly data was repeated 3 times. thanks a lot . I could merge

                          Best regards,

                          Comment


                          • #14
                            William Lisowski ,I apologize for asking . Actually there is a problem. My data is merged based on your guidance but because one is monthly and another yearly I have lot's of missing .
                            so I cannot run regression.
                            I am so thankful to receive your advice.

                            Best regards,

                            Comment


                            • #15
                              Here is an example that shows that there are not missing data caused by one dataset being yearly and the other dataset being monthly. I used post #8 to create a yearly and monthly dataset for ALABAMA and merged them.
                              Code:
                              // create example datasets for ALABAMA data only
                              
                              * Example generated by -dataex-. For more info, type help dataex
                              clear
                              input str10 country int year float gdp 
                              "ALABAMA"    2020 104.782 
                              end
                              save yearly, replace
                              
                              * Example generated by -dataex-. For more info, type help dataex
                              clear
                              input str7 country str9 month int birth float(unem averageofc1_schoolclosing averageofc2_workplaceclosing)
                              "ALABAMA" "April"     4573     13.2         3         3
                              "ALABAMA" "August"    5200      7.1 1.3548387 1.8064516
                              "ALABAMA" "December"  4863      4.7         0  .4516129
                              "ALABAMA" "February"  4464 4.760189         0         0
                              "ALABAMA" "January"   4928 4.911003         0         0
                              "ALABAMA" "July"      5230      7.4  .3870968         1
                              "ALABAMA" "June"      4803      7.7 1.2666667       1.3
                              "ALABAMA" "March"     4781      2.6  1.548387  .8064516
                              "ALABAMA" "May"       4530      7.9  2.548387  2.096774
                              "ALABAMA" "November"  4514      4.7       1.3         0
                              "ALABAMA" "October"   4821      4.7         2 1.3548387
                              "ALABAMA" "September" 4980      6.7 2.0666666         2
                              end
                              save monthly, replace
                              
                              // merge the datasets
                              
                              use yearly, clear
                              merge 1:m country using monthly
                              drop _merge
                              list, clean
                              Code:
                              . list, clean
                              
                                     country   year       gdp       month   birth       unem   a~1_sc~g   a~2_wo~g  
                                1.   ALABAMA   2020   104.782       April    4573       13.2          3          3  
                                2.   ALABAMA   2020   104.782      August    5200        7.1   1.354839   1.806452  
                                3.   ALABAMA   2020   104.782    December    4863        4.7          0   .4516129  
                                4.   ALABAMA   2020   104.782    February    4464   4.760189          0          0  
                                5.   ALABAMA   2020   104.782     January    4928   4.911003          0          0  
                                6.   ALABAMA   2020   104.782        July    5230        7.4   .3870968          1  
                                7.   ALABAMA   2020   104.782        June    4803        7.7   1.266667        1.3  
                                8.   ALABAMA   2020   104.782       March    4781        2.6   1.548387   .8064516  
                                9.   ALABAMA   2020   104.782         May    4530        7.9   2.548387   2.096774  
                               10.   ALABAMA   2020   104.782    November    4514        4.7        1.3          0  
                               11.   ALABAMA   2020   104.782     October    4821        4.7          2   1.354839  
                               12.   ALABAMA   2020   104.782   September    4980        6.7   2.066667          2
                              So if you have missing values, it is not because one dataset is yearly and the other is monthly.

                              Perhaps the problem is that your monthly dataset has 936 observations - 12 months from each of 78 countries - and your yearly dataset has 85 countries. There are certainly countries in your monthly data that are not in your yearly data, and perhaps there are countries in your yearly data that are not in your monthly data.

                              I modified the example data above to include FIJI only in the yearly dataset and FRANCE only in the monthly dataset.
                              Code:
                              . use yearly, clear
                              
                              . merge 1:m country using monthly
                              
                                  Result                      Number of obs
                                  -----------------------------------------
                                  Not matched                            13
                                      from master                         1  (_merge==1)
                                      from using                         12  (_merge==2)
                              
                                  Matched                                12  (_merge==3)
                                  -----------------------------------------
                              
                              . tab country _merge
                              
                                         |    Matching result from merge
                                 country | Master on  Using onl  Matched ( |     Total
                              -----------+---------------------------------+----------
                                 ALABAMA |         0          0         12 |        12 
                                    FIJI |         1          0          0 |         1 
                                  FRANCE |         0         12          0 |        12 
                              -----------+---------------------------------+----------
                                   Total |         1         12         12 |        25
                              You see that the output of merge tells us that one observation from the master dataset (that is, the yearly data in memory) was not matched, and 12 observations from the using dataset (that is, the monthly dataset) were not matched. The results of the tab command list all the countries in either dataset, and the results for that country. We see that ALABAMA has 12 matched observations, FIJI has 1 observation that is only in the master (yearly) dataset, and FRANCE has 12 observations that are only in the using (monthly) dataset.

                              You must do something similar to see what your results are. Perhaps some country names are spelled differently and need to be corrected in one dataset. And perhaps after the merge command you should do
                              Code:
                              keep if _merge==3
                              so that only the matched observations will be kept.

                              Comment

                              Working...
                              X