Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapse Daily Panel Data by Month and Year

    Dear Statalist,

    Excuse me for this rather rudimentary question but it has puzzled me for a few days.

    I have a panel data containing daily observation for ten years (hence 3650+ observations), and I would like to find the sum and mean of certain variables by month and year so that the panel collapses to a monthly panel with 120 observations. My date variable is in the form of 2007.12.31 and is already in date format.

    For summation, I was trying to use the below command but was not sure how should I specify the
    Code:
    by
    option.

    Code:
    preserve
    collapse (sum) varlist1, by()
    Please let me know if I missed anything and thank you very much in advance!

    Yangxi

  • #2
    My date variable is in the form of 2007.12.31 and is already in date format.
    No, that's not true. Stata does not have a "date format." You may have a string variable that contains strings like "2007.12.31" or you may have a numeric variable that carries a %tdCCYY.NN.DD display format. Your description does not distinguish these possibilities. But the complete solution of your problem requires knowing this.

    This is a good example of why it is better to use the -dataex- command to show an example of your data than to attempt to describe it in words. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    That said, if I assume that your date variable is numeric, and is named date, then the solution is to extract a Stata internal format monthly date variable from it, and then -collapse- over that:

    Code:
    gen int mdate = mofd(date)
    format mdate %tm
    collapse (sum) varlist1, by(mdate)

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      No, that's not true. Stata does not have a "date format." You may have a string variable that contains strings like "2007.12.31" or you may have a numeric variable that carries a %tdCCYY.NN.DD display format. Your description does not distinguish these possibilities. But the complete solution of your problem requires knowing this.

      This is a good example of why it is better to use the -dataex- command to show an example of your data than to attempt to describe it in words. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      That said, if I assume that your date variable is numeric, and is named date, then the solution is to extract a Stata internal format monthly date variable from it, and then -collapse- over that:

      Code:
      gen int mdate = mofd(date)
      format mdate %tm
      collapse (sum) varlist1, by(mdate)
      Thank you, Clyde. The data variable I have is indeed numeric displayed in %tdCCYY.NN.DD format. Thank you very much for your code and I will go through the -dataex- command.

      Comment


      • #4
        Dear Clyde Schechter
        I have a panel trade data between i and j from 1990-2017 (this is my dependent variable) and the vector of X (x1,x2,x3....) indepedent variable. It is an Annual data on consecutive years. I need to collapse this data by 4-year average, that is average of 1990-94; 1995-1999, 2000-2004. That is., for dependent variable, each observation is average trade flows between 1990-1994 period between i and j, similarily for other intervals. While, i should get similar transformation on independent variable too (average of say x1 between 1990-194)
        Can you help me, how can we do that.

        Thanks and regards,
        (Ridwan)

        Comment


        • #5
          You're describing five-year averages (e.g. 1990 1991 1992 1993 1994 are 5 years).

          The syntax for collapse to means doesn't hinge on whether you regard variables as dependent or independent. It would be some variation on

          Code:
          collapse  y x1 x2 x3 , by(period)
          where period could be got this way, among others. See e,g, https://journals.sagepub.com/doi/pdf...867X1801800311

          Code:
          . clear
          
          . set obs 28
          Number of observations (_N) was 0, now 28.
          
          . gen year = 1989 + _n
          
          . su year
          
              Variable |        Obs        Mean    Std. dev.       Min        Max
          -------------+---------------------------------------------------------
                  year |         28      2003.5    8.225975       1990       2017
          
          . gen period = 5 * floor(year/5)
          
          . l
          
               +---------------+
               | year   period |
               |---------------|
            1. | 1990     1990 |
            2. | 1991     1990 |
            3. | 1992     1990 |
            4. | 1993     1990 |
            5. | 1994     1990 |
               |---------------|
            6. | 1995     1995 |
            7. | 1996     1995 |
            8. | 1997     1995 |
            9. | 1998     1995 |
           10. | 1999     1995 |
               |---------------|
           11. | 2000     2000 |
           12. | 2001     2000 |
           13. | 2002     2000 |
           14. | 2003     2000 |
           15. | 2004     2000 |
               |---------------|
           16. | 2005     2005 |
           17. | 2006     2005 |
           18. | 2007     2005 |
           19. | 2008     2005 |
           20. | 2009     2005 |
               |---------------|
           21. | 2010     2010 |
           22. | 2011     2010 |
           23. | 2012     2010 |
           24. | 2013     2010 |
           25. | 2014     2010 |
               |---------------|
           26. | 2015     2015 |
           27. | 2016     2015 |
           28. | 2017     2015 |
               +---------------+

          Comment


          • #6
            Clearly if there are 28 years of data another possibility is of 7 periods of 4 years.

            Code:
            gen period = 2 + 4 * floor((year - 2)/4)

            Comment


            • #7
              Thank you very much Nick Cox .
              The collapse this way reduces no. of observations to just 7 based on 4-year averages.
              I have a data on blateral trade flows which is dyad i.e., trade flows (say exports) from iso_i to iso_j at year t. Therefore, after period is defined on either way (4-year average or 5 - year average), say using 4-year averages, i should write

              Code:
              collapse y x1 x2 x3 , by(iso_i iso_j period)
              Where each observation gives average trade flows between say 1990-1993 between iso_i and iso_ j at year t , similarily on other time average intervals.

              My only concern is x1, x2, x3 are all indicator (dummy) variables indicating the presence (or absence) of some agreement between iso_i and iso_j at year t. To further explain it:
              Suppose we consider an interval 1990-1993
              Let x1= 1 if there is presence of agreement at year t=1992, and zero otherwise.
              The
              collapse will also give the average value for x1 between 1990-1993 [i.e., x1=0.25 (1/4) for 1990], in which case it would no longer be dummy.

              I am not sure whether it is still consistent with the gravity model estimation, may be Tom Zylkin has more to offer here.

              Thanks very much for your help.

              regards,
              (Ridwan)

              Comment

              Working...
              X