Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining quarterly data into yearly

    Hey so I am doing a paper about mental health and how it is both affected by our living standards and how it affects our social economics, well being and productivity.

    Sadly I am fairly new to Stata and am having some trouble with it.

    My data is like so in excel
    Observation County Year 2005 Year 2006 Year 2007 Year 2008 Year 2009 Year 2010
    1 Stockholms county Q1 9334 8480 7999 7114 5830 4474
    2 Stockholms county Q2 8610 7940 7713 6216 5248 4656
    3 Stockholms county Q3 8011 7615 7178 6045 5069 4583
    4 Stockholms county Q4 8468 8019 7195 6147 5110 5458
    5 Uppsala county Q1 1661 1465 1372 1229 978 680
    6 Uppsala county Q2 1441 1311 1354 1067 847 721
    7 Uppsala county Q3 1356 1271 1283 1050 810 763
    8 Uppsala county Q4 1430 1313 1246 1054 841 958

    Which I then import into Stata the data is for active cases of people on sick leave because of mental health issues on a quarterly basis per year for a total of 24 county's (84 obs) from year 2005 - 2015 and split in gender. What I would like to do with the data is to combine the quarterly data into yearly data.
    Any help or points in the right direction on where to read about this would be greatly appreciated.

  • #2
    Welcome to Statalist.

    I see you have yet to get any answers. I'm not able to answer at the moment, but for anyone who is interested, I have transformed your data into dataex format, the requested format for posting data on Statalist. Of course, since you posted what appears to be a copy of data from a spreadsheet, it's not necessarily identical to what you have in your Stata dataset, but it's a start - and this post will bump this topic and perhaps catch someone's attention.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte observation str20 county int(year2005 year2006 year2007 year2008 year2009 year2010)
    1 "Stockholms county Q1" 9334 8480 7999 7114 5830 4474
    2 "Stockholms county Q2" 8610 7940 7713 6216 5248 4656
    3 "Stockholms county Q3" 8011 7615 7178 6045 5069 4583
    4 "Stockholms county Q4" 8468 8019 7195 6147 5110 5458
    5 "Uppsala county Q1"    1661 1465 1372 1229  978  680
    6 "Uppsala county Q2"    1441 1311 1354 1067  847  721
    7 "Uppsala county Q3"    1356 1271 1283 1050  810  763
    8 "Uppsala county Q4"    1430 1313 1246 1054  841  958
    end
    With that said, you should review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. In particular, get your data into Stata and then prepare a sample of it using the dataex command. If you are running Stata 15.1 or later, it is already installed. For earlier versions of Stata, install dataex by typing ssc install dataex. Type help dataex to read the simple instructions for using it. Using dataex will enable those who want to help you to quickly and easily create a 100% faithful replica of your situation to test their ideas and code on.

    The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

    Comment


    • #3
      Here's something that may start you in a useful direction. The trick is to get rid of the quarters from the county names.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte observation str20 county int(year2005 year2006 year2007 year2008 year2009 year2010)
      1 "Stockholms county Q1" 9334 8480 7999 7114 5830 4474
      2 "Stockholms county Q2" 8610 7940 7713 6216 5248 4656
      3 "Stockholms county Q3" 8011 7615 7178 6045 5069 4583
      4 "Stockholms county Q4" 8468 8019 7195 6147 5110 5458
      5 "Uppsala county Q1"    1661 1465 1372 1229  978  680
      6 "Uppsala county Q2"    1441 1311 1354 1067  847  721
      7 "Uppsala county Q3"    1356 1271 1283 1050  810  763
      8 "Uppsala county Q4"    1430 1313 1246 1054  841  958
      end
      // make sure each county ends in one of the quarters
      assert inlist(substr(county,-3,3)," Q1", " Q2", " Q3", " Q4")
      // they all did, so collapse across the quarters
      replace county = substr(county,1,length(county)-3)
      collapse (sum) year*, by(county)
      list, clean noobs
      Code:
      . list, clean noobs
      
                     county   year2005   year2006   year2007   year2008   year2009   year2010  
          Stockholms county      34423      32054      30085      25522      21257      19171  
             Uppsala county       5888       5360       5255       4400       3476       3122

      Comment


      • #4
        Originally posted by William Lisowski View Post
        Welcome to Statalist.

        I see you have yet to get any answers. I'm not able to answer at the moment, but for anyone who is interested, I have transformed your data into dataex format, the requested format for posting data on Statalist. Of course, since you posted what appears to be a copy of data from a spreadsheet, it's not necessarily identical to what you have in your Stata dataset, but it's a start - and this post will bump this topic and perhaps catch someone's attention.

        With that said, you should review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. In particular, get your data into Stata and then prepare a sample of it using the dataex command. If you are running Stata 15.1 or later, it is already installed. For earlier versions of Stata, install dataex by typing ssc install dataex. Type help dataex to read the simple instructions for using it. Using dataex will enable those who want to help you to quickly and easily create a 100% faithful replica of your situation to test their ideas and code on.

        The more you help others understand your problem, the more likely others are to be able to help you solve your problem.
        Thanks for the tip and link I'll have a look.


        Originally posted by William Lisowski View Post
        Here's something that may start you in a useful direction. The trick is to get rid of the quarters from the county names.
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte observation str20 county int(year2005 year2006 year2007 year2008 year2009 year2010)
        1 "Stockholms county Q1" 9334 8480 7999 7114 5830 4474
        2 "Stockholms county Q2" 8610 7940 7713 6216 5248 4656
        3 "Stockholms county Q3" 8011 7615 7178 6045 5069 4583
        4 "Stockholms county Q4" 8468 8019 7195 6147 5110 5458
        5 "Uppsala county Q1" 1661 1465 1372 1229 978 680
        6 "Uppsala county Q2" 1441 1311 1354 1067 847 721
        7 "Uppsala county Q3" 1356 1271 1283 1050 810 763
        8 "Uppsala county Q4" 1430 1313 1246 1054 841 958
        end
        // make sure each county ends in one of the quarters
        assert inlist(substr(county,-3,3)," Q1", " Q2", " Q3", " Q4")
        // they all did, so collapse across the quarters
        replace county = substr(county,1,length(county)-3)
        collapse (sum) year*, by(county)
        list, clean noobs
        Code:
        . list, clean noobs
        
        county year2005 year2006 year2007 year2008 year2009 year2010
        Stockholms county 34423 32054 30085 25522 21257 19171
        Uppsala county 5888 5360 5255 4400 3476 3122
        Thanks for the tip and it helped on another section the problem with doing that for the sick leaves is that, I might not have explained it well enough but the numbers are "active" cases of mental illness we are studying so we can't simply add them up together. This is were the problem is at since the average number of active sick leave would be around 8605,75 for year 2005 in Stockholm and not the total of 34 423. Since the sick leave could be anything from 2 weeks to 10 years if we add them together there is a very high chance we count the same sick leave twice or more.


        Comment


        • #5
          I am at a loss. Is it simply that you want to average the data rather than sum it? In that case, the output of help collapse suggests you change the collapse command to
          Code:
          collapse (mean) year*, by(county)
          But if it is something else you want, you're going to have to explain how you would calculate it by hand.

          Comment


          • #6
            Originally posted by William Lisowski View Post
            I am at a loss. Is it simply that you want to average the data rather than sum it? In that case, the output of help collapse suggests you change the collapse command to
            Code:
            collapse (mean) year*, by(county)
            But if it is something else you want, you're going to have to explain how you would calculate it by hand.
            Thank you for trying to help it turned out my way preparing the data in excel was what was mainly the issue here.
            Basically what I wanted to do was to combine the data from quarterly data to yearly data if I was to do this by hand.
            It would be something like this. Q1 + Q2 + Q3 + Q4 = 2 + 2 + 2 + 2 = 8 for example but because this was active cases the numbers had to be divided on an average so (2 + 2 + 2 + 2) / 4 = 2.

            But because I had inserted the data wrongly like this.
            Year A Year B Year C
            5352 5353 5325
            5441 4521 9875

            Instead of
            Var A Var B
            Year 5352
            Year 3523
            Year 5353

            It was making it very hard to work with in stata. There were also around 2000 observations which was why we didn't want to do it by hand.

            But again thank you for trying to lend a hand.

            Comment

            Working...
            X