Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • mean of first three non-zero and non-missing observations?

    Dear All, Suppose that I have this data set:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long stkcd float(year co)
     5 2000          .
     5 2001          0
     5 2002          0
     5 2003  .10579064
     5 2004  .25719556
     5 2005      .3125
     5 2006      .3125
     5 2007          0
     5 2008          0
     5 2009          0
     5 2010   .2011278
     5 2011   .3335362
     5 2012   .3333333
     5 2013    .410831
     5 2014   .4705882
     5 2015   .4706735
     5 2016  .54362416
     5 2017   .6111111
     5 2018   .5911346
     5 2019  .58562183
     5 2020   .6002913
     5 2021   .6251925
     5 2022   .6470588
     5 2023          .
    13 2000          0
    13 2001          0
    13 2002          0
    13 2003          0
    13 2004          .
    14 2000   .1853736
    14 2001 .012834225
    14 2002          0
    14 2003  .19567125
    14 2004   .4389878
    14 2005   .5973589
    14 2006   .7038942
    14 2007   .0753814
    14 2008  .13270143
    14 2009  .18029143
    14 2010   .1818182
    14 2011  .55491906
    14 2012   .6562573
    14 2013          0
    14 2014          0
    14 2015          0
    14 2016          0
    14 2017  .20273237
    14 2018  .27272728
    14 2019  .27272728
    14 2020          0
    14 2021  .07711757
    14 2022   .2299578
    14 2023  .23076923
    end
    For each `stkcd', I wish to generate a variable `co_base', which is equal to the mean of the first three non-zero and non-missing observations of `co' variable. Any suggestions? Thanks.
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    Code:
    preserve
        sort stkcd year
        drop if co == . | co == 0
        
        // only the first 3 valid obs
        bys stkcd: gen row_id = _n
        drop if row_id > 3
        
        // drop stkcd that have less than 3 valid obs
        bys stkcd: egen max_id = max(row_id)
        drop if max_id < 3
        
        // take mean
        bys stkcd: egen co_f3mean = mean(co)
        
        duplicates drop stkcd, force
        keep stkcd co_f3mean
        
        save "_temp_f3mean.dta", replace
    restore
    
    merge m:1 stkcd using "_temp_f3mean.dta"
    // unable to identify the -co-f3mean- according to your approach will generate missing -co_f3mean-
    
    drop _merge
    cap drop "_temp_f3mean.dta"

    Comment


    • #3
      Also:

      Code:
      bys stkcd (year): egen wanted= max(cond(sum(!missing(co)& co)==3, sum(co)/3, .))
      Res.:

      Code:
      . l, sepby(stkcd)
      
           +------------------------------------+
           | stkcd   year         co     wanted |
           |------------------------------------|
        1. |     5   2000          .   .2251621 |
        2. |     5   2001          0   .2251621 |
        3. |     5   2002          0   .2251621 |
        4. |     5   2003   .1057906   .2251621 |
        5. |     5   2004   .2571956   .2251621 |
        6. |     5   2005      .3125   .2251621 |
        7. |     5   2006      .3125   .2251621 |
        8. |     5   2007          0   .2251621 |
        9. |     5   2008          0   .2251621 |
       10. |     5   2009          0   .2251621 |
       11. |     5   2010   .2011278   .2251621 |
       12. |     5   2011   .3335362   .2251621 |
       13. |     5   2012   .3333333   .2251621 |
       14. |     5   2013    .410831   .2251621 |
       15. |     5   2014   .4705882   .2251621 |
       16. |     5   2015   .4706735   .2251621 |
       17. |     5   2016   .5436242   .2251621 |
       18. |     5   2017   .6111111   .2251621 |
       19. |     5   2018   .5911346   .2251621 |
       20. |     5   2019   .5856218   .2251621 |
       21. |     5   2020   .6002913   .2251621 |
       22. |     5   2021   .6251925   .2251621 |
       23. |     5   2022   .6470588   .2251621 |
       24. |     5   2023          .   .2251621 |
           |------------------------------------|
       25. |    13   2000          0          . |
       26. |    13   2001          0          . |
       27. |    13   2002          0          . |
       28. |    13   2003          0          . |
       29. |    13   2004          .          . |
           |------------------------------------|
       30. |    14   2000   .1853736    .131293 |
       31. |    14   2001   .0128342    .131293 |
       32. |    14   2002          0    .131293 |
       33. |    14   2003   .1956712    .131293 |
       34. |    14   2004   .4389878    .131293 |
       35. |    14   2005   .5973589    .131293 |
       36. |    14   2006   .7038942    .131293 |
       37. |    14   2007   .0753814    .131293 |
       38. |    14   2008   .1327014    .131293 |
       39. |    14   2009   .1802914    .131293 |
       40. |    14   2010   .1818182    .131293 |
       41. |    14   2011   .5549191    .131293 |
       42. |    14   2012   .6562573    .131293 |
       43. |    14   2013          0    .131293 |
       44. |    14   2014          0    .131293 |
       45. |    14   2015          0    .131293 |
       46. |    14   2016          0    .131293 |
       47. |    14   2017   .2027324    .131293 |
       48. |    14   2018   .2727273    .131293 |
       49. |    14   2019   .2727273    .131293 |
       50. |    14   2020          0    .131293 |
       51. |    14   2021   .0771176    .131293 |
       52. |    14   2022   .2299578    .131293 |
       53. |    14   2023   .2307692    .131293 |
           +------------------------------------+

      Comment


      • #4
        Here is yet another way to do it. See https://journals.sagepub.com/doi/pdf...867X1101100210 Section 9 and https://www.stata.com/support/faqs/d...t-occurrences/ for discussion of principles.

        The code checks that there are at least 3 such values, and so catches cases with only 1 or 2.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input long stkcd float(year co)
         5 2000          .
         5 2001          0
         5 2002          0
         5 2003  .10579064
         5 2004  .25719556
         5 2005      .3125
         5 2006      .3125
         5 2007          0
         5 2008          0
         5 2009          0
         5 2010   .2011278
         5 2011   .3335362
         5 2012   .3333333
         5 2013    .410831
         5 2014   .4705882
         5 2015   .4706735
         5 2016  .54362416
         5 2017   .6111111
         5 2018   .5911346
         5 2019  .58562183
         5 2020   .6002913
         5 2021   .6251925
         5 2022   .6470588
         5 2023          .
        13 2000          0
        13 2001          0
        13 2002          0
        13 2003          0
        13 2004          .
        14 2000   .1853736
        14 2001 .012834225
        14 2002          0
        14 2003  .19567125
        14 2004   .4389878
        14 2005   .5973589
        14 2006   .7038942
        14 2007   .0753814
        14 2008  .13270143
        14 2009  .18029143
        14 2010   .1818182
        14 2011  .55491906
        14 2012   .6562573
        14 2013          0
        14 2014          0
        14 2015          0
        14 2016          0
        14 2017  .20273237
        14 2018  .27272728
        14 2019  .27272728
        14 2020          0
        14 2021  .07711757
        14 2022   .2299578
        14 2023  .23076923
        end
        
        bysort stkcd (year) : gen count = sum(co != 0 & co < .) if co != 0 & co < .
        
        by stkcd : egen wanted = mean(cond(count <= 3, co, .)) 
        
        by stkcd : egen used = max(cond(count <= 3, count, .))
        
        
        list, sepby(stkcd)
        
            +---------------------------------------------------+
             | stkcd   year         co   count     wanted   used |
             |---------------------------------------------------|
          1. |     5   2000          .       .   .2251621      3 |
          2. |     5   2001          0       .   .2251621      3 |
          3. |     5   2002          0       .   .2251621      3 |
          4. |     5   2003   .1057906       1   .2251621      3 |
          5. |     5   2004   .2571956       2   .2251621      3 |
          6. |     5   2005      .3125       3   .2251621      3 |
          7. |     5   2006      .3125       4   .2251621      3 |
          8. |     5   2007          0       .   .2251621      3 |
          9. |     5   2008          0       .   .2251621      3 |
         10. |     5   2009          0       .   .2251621      3 |
         11. |     5   2010   .2011278       5   .2251621      3 |
         12. |     5   2011   .3335362       6   .2251621      3 |
         13. |     5   2012   .3333333       7   .2251621      3 |
         14. |     5   2013    .410831       8   .2251621      3 |
         15. |     5   2014   .4705882       9   .2251621      3 |
         16. |     5   2015   .4706735      10   .2251621      3 |
         17. |     5   2016   .5436242      11   .2251621      3 |
         18. |     5   2017   .6111111      12   .2251621      3 |
         19. |     5   2018   .5911346      13   .2251621      3 |
         20. |     5   2019   .5856218      14   .2251621      3 |
         21. |     5   2020   .6002913      15   .2251621      3 |
         22. |     5   2021   .6251925      16   .2251621      3 |
         23. |     5   2022   .6470588      17   .2251621      3 |
         24. |     5   2023          .       .   .2251621      3 |
             |---------------------------------------------------|
         25. |    13   2000          0       .          .      . |
         26. |    13   2001          0       .          .      . |
         27. |    13   2002          0       .          .      . |
         28. |    13   2003          0       .          .      . |
         29. |    13   2004          .       .          .      . |
             |---------------------------------------------------|
         30. |    14   2000   .1853736       1    .131293      3 |
         31. |    14   2001   .0128342       2    .131293      3 |
         32. |    14   2002          0       .    .131293      3 |
         33. |    14   2003   .1956712       3    .131293      3 |
         34. |    14   2004   .4389878       4    .131293      3 |
         35. |    14   2005   .5973589       5    .131293      3 |
         36. |    14   2006   .7038942       6    .131293      3 |
         37. |    14   2007   .0753814       7    .131293      3 |
         38. |    14   2008   .1327014       8    .131293      3 |
         39. |    14   2009   .1802914       9    .131293      3 |
         40. |    14   2010   .1818182      10    .131293      3 |
         41. |    14   2011   .5549191      11    .131293      3 |
         42. |    14   2012   .6562573      12    .131293      3 |
         43. |    14   2013          0       .    .131293      3 |
         44. |    14   2014          0       .    .131293      3 |
         45. |    14   2015          0       .    .131293      3 |
         46. |    14   2016          0       .    .131293      3 |
         47. |    14   2017   .2027324      13    .131293      3 |
         48. |    14   2018   .2727273      14    .131293      3 |
         49. |    14   2019   .2727273      15    .131293      3 |
         50. |    14   2020          0       .    .131293      3 |
         51. |    14   2021   .0771176      16    .131293      3 |
         52. |    14   2022   .2299578      17    .131293      3 |
         53. |    14   2023   .2307692      18    .131293      3 |
             +---------------------------------------------------+

        Comment


        • #5
          Dear @Xinya Hao, @Andrew Musau, and @Nick Cox, Many thanks for your helpful suggestions.
          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment

          Working...
          X