Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating the AVERAGE, by dropping Missing Values

    Hello Everyone,

    I have Ten years of Data (2005 to 2015) for an X variable for 200 countries (data taken from WDI)

    I wanted to generate an average for TEN years of data. But the problem is that there are some missing values in the years for some countries.

    Now my doubt is How to drop a country with data not available for less than three years?


    Please suggest me the Stata code for Generating the average from TEN years data, by dropping the countries with less than three years missing data.

    Thanks in Advance.

  • #2
    The code depends on how your data are organized. To get a specific answer you need to show an example of your data. Post back here using the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thank you very much, sir,

      I am giving an example below:



      input str52 country double(2005 2006 2007 2008 2009 2010)
      "Aruba" . . . . .
      "Afghanistan" .059468239545822144 .115722558637344 .205090675891881 .297777396438042 .446217052955417
      "Angola" 1.4621328115463257 4.43281174520707 6.27709843969893 8.39229942847176
      "Albania" 9.257523536682129 13.8688162583819 19.3615950019362 28.6265285875127 32.9414296046984
      "Andorra" . . . . .
      "Arab World" 6.544143199920654 9.522797933052514 14.1087683430021 13.78637474038145 16.00810431204975
      "United Arab Emirates" 38.284110689558 39.4213399576005 40.4785144566712 52.793756401188
      "Argentina" 2.086310386657715 2.86634464240058 4.938225033886 9.95624969394346 13.8195711182789
      "Armenia" 4.954209804534912 7.69517529609324 14.4824453908975 23.1183470295932
      "American Samoa" . . . . .
      end


      Thank you, Sir,

      Comment


      • #4
        Thanks for quick reply sir,

        Please see the below example of the data :

        ----------------------- copy starting from the next line -----------------------
        Code:
        clear
        input str52 country double(  2005                         2006                          2007                    2008                      2009                     2010)
        "Aruba"                                  .                 .                .                 .                 .
        "Afghanistan"          .059468239545822144     .115722558637344     .205090675891881    .297777396438042        .446217052955417
        "Angola"                1.4621328115463257                                          4.43281174520707       6.27709843969893             8.39229942847176
        "Albania"                9.257523536682129       13.86881625838196   19.3615950019362      28.6265285875127          32.9414296046984
        "Andorra"                                .                 .                .                 .                 .
        "Arab World"             6.544143199920654     9.522797933052514   14.1087683430021     13.78637474038145        16.00810431204975
        "United Arab Emirates"                                  38.284110689558 39.  4213399576005          40.4785144566712           52.793756401188
        "Argentina"              2.086310386657715        2.86634464240058     4.938225033886        9.95624969394346  
        "Armenia"                4.954209804534912        7.69517529609324                                           23.1183470295932            30.244723760548
        "American Samoa"                         .                 .                .                 .                 .
        end
        ------------------ copy up to and including the previous line ------------------

        Listed 10 out of 264 observations


        This is how I got when I use dataex command.

        Here,
        Data is missing for some countries completely others two or three years. I want to take only the countries which have 7 years of data.

        Comment


        • #5
          2005 and so on are not legal variable names, so you mangled good output. Please show syntax that will work.

          Comment


          • #6
            Yes, Sir, It looks like that if I use all.


            This is the original one sir,
            AX AY AZ BA BB represents years sir


            ----------------------- copy starting from the next line -----------------------
            Code:
            clear
            input str52 country double(AX AY AZ BA BB)
            "Aruba"                                  .                 .                .                 .                 .
            "Afghanistan"          .059468239545822144  .115722558637344 .205090675891881  .297777396438042  .446217052955417
            "Angola"                1.4621328115463257  3.06038785566533 4.43281174520707  6.27709843969893  8.39229942847176
            "Albania"                9.257523536682129  13.8688162583819 19.3615950019362  28.6265285875127  32.9414296046984
            "Andorra"                                .                 .                .                 .                 .
            "Arab World"             6.544143199920654 9.522797933052514 14.1087683430021 13.78637474038145 16.00810431204975
            "United Arab Emirates"  42.101505279541016   38.284110689558 39.4213399576005  40.4785144566712   52.793756401188
            "Argentina"              2.086310386657715  2.86634464240058   4.938225033886  9.95624969394346  13.8195711182789
            "Armenia"                4.954209804534912  7.69517529609324 14.4824453908975  23.1183470295932   30.244723760548
            "American Samoa"                         .                 .                .                 .                 .
            end
            ------------------ copy up to and including the previous line ------------------

            Listed 10 out of 264 observations

            Comment


            • #7
              Yes, Sir, It looks like that if I use all.


              This is the original one sir,
              AX AY AZ BA BB represents years sir


              ----------------------- copy starting from the next line -----------------------
              Code:
              clear
              input str52 country double(AX AY AZ BA BB)
              "Aruba"                                  .                 .                .                 .                 .
              "Afghanistan"          .059468239545822144  .115722558637344 .205090675891881  .297777396438042  .446217052955417
              "Angola"                1.4621328115463257  3.06038785566533 4.43281174520707  6.27709843969893  8.39229942847176
              "Albania"                9.257523536682129  13.8688162583819 19.3615950019362  28.6265285875127  32.9414296046984
              "Andorra"                                .                 .                .                 .                 .
              "Arab World"             6.544143199920654 9.522797933052514 14.1087683430021 13.78637474038145 16.00810431204975
              "United Arab Emirates"  42.101505279541016   38.284110689558 39.4213399576005  40.4785144566712   52.793756401188
              "Argentina"              2.086310386657715  2.86634464240058   4.938225033886  9.95624969394346  13.8195711182789
              "Armenia"                4.954209804534912  7.69517529609324 14.4824453908975  23.1183470295932   30.244723760548
              "American Samoa"                         .                 .                .                 .                 .
              end
              ------------------ copy up to and including the previous line ------------------

              Listed 10 out of 264 observations

              Comment


              • #8
                2005 to 2015 looks like 11 years to me. No matter; your data example allows progress now, so thanks.

                I am somewhat sceptical about 15 or 16 or 17 significant figures when perhaps only the first is reliable.

                This will need some adaptation to your full dataset, but the principle of needing to reshape and the handling of missing values are general.

                Code:
                clear
                input str52 country double(AX AY AZ BA BB)
                "Aruba"                                  .                 .                .                 .                 .
                "Afghanistan"          .059468239545822144  .115722558637344 .205090675891881  .297777396438042  .446217052955417
                "Angola"                1.4621328115463257  3.06038785566533 4.43281174520707  6.27709843969893  8.39229942847176
                "Albania"                9.257523536682129  13.8688162583819 19.3615950019362  28.6265285875127  32.9414296046984
                "Andorra"                                .                 .                .                 .                 .
                "Arab World"             6.544143199920654 9.522797933052514 14.1087683430021 13.78637474038145 16.00810431204975
                "United Arab Emirates"  42.101505279541016   38.284110689558 39.4213399576005  40.4785144566712   52.793756401188
                "Argentina"              2.086310386657715  2.86634464240058   4.938225033886  9.95624969394346  13.8195711182789
                "Armenia"                4.954209804534912  7.69517529609324 14.4824453908975  23.1183470295932   30.244723760548
                "American Samoa"                         .                 .                .                 .                 .
                end
                
                rename (AX-BB) whatever#, addnumber(2005)
                
                reshape long whatever, i(country) j(year)
                
                egen nOK = count(whatever), by(country)
                
                drop if nOK < 3
                
                egen mean = mean(whatever), by(country)
                
                format whatever mean %3.2f
                list, sepby(country)
                
                     +------------------------------------------------------+
                     |              country   year   whatever   nOK    mean |
                     |------------------------------------------------------|
                  1. |          Afghanistan   2005       0.06     5    0.22 |
                  2. |          Afghanistan   2006       0.12     5    0.22 |
                  3. |          Afghanistan   2007       0.21     5    0.22 |
                  4. |          Afghanistan   2008       0.30     5    0.22 |
                  5. |          Afghanistan   2009       0.45     5    0.22 |
                     |------------------------------------------------------|
                  6. |              Albania   2005       9.26     5   20.81 |
                  7. |              Albania   2006      13.87     5   20.81 |
                  8. |              Albania   2007      19.36     5   20.81 |
                  9. |              Albania   2008      28.63     5   20.81 |
                 10. |              Albania   2009      32.94     5   20.81 |
                     |------------------------------------------------------|
                 11. |               Angola   2005       1.46     5    4.72 |
                 12. |               Angola   2006       3.06     5    4.72 |
                 13. |               Angola   2007       4.43     5    4.72 |
                 14. |               Angola   2008       6.28     5    4.72 |
                 15. |               Angola   2009       8.39     5    4.72 |
                     |------------------------------------------------------|
                 16. |           Arab World   2005       6.54     5   11.99 |
                 17. |           Arab World   2006       9.52     5   11.99 |
                 18. |           Arab World   2007      14.11     5   11.99 |
                 19. |           Arab World   2008      13.79     5   11.99 |
                 20. |           Arab World   2009      16.01     5   11.99 |
                     |------------------------------------------------------|
                 21. |            Argentina   2005       2.09     5    6.73 |
                 22. |            Argentina   2006       2.87     5    6.73 |
                 23. |            Argentina   2007       4.94     5    6.73 |
                 24. |            Argentina   2008       9.96     5    6.73 |
                 25. |            Argentina   2009      13.82     5    6.73 |
                     |------------------------------------------------------|
                 26. |              Armenia   2005       4.95     5   16.10 |
                 27. |              Armenia   2006       7.70     5   16.10 |
                 28. |              Armenia   2007      14.48     5   16.10 |
                 29. |              Armenia   2008      23.12     5   16.10 |
                 30. |              Armenia   2009      30.24     5   16.10 |
                     |------------------------------------------------------|
                 31. | United Arab Emirates   2005      42.10     5   42.62 |
                 32. | United Arab Emirates   2006      38.28     5   42.62 |
                 33. | United Arab Emirates   2007      39.42     5   42.62 |
                 34. | United Arab Emirates   2008      40.48     5   42.62 |
                 35. | United Arab Emirates   2009      52.79     5   42.62 |
                     +------------------------------------------------------+

                Comment


                • #9
                  Thank you very much, sir, for this great hep. Now it is really working for me.

                  Sir, to an extension to this..
                  Here, we have done only for one variable, but I have FOUR different variables (X1 X2 X3 X4)

                  Then I want to merge ALL FOUR variables based on the COUNTRY to get an INDEX.

                  This INDEX is an Average of ALL the FOUR Variables.

                  Thanks a lot Sir.

                  Comment


                  • #10
                    So, your rename and reshape will be more complicated, but with no precise details on your real set-up I can't suggest specific code.

                    Comment


                    • #11
                      Thank you very much sir

                      I will get back to you with extended derails..

                      Comment

                      Working...
                      X