Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I create a variable that measures the number of times a data is repeated?

    Hi! I need to creat 5 variables that measure the number of years a country had democracy between 1800 and the next 5 years: 1925, 1950, 1975, 2000, and November 13, 1999. (The variable that measures the number of years of democracy is dem)

  • #2
    There are many ways your data could be organized and match your description, and the code would be different for each. Without example data, there is no possibility of answering this question except with vague, unhelpful generalities, or a wild guess that has some small chance of being right. Please post back showing example data, and using the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Hi Clyde! Thank you for your answer. I tried but stata says "input statement exceeds linesize limit. Try specifying fewer variables"

      Comment


      • #4
        So pick the variable dem and the other variable(s) that show the year(s) and run -dataex that_list_of_variables-, replacing that_list_of_variables, by the variables you picked.

        Comment


        • #5
          Hi Clyde! Thank you again. This is what Stata shows with the variable democ (it wasn't dem):
          I atached the data base if it helps.
          dataex democ

          ----------------------- copy starting from the next line -----------------------
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte democ
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          1
          end
          ------------------ copy up to and including the previous line ------------------

          Listed 100 out of 16727 observations
          Use the count() option to list more

          .
          end of do-file
          Attached Files

          Comment


          • #6
            The -dataex- of just one variable is not helpful. And as for the attached data set, I am one of many here who will not download anything from a stranger. So I'm still left to imagine what your actual data set looks like. Here is what I imagine:
            1. The variable democ is coded 1 for democracy and 0 otherwise (although as I have written the code, only the 1 is critical, and the 0 could actually be anything but 1.
            2. There is a variable, called country, which identifies each country in the data.
            3. For each country there are multiple observations, each corresponding to a single year. The single variable identifying the year is called year.
            4. There are no gaps in the years from 1800 on--every year is represented for each country after that.
            5. Each combination of country and year occurs only once: there are no two observations having the same country and the same year.
            If these assumptions are all true, then the following code will give you most of what you ask for in #1:
            Code:
            foreach y of numlist 1925 1950 1975 1999 2000 {
                by country, sort: egen democ_1800_to_`y' = total(cond(inrange(year, 1800, `y'), democ == 1, .))
            }
            You should verify all 5 assumptions before using this code, because if any of them are not true, either the code will not run at all, or, worse, will run and give incorrect results.

            The one part I could not imagine how to handle is November 13, 1999. It would not be possible to represent a single day in a variable that is otherwise years, and it is difficult for me to imagine that you have actual daily data going back to 1800 on any, let alone many, countries. So I just included a variable that counts the number of democracy years 1800 to 1999 for that.

            In terms of using -dataex-, what I was hoping you would do, if your data set met assumptions 1, 2, and 3 was -dataex country year democ- so that all of the variables relevant to the calculation you want would be shown.

            If you find that the 5 assumptions I have laid out are not true of your data, then it will be critical to post back with a -dataex- created example that does include all relevant variables (for example, if there are multiple variables showing the years and we have to somehow piece them together into a single variable to get the year, show all of them). Be sure also in posting the -dataex- example to choose the example observations to illustrate the failures to meet the five assumptions, and explain what the actual situation is.

            Comment


            • #7
              Thanks Clyde, now I understand what you are asking for. You are right: it is not November 13, 1999; it is just 1999. Here is the dataex with the information:
              Year: goes from 1800 to 2013
              Country: there are more than 100
              Democ: it's strange. The read me file says that "2.1 DEMOC (all versions) Institutionalized Democracy: Democracy is conceived as three essential, interdependent elements. One is the presence of institutions and procedures through which citizens can express effective preferences about alternative policies and leaders. Second is the existence of institutionalized constraints on the exercise of power by the executive. Third is the guarantee of civil liberties to all citizens in their daily lives and in acts of political participation. Other aspects of plural democracy, such as the rule of law, systems of checks and balances, freedom of the press, and so on are means to, or specific manifestations of, these general principles. We do not include coded data on civil liberties. The Democracy indicator is an additive eleven-point scale (0-10). The operational indicator of democracy is derived from codings of the competitiveness of political participation (variable 2.6), the openness and competitiveness of executive recruitment (variables 2.3 and 2.2), and constraints on the chief executive (variable 2.4) using the following weights.

              However, democ goes from -88 to 10
              sum democ

              Variable | Obs Mean Std. dev. Min Max
              -------------+---------------------------------------------------------
              democ | 16,727 -.2173133 17.5181 -88 10

              .
              end of do-file



              dataex year country democ

              ----------------------- copy starting from the next line -----------------------
              [CODE]
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input int year str24 country byte democ
              1800 "Afghanistan" 1
              1801 "Afghanistan" 1
              1802 "Afghanistan" 1
              1803 "Afghanistan" 1
              1804 "Afghanistan" 1
              1805 "Afghanistan" 1
              1806 "Afghanistan" 1
              1807 "Afghanistan" 1
              1808 "Afghanistan" 1
              1809 "Afghanistan" 1
              1810 "Afghanistan" 1
              1811 "Afghanistan" 1
              1812 "Afghanistan" 1
              1813 "Afghanistan" 1
              1814 "Afghanistan" 1
              1815 "Afghanistan" 1
              1816 "Afghanistan" 1
              1817 "Afghanistan" 1
              1818 "Afghanistan" 1
              1819 "Afghanistan" 1
              1820 "Afghanistan" 1
              1821 "Afghanistan" 1
              1822 "Afghanistan" 1
              1823 "Afghanistan" 1
              1824 "Afghanistan" 1
              1825 "Afghanistan" 1
              1826 "Afghanistan" 1
              1827 "Afghanistan" 1
              1828 "Afghanistan" 1
              1829 "Afghanistan" 1
              1830 "Afghanistan" 1
              1831 "Afghanistan" 1
              1832 "Afghanistan" 1
              1833 "Afghanistan" 1
              1834 "Afghanistan" 1
              1835 "Afghanistan" 1
              1836 "Afghanistan" 1
              1837 "Afghanistan" 1
              1838 "Afghanistan" 1
              1839 "Afghanistan" 1
              1840 "Afghanistan" 1
              1841 "Afghanistan" 1
              1842 "Afghanistan" 1
              1843 "Afghanistan" 1
              1844 "Afghanistan" 1
              1845 "Afghanistan" 1
              1846 "Afghanistan" 1
              1847 "Afghanistan" 1
              1848 "Afghanistan" 1
              1849 "Afghanistan" 1
              1850 "Afghanistan" 1
              1851 "Afghanistan" 1
              1852 "Afghanistan" 1
              1853 "Afghanistan" 1
              1854 "Afghanistan" 1
              1855 "Afghanistan" 1
              1856 "Afghanistan" 1
              1857 "Afghanistan" 1
              1858 "Afghanistan" 1
              1859 "Afghanistan" 1
              1860 "Afghanistan" 1
              1861 "Afghanistan" 1
              1862 "Afghanistan" 1
              1863 "Afghanistan" 1
              1864 "Afghanistan" 1
              1865 "Afghanistan" 1
              1866 "Afghanistan" 1
              1867 "Afghanistan" 1
              1868 "Afghanistan" 1
              1869 "Afghanistan" 1
              1870 "Afghanistan" 1
              1871 "Afghanistan" 1
              1872 "Afghanistan" 1
              1873 "Afghanistan" 1
              1874 "Afghanistan" 1
              1875 "Afghanistan" 1
              1876 "Afghanistan" 1
              1877 "Afghanistan" 1
              1878 "Afghanistan" 1
              1879 "Afghanistan" 1
              1880 "Afghanistan" 1
              1881 "Afghanistan" 1
              1882 "Afghanistan" 1
              1883 "Afghanistan" 1
              1884 "Afghanistan" 1
              1885 "Afghanistan" 1
              1886 "Afghanistan" 1
              1887 "Afghanistan" 1
              1888 "Afghanistan" 1
              1889 "Afghanistan" 1
              1890 "Afghanistan" 1
              1891 "Afghanistan" 1
              1892 "Afghanistan" 1
              1893 "Afghanistan" 1
              1894 "Afghanistan" 1
              1895 "Afghanistan" 1
              1896 "Afghanistan" 1
              1897 "Afghanistan" 1
              1898 "Afghanistan" 1
              1899 "Afghanistan" 1

              Comment


              • #8
                However, democ goes from -88 to 10
                sum democ

                Variable | Obs Mean Std. dev. Min Max
                -------------+---------------------------------------------------------
                democ | 16,727 -.2173133 17.5181 -88 10

                .
                In some statistical software, it is conventional to represent missing values by magic numbers such as -77, -88, -99 and the like. Often all of these numbers will be used, designating different reasons for the missingness (not applicable, no response obtained, inconsistent response, etc.) I suspect that accounts for the negative numbers, especially if they are just things like -77 and -88. It is less clear, however, how to interpret the 0 to 10 score. It sounds like they assessed four different aspects of each country and scored them, and then took some kind of weighted average. But they do not say, in the text you show, whether higher scores are more democratic or less democratic. Nor do they suggest a cutoff above which a country is "democratic" and below which it is "undemocratic." (Not that using such a cutoff would necessarily be a good thing to do--probably not, in fact.)

                But all of this suggests that you need to find the documentation that describes the original source of this data and how it was put together. Perhaps the information is present in the same document from which you showed that quote and you just need to read a bit further to learn how this variable is to be used.

                From a general statistical perspective, taking a multi-valued score like this and imposing a cutoff is usually a bad idea that adds noise to the data and throws away useful information. So probably you will want to rethink your plan, and rather than adding up the number of years in which each country was "democratic" you might want to do something like look at the average democracy score over those same time intervals. Before calculating that, you need to deal with the magic number coding of missing values (assuming that's what those negative values are). The code would probably look like this:

                Code:
                replace democ = . if democ < 0 // MAGIC NUMBER MISSING VALUES DON'T WORK WELL IN STATA
                
                foreach y of numlist 1925 1950 1975 1999 2000 {
                    by country, sort: egen mean_democ_score_1800_to_`y' = mean(cond(inrange(year, 1800, `y'), democ, .))
                }
                But don't proceed until you have clarified that those negative numbers really are codes for missing values and have an understanding about how the democracy score itself is supposed to be used.
                Last edited by Clyde Schechter; 02 Sep 2022, 17:45.

                Comment

                Working...
                X