Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reshape issue

    I am trying to reshape variables from long to wide. I first tried:

    . reshape wide i(pop_WDI_PW - ri_out_BZ) j(year)
    option i() required

    long wide
    +---------------+ +------------------+
    | i j a b | | i a1 a2 b1 b2 |
    |---------------| <--- reshape ---> |------------------|
    | 1 1 1 2 | | 1 1 3 2 4 |
    | 1 2 3 4 | | 2 5 7 6 8 |
    | 2 1 5 6 | +------------------+
    | 2 2 7 8 |
    +---------------+

    long to wide: reshape wide a b, i(i) j(j) (j existing variable)
    wide to long: reshape long a b, i(i) j(j) (j new variable)
    r(198);

    . dataex
    input statement exceeds linesize limit. Try specifying fewer variables

    and then when I tried to make a dataex I wss informed that there were too many variables so that I only selected one variable and still got the same error message:

    . reshape wide, i(pop_WDI_PW) j(year)
    invalid syntax
    In the reshape command that you typed, you omitted the word wide or long, or substituted some other word for it. You should have typed

    . reshape wide varlist, ...
    or
    . reshape long varlist, ...

    You might have omitted varlist, too. The basic syntax of reshape is

    long wide
    +---------------+ +------------------+
    | i j a b | | i a1 a2 b1 b2 |
    |---------------| <--- reshape ---> |------------------|
    | 1 1 1 2 | | 1 1 3 2 4 |
    | 1 2 3 4 | | 2 5 7 6 8 |
    | 2 1 5 6 | +------------------+
    | 2 2 7 8 |
    +---------------+

    long to wide: reshape wide a b, i(i) j(j) (j existing variable)
    wide to long: reshape long a b, i(i) j(j) (j new variable)
    r(198);

    . dataex
    input statement exceeds linesize limit. Try specifying fewer variables
    r(1000);

    What am I doing wrong? Many thanks for any help.

    Ric

  • #2
    This is really hard to read. No data example and no CODE formatting. But what Stata is objecting to is fairly clear:


    Code:
    reshape wide i(pop_WDI_PW - ri_out_BZ) j(year)
    reshape wide expects to see a comma followed by i() and j() options. The comma is missing.

    reshape wide also expects to see stubnames but there aren't any.

    There is an overarching problem: if you get two or more things wrong in a command, sometimes the parser in Stata doesn't see it your way and misreads what you said. As this appears to be some very large percent of politics too, the broad problem should be familiar, except that Stata has no idea of deceit, evasion or lies.

    I imagine that your command should start and end

    Code:
    reshape wide
    Code:
    j(year)
    but I can't easily guess what should be in the middle. The identifier, judging from your recent posts, is possibly a state or county or other area identifier.

    I am broadly familiar with dataex as its second author. It does try to hint what you're doing wrong.

    Code:
    input statement exceeds linesize limit. Try specifying fewer variables
    r(1000);
    Let's suppose you have electoral data for certain major parties in a certain country in different election years. Then a start might well be something like

    Code:
    reshape wide dem rep, i(county) j(year)

    Comment


    • #3
      Here is what I wrote now for just one variable for data by year and country (trying to ignore the other variables). I am trying to get variables for pop_WDI_PW for each of the years. The data are to be organized by country.

      reshape wide pop_WDI_PW ,i(country) j(year)

      but this is what I got:

      (note: j = 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838
      > 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879
      > 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 192
      > 0 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 19
      > 61 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2
      > 002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017)
      variable gdp_WDI_PW_PW not constant within country
      variable gdppc_WDI_PW_PW not constant within country
      variable growth_WDI_PW_PW not constant within country
      variable lnpop_WDI_PW_PW not constant within country
      variable lngdp_WDI_PW_PW not constant within country
      variable lngdppc_WDI_PW_PW not constant within country
      variable rank_BZ not constant within country
      variable rank_sb_BZ not constant within country
      variable sb_proc_BZ not constant within country
      variable sb_time_BZ not constant within country
      variable sb_cost_BZ not constant within country
      variable sb_paid_BZ not constant within country
      variable rank_cp_BZ not constant within country
      variable cp_proc_BZ not constant within country
      variable cp_time_BZ not constant within country
      variable cp_cost_BZ not constant within country
      variable rank_ge_BZ not constant within country
      variable ge_proc_BZ not constant within country
      variable ge_time_BZ not constant within country
      variable ge_cost_BZ not constant within country
      variable rank_rp_BZ not constant within country
      variable rp_proc_BZ not constant within country
      variable rp_time_BZ not constant within country
      variable rp_cost_BZ not constant within country
      variable rank_gc_BZ not constant within country
      variable _BZgc_legal not constant within country
      variable gc_legal_old_BZ not constant within country
      variable gc_info_BZ not constant within country
      variable gc_info_old_BZ not constant within country
      variable gc_pub_BZ not constant within country
      variable gc_priv_BZ not constant within country
      variable rank_pi_BZ not constant within country
      variable pi_ip_BZ not constant within country
      variable pi_ip_old_BZ not constant within country
      variable pi_disc_BZ not constant within country
      variable pi_dl_BZ not constant within country
      variable pi_shsu_BZ not constant within country
      variable pi_shsu_old_BZ not constant within country
      variable rank_pt_BZ not constant within country
      variable pt_pay_BZ not constant within country
      variable pt_time_BZ not constant within country
      variable pt_tot_BZ not constant within country
      variable pt_prof_BZ not constant within country
      variable pt_lab_BZ not constant within country
      variable pt_other_BZ not constant within country
      variable rank_tr_BZ not constant within country
      variable tr_exdo_old_BZ not constant within country
      variable tr_exti_old_BZ not constant within country
      variable tr_exco_old_BZ not constant within country
      variable tr_imdo_BZ not constant within country
      variable tr_imti_old_BZ not constant within country
      variable tr_imco_old_BZ not constant within country
      variable rank_ec_BZ not constant within country
      variable ec_time_BZ not constant within country
      variable ec_cost_BZ not constant within country
      variable rank_ri_BZ not constant within country
      variable ri_reco_BZ not constant within country
      variable ri_time_BZ not constant within country
      variable ri_cost_BZ not constant within country
      variable ri_out_BZ not constant within country
      variable edu_rd_BTI not constant within country
      variable frac_DPI not constant within country
      variable oppfrac_DPI not constant within country
      variable system_DPI not constant within country
      variable gov1nat_DPI not constant within country
      variable sr_FHP not constant within country
      variable fragment_P4 not constant within country
      variable democ_P4 not constant within country
      variable polity_P4 not constant within country
      variable polity2_P4 not constant within country
      variable change_P4 not constant within country
      variable exec_crpt_WJP not constant within country
      variable judic_crpt_WJP not constant within country
      variable milt_crpt_WJP not constant within country
      variable legis_crpt_WJP not constant within country
      variable crime_ctrl_WJP not constant within country
      variable gov_reg_WJP not constant within country
      variable due_admin_WJP not constant within country
      variable cj_disc_WJP not constant within country
      variable cj_crpt_WJP not constant within country
      variable cj_enfor_WJP not constant within country
      variable corrupprev_SGI not constant within country
      variable ti_cpi_TI not constant within country
      variable tgpm_WE not constant within country
      variable v2xcl_rol_VDEM not constant within country
      variable VA_EST_WGI not constant within country
      variable PV_EST_WGI not constant within country
      variable GE_EST_WGI not constant within country
      variable RL_EST_WGI not constant within country
      variable CC_EST_WGI not constant within country
      variable lh_1519_BL not constant within country
      variable lh_2024_BL not constant within country
      variable lh_25999_BL not constant within country
      variable lh_long_BL not constant within country
      variable lh_F_1519_BL not constant within country
      variable lh_F_2024_BL not constant within country
      variable lh_F_25999_BL not constant within country
      variable lh_F_long_BL not constant within country
      variable lpc_1519_BL not constant within country
      variable lpc_25999_BL not constant within country
      variable lpc_F_1519_BL not constant within country
      variable lpc_F_25999_BL not constant within country
      variable ls_1519_BL not constant within country
      variable ls_25999_BL not constant within country
      variable ls_long_BL not constant within country
      variable ls_F_1519_BL not constant within country
      variable ls_F_25999_BL not constant within country
      variable ls_F_long_BL not constant within country
      variable lsc_1519_BL not constant within country
      variable lsc_2024_BL not constant within country
      variable lsc_25999_BL not constant within country
      variable lsc_F_1519_BL not constant within country
      variable lsc_F_2024_BL not constant within country
      variable lsc_F_25999_BL not constant within country
      variable lu_1519_BL not constant within country
      variable lu_25999_BL not constant within country
      variable lu_F_1519_BL not constant within country
      variable lu_F_25999_BL not constant within country
      variable lu_F_long_BL not constant within country
      variable lu_long_BL not constant within country
      variable pri_F_long_BL not constant within country
      variable pri_long_BL not constant within country
      variable sec_F_long_BL not constant within country
      variable sec_long_BL not constant within country
      variable syr_F_long_BL not constant within country
      variable syr_long_BL not constant within country
      variable tyr_F_long_BL not constant within country
      variable tyr_long_BL not constant within country
      variable yr_sch_1519_BL not constant within country
      variable yr_sch_25999_BL not constant within country
      variable yr_sch_F_1519_BL not constant within country
      variable yr_sch_F_25999_BL not constant within country
      variable yr_sch_sec_1519_BL not constant within country
      variable yr_sch_sec_25999_BL not constant within country
      variable yr_sch_sec_F_1519_BL not constant within country
      variable yr_sch_sec_F_25999_BL not constant within country
      variable chrpct_RCS not constant within country
      variable jewpct_RCS not constant within country
      variable muspct_RCS not constant within country
      variable hinpct_RCS not constant within country
      variable budpct_RCS not constant within country
      variable eacomppt_RCS not constant within country
      variable cnfpct_RCS not constant within country
      variable notrelpt_RCS not constant within country
      variable theftcars_rate_ODC not constant within country
      variable burg_rate_ODC not constant within country
      variable homicide_rate_ODC not constant within country
      variable hommicidempc_rate_ODC not constant within country
      variable mvtheft_rate_ODC not constant within country
      variable theft_rate_ODC not constant within country
      variable ptratio_seced_ES not constant within country
      Your data are currently long. You are performing a reshape wide. You typed something like

      . reshape wide a b, i(country) j(year)

      There are variables other than a, b, country, year in your data. They must be constant within country because that is the only way they can fit into wide data without loss of information.

      The variable or variables listed above are not constant within country. Perhaps the values are in error. Type reshape error for a list of the problem observations.

      Either that, or the values vary because they should vary, in which case you must either add the variables to the list of xij variables to be reshaped, or drop them.
      r(9);


      Stata won't let me ignore the other variables.

      Comment


      • #4
        Sorry, but you have to choose. If you want a wide layout for pop_WDI_PW then either you must drop the other variables or you must reshape them too.

        reshape is being reasonable here. It just wants you to choose.

        At this point this looks like the XY problem. http://xyproblem.info/

        Why is that you want the wide layout in the first place? I can think of a few good reasons for it and many more reasons against it.

        If you want it it's hard to see that you profit much from reshaping all of those variables. If I am wrong about that, then you need to include their names in the command. The number of variables in the dataset is going to go up by a few thousand. Each variable will be mapped to 219 new variables (apart from the country identifier).

        Comment


        • #5
          I do want to reshape all of the variables if Stata will let me. I want each variable by country for each year. I am not looking for reasons not to do this. It should be possible to have the data organized as follows:

          For each year, the country and all of the other variables: For each year X, the data should be: country var1X through var219X in separate data sets (at least ultimately)

          Comment


          • #6
            As said, you can do it. You just need to make that explicit. You don't need to type all the variable names.

            Code:
            ds country year, not
            reshape wide `r(varlist)', i(country) j(year)

            Comment


            • #7
              Thanks Nick. Worked very well!

              Comment


              • #8
                Sorry one more question. I want to get files for each year, so I typed an got:

                . foreach num of numlist 1900/2017 {
                2. save wepwide'num'
                3. }
                invalid '''
                r(198);

                end of do-file

                r(198);



                Comment


                • #9
                  Code:
                   save wepwide'num'
                  has the same quotation marks. You need left and right quotation marks
                  Code:
                  ` '
                  Note. With nothing else said, this saves 118 identical copies of the current dataset. Presumably you omitted the code that makes a difference,

                  Comment


                  • #10
                    How do I get a data set for each year?

                    Comment


                    • #11
                      I have to strain to see your dataset from here but I imagine that what you seek is something like

                      Code:
                      forval y = 1900/2017 {
                            use main, clear
                            keep country *’y’
                            save wepwide’y’
                      }
                      where, crucially,

                      1. I can test nothing at the moment.

                      2. I am imputing a name main for your main dataset and yours will presumably differ.

                      3. I use ‘ ‘ around local macros because I am using a phone and don’t know a way to get the left quotation mark.




                      Comment


                      • #12
                        Thanks, one more question: what is y?

                        Comment


                        • #13
                          The name of the local macro in the loop I defined.

                          Comment


                          • #14
                            OK understood but is there any way to get data sets for each year rather than for the entire set of years (e.g., one data set for 1901, one for 1902, etc.):?

                            Comment


                            • #15
                              That is precisely what you asked in #10 and what was coded up in #11. I am using a computer now, so can fix the quotation marks in the code.


                              Code:
                              forval y = 1900/2017 {      
                                   use main, clear      
                                   keep country *`y'      
                                   save wepwide`y'
                              }
                              To understand why it is a little complicated, reading the help for save is a good idea, even if you think you know the command. save doesn't allow specification of variables. Therefore you need to keep what you want beforehand (or equivalently drop what you don't want, which isn't easier in this case). For 1952, or whatever, you need all the variables whose names end in 1952, or whatever, and then you will always want country, or so I presume.

                              Once you have done that, you need to go back to the original dataset to be able to save the next lot.

                              There is savesome (SSC), but that just automates part of this, and can't be faster.

                              All that said, I will say just once (and no more) than this seems an odd thing to do, but I am explaining what you are asking, which is your business. (There might be a teaching need to give students different datasets, but even then giving all the students the same dataset and telling them which year to use would be much simpler on you.)
                              Last edited by Nick Cox; 10 Oct 2019, 07:06.

                              Comment

                              Working...
                              X