Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape Long

    Good Evening Statalist,

    I'm trying to take the values of the following variables, and stack them into one column.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(v1 v2 v3 v4 v5 v6 v7 v8 v9 v10)
     98583.65  2689.404  39363.97  956.384 198349.94  19176.32  41658.11 14610875  2956.389  31561.92
    102967.47  1425.272 66995.945  1043.32 173146.16 12588.352  20920.59 15258652   397.702 12350.628
     83608.82   787.546 23464.066  972.375  166769.4 16800.342  30197.59 16721873   421.001  41698.17
    106049.66   682.952 15559.797  527.214 191980.97  12614.84  14221.04 18962966  1292.407 36579.727
    113240.53  1637.298  25359.62  232.069  273081.7 12200.224  74230.52 22733624   591.017 18066.707
     104162.4   572.742  88162.01  667.438  307729.8  72783.21 179674.44 26220594   484.496  34124.22
    155877.78   633.475 16615.023 1107.778    377619 27026.953 253609.83 28031248   830.957  27532.68
    113431.47   5120.94  24307.29  837.954 376820.75 26385.254  188838.2 27193410    44.141   43695.1
    163516.67 32832.684 140704.28  463.761 526008.06  27973.33 219713.38 32072884  2121.059  11183.41
     108074.3   474.194 13613.933  219.972  522529.9  26076.73 132737.53 21017804   403.216  6264.399
    130121.13   301.185  23073.35  406.481  694166.9  26864.92  170456.1 25009190  1477.915  19216.52
    199976.23  2018.589  60304.96  543.454  915072.3  51573.54 294171.88 32838644  6083.098  6703.542
     161832.1   2673.58  83778.19  568.747 1682149.4   51579.9 221039.06 32123374 19452.854 13451.702
    127129.43  3128.498   60814.6  538.582   1251391   57733.7 277714.44 27630990 17739.893  6361.995
     103531.1  5534.069 37125.605  306.482   1943178  86158.93  323786.1 31607114 14304.836  11046.25
     81461.02  5111.324 109290.13 3621.451   1014381  81000.57 220334.34 28517866 16154.084 16795.736
     58080.41  7272.705  25207.12 4272.858  983769.9  78399.47  444441.1 27057612  12222.92  31385.52
    end
    I actually have over two-hundred similar variables. I would like to stack them into one column. I tried to use the
    Code:
    reshape long
    command, and tried to follow some of the posts here, but I keep getting invalid syntax. I'm not sure what I'm doing wrong. I want them into one long column, because the data is ordered by county and years. (For example, the first column are values for Country A between 2000-2016). Thank you in advance.

  • #2
    The obstacle to using -reshape long- here is that you have nothing to put into the -i()- option. You need a variable that uniquely identifies the observations. So you just create one, and then you can use -reshape long-.

    Code:
    gen original_obs_no = _n
    reshape long v, i(original_obs_no) j(_j)
    At that point it will probably be important for you to in some way make the new variable _j, which ranges from 1 to 10 (or 1 to 200 in your real data) indicate which countries and years correspond to which values of _j, unless that information is actually not of importance for your later analyses.

    Added: By the way, in Stata we do not speak of rows and columns we speak of observations and variables. If you think of a Stata data set as a spreadsheet and try to work with it as if it were one, you will get yourself into trouble, because the underlying data models and approaches are quite different. Your instincts and habits acquired from using spreadsheets are generally not helpful in Stata and sometimes even get in the way. In order to keep your mind focused on Stata when you are using Stata, it is better to avoid using spreadsheet terminology. The more you stick to Stata-appropriate terminology, the less likely you are to mess things up with spreadsheet-think.
    Last edited by Clyde Schechter; 27 May 2018, 21:45.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      The obstacle to using -reshape long- here is that you have nothing to put into the -i()- option. You need a variable that uniquely identifies the observations. So you just create one, and then you can use -reshape long-.

      Code:
      gen original_obs_no = _n
      reshape long v, i(original_obs_no) j(_j)
      At that point it will probably be important for you to in some way make the new variable _j, which ranges from 1 to 10 (or 1 to 200 in your real data) indicate which countries and years correspond to which values of _j, unless that information is actually not of importance for your later analyses.

      Added: By the way, in Stata we do not speak of rows and columns we speak of observations and variables. If you think of a Stata data set as a spreadsheet and try to work with it as if it were one, you will get yourself into trouble, because the underlying data models and approaches are quite different. Your instincts and habits acquired from using spreadsheets are generally not helpful in Stata and sometimes even get in the way. In order to keep your mind focused on Stata when you are using Stata, it is better to avoid using spreadsheet terminology. The more you stick to Stata-appropriate terminology, the less likely you are to mess things up with spreadsheet-think.
      Hello Clyde,
      Thank you so much. I agree with a lot of things you said.
      Thank you for helping me with the code, but what the code did was stack the horizontal lines. I don't think I was clear in my question. What I need is all the columns to be stacked into one column. So, for example, the first column ends with 58080.412. I need the next column (starting with 2689.404) right under it and so forth.

      Example:

      ​​​​​​98583.647
      102967.47
      83608.817
      106049.656
      [...]
      58080.412
      ​​​​​​2689.404
      1425.272
      787.546
      682.952
      [...]

      Comment


      • #4
        Maybe stack is what you're looking for.
        Code:
        help stack
        for more information.

        .ÿclearÿ*

        .ÿ
        .ÿinputÿbyteÿaÿb

        ÿÿÿÿÿÿÿÿÿÿÿÿaÿÿÿÿÿÿÿÿÿÿb
        ÿÿ1.ÿ1ÿ3
        ÿÿ2.ÿ2ÿ4
        ÿÿ3.ÿend

        .ÿ
        .ÿstackÿaÿb,ÿinto(c)ÿclear

        .ÿ
        .ÿlist,ÿnoobs

        ÿÿ+------------+
        ÿÿ|ÿ_stackÿÿÿcÿ|
        ÿÿ|------------|
        ÿÿ|ÿÿÿÿÿÿ1ÿÿÿ1ÿ|
        ÿÿ|ÿÿÿÿÿÿ1ÿÿÿ2ÿ|
        ÿÿ|ÿÿÿÿÿÿ2ÿÿÿ3ÿ|
        ÿÿ|ÿÿÿÿÿÿ2ÿÿÿ4ÿ|
        ÿÿ+------------+

        .ÿ
        .ÿexit

        endÿofÿdo-file


        .

        Comment


        • #5
          Or, building on the code I provided in #2, just follow that with
          Code:
          sort _j original_obs_no
          to get the order you are looking for.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            Or, building on the code I provided in #2, just follow that with
            Code:
            sort _j original_obs_no
            to get the order you are looking for.
            Clyde,
            Thank you!
            I'm sorry for the silly questions, but I'm always trying to understand what the code does.

            Does the "v" in
            Code:
             reshape long v, i(original_obs_no) j(_j)
            mean all variables?

            And why does
            Code:
            sort _j original_obs_no
            get the job done by putting i and then j in that order?

            Comment


            • #7
              Originally posted by Monica White View Post
              Does the "v" in
              Code:
               reshape long v, i(original_obs_no) j(_j)
              mean all variables?
              It literally means "all variables with names starting with the letter 'v', immediately followed by a running number".
              If your variable names would have read "var_1 var_2 var_3 ...", you would have had to execute
              Code:
              reshape long var_ , i(...) (j...)
              There are more examples in -help reshape-.

              Originally posted by Monica White View Post
              And why does
              Code:
              sort _j original_obs_no
              get the job done by putting i and then j in that order?
              sort does not change the data's contents, just the ordering of observations in the dataset. See -help sort-.

              Regards
              Bela

              Comment


              • #8
                Originally posted by Daniel Bela View Post

                It literally means "all variables with names starting with the letter 'v', immediately followed by a running number".
                If your variable names would have read "var_1 var_2 var_3 ...", you would have had to execute
                Code:
                reshape long var_ , i(...) (j...)
                There are more examples in -help reshape-.



                sort does not change the data's contents, just the ordering of observations in the dataset. See -help sort-.

                Regards
                Bela
                Thank you, Bela!

                Comment


                • #9
                  Originally posted by Daniel Bela View Post

                  It literally means "all variables with names starting with the letter 'v', immediately followed by a running number".
                  If your variable names would have read "var_1 var_2 var_3 ...", you would have had to execute
                  Code:
                  reshape long var_ , i(...) (j...)
                  There are more examples in -help reshape-.



                  sort does not change the data's contents, just the ordering of observations in the dataset. See -help sort-.

                  Regards
                  Bela
                  This may be a bit of a dumb question, but what if the variables start with different letters. I have one where the variable names start from A through S.

                  I tried
                  Code:
                    reshape long A-S, i(original_obs_no) j(_j)
                  Obviously, not the right way to go.

                  Not sure if I missed something here: https://www.stata.com/manuals13/dreshape.pdf

                  Comment


                  • #10
                    There is no general solution to this situation. You have to customize the code to the particular variables you are working with.

                    The general principle is to mentally gather the variables that you need to -reshape long- and then parse them into stubs followed by suffixes. Then you have to list all of the stubs before the comma, and then specify a -j()- option that will receive the suffixes. The stubs are not, themselves, variable names, so you cannot list them using wildcards like *, ?, or the - notation for a range of variables. You have to spell them out.

                    Now, if there is a large number of them, there is usually a way to get Stata to build a list of those variables in a local macro, and then you can feed that local macro into -reshape-. So, if your variables are abc1 abc2 abc3 def1 def2 def3 ... xyz1 xyz2 xyz3, and if no other variables end in 1, you could do something like this:

                    Code:
                    //  PUT ALL THE VARIABLES ENDING IN 1 IN A LOCAL MACRO
                    ds *1
                    local stubs `r(varlist)'
                    
                    //  NOW STRIP THE 1s OFF TO GET THE ACTUAL STUBS
                    local stubs: subinstr local stubs "1" "", all
                    
                    //   READY FOR RESHAPE
                    reshape long `stubs', i(identifier_variable) j(_j)
                    But clearly this approach is idiosyncratic and requires specific code to deal with it. You have to deal with each such instance on its own terms, I'm afraid.

                    Comment


                    • #11
                      Originally posted by Clyde Schechter View Post
                      There is no general solution to this situation. You have to customize the code to the particular variables you are working with.

                      The general principle is to mentally gather the variables that you need to -reshape long- and then parse them into stubs followed by suffixes. Then you have to list all of the stubs before the comma, and then specify a -j()- option that will receive the suffixes. The stubs are not, themselves, variable names, so you cannot list them using wildcards like *, ?, or the - notation for a range of variables. You have to spell them out.

                      Now, if there is a large number of them, there is usually a way to get Stata to build a list of those variables in a local macro, and then you can feed that local macro into -reshape-. So, if your variables are abc1 abc2 abc3 def1 def2 def3 ... xyz1 xyz2 xyz3, and if no other variables end in 1, you could do something like this:

                      Code:
                      // PUT ALL THE VARIABLES ENDING IN 1 IN A LOCAL MACRO
                      ds *1
                      local stubs `r(varlist)'
                      
                      // NOW STRIP THE 1s OFF TO GET THE ACTUAL STUBS
                      local stubs: subinstr local stubs "1" "", all
                      
                      // READY FOR RESHAPE
                      reshape long `stubs', i(identifier_variable) j(_j)
                      But clearly this approach is idiosyncratic and requires specific code to deal with it. You have to deal with each such instance on its own terms, I'm afraid.
                      Indeed. Thank you, Clyde. I think the quickest thing is just to rename the variables at the moment to v1, v2, etc.

                      Comment

                      Working...
                      X