Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping from wide to long

    Hi everyone,

    I'm trying to reshape my dataset from wide to long, but am running into some issues. I'm using StataSE15 on Windows 7.

    I imported it as:
    Code:
     import delimited "X:\My Documents\filename.csv",
    delimiter(";") varnames(1) case(lower) numericcols(_all)
    The numericcols(_all) is to prevent Stata from importing everything as strings.

    The current format is:

    unit_id series_id b01-01-2016 ... b14-11-2017
    1 1
    1 2
    1 3
    1 4
    1 5
    1 6
    1 7
    1 8
    ... ... ... ...
    1000 1
    1000 2
    1000 3
    1000 4
    1000 5
    1000 6
    1000 7
    1000 8

    Where unit_id stands for a firm (1000 total) and series_id stands for various equity observations such as stock price, it's daily data ranging from 01-01-2016 to 14-11-2017.

    I want it to take this format:

    unit_id date series_id(1) series_id(2) ... series_id(8)
    1 01-01-2016
    1 02-02-2016
    ... ...
    1 14-11-2017

    etc

    I tried
    Code:
     reshape long date, i(unit_id series_id) j(date)
    but got the error:
    "no xij variables found" r(111)
    which says: error . . . . . . . . . . . . . . . . . . . . . . . . Return code 111
    __________ not found
    no variables defined
    The variable does not exist. You may have mistyped the
    variable's name.
    But I'm pretty sure I have not mistyped anything.

    I tried
    Code:
    reshape long b, i(unit_id series_id) j(date)
    Where the b refers to the b before each date. This gives a list of errors saying (note: b1012016 not found)
    (note: b1022016 not found)
    (note: b1022017 not found)
    (note: b1032016 not found)
    (note: b1032017 not found)
    (note: b1042016 not found)
    (note: b1052017 not found)
    (note: b1062016 not found)

    which I eventually have to press break on, otherwise the computer crashes.

    Does anyone have any idea how to reshape my data? Please let me know!

  • #2
    Please give a genuine example using dataex as the FAQ Advice requests of you. Section #12 applies.

    Working out what your data are exactly like becomes too much of a challlenge once it is noticed that b01-01-2016 can't be a legal variable name,
    Last edited by Nick Cox; 28 Nov 2017, 10:53.

    Comment


    • #3
      Welcome to Statalist.

      Your presentation of your data leaves ... to the imagination. Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

      It would be particularly helpful to post a small subset of your data, perhaps with just a 5 values of your unit_id, 3 values of your series_id, and 3 of your b variables. In particular, please read FAQ #12 and use dataex and CODE delimiters when posting sample data to Statalist.

      I can't speak for others, but I find it much easier to present sample code to show technique, that to spell it out in prose, and without sample data, I can't test sample code.

      The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

      Comment


      • #4
        I'm sorry for formulating my question badly. I'm completely new to Stata.
        I read through the FAQ and downloaded dataex but I'm unable to figure out how it works. I put in
        Code:
         dataex [unit_id series_id b01012016 b04012016 b05012016]
        but it keeps giving me the error "weights not allowed" r(101) and I've been unable to find much more on the internet.

        Below is a crudely copy-pasted sample of my data. Sorry for that.

        Code:
        unit_id    series_id    b01012016    b04012016    b05012016    b06012016
        1    1    1596.04    1549.01    1555.56    1512.99
        1    2    0    -2.95    .42    -2.74
        1    3    536.2    520.4    522.6    508.3
        1    4    0    -2.95    .42    -2.74
        1    5    3444.26    3367.42    3389.41    3356
        1    6    0    -2.23    .65    -.99
        1    7        23664.1    18505.2    40641
        1    8    1.03    1.03    1.03    1.03
        2    1    1.0e+06    1.0e+06    1.0e+06    1.0e+06
        2    2    0    -2.27    .64    .51
        2    3    3771    3685.5    3709    3728
        2    4    0    -2.27    .64    .51
        2    5    3444.26    3367.42    3389.41    3356
        2    6    0    -2.23    .65    -.99
        2    7        2011.8    1887.6    2083.1
        2    8    1.26    1.26    1.26    1.26
        I read somewhere else on Statalist that the dates are not allowed to start with a number, hence the b in front of them. I changed my csv file to dates at the top of each column, but when I imported it using
        Code:
         import delimited "X:\My Documents\Filename.csv", delimiter(";")
        varnames(1) case(lower) numericcols(_all)
        (490 vars, 7,816 obs)

        it showed them as v3 all the way to v490.

        I hope this makes it a bit clearer

        Comment


        • #5
          You may try this:

          Code:
          reshape long b, i(unit_id series_id) j(dt) string
          gen date = date(dt, "DMY")
          format %td date
          drop dt
          
          reshape wide b, i(unit_id date) j(series_id)
          rename b* series_id*
          Using your copy-pasted dataset, the first part produces the following dataset (output is truncated):

          Code:
          . list
          
               +------------------------------------------+
               | unit_id   series~d         b        date |
               |------------------------------------------|
            1. |       1          1   1596.04   01jan2016 |
            2. |       1          1   1549.01   04jan2016 |
            3. |       1          1   1555.56   05jan2016 |
            4. |       1          1   1512.99   06jan2016 |
            5. |       1          2         0   01jan2016 |
               |------------------------------------------|
            6. |       1          2     -2.95   04jan2016 |
            7. |       1          2       .42   05jan2016 |
            8. |       1          2     -2.74   06jan2016 |
            9. |       1          3     536.2   01jan2016 |
               ............................................
          After the second reshape:

          Code:
          . list
          
               +-------------------------------------------------------------------------------------------------------------+
               | unit_id        date   series~1   series~2   series~3   series~4   series~5   series~6   series~7   series~8 |
               |-------------------------------------------------------------------------------------------------------------|
            1. |       1   01jan2016    1596.04          0      536.2          0    3444.26          0    23664.1       1.03 |
            2. |       1   04jan2016    1549.01      -2.95      520.4      -2.95    3367.42      -2.23    18505.2       1.03 |
            3. |       1   05jan2016    1555.56        .42      522.6        .42    3389.41        .65      40641       1.03 |
            4. |       1   06jan2016    1512.99      -2.74      508.3      -2.74       3356       -.99          .       1.03 |
            5. |       2   01jan2016    1000000          0       3771          0    3444.26          0     2011.8       1.26 |
               |-------------------------------------------------------------------------------------------------------------|
            6. |       2   04jan2016    1000000      -2.27     3685.5      -2.27    3367.42      -2.23     1887.6       1.26 |
            7. |       2   05jan2016    1000000        .64       3709        .64    3389.41        .65     2083.1       1.26 |
            8. |       2   06jan2016    1000000        .51       3728        .51       3356       -.99          .       1.26 |
               +-------------------------------------------------------------------------------------------------------------+
          Hope this helps,

          Jean-Claude Arbaut
          Last edited by Jean-Claude Arbaut; 28 Nov 2017, 14:52.

          Comment


          • #6
            Since Jean-Claude Arbaut has given you what seems to be a good solution to your problem, let me address a few other things

            First, it's not that dates cannot start with a number, but rather that Stata variable names cannot start with a number. As a consequence, when Stata imported your data it was not able to use the contents of the first row as a variable name from column 3 onward, so it gave you the default variable names.

            Secondly, I'm sorry you found the syntax for the dataex command as presented by help dataex to not be understandable. But because the help files are crucial to making effective use of Stata, that points to a bigger problem: you haven't understood the conventions of Stata documentation.

            I'm sympathetic to you as a new user of Stata - it's a lot to absorb. And even worse if perhaps you are under pressure to produce some output quickly. Nevertheless, I'd like to encourage you to take a step back from your immediate tasks.

            When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. There are a lot of examples to copy and paste into Stata's do-file editor to run yourself, and better yet, to experiment with changing the options to see how the results change.

            All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu. The objective in doing the reading was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

            Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

            At a minimum, do reach Chapter 10 in the Getting Started manual to understand Stata command syntax and how it is represented in print. Good luck!

            Comment


            • #7
              To Jean-Claude Arbaut, thank you so so much! It worked! You have no idea how happy I am!

              To William Lisowski, thank you, I'm indeed under a lot of pressure, but your comment made me realise that taking a step back and reading the manual might be faster than getting completely stuck on elementary subjects. I'll start reading first.

              Thanks again to both of you, it's amazing to see such a helpful and fast responding community!

              Comment

              Working...
              X