Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping long to wide, values not unique. general solution?

    Hi everyone,

    dataex:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str2 EventRootCode float real_month long monthly_ERC_count double monthly_ERC_event_total
    "15" 448 1 1
    "20" 480 1 1
    "20" 496 1 1
    "20" 474 1 1
    "20" 637 1 1
    "20" 440 1 1
    "20" 472 1 1
    "20" 528 1 1
    "20" 533 1 1
    "20" 518 1 1
    "20" 500 1 1
    "20" 531 1 1
    "20" 636 1 1
    "20" 604 1 1
    "20" 456 1 1
    "20" 499 1 1
    "20" 601 1 1
    "20" 503 1 1
    "20" 479 1 1
    "20" 482 1 1
    "20" 494 1 1
    "20" 436 1 1
    "20" 602 1 1
    "20" 461 1 1
    "20" 492 1 1
    "20" 562 1 1
    "20" 478 1 1
    "20" 489 1 2
    "20" 505 1 2
    "20" 447 1 2
    "20" 488 1 2
    "20" 491 1 2
    "20" 600 1 2
    "20" 580 1 2
    "20" 559 1 2
    "20" 417 1 2
    "18" 441 1 2
    "20" 624 1 2
    "20" 413 1 2
    "14" 408 1 2
    "20" 507 1 2
    "20" 457 1 2
    "20" 408 1 2
    "20" 475 1 2
    "15" 425 1 2
    "15" 643 1 2
    "20" 490 1 2
    "20" 515 1 2
    "20" 434 1 2
    "20" 593 1 2
    "20" 451 1 2
    "20" 506 1 2
    "20" 409 1 2
    "20" 508 1 2
    "20" 420 1 2
    "20" 462 1 3
    "20" 566 1 3
    "20" 416 1 3
    "20" 466 1 3
    "20" 473 1 3
    "20" 509 1 3
    "20" 590 1 3
    "15" 552 1 3
    "14" 553 1 4
    "09" 425 1 4
    "20" 439 1 4
    "20" 535 1 4
    "20" 540 1 5
    "20" 471 1 5
    "20" 519 2 2
    "14" 419 2 2
    "09" 430 2 2
    "20" 464 2 2
    "20" 453 2 2
    "20" 510 2 2
    "14" 423 2 2
    "20" 579 2 2
    "20" 455 2 2
    "20" 621 2 3
    "15" 420 2 3
    "20" 484 2 3
    "20" 536 2 3
    "20" 568 2 3
    "20" 497 2 3
    "20" 517 2 3
    "14" 539 2 3
    "20" 616 2 3
    "14" 428 2 3
    "20" 599 2 3
    "18" 412 2 4
    "20" 501 2 4
    "15" 431 2 4
    "15" 427 2 4
    "18" 410 2 4
    "20" 529 2 4
    "20" 569 2 4
    "20" 470 2 5
    "20" 574 2 5
    "14" 431 2 5
    "20" 560 2 5
    end
    format %tm real_month
    [/CODE]
    ------------------ copy up to and including the previous line ------------------


    reshape wide EventRootCode real_month, i( monthly_ERC_count ) j( monthly_ERC_event_total )
    Error:
    values of variable monthly_ERC_event_total not unique within monthly_ERC_count
    Your data are currently long. You are performing a reshape wide. You specified i(monthly_ERC_count) and j(monthly_ERC_event_total). There are observations
    within i(monthly_ERC_count) with the same value of j(monthly_ERC_event_total). In the long data, variables i() and j() together must uniquely identify the
    observations.

    long wide
    +---------------+ +------------------+
    | i j a b | | i a1 a2 b1 b2 |
    |---------------| <--- reshape ---> |------------------|
    | 1 1 1 2 | | 1 1 3 2 4 |
    | 1 2 3 4 | | 2 5 7 6 8 |
    | 2 1 5 6 | +------------------+
    | 2 2 7 8 |
    +---------------+
    Type reshape error for a list of the problem variables.


    Read the manual on reshaping , however did not see a solution thus far.

    Stata`s error does give me the clear message that the reasons is that monthly_ERC_event_total does not uniquely identify observations and therefore then by itself cannot be used as the id for
    Code:
    reshape
    .

    can someone give me a general way to go about tackling this problem or concrete example.

    reshaping stata manual: https://www.stata.com/manuals13/dreshape.pdf
    my problem is best reflected as similar to example 3.

    Any comments or suggestions are welcome.

  • #2
    Let's start by looking at just two of your observations, for which the observations are uniquely identified.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str2 EventRootCode float real_month long monthly_ERC_count double monthly_ERC_event_total
    "20" 478 1 1
    "20" 489 1 2
    end
    format %tm real_month
    reshape wide EventRootCode real_month, i( monthly_ERC_count ) j( monthly_ERC_event_total )
    list, clean noobs abbreviate(20)
    Code:
    . reshape wide EventRootCode real_month, i( monthly_ERC_count ) j( monthly_ERC_event_total )
    (note: j = 1 2)
    
    Data                               long   ->   wide
    -----------------------------------------------------------------------------
    Number of obs.                        2   ->       1
    Number of variables                   4   ->       5
    j variable (2 values)monthly_ERC_event_total-> (dropped)
    xij variables:
                              EventRootCode   ->   EventRootCode1 EventRootCode2
                                 real_month   ->   real_month1 real_month2
    -----------------------------------------------------------------------------
    
    . list, clean noobs abbreviate(20)
    
        monthly_ERC_count   EventRootCode1   real_month1   EventRootCode2   real_month2  
                        1               20       1999m11               20       2000m10
    Is this the result you expected? Perhaps you misstated the reshape wide command.

    If it is what you expected, consider adding this third observation.
    Code:
    "20" 505 1 2
    Since monthly_ERC_event_total=2, the reshape wide command will want to identify the reshaped variables as EventRootCode2 and real_month2, but those names have already been used.

    So the general solution to your problem is to figure out how you want your reshaped variables identified, and then construct the suitable reshape wide command. Without a further explanation of what you hope to accomplish, it's difficult to guess an answer.

    However, it is possible you are solving the wrong problem. Instead of reshaping your data into a wide layout, perhaps you need to figure out how to accomplish your objective using your data in its current long layout. The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of the data rather than a wide layout of the same data.

    Comment


    • #3
      Hey,

      Okay yes I did misstate the reshape wide command.

      I was aiming for the following structure: | = column

      real month| monthly_ERC_count| monthly_ERC_event_total|

      Where both monthly_ERC_count and monthly_ERC_event_total would be specific for every Eventrootcodes. eventrootcodes is numbered from 1 through 20. thus I was aiming for 20 x 2 specific column.

      thank you for the suggestion.

      #
      On keeping the long format I think is a better way. I basically would like to construct new variables from the specific eventrootcodes for 1 through 20. accomplishing this in a logn format might be a better idea afterall.

      thank you for the headsup and advice.

      Comment

      Working...
      X