Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Long table and wide table

    Hi , all!
    If I have two sheets ,sheet1 shows every students‘ score got in the exam and sheet2 contains their physical characteristics.
    sheet1
    name gender year Math English Science
    Tom m 2018 66 93 89
    Tom m 2019 79 100 74
    Tom m 2020 74 84 85
    Jerry m 2018 78 64 63
    Jerry m 2019 77 64 98
    Jerry m 2020 96 60 94
    Alice f 2018 97 85 100
    Alice f 2019 66 70 100
    Alice f 2020 65 63 94
    sheet2
    name gender year height weight
    Tom m 2019 178 75
    Tom m 2020 178 78
    Jerry m 2019 173 70
    Jerry m 2020 173 72
    Alice f 2019 165 58
    Alice f 2020 165 60
    Should I reshape them to wide table and merge them to one sheet like below ? I can easily see 3 observations here and directly study the relationship between scores and body features without calling different dtas saved in different folders, but it makes variable's name more complex. Is there any rule
    about when to use long table and when to use wide table? Any suggestions ? Thanks in advance.
    name gender Math_2018 Math_2019 Math_2020 English_2018 English_2019 English_2020 Science_2018 Science_2019 Science_2020 height_2018 height_2019 weight_2018 weight_2019
    Tom m 66 79 74 93 100 84 89 74 85 178 178 75 78
    Jerry m 78 77 96 64 64 60 63 98 94 173 173 70 72
    Alice f 97 66 65 85 70 63 100 100 94 165 165 58 60

  • #2
    I am not entirely sure what you are asking.

    You can equally well merge if your datasets are in wide, and in long format. Merging does not become any easier (or harder) if you make them wide. Hence merging is not a reason to prefer one over the other.

    The main reason to prefer wide over long or the other way round is the model you are going to fit to your data.

    For example if I regress the test score in particular year on the weight of the student in this and in the other years, then I would have to use the wide form.

    If I regress score on weight but only in the particular year, then long would do.

    In short, long vs wide is mainly determined by the analysis you want to carry out.

    Comment


    • #3
      I didn't give a good example.

      I mean when there are many sheets and all of them are complex , for example, the values of some variable in each sheet are not unique, the command "merge m:m " may cause mistakes.

      I already have the separate tables , and I want to put them together , but I am not sure if this action is reasonable.

      Maybe I should just keep them in separate and call the needed data when running the models.

      So I want to learn about a general way to deal with this situation.

      Comment


      • #4
        #2

        For example if I regress the test score in particular year on the weight of the student in this and in the other years, then I would have to use the wide form.
        Not so, as you can use time series operators directly. Long is fine for that kind of problem.

        Comment

        Working...
        X