Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape to wide format and data splitting?

    Hello,

    I have a dataset which comes in long format and has 8863 observations. For my analysis, i have reshaped the data from long to wide (see tables below) which allows me to have one row per case. However, when i reshaped to wide format the data then split, meaning that my observations numbers are different from the original. For example for the variables fcacc_mean_acc_24h_1 and fcacc_mean_acc_24h_2, the number are now 4750 and 4113, respectively. I need to have one variable that has all the observation together rather than two? Does anyone have any suggestions? I have tried using stack command but this clears away the variables in memory and i need to keep original variables in memory to merge with my master file using the id variable?

    Long format:
    MSCID fcaccad fcaccmonth fcaccweekday fcacc_mean_acc_24h
    M1111111 1 April Monday 25
    M1111111 2 April Tuesday 24
    M2222222 1 November Wednesday 12
    M2222222 2 November Friday 13
    M3333333 1 December Thursday 15
    M3333333 2 December Monday 16
    Wide format:
    MSCID Fcaccmonth_1 Fcaccweekday_1 fcacc_mean_acc_24h_1 Fcaccmonth_2 Fcaccweekday_2 fcacc_mean_acc_24h_2
    M1111111 April Monday 25 April Tuesday 24
    M2222222 November Wednesday 12 November Friday 13
    M3333333 December Thursday 15 December Monday 16
    Best,
    Mohammed

  • #2
    I don't see anything wrong here. The examples you show are a perfectly correct transformation of long data into wide, and all of the information that appears in the long data set also appears in the wide version.

    When working with the long version, how many non-missing observations do you have for variable fcacc_mean_acc_24h? I suspect it is the sum of 4750 and 4113. (In fact you do say you started out with 8863 observations.) The conclusion I am led to is that in the original long data set, the observations are not evenly distributed among the two values of fcaccad, but rather that you started with 4,750 having fcaccad = 1 and 4,113 with fcaccad = 2. Run -tab fcaccad- in the long data set to verify or refute my prediction.

    I need to have one variable that has all the observation together rather than two?
    If you want one variable that has all the observations of fcacc_mean_acc_24h, then you should stay with the original long layout--that is not possible with wide data by definition.

    Actually, why are you reshaping wide in the first place? There are only a small number of things that work better, or even at all, with wide data in Stata. Unless you know for a fact that you are going to be doing some of those, you will find that your work is much easier with long data.
    Last edited by Clyde Schechter; 09 May 2023, 20:39.

    Comment


    • #3
      [QUOTE=Run -tab fcaccad- in the long data set to verify or refute my prediction.]

      tab fcaccad

      sortkey: |
      | Freq. Percent Cum.
      ------------+-----------------------------------
      1 | 4,551 51.35 51.35
      2 | 4,312 48.65 100.00
      ------------+-----------------------------------
      Total | 8,863 100.00

      it's as you say, that the data is not evenly distributed.


      [QUOTE=Clyde Schechter;n1713041.

      Actually, why are you reshaping wide in the first place? There are only a small number of things that work better, or even at all, with wide data in Stata. Unless you know for a fact that you are going to be doing some of those, you will find that your work is much easier with long data.
      [/QUOTE]

      I'am performing analysis on a longitudinal dataset. So, I need to merge the above dataset with my master file, which i have also reshaped from long to wide format. This is done so that I can perform regression and other advanced statistical methods such using wide format.

      At this point all i want to do is perform descriptive statistics so i can obtain the mean and 95% CI for each variable of interest using the total number of observations in wide format.

      Comment


      • #4
        So, I need to merge the above dataset with my master file, which i have also reshaped from long to wide format.
        It is possible, though unlikely, that you needed to go to wide layout to do this. More likely, it could have been done from long. Be that as it may, what is done is done, and if you have a correctly constructed merged data set there is no reason to go back.

        This is done so that I can perform regression and other advanced statistical methods such using wide format.
        I cannot think of even one Stata command for regression analysis of longitudinal data that will even work at all in wide layout. All Stata commands for regression in longitudinal data that I know of require long layout.

        Comment

        Working...
        X