Reshape to wide format and data splitting?

Mohammed Al-Saffar

Join Date: May 2023
Posts: 2

Reshape to wide format and data splitting?

09 May 2023, 19:23

Hello,

I have a dataset which comes in long format and has 8863 observations. For my analysis, i have reshaped the data from long to wide (see tables below) which allows me to have one row per case. However, when i reshaped to wide format the data then split, meaning that my observations numbers are different from the original. For example for the variables fcacc_mean_acc_24h_1 and fcacc_mean_acc_24h_2, the number are now 4750 and 4113, respectively. I need to have one variable that has all the observation together rather than two? Does anyone have any suggestions? I have tried using stack command but this clears away the variables in memory and i need to keep original variables in memory to merge with my master file using the id variable?

Long format:

MSCID	fcaccad	fcaccmonth	fcaccweekday	fcacc_mean_acc_24h
M1111111	1	April	Monday	25
M1111111	2	April	Tuesday	24
M2222222	1	November	Wednesday	12
M2222222	2	November	Friday	13
M3333333	1	December	Thursday	15
M3333333	2	December	Monday	16

Wide format:

MSCID	Fcaccmonth_1	Fcaccweekday_1	fcacc_mean_acc_24h_1	Fcaccmonth_2	Fcaccweekday_2	fcacc_mean_acc_24h_2
M1111111	April	Monday	25	April	Tuesday	24
M2222222	November	Wednesday	12	November	Friday	13
M3333333	December	Thursday	15	December	Monday	16

Best,
Mohammed

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30058
#2

09 May 2023, 20:35

I don't see anything wrong here. The examples you show are a perfectly correct transformation of long data into wide, and all of the information that appears in the long data set also appears in the wide version.

When working with the long version, how many non-missing observations do you have for variable fcacc_mean_acc_24h? I suspect it is the sum of 4750 and 4113. (In fact you do say you started out with 8863 observations.) The conclusion I am led to is that in the original long data set, the observations are not evenly distributed among the two values of fcaccad, but rather that you started with 4,750 having fcaccad = 1 and 4,113 with fcaccad = 2. Run -tab fcaccad- in the long data set to verify or refute my prediction.

I need to have one variable that has all the observation together rather than two?

If you want one variable that has all the observations of fcacc_mean_acc_24h, then you should stay with the original long layout--that is not possible with wide data by definition.

Actually, why are you reshaping wide in the first place? There are only a small number of things that work better, or even at all, with wide data in Stata. Unless you know for a fact that you are going to be doing some of those, you will find that your work is much easier with long data.

Last edited by Clyde Schechter; 09 May 2023, 20:39.
Comment
Mohammed Al-Saffar

Join Date: May 2023

Posts: 2
#3

10 May 2023, 09:16

[QUOTE=Run -tab fcaccad- in the long data set to verify or refute my prediction.]

tab fcaccad

sortkey: |
| Freq. Percent Cum.
------------+-----------------------------------
1 | 4,551 51.35 51.35
2 | 4,312 48.65 100.00
------------+-----------------------------------
Total | 8,863 100.00

it's as you say, that the data is not evenly distributed.

[QUOTE=Clyde Schechter;n1713041.

Actually, why are you reshaping wide in the first place? There are only a small number of things that work better, or even at all, with wide data in Stata. Unless you know for a fact that you are going to be doing some of those, you will find that your work is much easier with long data.
[/QUOTE]

I'am performing analysis on a longitudinal dataset. So, I need to merge the above dataset with my master file, which i have also reshaped from long to wide format. This is done so that I can perform regression and other advanced statistical methods such using wide format.

At this point all i want to do is perform descriptive statistics so i can obtain the mean and 95% CI for each variable of interest using the total number of observations in wide format.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30058
#4

10 May 2023, 10:27

So, I need to merge the above dataset with my master file, which i have also reshaped from long to wide format.

It is possible, though unlikely, that you needed to go to wide layout to do this. More likely, it could have been done from long. Be that as it may, what is done is done, and if you have a correctly constructed merged data set there is no reason to go back.

This is done so that I can perform regression and other advanced statistical methods such using wide format.

I cannot think of even one Stata command for regression analysis of longitudinal data that will even work at all in wide layout. All Stata commands for regression in longitudinal data that I know of require long layout.
Comment

Announcement

Reshape to wide format and data splitting?

Comment

Comment

Comment