Reshape problems

Jonas Meyer

Join Date: Jul 2014

Posts: 20
#1

Reshape problems

12 Aug 2014, 02:18

Hello everybody.

I have a problem when I try to reshape my dataset. My has informations about high-school grades, but because some students have taken one or more classes again, across a school-year, I can get my dataset reshaped because my year-variabel is not uniquely identified with the students id-number (pnr). In almost all cases it only concerns one class that the student have taken again in another year. so if I could just delete the class-grade that has been taken in a different year in comparison to the students other grades that would be fine. Perhaps my example below can do some explaining:

ID year Class1 Class2 Class3
1 2000 . A .
1 2000 A . .
1 2003 A
2 2001 A . .
2 2005 . A .
2 2001 . . A
3 2000 A . .
3 2000 . A .
3 2000
4 2003 . A .
4 2003 A . .
4 2008 . . A

As you can see the problem arise for student nr. 1, 2 and 4, but not for studen 3. And problem is that when I try to reshape from long to wide, the year-variabel has to be uniquely identified with the ID. I want a dataset one student observation (ID) in the long form, and then one grade per one class in the wide form. But I still want the year-variabel to be a part of the dataset, so the year obiously need to be consistent within the student-ID.

Can anyone help me out with an idea? Or just show me how I can delete the year observation (per student) that does not equal the other year-observations for that student?

Kind regards
Jonas
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35754
#2

12 Aug 2014, 02:25

The prior question is why you think you need to reshape. In general, a long shape or structure is preferred in Stata. In this case, reshape wide is only possible by omitting data, in other words an act of violence to the data that you should undertake only when absolutely necessary.

Tell us what you want to do that you think requires reshape wide.
Comment
Jonas Meyer

Join Date: Jul 2014

Posts: 20
#3

12 Aug 2014, 04:18

The reshape is necessary due to the build op of this case I'm working on. The information lost by manipulating the year-variabel will be restored later on, when I'm going to merge the dataset to other datasets. So for now the only thing important for me is to get the dataset reshaped from long to wide. I think the best way for me to continue forward is, for every student ID to make the latest year-observation count. I mean, if a student has taken classes in both 2000 and in 2002, then i would like to change all the year-observations for that student to 2002.

Can you help with some code which can fix this professor?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#4

12 Aug 2014, 04:24

It's your call, but "I want it because I need it" is to me no kind of explanation.

Sorry, but I have no interest in providing code for what, on the evidence here, seems to me to be bad practice in data analysis.
Comment
Jonas Meyer

Join Date: Jul 2014

Posts: 20
#5

12 Aug 2014, 05:01

Dear professor. The reshape of the dataset (including the year-variabel) is very important and a necessarity for me to move forward with this case, while the information contained in the year-variabel is only of transitory importance. The observations that will be affected by the change in the year variable will be dropped later on after the merging with the other datasets. I purely need the year-variabel reshape as a backup check later on in the progress. You have to trust me when I say that the manipulation of the year-variable will not affect the final results, at all! From this (rather poor) explanation I hope you can understand my problem.

Your help has been highly appreciated so far and I would be very sorry if it should end here?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#6

12 Aug 2014, 05:08

Sorry, but like anybody else I pick and choose what I answer and I don't want to take this further. The question remains open to those minded to answer it.
Comment
Jonas Meyer

Join Date: Jul 2014

Posts: 20
#7

12 Aug 2014, 05:27

I'm very sorry you feel that way professor Cox.

Can anybody else help me out?
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1438
#8

12 Aug 2014, 05:39

For what it's worth I would endorse Nick's point about picking and choosing what to respond to. Everybody does it; he's simply honest enough to say it to you explicitly! No one has an obligation to respond to anything. (See the Forum FAQ.)

To me, the problem is as much that I don't really understand what you want to do (or indeed why -- Nick's point). Your existing explanations are insufficiently clear to me.

One way of moving forward might be to (a) repost your example data in a more user-friendly manner (e.g. using CODE delimiters, via the advanced editing features accessible via underlined-A button). At present, the information is not in a fixed-width font and hard to digest. (b) Then also add another version of these example data, transformed in the manner in which you are looking for (also formatted for legibility). And (c) revise the commentary about what you want done. [Do all this within the existing thread; don't start a new one.]
Comment

Jonas Meyer

Join Date: Jul 2014
Posts: 20

13 Aug 2014, 06:01

Thank you for your attention Professor Jenkins. I will try to clarify my request:

My dataset contains information about students high-school grades, and the year they have taken the specific classes. Below I've tried to create an example of the dataset.

ID	Class ID	Class-grade	School ID	Year
1	CL1	A	SC1	2000
1	CL2	B	SC1	2001
1	CL3	C	SC1	2001
2	CL1	C	SC2	2002
2	CL2	A	SC2	2004
2	CL3	B	SC2	2004
3	CL1	A	SC3	2003
3	CL2	D	SC3	2003
3	CL3	A	SC3	2003
4	CL1	A	SC2	2008
4	CL2	C	SC2	2007
4	CL3	A	SC2	2006

As you can see there a multiple observations per student, each containing one grade-observation and one year-observation (and a School-ID observation). Now I would like to reshape this dataset from long to wide because I need to merge it with other dataset.

The problem is, that because of the different year-observations per student, Stata (obviously) can not perform a reshape operation because Stata does not know which of the year-observations to use.

I would like to make the dataset look like the following one:

ID	CL1-grade	CL2-grade	CL3-grade	School ID	Year
1	A	B	C	SC1	2001
2	C	A	B	SC2	2004
3	A	D	A	SC3	2003
4	A	C	A	SC2	2008

Where I have used the latest year-observation per student.

The problem is not the Reshape part of the syntax. But rather, that I don't know how to create a code which replaces the oldest year-observation with the newest year-observation per student so that I can perform the reshape of the dataset.

I hope that this post was more informativ than the former ones.

/Jonas

Comment

Stephen Jenkins

Join Date: Apr 2014
Posts: 1438

#10

13 Aug 2014, 07:47

Thanks for reformatting and reformulation. How about something based on the commands in the following?

Code:

 . li
       +----------------------------------------+
     | ID   classID   grade   schoolID   year |
     |----------------------------------------|
  1. |  1       CL1       A        SC1   2000 |
  2. |  1       CL2       B        SC1   2001 |
  3. |  1       CL3       C        SC1   2001 |
  4. |  2       CL1       C        SC2   2002 |
  5. |  2       CL2       A        SC2   2004 |
     |----------------------------------------|
  6. |  2       CL3       B        SC2   2004 |
  7. |  3       CL1       A        SC3   2003 |
  8. |  3       CL2       D        SC3   2003 |
  9. |  3       CL3       A        SC3   2003 |
 10. |  4       CL1       A        SC2   2008 |
     |----------------------------------------|
 11. |  4       CL2       C        SC2   2007 |
 12. |  4       CL3       A        SC2   2006 |
     +----------------------------------------+
  . destring classID, ignore(CL) replace
classID: characters C L removed; replaced as byte
  . destring schoolID, ignore(SC) replace
schoolID: characters S C removed; replaced as byte
  . sort ID year
  . list , sepby(ID)
       +----------------------------------------+
     | ID   classID   grade   schoolID   year |
     |----------------------------------------|
  1. |  1         1       A          1   2000 |
  2. |  1         3       C          1   2001 |
  3. |  1         2       B          1   2001 |
     |----------------------------------------|
  4. |  2         1       C          2   2002 |
  5. |  2         2       A          2   2004 |
  6. |  2         3       B          2   2004 |
     |----------------------------------------|
  7. |  3         3       A          3   2003 |
  8. |  3         2       D          3   2003 |
  9. |  3         1       A          3   2003 |
     |----------------------------------------|
 10. |  4         3       A          2   2006 |
 11. |  4         2       C          2   2007 |
 12. |  4         1       A          2   2008 |
     +----------------------------------------+
  . bys ID (year): ge obs_no = _n
  
. bys ID (year): ge last_grade_observed = grade[_N]
 
  . bys ID (year): ge last_year_observed = year[_N]
  . bys ID (year): ge last_school_observed = schoolID[_N]
  . list , sepby(ID) noobs
    +----------------------------------------------------------------------------------+
  | ID   classID   grade   schoolID   year   obs_no   last_g~d   last_y~d   last_s~d |
  |----------------------------------------------------------------------------------|
  |  1         1       A          1   2000        1          B       2001          1 |
  |  1         3       C          1   2001        2          B       2001          1 |
  |  1         2       B          1   2001        3          B       2001          1 |
  |----------------------------------------------------------------------------------|
  |  2         1       C          2   2002        1          B       2004          2 |
  |  2         2       A          2   2004        2          B       2004          2 |
  |  2         3       B          2   2004        3          B       2004          2 |
  |----------------------------------------------------------------------------------|
  |  3         3       A          3   2003        1          A       2003          3 |
  |  3         2       D          3   2003        2          A       2003          3 |
  |  3         1       A          3   2003        3          A       2003          3 |
  |----------------------------------------------------------------------------------|
  |  4         3       A          2   2006        1          A       2008          2 |
  |  4         2       C          2   2007        2          A       2008          2 |
  |  4         1       A          2   2008        3          A       2008          2 |
  +----------------------------------------------------------------------------------+
  . bys ID (year): keep if _N == _n
(8 observations deleted)

. drop grade schoolID year obs_no
 
  . list , sepby(ID) noobs
    +-----------------------------------------------+
  | ID   classID   last_g~d   last_y~d   last_s~d |
  |-----------------------------------------------|
  |  1         2          B       2001          1 |
  |-----------------------------------------------|
  |  2         3          B       2004          2 |
  |-----------------------------------------------|
  |  3         1          A       2003          3 |
  |-----------------------------------------------|
  |  4         1          A       2008          2 |
  +-----------------------------------------------+

No reshape involved, here. Bygroup operations instead. Note that I converted your string ID variables containing redundant characters to numeric.

Comment

Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#11

13 Aug 2014, 07:51

Jonas, the last example is not illustrative. You have 12x1=12 grades to begin with and 4x3=12 grades after the transformation. This is a classical example of reshape to wide, with no data loss. If this is the data you have just follow the manual for reshape.

Do you have anywhere in your data something like

Code:

id year class grade 1 2000 CL1 B 1 2000 CL1 A

If so, how do you want to break the tie (which grade A or B will you select?) Do you have reasons to believe that observations are ordered chronologically?

It is a common practice in some countries to grade a discipline (e.g. math) every year a student studies and seats the course, but only the last year's grade is carried over to the certificate. Is this something you are doing? It implies data loss, and it's fine as long as everybody understands it. (you can later see only final grade, not any intermediate grades).

If not, you can still reshape long to wide with multiple slots per year which you can create artificially (if you know that grades are coming from semesters, 2 per year, or there are 3 attempts to pass the exam max, etc).

Hope this helps.
Best Sergiy Radyakin
(not a professor)
Comment
Jonas Meyer

Join Date: Jul 2014

Posts: 20
#12

13 Aug 2014, 09:01

Thank you Professor Jenkins, the problem has been solved.
Comment
Jonas Meyer

Join Date: Jul 2014

Posts: 20
#13

13 Aug 2014, 09:15

Sergiy, I'm not sure I quite understand your post. The essential problem was that the year-observations was'nt uniquely identified with the student-ID which made it impossible to reshape the dataset and still keep the year-variabel. With help from Professor Jenkins I managed to create a unique year-observation per student, so that the year-variabel became uniquely identified with the student-ID, which again made a reshape from long to wide possible.

Anyhow, thanks a lot for your attention and involvement within this thread.

Best Jonas
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment