Hi all
I am trying to make a dataset showing some statistics of the collective of graduates from different universities in different years. Currently I have one observation and then a lot of variables in the form 'variablejx', where j is the university and x is the year. So for example the proportion of males among students who graduated from university 7 in year 2006 is shown by 'male_7_6' and the grade point average from highschool among the students who graduated from university 2 in year 2008 is shown by 'gpa_2_8'. As I have 8 variables, 8 universities and 16 years I have 8*8*16=1.024 variables. I want to reshape this very wide dataset into long format with each row corresponding to a university-year observation. I suppose I could do this by first reshaping into wide having the "j-variable" be the year and then reshaping into wide again having the "j-variable" be the university. My problem is that I don't want to type in 1.024 different stubs in the first reshape command, but wildcards are - as far as I have understood - not allowed in reshape commands.
Here is an example of what my data looks like (where I have restricted the number of variables to 2, universities to 3 and years to 3, so to make the data-example not too comphrehensive):
Is there some workaround the no-wildcard rule, which can be applied in my case?
****
In case anybody wonders how I got into this akward situation (and in case someone perhaps could provide an easier solution to my "real" task by looking at the steps before) I started by having a dataset with individual data on all individuals graduating from the 8 universities. I am trying to get from there, where my rows correspond to individuals observed in years, to a dataset where my rows correspond to graduate-years from the universities. I have currently made variables for each university in each graduate-year by
...where male shows the sex of the individual (1=male) and 'university' and 'gradyear' shows the university and year the individual graduated from. Then after doing this for each of my 8 variables I wrote (just writing male in the code as an example, so not to put too much text in this post):
And now I am in the situation with 1 observation and 1.024 variables which I hope to reduce to 8 variables after two succesful reshape commands. I would be happy if anyone could see a smarter solution to my problem which in generic terms could be stated as "how to go from a dataset on individuals clustered in institutions to a dataset on institutions", but for the purpose of knowing how to work around the problem of not being able to use wildcards in reshape-commands I would also much appreciate if anyone could point me in the direction of a solution to first problem.
I am trying to make a dataset showing some statistics of the collective of graduates from different universities in different years. Currently I have one observation and then a lot of variables in the form 'variablejx', where j is the university and x is the year. So for example the proportion of males among students who graduated from university 7 in year 2006 is shown by 'male_7_6' and the grade point average from highschool among the students who graduated from university 2 in year 2008 is shown by 'gpa_2_8'. As I have 8 variables, 8 universities and 16 years I have 8*8*16=1.024 variables. I want to reshape this very wide dataset into long format with each row corresponding to a university-year observation. I suppose I could do this by first reshaping into wide having the "j-variable" be the year and then reshaping into wide again having the "j-variable" be the university. My problem is that I don't want to type in 1.024 different stubs in the first reshape command, but wildcards are - as far as I have understood - not allowed in reshape commands.
Here is an example of what my data looks like (where I have restricted the number of variables to 2, universities to 3 and years to 3, so to make the data-example not too comphrehensive):
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte(gpa_1_1 gpa_1_2 gpa_1_3 gpa_2_1 gpa_2_2 gpa_2_3 gpa_3_1 gpa_3_2 gpa_3_3) float(male_1_1 male_1_2 male_1_3 male_2_1 male_2_2 male_2_3 male_3_1 male_3_2 male_3_3 i) 1 2 3 4 5 6 7 8 9 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 end
****
In case anybody wonders how I got into this akward situation (and in case someone perhaps could provide an easier solution to my "real" task by looking at the steps before) I started by having a dataset with individual data on all individuals graduating from the 8 universities. I am trying to get from there, where my rows correspond to individuals observed in years, to a dataset where my rows correspond to graduate-years from the universities. I have currently made variables for each university in each graduate-year by
Code:
forval j = 1/8{
forval x = 1/16{
egen male_`j'_`x' = mean(male) if university==`j' & gradyear==`x'
}
}
Code:
foreach x of varlist male*{
sort `x'
replace `x' = `x'[1]
}
keep in 1

Comment