Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating a new education variable by combining three variables

    Dear Statalist,

    STATA 11.2 is installed in my computer.

    And I want to generate a new completed years of education variable by using the following three variables. I have tried many things as well as reading previous posts but I could not find an answer.


    Level of education What is the last school you went?
    1 Primary School-5 years
    2 Secondary School-3 years
    3 High School-3 or 4 years
    4 University-4 to 6 years
    5 Master-2 years
    6 PhD-min 3 years


    Highest grade What is the highest grade completed in that education level?
    0 the respondent went to preparation class or did not finish any grade
    7 inconsistent answer
    8 does not know




    Graduated from the school Have you graduated from this school?
    1 Yes
    2 No



    P.S. Highest grade variable is not the relevant years of education variable. it shows how many years each respondent completed in that education level.
    Therefore, a new variables showing years of education is needed.

    Thank you very much for your help,

    Kind regards,



    Last edited by Mustafa Ozer; 07 Feb 2016, 12:10.

  • #2
    So how would you handle a case where someone finished a PhD within 6 years of starting their undergraduate program (e.g., 3 years undergrad and 3 years in gradschool)? Nothing provided here gives clear enough constraints to consistently and accurately identify how that would be coded. I would also argue that you could get more reliable information using the first and last items (e.g., if high school was the last school attended but they did not graduate the overall educational level attained could be coded as middle school). For example, would you expect there to be a difference in your outcome of interest between respondents who've finished 10 vs 11 years of school? Would that be the same effect you would expect to see between a respondent who finished 11 years of school and one who graduated from high school (e.g., is the schooling effect you are measuring on an ordinal, intervallic, or ratio scale)?

    Comment


    • #3
      Dear wbuchanan,

      Highest grade variable indeed tells you how many years the respondent spent in that education category. For instance, if the last education level is PhD, the variable says she spent 3 years or four years or one years in that category. the same argument applies to the other education categories. However, if the respondent`s last education category is high school, the variable does not allow us to know how many year he/she spent in the secondary school. Even though we do not know how many years she spent in the previous education category, what we know is that
      the primary school is 5 years;secondary school is 3 years, high school is 4 years, university is 4 year..... so we can approximately estimate how many years the respondent spent in education.

      My primary interest education variable is continuous education variable, i.e. single years of education completed.

      What would be the stata codes to construct this variable?

      Thank you in advance

      Comment


      • #4
        The mean time to completion for PhD students in the US is somewhere around the 6 year mark - exclusive of any prior training. Undergraduate programs can also vary in length as well. An associates degree, for example, is typically a two year long course of study, but the way you intent to handle this would bias the number of years of education the subject completed by 2-3 years on average. The same could be said for any subjects who were retained in grade during their primary/secondary education years. Additionally, there are growing numbers of Master's degree programs that only span a single academic year. Given the data you have available, it makes more sense to not add unnecessary noise to already coarse grained measurements. So, unless there is other data/coding that you are able to share there isn't a clear way to handle this. If you really wanted you could potentially treat it as a missing data problem and impute things, but again this would probably do more harm than good in this case.

        Comment


        • #5
          Dear Statalist,

          My individual survey is on Turkey. If the last level of education is PhD or undergrad we know how many years she spent there. But due to the fact that we do not know how many years she spent in the previous education level, of course the measure of education would carry some bias. If we accept some sort of bias, do we have any chance to do it?

          And why do i need imputation? I already know how many years they spent in the last education level and i also know how many year she can spent in the previous education levels.

          Secondly, i will also generate a dummy variable, which equals to one if the individual completes secondary school and above, and zero otherwise. How would i do it with the above variables?

          Regards
          Last edited by Mustafa Ozer; 07 Feb 2016, 15:53.

          Comment


          • #6
            Mustafa Ozer as I mentioned previously, the information that you shared provides insufficient detail for anyone to help. From your initial post:
            Highest grade What is the highest grade completed in that education level?
            0 the respondent went to preparation class or did not finish any grade
            7 inconsistent answer
            8 does not know
            P.S. Highest grade variable is not the relevant years of education variable. it shows how many years each respondent completed in that education level.
            Therefore, a new variables showing years of education is needed.


            There is no information that you provided here that explains how this highest grade variable shows the number of years the respondent completed in that education level; however, it does show what amounts to three missing variable codes. So, I'd ask again, given that your University categorical response indicates a span of 4-6 years (I assume this also implies that that length of time is required for completion). How would you identify whether or not the subject completed 4, 5, or 6 years at the undergraduate level? If you can provide those details you are almost certain to get some help.
            Last edited by wbuchanan; 08 Feb 2016, 02:57.

            Comment


            • #7
              Dear Statalist,

              I have provided all the information I have to generate completed single years of education variable and the dummy education variable, taking the value of zero if the respondent completes less than secondary school, and 1 if the respondent graduates from secondary school and above that level.

              The tables below shows tab of relevant variables. it also gives cross tabs of highest grade completed and level of education variables.

              Moreover, I also attached the data set consisting of three id variables and these 4 given variables below. I hope this helps us to generate the variables.


              Code:
              tab  everschoolattended
                
              Ever attended any school Freq. Percent cum
              No 964 12.92 12.92
              Yes 6497 87.07 99.99
              Missing 1 0.01 1000
              Total 7462 1000
              tab levelofeducation
              Level Of Education Freq.
              Did not go to formal education 965
              Primary School 3301
              Secondary School 855
              High School 1456
              University 833
              Master 44
              PhD 8
              Total 7462
              tab highestgradecompleted (i.e. how many years the respondent spent in the last education level he/she attended)
              highest grade completed in the last level of education freq
              0 1281
              1 483
              2 593
              3 1600
              4 577
              5 2916
              6 6
              Does not know 2
              Missing 4
              Total 7462
              tab highestgradecompleted levelofeducation
              notattend PrimarySc Secodarys Highschoo Universit Master PhD Total
              0 965 35 48 138 89 3 3 1281
              1 0 54 103 209 106 11 0 483
              2 0 74 82 175 232 30 0 593
              3 0 123 618 818 39 0 2 1600
              4 0 99 2 114 359 0 3 577
              5 0 2914 0 0 2 0 0 2916
              6 0 0 0 0 6 0 0 6
              doesnotknow 0 0 1 1 0 0 0 2
              missing 0 2 1 1 0 0 0 4
              total 965 3301 855 1456 833 44 8 7462
              tab graduatedfromthelastleveledc
              whether the respondent graduated from the last education level attended Freq
              Did not go to school 965
              No 1521
              Yes 4974
              Missing 2
              Total 7462

              Thank you very much,

              P.S. please see attached STATA data set used to generate the above tables,

              Regards

              Attached Files

              Comment


              • #8
                Let me ask Mustafa Ozer some questions in the hopes of setting this discussion on track:

                Essentially, what is needed is for each cell in your tabulation in #7 above, to know what number you would want to see as the number of years of school completed. At this point there is some uncertainty.

                Q: How do you want to calculate the number of years of school completed for somebody who has 2 years of university?

                I think you make the assumption that to progress to any level, the person will have completed all the previous levels, and I agree with that simplification, even though there may be some exceptions.

                According to post #1 and the tabulation at post #7, they have 5 years of primary school, and 3 years of secondary school, and we know they have 1 year of university. The question is, how many years of high school do you want to assume: 3 or 4? Depending on what you assume, then the answer will either be 5+3+3+1=12 or 5+3+4+1=13. Based on the tabulation, 3 years seems like the most common result, but we don't know if students who go on to university are different than students who stop at the end of high school - perhaps pre-university students need an extra year of high school, or perhaps they take one fewer years of high school.

                Q: The same is true for someone who has completed university and started graduate school: how many years of university do you want to assume, 4, 5, or 6?

                In this case, based on the tabulation, 4 seems like the most common result and a good assumption.

                Comment


                • #9
                  Dear Statalist,

                  What I assumed is 5 years for primary school; 3 years for secondary school; 3 years for high school; and 4 years for university; and 2 years for master.
                  The high school was 3 years in Turkey but very recently it has become 4 years. Most of the cohort here belong to the before change. This is the reason why the frequency of 3 years is higher.
                  I think it would be more suitable to choose three years. And for the robustness check I can drop people who completed 4 years of high school.

                  For instance, if somebody completed 2 years of university degree, then she has 5+3+3+2=13 years of education.

                  And most of the universities in Turkey are 4 years. So it makes sense to assume 4 years of university.

                  Comment


                  • #10
                    Dear Lisowski and Wbuchanan,

                    In the light of above given information what would be the STATA codes to generate single years of education variable?

                    Regards,

                    Comment


                    • #11
                      Here is a start at Stata (not STATA) code to generate what I understand you to want.
                      Code:
                      use "statalistdataset.dta", clear
                      replace highestgradecompleted = . if highestgradecompleted>=8
                      generate long yearsofschool = .
                      replace yearsofschool = 5+3+3+4+2 if levelofeducation==6
                      replace yearsofschool = 5+3+3+4   if levelofeducation==5
                      replace yearsofschool = 5+3+3     if levelofeducation==4
                      replace yearsofschool = 5+3       if levelofeducation==3
                      replace yearsofschool = 5         if levelofeducation==2
                      replace yearsofschool = 0         if levelofeducation==1
                      replace yearsofschool = yearsofschool + highestgradecompleted if everschoolattended==1
                      tab yearsofschool levelofeducation
                      Code:
                      yearsofsch |                      RECODE of lofedc (W108)
                             ool | PrimarySc  Secodarys  Highschoo  Universit     Master        PhD |     Total
                      -----------+------------------------------------------------------------------+----------
                               0 |        35          0          0          0          0          0 |        35
                               1 |        54          0          0          0          0          0 |        54
                               2 |        74          0          0          0          0          0 |        74
                               3 |       123          0          0          0          0          0 |       123
                               4 |        99          0          0          0          0          0 |        99
                               5 |     2,914         48          0          0          0          0 |     2,962
                               6 |         0        103          0          0          0          0 |       103
                               7 |         0         82          0          0          0          0 |        82
                               8 |         0        618        138          0          0          0 |       756
                               9 |         0          2        209          0          0          0 |       211
                              10 |         0          0        175          0          0          0 |       175
                              11 |         0          0        818         89          0          0 |       907
                              12 |         0          0        114        106          0          0 |       220
                              13 |         0          0          0        232          0          0 |       232
                              14 |         0          0          0         39          0          0 |        39
                              15 |         0          0          0        359          3          0 |       362
                              16 |         0          0          0          2         11          0 |        13
                              17 |         0          0          0          6         30          3 |        39
                              20 |         0          0          0          0          0          2 |         2
                              21 |         0          0          0          0          0          3 |         3
                      -----------+------------------------------------------------------------------+----------
                           Total |     3,299        853      1,454        833         44          8 |     6,491
                      There's more work to be done - if you include the missing option on the tab command, you'll see the cases with missing yearsofschool that need to be dealt with, but the code above should start you in the right direction.
                      Last edited by William Lisowski; 08 Feb 2016, 13:12.

                      Comment


                      • #12
                        It can be a bit to read at times, but another alternative using the solution suggested by William Lisowski would be to wrap the multiple calls in a single call using the cond function. The biggest benefit if found from using the cond function in these situations is that it helps to ensure the logic/rules are consistently implemented and not overridden. If there is a case where the same situation is true, but one condition should take precedence over another the cond function would definitely be a good idea. The biggest down side is making sure a ton of different parentheses are all balanced.

                        Comment


                        • #13
                          Dear Statalist,

                          Thank you very much for your answers.

                          I would like to ask a question regarding the codes provided by Mr. Lisowski.

                          Why are people in the category of secondary school in 5 years of education? it seems like it is overwritten. The same problem applies to the other education categories too. For instance, 8 years of education is in secondary school and high school. In Turkey, 5 years is primary school; 3 years secondary school. So if you are in high school you should be in the 9th grade.

                          This is a problem because I also want to calculate a dummy variable for completion of secondary school. the dummy will be one for the people who completes secondary school and above, and will be zero otherwise. However, it will be problematic to separate them if the table is like the following, is not it?

                          Moreover, If I am understanding correctly, Mr. Wbuchanan addressing this issue. However I have not quite understood how to use the suggestion of Mr. Wbuchanan. Could you please show it with the codes? I am new with Stata and therefore not really familiar with writing the codes.

                          Regards
                          Last edited by Mustafa Ozer; 09 Feb 2016, 17:29.

                          Comment


                          • #14
                            Why are people in the category of secondary school in 5 years of education?
                            When you have results you don't understand, the first step is always to look at the data. It is a sad fact that too many analysts never actually look at their data; only at tabulations, regressions, and the like.

                            If you follow the code I showed above with
                            Code:
                            list if levelofeducation==2 & yearsofschool==5, clean
                            you will see the 48 observations of individuals who responded that their last education level was secondary school, but they completed 0 years at that level. So their years of school completed = 5 in primary school plus 0 in secondary school.

                            So if you are in high school you should be in the 9th grade.
                            You are in the 9th grade, but maybe you do not complete it, so you only have 8 years completed.

                            This is a problem because I also want to calculate a dummy variable for completion of secondary school.
                            Something like the following would do that.
                            Code:
                            generate levelcompleted = levelofeducation
                            replace levelcompleted = levelcompleted-1 if graduatedfromthelastleveledc!=2
                            generate completedsecondary = levelcompleted>=2

                            Comment


                            • #15
                              Here is an example of what I was mentioning previously, even though William Lisowski's response may satisfy your immediate needs:

                              Code:
                              . use statalistdataset.dta, clear
                              
                              . replace highestgradecompleted = . if highestgradecompleted >= 8
                              (6 real changes made, 6 to missing)
                              
                              . 
                              . qui: g byte yrsch = cond(levelofeducation == 6 & everschoolattended == 1, 17 + highestgradecompleted, /// 
                              >                                         cond(levelofeducation == 6 & everschoolattended != 1, 17, ///   
                              >                                         cond(levelofeducation == 5 & everschoolattended == 1, 15 + highestgradecompleted, ///           
                              >                                         cond(levelofeducation == 5 & everschoolattended != 1, 15, ///   
                              >                                         cond(levelofeducation == 4 & everschoolattended == 1, 11 + highestgradecompleted, ///           
                              >                                         cond(levelofeducation == 4 & everschoolattended != 1, 11, ///   
                              >                                         cond(levelofeducation == 3 & everschoolattended == 1, 8 + highestgradecompleted, ///            
                              >                                         cond(levelofeducation == 3 & everschoolattended != 1, 8, ///   
                              >                                         cond(levelofeducation == 2 & everschoolattended == 1, 5 + highestgradecompleted, ///            
                              >                                         cond(levelofeducation == 2 & everschoolattended != 1, 5, ///   
                              >                                         cond(levelofeducation == 1 & everschoolattended == 1, 0 + highestgradecompleted, ///            
                              >                                         cond(levelofeducation == 1 & everschoolattended != 1, 0, .))))))))))))
                              
                              . 
                              . qui: g byte secondary = cond(graduatedfromthelastleveledc < 2, (levelofeducation - 1) >= 3, ///   
                              >                                                 cond(graduatedfromthelastleveledc == 2, levelofeducation >= 3, ///   
                              >                                                 cond(graduatedfromthelastleveledc == 9, ., .)))
                              
                              .                                                 
                              . ta yrsch levelofeducation
                              
                                         |                      RECODE of lofedc (W108)
                                   yrsch | PrimarySc  Secodarys  Highschoo  Universit     Master        PhD |     Total
                              -----------+------------------------------------------------------------------+----------
                                       0 |        35          0          0          0          0          0 |        35 
                                       1 |        54          0          0          0          0          0 |        54 
                                       2 |        74          0          0          0          0          0 |        74 
                                       3 |       123          0          0          0          0          0 |       123 
                                       4 |        99          0          0          0          0          0 |        99 
                                       5 |     2,914         48          0          0          0          0 |     2,962 
                                       6 |         0        103          0          0          0          0 |       103 
                                       7 |         0         82          0          0          0          0 |        82 
                                       8 |         0        618        138          0          0          0 |       756 
                                       9 |         0          2        209          0          0          0 |       211 
                                      10 |         0          0        175          0          0          0 |       175 
                                      11 |         0          0        818         89          0          0 |       907 
                                      12 |         0          0        114        106          0          0 |       220 
                                      13 |         0          0          0        232          0          0 |       232 
                                      14 |         0          0          0         39          0          0 |        39 
                                      15 |         0          0          0        359          3          0 |       362 
                                      16 |         0          0          0          2         11          0 |        13 
                                      17 |         0          0          0          6         30          3 |        39 
                                      20 |         0          0          0          0          0          2 |         2 
                                      21 |         0          0          0          0          0          3 |         3 
                              -----------+------------------------------------------------------------------+----------
                                   Total |     3,299        853      1,454        833         44          8 |     6,491 
                              
                              
                              . ta yrsch secondary
                              
                                         |       secondary
                                   yrsch |         0          1 |     Total
                              -----------+----------------------+----------
                                       0 |        35          0 |        35 
                                       1 |        54          0 |        54 
                                       2 |        74          0 |        74 
                                       3 |       123          0 |       123 
                                       4 |        99          0 |        99 
                                       5 |     2,962          0 |     2,962 
                                       6 |       103          0 |       103 
                                       7 |        82          0 |        82 
                                       8 |       753          3 |       756 
                                       9 |       202          9 |       211 
                                      10 |       169          6 |       175 
                                      11 |       110        797 |       907 
                                      12 |         0        220 |       220 
                                      13 |         0        232 |       232 
                                      14 |         0         39 |        39 
                                      15 |         0        362 |       362 
                                      16 |         0         13 |        13 
                                      17 |         0         38 |        38 
                                      20 |         0          2 |         2 
                                      21 |         0          3 |         3 
                              -----------+----------------------+----------
                                   Total |     4,766      1,724 |     6,490 
                              
                              
                              . ta levelofeducation secondary
                              
                                   RECODE of |       secondary
                               lofedc (W108) |         0          1 |     Total
                              ---------------+----------------------+----------
                                 notattended |       965          0 |       965 
                               PrimarySchool |     3,301          0 |     3,301 
                              Secodaryschool |       855          0 |       855 
                                  Highschool |       614        841 |     1,455 
                                  University |         0        832 |       832 
                                      Master |         0         44 |        44 
                                         PhD |         0          8 |         8 
                              ---------------+----------------------+----------
                                       Total |     5,735      1,725 |     7,460
                              As an additional FYI, you'll have a much easier time with your analysis if you map missing/unknown values to system and/or extended missing values; this will help to ensure that they are not included in models and things like that.

                              Comment

                              Working...
                              X