Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • IF One variable - Multiple Observations

    Hi there,

    I need you guys help on creating two binary variables (dropout) and (transtition)

    This is my dataset:
    Year UE St_ID Area Grade DROPOUT TRANSITION
    2009 5355 116 301 1 0 1
    2010 5355 116 301 2 0 1
    2011 5355 116 301 3 0 1
    2009 5355 116 401 1 0 1
    2010 5355 116 401 2 0 1
    2011 5355 116 401 3 0 1
    2008 6181 140 301 1 0
    2009 6181 140 301 2 0
    2010 6181 140 301 3 0
    2008 6181 140 401 1 0
    2009 6181 140 401 2 0
    2010 6181 140 401 3 0
    2009 5150 604 301 1 1 0
    2009 5150 604 401 1 1 0

    2010 5150 612 301 1 1 1
    2011 5150 612 301 2 1 0
    2010 5150 612 401 1 1 1
    2011 5150 612 401 2 1 0
    2010 102652 680 301 3 0 1
    2010 102652 680 401 3 0 1
    2011 102652 744 301 1 1 0
    2011 102652 744 401 1 1 0


    a) Creating DROPOUT variable

    If St_ID has grade 3, that means he finished school, so dropout = 0 else 1. On my dataset St_ID 604 and 744 don`t have grade 3, so dropout = 1, for all observations (lines)

    Because i have multiple observations, this how i tried doing, but it`s not working:

    // CREATE HIGH SCHOOL DROPOUT DUMMY

    // FIRST VERIFY GRADE IS CONSISTENT FOR ANY
    // STUDENT WITHIN THE COURSE OF A YEAR
    by St_ID Ano (Serie), sort: assert Serie[1] == Serie[_N]

    // caso haja contradicao - comando para identificar contradicao na base
    // by St_ID Ano (Serie), sort: gen byte flag = (Serie[1] != Serie[_N])
    // browse if flag

    // NOW REDUCE TO ONE OBSERVATION PER STUDENT-YEAR
    tempfile holding
    save `holding'
    collapse (first) Serie, by(St_ID Ano)

    // IDENTIFY WHEN DROPOUT OCCUR
    by St_ID (Ano), sort: gen byte dropout if Serie != 3

    HERE MY CONDITION DOSEN`T WORK....


    b) TRANSITION VARIABLE (grade to grade)
    This is a litle more complicated because i have to look to the future [_n+1} to answer today {_n}. I need to create a variable that ansers the question: Did St_ID go to next grade? if on my dataset there is the following grade, transition =1 ELSE 0. The highest grade level is 3 and for grade 3 we will always have transition = 1.

    For this case I wasn`t able to came up with any IF COMBINATIONS syntax.

    If someone could help me cdreating those 2 binary variables, I appreciate it.

    Max


  • #2
    This should do it:

    Code:
    clear*
    input Year UE St_ID Area Grade
    2009 5355 116 301 1
    2010 5355 116 301 2
    2011 5355 116 301 3
    2009 5355 116 401 1
    2010 5355 116 401 2
    2011 5355 116 401 3
    2008 6181 140 301 1
    2009 6181 140 301 2
    2010 6181 140 301 3
    2008 6181 140 401 1
    2009 6181 140 401 2
    2010 6181 140 401 3
    2009 5150 604 301 1
    2009 5150 604 401 1
    2010 5150 612 301 1
    2011 5150 612 301 2
    2010 5150 612 401 1
    2011 5150 612 401 2
    2010 102652 680 301 3
    2010 102652 680 401 3
    2011 102652 744 301 1
    2011 102652 744 401 1
    end
    
    by St_ID, sort: egen dropout = min(Grade != 3)
    replace dropout = !dropout
    
    rangestat (count) transition = UE, interval(Grade 1 1) by(St_ID)
    replace transition = !missing(transition)
    replace transition = 1 if Grade == 3
    Note: -rangestat- is written by Robert Picard, Nick Cox and Roberto Ferrer. You can get it by running -ssc install rangestat-. Do read -help rangestat- to understand how this works. Grade 3 required special handling here because transition status is not definable by the presence of another record where Grade = 4.

    By the way, in your example data, St_ID 612 is also a dropout--and this code correctly captures that.

    In the future, please use the -dataex- command when posting example data, as I have done here. It took me much longer to import your listing of the data into Stata than it did for me to write and test the code solving your problem. The only truly helpful way to show data examples is with the -dataex- command. Run -ssc install dataex- to get the -dataex- command, and then run -help dataex- to read the simple instructions for using it. When you use -dataex- you enable those who want to help you to create a complete and faithful replica of your Stata example to work with.
    Last edited by Clyde Schechter; 02 Aug 2017, 17:43.

    Comment


    • #3
      I did everything you told me to do, and i also installed dataex...... the reason why I don`t have grade=4, is because this study focus on brazilian high school students. In Brazil, high school is 3 years and not 4 like here in USA.

      Thank you once again Clyde.

      Comment


      • #4
        Hi Clyde,

        I need your help once again.

        Because of the period (2008-2012) of my dataset, the previous dummy variables that you helped me creating, gave me some ambiguous interpretations. Well, in order to solve some of those issues, i have been trying to create two new dummies related to grade retention. I

        In order to do that, i created a fail dummy variable which answers me the question? The grade that St_ID is coursing is repeated? Yes ==1 else 0. As you can seee on the dataset bellow, i got that.

        The DataSet

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double(Ano St_ID serie) byte fail float Retention
        2009 116 1 0 0
        2010 116 2 0 0
        2011 116 3 0 0
        2008 140 1 0 0
        2009 140 2 0 0
        2010 140 3 0 0
        2009 604 1 0 0
        2010 612 1 0 0
        2011 612 2 0 0
        2012 612 3 0 0
        2010 680 3 0 0
        2011 744 1 0 0
        2009 752 1 0 1
        2010 752 1 1 1
        2011 752 2 0 1
        2012 752 2 1 1
        2009 779 1 0 0
        2010 779 2 0 0
        2009 795 1 0 0
        2010 795 2 0 0
        2011 795 3 0 0
        end

        // CREATE FAIL DUMMY

        // FIRST VERIFY GRADE IS CONSISTENT FOR ANY
        // STUDENT WITHIN THE COURSE OF A YEAR
        by St_ID Ano (serie), sort: assert serie[1] == serie[_N]

        // NOW REDUCE TO ONE OBSERVATION PER STUDENT-YEAR
        tempfile holding
        save `holding'
        collapse (first) serie, by(St_ID Ano)

        // IDENTIFY WHEN FAIL OCCUR
        by St_ID (Ano), sort: gen byte fail = (serie == serie[_n-1]) & _n > 1

        // MERGE BACK TO ORIGINAL DATA
        merge 1:m St_ID Ano using `holding', assert(match) nogenerate

        Now comes my question;

        1) How can Identify that St-ID is a grade retention student? I would like to have 1's for all his years, and not just for the year that he is repeating.....it would be the retention column on the dataset above.

        2) I want to create another dummy that tells if St_ID has a failed his grade? To do that i need to look the next grade (serie on that data) to get the answer.

        I tried doing the opposite of creating a fail dummy, but it gave me the same answer.

        // IDENTIFY WHEN GRADEFAIL OCCUR
        by St_ID (Ano), sort: gen byte grade_failed = (serie[_n-1] == serie) & _n > 1

        Could you give any suggestion on that also?

        Once again,

        Thanks for all the help!!!

        Max




        Comment


        • #5
          So
          Code:
          by St_ID, sort: egen Retention = max(fail)
          In general, when you want to create a variable that is constant over a group of observations (i.e. the observations of a single St_ID) and is calculated from those observations, look for an -egen- function to do that. -help egen-.

          I don't understand your second question. I don't see how or why this variable you describe would be different from the variable fail that you already have created. How is "has failed his grade" different from "is in the same grade the next year?" Please clarify.

          By the way, the -& _n > 1- in your code is not needed. When _n == 1, serie[_n-1] becomes serie[0], and the 0'th observation of any variable is always taken to be missing value. So the serie == serie[_n-1] will always be false (unless you have an St_ID where serie is missing in the first observation--but that does not appear to occur in your data.


          Last edited by Clyde Schechter; 08 Aug 2017, 16:30.

          Comment


          • #6
            1) The code had run well for the variable retention.


            Originally posted by Clyde Schechter View Post
            So

            I don't understand your second question. I don't see how or why this variable you describe would be different from the variable fail that you already have created. How is "has failed his grade" different from "is in the same grade the next year?" Please clarify.
            Look on St_ID == 752

            the fail variable is saying that on the year 2010 he is taking grade 1 again. the same for the year 2012. So, fail variable tells me if he is a repeater student on that year.

            but he failed on 2009 and not on 2010. He failed on 2011 and repeated on 2012 the grade 2.

            i think that instead naming it fail I will switch to repeater.

            So, the question should be: Did St_ID ==752 failed on 2009? Yes -> Fail ==1
            Did St_ID failed on 2010? No -> Fail =0

            Was I more reasonable on my explanation now?

            Thanks agian,

            Max

            Comment


            • #7
              Now I get it; that was much clearer. So, first generate the repeater variable (that you currently call fail but will rename). Then

              Code:
              by St_ID (Ano), sort: gen failed = (repeater[_n+1] == 1)

              Comment


              • #8
                Clyde,

                Thank you once again. All these new variables seems fine. But i need your help on the grade transition variable.


                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input double(Ano St_ID UE serie Disciplina transition)
                2009 116   5355 1 301 1
                2011 116   5355 3 301 1
                2010 116   5355 2 301 1
                2011 116   5355 3 401 1
                2010 116   5355 2 401 1
                2009 116   5355 1 401 1
                2008 140   6181 1 301 1
                2010 140   6181 3 301 1
                2009 140   6181 2 301 1
                2009 140   6181 2 401 1
                2008 140   6181 1 401 1
                2010 140   6181 3 401 1
                2009 604   5150 1 301 0
                2009 604   5150 1 401 0
                2010 612   5150 1 301 1
                2011 612   5150 2 301 1
                2012 612   5150 3 301 1
                2010 612   5150 1 401 1
                2012 612   5150 3 401 1
                2011 612   5150 2 401 1
                2010 680 102652 3 301 1
                2010 680 102652 3 401 1
                2011 744 102652 1 301 0
                2011 744 102652 1 401 0
                2009 752   2070 1 301 1
                2011 752   2070 2 301 0
                2012 752 102652 2 301 0
                2010 752   2070 1 301 1
                2009 752   2070 1 401 1
                2012 752 102652 2 401 0
                2011 752   2070 2 401 0
                2010 752   2070 1 401 1
                2009 779   3522 1 301 1
                2010 779   3522 2 301 0
                2010 779   3522 2 401 0
                2009 779   3522 1 401 1
                end

                The syntax that t we used to create it was:

                // CREATE TRANSITION DUMMY
                rangestat (count) transition = UE, interval(serie 1 1) by(St_ID)
                replace transition = !missing(transition)
                replace trans = 1 if serie == 3

                If you look the data, you will see that the syntax is giving me the wrong outcome for some cases (cases that is a failed year) for the transition grade variable

                Look again to St_ID==752. When i thought of that variable, the question is? Did St_IO pass to the next grade? Look that 2009, the transition ==1, buit should be 0, because he didn`t go to next grade. But for 2010, he gave me the right answer.

                it should be...
                2009 grade 1 , transition ==0 (failed)
                2010, grade 1, transition ==1(he passed)
                2011, grade 2, transition = 0 (failed)
                2012, grade 2, transition == 0 (this is right because i don`t have any further information about St_ID).

                I have tried a few things but none of then worked.

                Thank you once again for all that help.

                Max

                Comment


                • #9
                  Yes, I see the problem. I think this does it:

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input double(Ano St_ID UE serie Disciplina)
                  2009 116   5355 1 301
                  2011 116   5355 3 301
                  2010 116   5355 2 301
                  2011 116   5355 3 401
                  2010 116   5355 2 401
                  2009 116   5355 1 401
                  2008 140   6181 1 301
                  2010 140   6181 3 301
                  2009 140   6181 2 301
                  2009 140   6181 2 401
                  2008 140   6181 1 401
                  2010 140   6181 3 401
                  2009 604   5150 1 301
                  2009 604   5150 1 401
                  2010 612   5150 1 301
                  2011 612   5150 2 301
                  2012 612   5150 3 301
                  2010 612   5150 1 401
                  2012 612   5150 3 401
                  2011 612   5150 2 401
                  2010 680 102652 3 301
                  2010 680 102652 3 401
                  2011 744 102652 1 301
                  2011 744 102652 1 401
                  2009 752   2070 1 301
                  2011 752   2070 2 301
                  2012 752 102652 2 301
                  2010 752   2070 1 301
                  2009 752   2070 1 401
                  2012 752 102652 2 401
                  2011 752   2070 2 401
                  2010 752   2070 1 401
                  2009 779   3522 1 301
                  2010 779   3522 2 301
                  2010 779   3522 2 401
                  2009 779   3522 1 401
                  end
                  
                  preserve
                  by St_ID Ano (serie),sort: assert serie[1] == serie[_N]
                  collapse (first) serie, by(St_ID Ano)
                  tempfile next_year
                  save `next_year'
                  restore
                  rangejoin Ano 1 1 using `next_year', by(St_ID)
                  
                  gen byte transitioning = (serie < serie_U) & !missing(serie_U)
                  replace transitioning = 1 if serie == 3

                  Comment


                  • #10
                    Clyde, It didn`t work. Now it gives the opposite of the last syntax

                    And i also realize that we have another issue.


                    ----------------------- copy starting from the next line -----------------------
                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input double(Ano St_ID UE serie Disciplina) byte transitioning
                    2009  116   5355 1 301 1
                    2011  116   5355 3 301 1
                    2010  116   5355 2 301 1
                    2011  116   5355 3 401 1
                    2010  116   5355 2 401 1
                    2009  116   5355 1 401 1
                    2010  612   5150 1 301 1
                    2011  612   5150 2 301 1
                    2012  612   5150 3 301 1
                    2010  612   5150 1 401 1
                    2012  612   5150 3 401 1
                    2011  612   5150 2 401 1
                    2010  680 102652 3 301 1
                    2010  680 102652 3 401 1
                    2011  744 102652 1 301 0
                    2011  744 102652 1 401 0
                    2009  752   2070 1 301 0
                    2010  752   2070 1 301 1
                    2011  752   2070 2 301 0
                    2012  752 102652 2 301 0
                    2009  752   2070 1 401 0
                    2010  752   2070 1 401 1
                    2011  752   2070 2 401 0
                    2012  752 102652 2 401 0
                    2009  779   3522 1 301 1
                    2010  779   3522 2 301 0
                    2010  779   3522 2 401 0
                    2009  779   3522 1 401 1
                    2009  795   2453 1 301 1
                    2010  795   2860 2 301 1
                    2011  795   2860 3 301 1
                    2011  795   2860 3 401 1
                    2009  795   2453 1 401 1
                    2010  795   2860 2 401 1
                    2011  914 102652 2 301 1
                    2012  914 102652 3 301 1
                    2010  914 102652 1 301 1
                    2012  914 102652 3 401 1
                    2010  914 102652 1 401 1
                    2011  914 102652 2 401 1
                    2008 1201   4510 1 301 0
                    2010 1201   4510 3 301 1
                    2008 1201   4510 1 401 0
                    2010 1201   4510 3 401 1
                    end
                    And when the data skips a year, St_ID == 1201, for any reason (ST_ID moved out of the state, went ro private school, etc.) he is not comparing the grades.

                    Once again thank you for helping me.

                    Max
                    Last edited by Max Resende; 09 Aug 2017, 14:04.

                    Comment


                    • #11
                      I'm sorry, but I don't understand what is wrong here.

                      For ID 752 we have serie 1 for both 2009 and 2010, then serie 2 for 2011 and 2012, so he transitions in 2010, but not in 2009, or 2011, nor 2012. Why isn't that correct?

                      For ID 1201, it is serie 1 in 2008 and serie 3 in 2010. Why do you assume there is a transition in 2008? You don't know what happened in 2009. It is possible that serie 1 is repeated in 2009 and then in 2010 there is a "skip" to serie 3. It is also possible that this student wasn't in school at all in 2008. You may think you want to impute a serie = 2 in 2009 here, but then what would you do with this:

                      2008 serie 1
                      2009 no information
                      2010 no information
                      2011 serie 2

                      There is no way to know whether the transition occurs 2008, 2009, or 2010.

                      So if you want to get into the business of imputing serie when there are gaps in the data, you need to think it through very carefully and come up with a complete, consistent, explicit set of rules how you do this. If you post back with that, I'll do my best to implement them in code.

                      Comment


                      • #12
                        Originally posted by Clyde Schechter View Post
                        I'm sorry, but I don't understand what is wrong here.

                        For ID 752 we have serie 1 for both 2009 and 2010, then serie 2 for 2011 and 2012, so he transitions in 2010, but not in 2009, or 2011, nor 2012. Why isn't that correct?
                        You are right. That`s correct.


                        Originally posted by Clyde Schechter View Post

                        For ID 1201, it is serie 1 in 2008 and serie 3 in 2010. Why do you assume there is a transition in 2008? You don't know what happened in 2009. It is possible that serie 1 is repeated in 2009 and then in 2010 there is a "skip" to serie 3. It is also possible that this student wasn't in school at all in 2008. You may think you want to impute a serie = 2 in 2009 here, but then what would you do with this:

                        2008 serie 1
                        2009 no information
                        2010 no information
                        2011 serie 2

                        There is no way to know whether the transition occurs 2008, 2009, or 2010.

                        So if you want to get into the business of imputing serie when there are gaps in the data, you need to think it through very carefully and come up with a complete, consistent, explicit set of rules how you do this. If you post back with that, I'll do my best to implement them in code.
                        I wasn`t thinking about imputing data. If we add data, that will change my research, As you said i will need very strong reasons to do that.

                        The reason why I said that is because I know for sure that in Brazil that:

                        Student can`t skipe grades in High School
                        It takes 3 yrs to graduate.

                        So if the St_ID is in serie 1 in 2008 and serie 3 in 2010, I can say that he made all the transitions and didn`t fail a year. Only because the high school period is minimum of 3 years and because St_ID started in 2008 and finished on 2010 (3 years, one year per grade), so i can assume that 2008 transition =1.


                        2008 serie 1
                        2009 no information
                        2010 serie 3

                        I know that is one rule inside another rule, but was i clear on my idea?

                        Please, with you think that this is ambiguos and that it can led to many interpretations, let me know.


                        Thank you once again,

                        Max
                        Last edited by Max Resende; 09 Aug 2017, 16:03.

                        Comment


                        • #13
                          OK, so there is really only that one special case. The following, then, I think is your solution:

                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input double(Ano St_ID UE serie Disciplina) 
                          2009  116   5355 1 301 
                          2011  116   5355 3 301 
                          2010  116   5355 2 301 
                          2011  116   5355 3 401 
                          2010  116   5355 2 401 
                          2009  116   5355 1 401 
                          2010  612   5150 1 301 
                          2011  612   5150 2 301 
                          2012  612   5150 3 301 
                          2010  612   5150 1 401 
                          2012  612   5150 3 401 
                          2011  612   5150 2 401 
                          2010  680 102652 3 301 
                          2010  680 102652 3 401 
                          2011  744 102652 1 301 
                          2011  744 102652 1 401 
                          2009  752   2070 1 301 
                          2010  752   2070 1 301 
                          2011  752   2070 2 301 
                          2012  752 102652 2 301 
                          2009  752   2070 1 401 
                          2010  752   2070 1 401 
                          2011  752   2070 2 401 
                          2012  752 102652 2 401 
                          2009  779   3522 1 301 
                          2010  779   3522 2 301 
                          2010  779   3522 2 401 
                          2009  779   3522 1 401 
                          2009  795   2453 1 301 
                          2010  795   2860 2 301 
                          2011  795   2860 3 301 
                          2011  795   2860 3 401 
                          2009  795   2453 1 401 
                          2010  795   2860 2 401 
                          2011  914 102652 2 301 
                          2012  914 102652 3 301 
                          2010  914 102652 1 301 
                          2012  914 102652 3 401 
                          2010  914 102652 1 401 
                          2011  914 102652 2 401 
                          2008 1201   4510 1 301 
                          2010 1201   4510 3 301 
                          2008 1201   4510 1 401 
                          2010 1201   4510 3 401 
                          end
                          preserve
                          by St_ID Ano (serie),sort: assert serie[1] == serie[_N]
                          collapse (first) serie, by(St_ID Ano)
                          by St_ID (Ano): gen byte skip_advance = (Ano+2 == Ano[_n+1]) & (serie+2 == serie[_n+1])
                          tempfile next_year
                          save `next_year'
                          restore
                          rangejoin Ano 1 1 using `next_year', by(St_ID)
                          merge m:1 St_ID Ano using `next_year', keepusing(skip_advance) update replace ///
                              assert(2 3 4 5) nogenerate
                          
                          
                          gen byte transitioning = (serie < serie_U) & !missing(serie_U)
                          replace transitioning = 1 if serie == 3 | skip_advance == 1

                          Comment


                          • #14
                            Clyde,

                            Everything looks fine.

                            Thank you once again.

                            Max

                            Comment


                            • #15
                              Clyde,

                              I need your help once gain. As you can see on my dataset, it is organized as a panel data and because i want to control for the fixed effects of high school characteristics and work with AR residual structure, I need to set my data as a panel data.

                              In order to do I tried first the basic:

                              xtset St_ID year
                              repeated time values within panel
                              r(451);
                              This happened because i have multiple observations for the same year (2 subjects per year) for each St_ID.

                              by St_ID year (Serie), sort: assert Serie[1] == Serie[_N] // NOW REDUCE TO ONE OBSERVATION PER STUDENT-YEAR tempfile holding save `holding' collapse (count) n_events = event_date (first) Disciplina, by(St_ID year) // MERGE BACK TO ORIGINAL DATA
                              merge 1:m St_ID year using `holding', assert(match) nogenerate
                              But It didnt`t work.

                              So my question is: how can i define a painel data with multiple observations for the same year - student? In other words, i have a painel data with 3 dimensions, student, school and year


                              Thank you once again.

                              Max
                              Last edited by Max Resende; 10 Aug 2017, 14:19.

                              Comment

                              Working...
                              X