Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating a new dummy variable based on the median

    I have an unbalanced panel data where many firms (each has its own id) exists for each time period t.
    then I have a variable x . I want to compute a new dummy variable that equals to 1 if the firm has an x lower than the median x of firms that exist at time t.
    how can i generate this dummy?

  • #2
    I think the simplest way to do this is:

    Code:
    rangestat (median) med_x_this_year = x, interval(year 0 0)
    gen byte desired = (x < med_x_this_year) if !missing(med_x_this_year)
    To do this you will need to install Robert Picard, Nick Cox, and Roberto Ferrer's -rangestat- command from SSC, if you don't already have it.

    Comment


    • #3
      thanks. it works nicely

      Comment


      • #4
        Sorry to bother you again about it.
        If I want the dummy to equal 1 if it is lower than the 25th percentile instead of the median, how to do so? I looked at the help file of rangestat, it supports only mean, median, min and maximum.

        Comment


        • #5
          You missed this example in the help:

          Code:
            Moving quantiles using a user-supplied Mata function
          
              Given a response variable and a predictor variable, we might be
              interested in plotting conditional quantiles, particular quantiles for
              the response calculated within moving windows of the predictor. The
              function mm_quantile() (Jann 2005) is a very suitable general tool for
              calculating quantiles. All we need is to decide which quantiles we want
              and specify those in a wrapper function.  Then we call up a graph.
          
                  --------------------------- example do-file content ---------------------------
                  webuse nlswork, clear
          
                  * ssc inst moremata needed for -mm_quantile()-
                  mata:  
                      mata clear
                      real rowvector myquantile(real colvector X) {
                          return(mm_quantile(X, 1, (0.1, 0.25, 0.5, 0.75, 0.9)))
                      }
                  end
          
                  rangestat (myquantile) ln_wage, interval(age -2 2)
          
                  label var myquantile1 "p10"
                  label var myquantile2 "p25"
                  label var myquantile3 "p50"
                  label var myquantile4 "p75"
                  label var myquantile5 "p90"
          
                  set scheme s1color
                  scatter ln_wage age, ms(oh) mc(gs8) || ///
                  line myquantile? age, sort legend(order(6 5 4 3 2) col(1) pos(3)) ///
                      ytitle("`: var label ln_wage'") yla(, ang(h)) xla(15(5)45)
                  --------------------------------------------------------------------------------
                  (click to run)
          In your case, you just want 0.25.

          Comment


          • #6
            if I have interpreted the codes correctly
            Code:
                    mata:  
                        mata clear
                        real rowvector myquantile(real colvector X) {
                            return(mm_quantile(X, 1, (0.1, 0.25, 0.5, 0.75, 0.9)))
                        }
                    end
                    
            rangestat (myquantile) x, interval(year 0 0)
            // to be followed by
            gen byte desired= (x < (name of the 0.25 quantile derived from the dataset) if !missing(name of the 0.25 quantile)
            is returning an error
            Click image for larger version

Name:	1.png
Views:	1
Size:	12.2 KB
ID:	1408873

            Comment


            • #7
              Wesso: I was not aware of rangestat until now—and am glad to learn of it—but I often create variables like the one I think you are trying to create. I may be misinterpreting your question, but here is how I would typically do this
              Code:
              sort timevar
              by timevar: egen p25x=pctile(x), p(25)
              gen dx=x<p25x
              So, for example
              Code:
              . set obs 1000
              number of observations (_N) was 0, now 1,000
              
              . gen timevar=_n<=500
              
              . sort timevar
              
              . gen x=runiform()
              
              . by timevar: egen p25x=pctile(x), p(25)
              
              . gen dx=x<p25x
              
              . by timevar: sum dx
              
              ------------------------------------------------------------------------------------------------------------------------------
              -> timevar = 0
              
                  Variable |        Obs        Mean    Std. Dev.       Min        Max
              -------------+---------------------------------------------------------
                        dx |        500         .25    .4334464          0          1
              
              ------------------------------------------------------------------------------------------------------------------------------
              -> timevar = 1
              
                  Variable |        Obs        Mean    Std. Dev.       Min        Max
              -------------+---------------------------------------------------------
                        dx |        500         .25    .4334464          0          1

              Comment


              • #8
                John is right. The recommendation of rangestat here lies in its greater flexibility and speed for large datasets, but there are several ways to get quartiles into variables.

                As said, you don't need as many quantiles as in the example in the help. More crucially, I can't reproduce the problem of #6:

                Code:
                webuse grunfeld, clear 
                
                mata:  
                mata clear
                real rowvector loq(real colvector X) {
                     return(mm_quantile(X, 1, 0.25))
                }
                end
                        
                rangestat (loq) invest, interval(year 0 0)
                gen byte desired = invest < loq1 
                
                tab desired 
                
                
                    desired |      Freq.     Percent        Cum.
                ------------+-----------------------------------
                          0 |        160       80.00       80.00
                          1 |         40       20.00      100.00
                ------------+-----------------------------------
                      Total |        200      100.00

                Comment

                Working...
                X