Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using -xtile- to assess the impact of a variable on an outcome

    Hi All,

    I am looking to understand at what number of kids or distribution does a household member go out to work as opposed to devoting time to child care at home. I have variables on labor market outcomes and total number of children in a household.
    I used -xtile- to calculate quantile categories 2,3,4,5,6,7 etc for quantiles of children using the following code

    Code:
     forval i = 2(1)7{
     xtile quantchild`i' = tot_child_born, nq(`i')
      }
    This naturally creates quantchild2, quantchild3...with corresponding quantiles of children and households within each quantile. Now I want to look at what point does a woman get out of the house to work to cater to financial need of having x number of children. Would it make sense then to just run a regression with factor notation for quantchild`i'?
    For example running the following regression:

    Code:
     forval i = 2(1)7{
     reg paidwork i.quantchild`i' $controls
      }
    This should give me a coefficient for indicators at each level of the distribution no? Implying at each indicator I either get a positive or negative coefficient relating to the point where financial need of the household trades off for childcare needs. I am uncertain if this is the right method, i.e. using -xtile- and -reg- to get at the problem I am describing.

    Thanks a lot
    Lori

  • #2
    What do you hope to gain from all that xtile stuff that you cannot get by just a single regression with i.tot_child_born?
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      I was hoping to use quantiles instead of factor notations to see at what part of the distribution of children changes the coeff.

      Comment


      • #4
        Sounds a good idea, but quantiles can't make that easier. For a start, there can't be more quantile bins than there are distinct values. A discrete variable with (I guess here) moderate skew can't map cleanly to quantile bins.

        Otherwise put, quantile bins can't use more information than is in the original data and they can't use it more directly either.

        Comment


        • #5
          Maybe what Lorien is look for is plotting positions: https://www.stata.com/support/faqs/s...ons/index.html ? You can think of those as the percentage of the sample that has less children than you do. This might be useful when comparing societies that have very different norms on what the "normal" number of children is.

          If you want "to see at what part of the distribution of children changes the coeff", then I would say that that is exactly what the factor variable notation is for. Since this is clearly ordered, it might be clearer to use contrast with the ar. operator for the number of children after the regression. Consider the example below:

          Code:
          . // open example data
          . sysuse nlsw88, clear
          (NLSW, 1988 extract)
          
          .
          . // prepare the data
          .
          . gen byte edcat = cond(grade <  12, 1,     ///
          >                  cond(grade == 12, 2,     ///
          >                  cond(grade <  16, 3,4))) ///
          >                  if !missing(grade)
          (2 missing values generated)
          
          . label variable edcat "respondent's education"
          
          . label define edcat 1 "< highschool"    ///
          >                    2 "highschool"      ///
          >                    3 "some college"    ///
          >                    4 "college"            
          
          . label value edcat edcat
          
          .
          . reg wage i.edcat i.race ttl_exp i.union, base
          
                Source |       SS           df       MS      Number of obs   =     1,876
          -------------+----------------------------------   F(7, 1868)      =    107.55
                 Model |  9365.12892         7  1337.87556   Prob > F        =    0.0000
              Residual |  23236.3456     1,868  12.4391572   R-squared       =    0.2873
          -------------+----------------------------------   Adj R-squared   =    0.2846
                 Total |  32601.4745     1,875  17.3874531   Root MSE        =    3.5269
          
          -------------------------------------------------------------------------------
                   wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          --------------+----------------------------------------------------------------
                  edcat |
          < highschool  |          0  (base)
            highschool  |   .7051334   .2562023     2.75   0.006     .2026605    1.207606
          some college  |   2.147486    .288125     7.45   0.000     1.582405    2.712566
               college  |   4.013855   .2830484    14.18   0.000     3.458731    4.568979
                        |
                   race |
                 white  |          0  (base)
                 black  |  -.8072133   .1895417    -4.26   0.000    -1.178949   -.4354775
                 other  |   .4483814   .7287102     0.62   0.538    -.9807903    1.877553
                        |
                ttl_exp |   .2785421   .0181598    15.34   0.000     .2429266    .3141577
                        |
                  union |
              nonunion  |          0  (base)
                 union  |   1.099521   .1913122     5.75   0.000     .7243131    1.474729
                        |
                  _cons |   2.250475   .2983421     7.54   0.000     1.665356    2.835594
          -------------------------------------------------------------------------------
          
          . contrast ar.edcat
          
          Contrasts of marginal linear predictions
          
          Margins      : asbalanced
          
          -----------------------------------------------------------------
                                        |         df           F        P>F
          ------------------------------+----------------------------------
                                  edcat |
          (highschool vs < highschool)  |          1        7.57     0.0060
          (some college vs highschool)  |          1       42.62     0.0000
             (college vs some college)  |          1       57.22     0.0000
                                 Joint  |          3      102.23     0.0000
                                        |
                            Denominator |       1868
          -----------------------------------------------------------------
          
          -------------------------------------------------------------------------------
                                        |   Contrast   Std. Err.     [95% Conf. Interval]
          ------------------------------+------------------------------------------------
                                  edcat |
          (highschool vs < highschool)  |   .7051334   .2562023      .2026605    1.207606
          (some college vs highschool)  |   1.442352   .2209292      1.009058    1.875646
             (college vs some college)  |   1.866369   .2467259      1.382482    2.350257
          -------------------------------------------------------------------------------
          At the bottom you can see that finishing highschool gets you about 71 cents per hour, entering college gets you an additional 1 dollar and 44 cents, finishing college gets you 1 dollar and 87 cents on top of that.

          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Thanks a lot Maarten, I will give this a try.

            Comment


            • #7
              I'd suggest thinking of ridit scores as calculated (e.g.) by this egen function (part of egenmore from SSC)


              Code:
              . ssc type _gridit.ado
              *! NJC 1.0.0 19 Oct 2000                  
              program define _gridit 
                      version 6.0
                      gettoken type 0 : 0 
                      gettoken g 0 : 0 
                      gettoken eqs 0 : 0 
                      syntax varname [if] [in] [, by(varlist) MISSing REVerse PERCent]
                      marksample touse
                      if "`missing'" == "" & "`by'" != "" { markout `touse' `by', strok } 
                      sort `touse' `by' `varlist' 
                      tempvar total pr    
                      qui by `touse' `by': gen `total' = _N     
                      qui by `touse' `by' `varlist': gen `pr' = _N / `total' 
                      qui by `touse' `by': gen `type' `g' = 0.5 * `pr' if `touse' 
                      qui by `touse' `by' `varlist': replace `pr' = `pr' * (_n == _N) 
                      qui by `touse' `by': replace `g' = `g' + sum(`pr'[_n-1])   
                      if "`reverse'" != "" { replace `g' = 1 - `g' } 
                      if "`percent'" != "" { replace `g' = 100 * `g' } 
              end
              The following notes are based on the help for distplot (Stata Journal)


              The cumulative probability is defined under the -midpoint- option of -distplot- as

              SUM counts in categories below + (1/2) count in this category
              -------------------------------------------------------------
              SUM counts in all categories

              With terminology from Tukey (1977, 496-497), this could be called a "split fraction" below. It is also a "ridit" as defined by Bross (1958);
              see also Fleiss, Levin, and Paik (2003, 198-205), Flora (1988), or Beder and Heim (1990). Yet again, it is also the mid-distribution
              function of Parzen (1993, 3295) and the grade function of Haberman (1996, 240-241). The numerator is a split count. Using this numerator,
              rather than

              SUM counts in categories below

              or

              SUM counts in categories below + count in this category

              means that more use is made of the information in the data. Either alternative would always mean that some probabilities are identically 0
              or 1, which tells us nothing about the data. Also, there are fewer problems in showing the cumulative distribution on any transformed scale
              (e.g., logit) for which the transform of 0 or 1 is not plottable. Using this approach for graded data was suggested by Cox (2001, 2004).

              Aside on the term ridit: The term was originally explained as meaning "relative to an identified distribution", but Bross (1981)
              explained later that the name honored his wife Rida. See also Tannen (2004).

              "Because the rationale for ridit analysis was an acronym ('Relative to an Identified Distribution') plus the productive suffix '-it'
              which denotes a transformation, this may have avoided this confusion. A short and simple name seems to have survival value and to be
              preferred to personal names. Actually, however, ridit analysis was named for my wife, Rida."

              (Irwin Dudley Jackson Bross, 1921-2004; Rida Singer Bross, 1929-2012)



              References

              Beder, J. H., and R. C. Heim. 1990. On the use of ridit analysis. Psychometrika 55: 603-616.

              Bross, I. D. J. 1958. How to use ridit analysis. Biometrics 14: 18-38.

              ------. 1981. This Week's Citation Classic: Bross I D J. How to use ridit analysis. Biometrics 14: 18-38, 1958. Current Contents Life
              Sciences 24: 17. http://garfield.library.upenn.edu/cl...LS07400002.pdf

              Cox, N. J. 2001. Plotting graded data: A Tukey-ish approach. Presentation to UK Stata Users Group meeting, Royal Statistical Society,
              London, 14-15 May. http://www.stata.com/support/meeting/7uk/cox1.pdf.

              ------. 2004. Speaking Stata: Graphing categorical and compositional data. Stata Journal 4: 190-215.

              Fleiss, J. L., B. Levin, and M. C. Paik. 2003. Statistical Methods for Rates and Proportions. 3rd ed. New York: Wiley.

              Flora, J. D. 1988. Ridit analysis. In Encyclopedia of Statistical Sciences, ed. S. Kotz and N. L. Johnson, vol. 8, 136-139. New York:
              Wiley.

              Haberman, S. J. 1996. Advanced Statistics Volume I: Description of Populations. New York: Springer.

              Parzen, E. 1993. Change PP plot and continuous sample quantile function. Communications in Statistics -- Theory and Methods 22: 3287-3304.

              Tannen, T. 2004. Obituary: Irwin D J Bross. Lancet 364: 1212.

              Tukey, J. W. 977. Exploratory Data Analysis. Reading, MA: Addison-Wesley.

              Comment


              • #8
                , the process you mentioned makes sense to me the most. Is there anyway to export contrast output using Outreg2? Or something similar? Thanks

                Comment


                • #9
                  Lorien Nair I suspect some of your answer in #8 got cut off. There were several suggestions made previously, can you tell us which one you wanted to export?
                  ---------------------------------
                  Maarten L. Buis
                  University of Konstanz
                  Department of history and sociology
                  box 40
                  78457 Konstanz
                  Germany
                  http://www.maartenbuis.nl
                  ---------------------------------

                  Comment


                  • #10
                    My apologies, I am not certain how my response got cut before posting.

                    I ran the following regression:

                    Code:
                     
                     reg paidwork i.quantchild5 $controls contrast ar.quantchild
                    This is how I want to go about the analysis. But is there a way to export the output that -contrast generates? Is there an option similar to -outreg2- for -contrast- ?Thanks!

                    Comment


                    • #11
                      Within the contrast command, there is the post option. If you specify that then contast will act like a regular estimation command, and outreg2 should be able to find those estimates.
                      ---------------------------------
                      Maarten L. Buis
                      University of Konstanz
                      Department of history and sociology
                      box 40
                      78457 Konstanz
                      Germany
                      http://www.maartenbuis.nl
                      ---------------------------------

                      Comment

                      Working...
                      X