Using -xtile- to assess the impact of a variable on an outcome

Lorien Nair

Join Date: May 2019

Posts: 115
#1

Using -xtile- to assess the impact of a variable on an outcome

18 Mar 2021, 03:40

Hi All,

I am looking to understand at what number of kids or distribution does a household member go out to work as opposed to devoting time to child care at home. I have variables on labor market outcomes and total number of children in a household.
I used -xtile- to calculate quantile categories 2,3,4,5,6,7 etc for quantiles of children using the following code

Code:

forval i = 2(1)7{ xtile quantchild`i' = tot_child_born, nq(`i') }

This naturally creates quantchild2, quantchild3...with corresponding quantiles of children and households within each quantile. Now I want to look at what point does a woman get out of the house to work to cater to financial need of having x number of children. Would it make sense then to just run a regression with factor notation for quantchild`i'?
For example running the following regression:

Code:

forval i = 2(1)7{ reg paidwork i.quantchild`i' $controls }

This should give me a coefficient for indicators at each level of the distribution no? Implying at each indicator I either get a positive or negative coefficient relating to the point where financial need of the household trades off for childcare needs. I am uncertain if this is the right method, i.e. using -xtile- and -reg- to get at the problem I am describing.

Thanks a lot
Lori
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3458
#2

18 Mar 2021, 06:34

What do you hope to gain from all that xtile stuff that you cannot get by just a single regression with i.tot_child_born?

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Lorien Nair

Join Date: May 2019

Posts: 115
#3

18 Mar 2021, 08:44

I was hoping to use quantiles instead of factor notations to see at what part of the distribution of children changes the coeff.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35709
#4

18 Mar 2021, 08:56

Sounds a good idea, but quantiles can't make that easier. For a start, there can't be more quantile bins than there are distinct values. A discrete variable with (I guess here) moderate skew can't map cleanly to quantile bins.

Otherwise put, quantile bins can't use more information than is in the original data and they can't use it more directly either.
Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3458

18 Mar 2021, 09:21

Maybe what Lorien is look for is plotting positions: https://www.stata.com/support/faqs/s...ons/index.html ? You can think of those as the percentage of the sample that has less children than you do. This might be useful when comparing societies that have very different norms on what the "normal" number of children is.

If you want "to see at what part of the distribution of children changes the coeff", then I would say that that is exactly what the factor variable notation is for. Since this is clearly ordered, it might be clearer to use contrast with the ar. operator for the number of children after the regression. Consider the example below:

Code:

. // open example data
. sysuse nlsw88, clear
(NLSW, 1988 extract)

.
. // prepare the data
.
. gen byte edcat = cond(grade <  12, 1,     ///
>                  cond(grade == 12, 2,     ///
>                  cond(grade <  16, 3,4))) ///
>                  if !missing(grade)
(2 missing values generated)

. label variable edcat "respondent's education"

. label define edcat 1 "< highschool"    ///
>                    2 "highschool"      ///
>                    3 "some college"    ///
>                    4 "college"            

. label value edcat edcat

.
. reg wage i.edcat i.race ttl_exp i.union, base

      Source |       SS           df       MS      Number of obs   =     1,876
-------------+----------------------------------   F(7, 1868)      =    107.55
       Model |  9365.12892         7  1337.87556   Prob > F        =    0.0000
    Residual |  23236.3456     1,868  12.4391572   R-squared       =    0.2873
-------------+----------------------------------   Adj R-squared   =    0.2846
       Total |  32601.4745     1,875  17.3874531   Root MSE        =    3.5269

-------------------------------------------------------------------------------
         wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
        edcat |
< highschool  |          0  (base)
  highschool  |   .7051334   .2562023     2.75   0.006     .2026605    1.207606
some college  |   2.147486    .288125     7.45   0.000     1.582405    2.712566
     college  |   4.013855   .2830484    14.18   0.000     3.458731    4.568979
              |
         race |
       white  |          0  (base)
       black  |  -.8072133   .1895417    -4.26   0.000    -1.178949   -.4354775
       other  |   .4483814   .7287102     0.62   0.538    -.9807903    1.877553
              |
      ttl_exp |   .2785421   .0181598    15.34   0.000     .2429266    .3141577
              |
        union |
    nonunion  |          0  (base)
       union  |   1.099521   .1913122     5.75   0.000     .7243131    1.474729
              |
        _cons |   2.250475   .2983421     7.54   0.000     1.665356    2.835594
-------------------------------------------------------------------------------

. contrast ar.edcat

Contrasts of marginal linear predictions

Margins      : asbalanced

-----------------------------------------------------------------
                              |         df           F        P>F
------------------------------+----------------------------------
                        edcat |
(highschool vs < highschool)  |          1        7.57     0.0060
(some college vs highschool)  |          1       42.62     0.0000
   (college vs some college)  |          1       57.22     0.0000
                       Joint  |          3      102.23     0.0000
                              |
                  Denominator |       1868
-----------------------------------------------------------------

-------------------------------------------------------------------------------
                              |   Contrast   Std. Err.     [95% Conf. Interval]
------------------------------+------------------------------------------------
                        edcat |
(highschool vs < highschool)  |   .7051334   .2562023      .2026605    1.207606
(some college vs highschool)  |   1.442352   .2209292      1.009058    1.875646
   (college vs some college)  |   1.866369   .2467259      1.382482    2.350257
-------------------------------------------------------------------------------

At the bottom you can see that finishing highschool gets you about 71 cents per hour, entering college gets you an additional 1 dollar and 44 cents, finishing college gets you 1 dollar and 87 cents on top of that.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Lorien Nair

Join Date: May 2019

Posts: 115
#6

18 Mar 2021, 10:19

Thanks a lot Maarten, I will give this a try.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35709
#7

18 Mar 2021, 10:36

I'd suggest thinking of ridit scores as calculated (e.g.) by this egen function (part of egenmore from SSC)

Code:

. ssc type _gridit.ado *! NJC 1.0.0 19 Oct 2000 program define _gridit version 6.0 gettoken type 0 : 0 gettoken g 0 : 0 gettoken eqs 0 : 0 syntax varname [if] [in] [, by(varlist) MISSing REVerse PERCent] marksample touse if "`missing'" == "" & "`by'" != "" { markout `touse' `by', strok } sort `touse' `by' `varlist' tempvar total pr qui by `touse' `by': gen `total' = _N qui by `touse' `by' `varlist': gen `pr' = _N / `total' qui by `touse' `by': gen `type' `g' = 0.5 * `pr' if `touse' qui by `touse' `by' `varlist': replace `pr' = `pr' * (_n == _N) qui by `touse' `by': replace `g' = `g' + sum(`pr'[_n-1]) if "`reverse'" != "" { replace `g' = 1 - `g' } if "`percent'" != "" { replace `g' = 100 * `g' } end

The following notes are based on the help for distplot (Stata Journal)

The cumulative probability is defined under the -midpoint- option of -distplot- as

SUM counts in categories below + (1/2) count in this category
-------------------------------------------------------------
SUM counts in all categories

With terminology from Tukey (1977, 496-497), this could be called a "split fraction" below. It is also a "ridit" as defined by Bross (1958);
see also Fleiss, Levin, and Paik (2003, 198-205), Flora (1988), or Beder and Heim (1990). Yet again, it is also the mid-distribution
function of Parzen (1993, 3295) and the grade function of Haberman (1996, 240-241). The numerator is a split count. Using this numerator,
rather than

SUM counts in categories below

or

SUM counts in categories below + count in this category

means that more use is made of the information in the data. Either alternative would always mean that some probabilities are identically 0
or 1, which tells us nothing about the data. Also, there are fewer problems in showing the cumulative distribution on any transformed scale
(e.g., logit) for which the transform of 0 or 1 is not plottable. Using this approach for graded data was suggested by Cox (2001, 2004).

Aside on the term ridit: The term was originally explained as meaning "relative to an identified distribution", but Bross (1981)
explained later that the name honored his wife Rida. See also Tannen (2004).

"Because the rationale for ridit analysis was an acronym ('Relative to an Identified Distribution') plus the productive suffix '-it'
which denotes a transformation, this may have avoided this confusion. A short and simple name seems to have survival value and to be
preferred to personal names. Actually, however, ridit analysis was named for my wife, Rida."

(Irwin Dudley Jackson Bross, 1921-2004; Rida Singer Bross, 1929-2012)

References

Beder, J. H., and R. C. Heim. 1990. On the use of ridit analysis. Psychometrika 55: 603-616.

Bross, I. D. J. 1958. How to use ridit analysis. Biometrics 14: 18-38.

------. 1981. This Week's Citation Classic: Bross I D J. How to use ridit analysis. Biometrics 14: 18-38, 1958. Current Contents Life
Sciences 24: 17. http://garfield.library.upenn.edu/cl...LS07400002.pdf

Cox, N. J. 2001. Plotting graded data: A Tukey-ish approach. Presentation to UK Stata Users Group meeting, Royal Statistical Society,
London, 14-15 May. http://www.stata.com/support/meeting/7uk/cox1.pdf.

------. 2004. Speaking Stata: Graphing categorical and compositional data. Stata Journal 4: 190-215.

Fleiss, J. L., B. Levin, and M. C. Paik. 2003. Statistical Methods for Rates and Proportions. 3rd ed. New York: Wiley.

Flora, J. D. 1988. Ridit analysis. In Encyclopedia of Statistical Sciences, ed. S. Kotz and N. L. Johnson, vol. 8, 136-139. New York:
Wiley.

Haberman, S. J. 1996. Advanced Statistics Volume I: Description of Populations. New York: Springer.

Parzen, E. 1993. Change PP plot and continuous sample quantile function. Communications in Statistics -- Theory and Methods 22: 3287-3304.

Tannen, T. 2004. Obituary: Irwin D J Bross. Lancet 364: 1212.

Tukey, J. W. 977. Exploratory Data Analysis. Reading, MA: Addison-Wesley.
Comment
Lorien Nair

Join Date: May 2019

Posts: 115
#8

22 Mar 2021, 05:55

, the process you mentioned makes sense to me the most. Is there anyway to export contrast output using Outreg2? Or something similar? Thanks
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3458
#9

22 Mar 2021, 08:11

Lorien Nair I suspect some of your answer in #8 got cut off. There were several suggestions made previously, can you tell us which one you wanted to export?

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Lorien Nair

Join Date: May 2019

Posts: 115
#10

22 Mar 2021, 09:43

My apologies, I am not certain how my response got cut before posting.

I ran the following regression:

Code:

reg paidwork i.quantchild5 $controls contrast ar.quantchild

This is how I want to go about the analysis. But is there a way to export the output that -contrast generates? Is there an option similar to -outreg2- for -contrast- ?Thanks!
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3458
#11

22 Mar 2021, 11:55

Within the contrast command, there is the post option. If you specify that then contast will act like a regular estimation command, and outreg2 should be able to find those estimates.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment

Announcement