local polynomial smoothed graph

Mukesh Punia

Join Date: May 2020
Posts: 90

local polynomial smoothed graph

20 Jun 2025, 01:44

I am trying to create a kernel-weighted local polynomial smoothed values graph.

1.) At the mean value of y at each x by group

An example graph is added below the dataex

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float y byte(x group)
-1.48  1 1
  .04  8 2
  .04  8 2
 -.92  9 1
-1.66  1 1
  .78  4 2
  1.3  5 2
  1.9  3 2
-2.14  4 1
-2.21  4 1
-1.71  8 1
 -.79  8 1
  .39  7 1
  .24  4 1
  .31  7 2
 2.88  6 2
  .53  3 2
 -.89  2 2
-1.57  3 1
-1.28  3 1
 -.39  4 1
 -.53  9 1
-1.13  1 1
-1.78  3 2
  .32  4 2
 -.07  5 2
-1.56  2 2
  .03  5 2
 -.69  5 1
 1.74 10 1
 1.39  6 1
  .78  7 1
  .51  1 1
  .53 10 1
 -.32 10 1
-1.28  9 2
-1.82  3 2
  .72  5 2
  .28 10 2
 -.72  5 2
 -.93  2 1
 3.97  1 1
-4.17  4 1
  .85  3 2
 -.47  5 1
-1.38  3 1
-1.63  2 1
 -1.2  8 1
 -.79  3 1
 -.23 10 1
 1.29  3 2
 1.94  6 1
  .12  3 1
 -.56  3 1
  .71  1 1
  .76  8 1
   .9  5 1
 -.43  4 1
 1.63  3 1
 1.19  7 1
 5.11  5 1
 1.22  9 1
   .8  2 1
 -.75  7 2
 -1.3  3 2
-1.77  5 1
 -.75  3 1
 -.29 10 1
-3.15  2 1
 -.63 10 2
 -.85  5 1
  .26  2 1
 -.67  2 1
 1.62  2 1
 -.92  4 1
 1.83  6 1
 -.61  7 1
 -.15  4 1
-1.42 10 1
  .02  9 1
  .22  5 1
-1.17 10 1
 -.63  6 1
 -.12  3 1
   .6  8 1
  .37  9 1
 -.04 10 2
 1.01  2 1
 1.35  7 1
-1.93  3 1
-2.21  9 1
 -.56  2 1
 1.52  2 1
 1.05  2 1
  .27  7 2
-1.46  3 1
  .48  6 1
  .46  1 2
-2.89  3 1
  -.7  9 1
  .82  8 1
 1.05  8 1
 1.29  9 1
 1.26  6 2
 1.92  1 1
 1.58  6 1
 1.39  3 1
 -.04  1 1
 1.66  7 1
 1.93  9 1
   -1  3 1
 -.21  3 1
  .18  3 1
 2.31  4 1
 2.36 10 1
  1.6  2 1
 -.09 10 1
 -.28  2 2
 -.17  1 1
  .67  7 2
 -2.2  5 1
 -.48  5 1
 1.35  9 2
 3.25  9 2
 1.85  5 1
  .34  6 1
  .16  6 1
  .09  8 1
-1.87  7 1
  .66  8 1
 1.24  6 1
  .31  3 1
  .53  7 2
  .69  6 1
 -.66  7 1
  .03  3 1
 1.45  6 1
-1.05 10 1
-2.14  4 1
  .38  4 2
 -.48  8 1
 -.43  3 1
-2.01  7 2
-2.19 10 1
 -2.9  3 1
  .07  7 2
 1.73  8 1
  -.6  7 1
 -.38  9 1
  .91  4 2
-2.28  7 1
   .3  8 1
  .51  8 1
-1.57  2 1
-1.07  1 1
  .83  9 1
-3.01  3 1
  .54 10 1
-1.44  2 1
  .58  9 1
 -.24  6 1
 -.36  5 1
 -.94  5 1
 2.24  7 1
 -.65  7 1
 -.02  1 1
-2.35  1 1
-1.81  8 1
  1.1  8 1
  .13  4 1
 -.26 10 1
 -.03  2 1
  -.3  8 1
 -.22  9 1
 2.25  6 2
 -1.2  8 2
  .44 10 2
  .83  7 1
 -.66  7 2
 -.78  9 1
-1.21  5 1
-1.59  5 1
 1.09  4 1
 1.48 10 1
-2.96  1 1
 -.06  3 1
  .06  2 1
  .27  6 1
-2.24  1 1
 -.81  4 2
 -.54  7 1
 -.77  7 1
 -1.1  4 1
-1.52  9 1
-2.87  5 1
-1.54 10 1
 -.81 10 1
-1.64  9 1
  .73 10 1
 1.17  4 1
end

Click image for larger version

Name: Screenshot 2025-06-20 130944.png
Views: 1
Size: 72.0 KB
ID: 1779029

Best regards,
Mukesh

Tags: figure, graph, polynomial graph, smooth graph

Felix Bittmann

Join Date: Aug 2018

Posts: 690
#2

20 Jun 2025, 02:38

Code:

twoway (lpoly y x if group == 1) (lpoly y x if group == 2)

If you want the raw values on top, you can also add a scatterplot with

Code:

scatter y x if group == 1

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
1 like
Comment
Mukesh Punia

Join Date: May 2020

Posts: 90
#3

20 Jun 2025, 02:43

Thank Felix Bittmann for your response.
I want at mean value of y by x

Best regards,
Mukesh
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35677
#4

20 Jun 2025, 02:59

So, calculate the means first using egen or collapse and then fire up a smoother.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 690
#5

20 Jun 2025, 03:31

Following Nick Cox 's suggestion, the command then becomes:

Code:

collapse (mean) y, by(x group) twoway (lpoly y x if group == 1) (lpoly y x if group == 2)

Keep in mind that lpoly also has the degree option.

HTML Code:

degree(#) specifies the degree of the polynomial to be used in the smoothing. The default is degree(0), meaning local-mean smoothing.

I was not sure if your initial request was about this type of smoothing or the explicit version, just shown.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
1 like
Comment
Mukesh Punia

Join Date: May 2020

Posts: 90
#6

20 Jun 2025, 04:14

This statement related to #1:- First, we computed the mean of y by x and graphed the variable and its smoothed values (using the kernel-weighted local polynomial smoothing algorithm) by x.

If I understood properly, the statement implies:

Code:

collapse (mean) y, by(x group)

Code:

twoway (lpoly y x if group == 1) (lpoly y x if group == 2) (scatter y x if group == 2) (scatter y x if group == 1)

Dear Felix Bittmann are there any specific ways/rules to decide the degree or bw?. The study mentioned in #1 mentioned nothing about degree/bw.

I am asking because my real data has 200000 observations on y with range (-6,6) and 60 (x discrete values). Approx. 30% of y are below (-2).

Greatly thankful for the responses so far.

Last edited by Mukesh Punia; 20 Jun 2025, 04:20.

Best regards,
Mukesh
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 690
#7

20 Jun 2025, 05:21

I dont think there is an easy answer to your questions. The Stata manual provides some guidance and references: https://www.stata.com/manuals13/rlpoly.pdf
Personally, I would play around with the options and see how this changes your graph. As long as you report transparently what you are doing you should be fine.

Last edited by Felix Bittmann; 20 Jun 2025, 05:53.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
1 like
Comment
Mukesh Punia

Join Date: May 2020

Posts: 90
#8

20 Jun 2025, 05:35

Dear Felix Bittmann your responses are greatly appreciated!

I was looking at your paper on BMI & happiness. Find it interesting conceptually & methodologically, the robustness part.
I am trying to establish relationships between childhood (mal)nutrition and well-being in adolescence in LMICs. About to finish my PhD. May I send you an e-mail if needed, for help/discussion or for collab?
Thank you

Last edited by Mukesh Punia; 20 Jun 2025, 05:37.

Best regards,
Mukesh
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35677
#9

20 Jun 2025, 05:40

To me. reduction to means before smoothing adds an extra and arbitrary step and raises a variety of issues, such as: why not medians, or trimmed means; or how many observations go into each mean (or other summary) and whether that should be taken into account.

More generally, whatever works well for your data and purpose can well be the prime consideration, but there are others, such as whether you are trying a smoother curve that is a weighted moving average or one that is a local linear regression.

You have given us no information that I can see on what your variables are. Sometimes that does guide what you're seeking, depending on the nature of the generating process, and whether you expect kinks or even jumps in the relationship.

Last edited by Nick Cox; 20 Jun 2025, 06:38.
Comment
Mukesh Punia

Join Date: May 2020

Posts: 90
#10

20 Jun 2025, 06:00

Dear Nick Cox in #6 I described my variable & n. More clearly y is standardised height-for-age z-score of children under-five age as per WHO 2006 growth standards. X is age in months completed (0-59). If you wish to see the graph I will post after some time, I am bit away from my PC. I am doing this because Jeff Leroy at IFPRI in his two paper in 2014-15 used height-for-age deficiency (HAD) metric to assess how at different ages stunning cumulates. Sharing full data here is I think not possible because of 2000000 observations.

Thank you - Mukesh

Best regards,
Mukesh
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35677
#11

20 Jun 2025, 06:06

You'd need a subject-matter expert to say more. I'd expect that while individuals may have slightly irregular growth curves, that would be averaged out over such a large sample.
Comment
Mukesh Punia

Join Date: May 2020

Posts: 90
#12

20 Jun 2025, 10:38

True!
Based on 161160 observations over 59 months and 5 wealth index quintiles. graph is here for highest v/s lowest quintiles:

Code:

collapse (mean) HAZ, by(age wquintile)

Code:

twoway (lpoly HAZ age if wquintile== 1) (lpoly HAZ age if wquintile == 5) (scatter HAZ age if wquintile == 1) (scatter HAZ age if wquintile == 5)

with other default options

Best regards,
Mukesh
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35677
#13

20 Jun 2025, 10:50

This children are presumably born at different times of year, so what's the story there?
Comment
Mukesh Punia

Join Date: May 2020

Posts: 90
#14

21 Jun 2025, 02:21

Thank you Dear Nick for an important question. I will post a detailed comment in coming days. I think your concern is seasonality.
A quick response is that children are of the same age, i.e. 3 months old, but are from two (lowest & highest) economic status.

Best regards,
Mukesh
Comment

Announcement

local polynomial smoothed graph

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment