You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
How about a merge command syntax where we can specify which variable to link between the master and the using instead of requiring both variables to have the same name.
Code:
merge 1:1 patientid == patient_num using "demo_data.dta", keep(master match)
1. I often find myself re-running the same section of code when I'm debugging. If it is a large section of code, this can be inconvenient. Being able to set something like bookmarks and run the code between two bookmarks would be nice.
2. While the eregress estimators are great, if would be nice if we had something similar where possible for fixed effects.
3. xtregar - could it be extended to (i) handle more than one lag on the serial correlation, (ii) present robust standard errors, or (iii) allow for endogeneity?
AFAIK, -predict- still has no option to save (raw) deleted residuals (aka., PRESS residuals) following -regress-. I would like to see that option added. For that matter, an option to report PRESS (the prediction sum of squares) would be nice! At the moment, I think one has to roll their own using methods like those shown in the stackoverflow thread below.
William Lisowski
I like your suggestion for the notational convention. In terms of frequency, effects coding uses negative integers to define levels of factors and in difference-in-differences models that use relative time negative integers are the only useful way to encode the timing information in a way that makes sense (e.g., for positive integers to identify relative time points following the implementation of the intervention and negative integers to identify the periods prior to intervention). I don't think the burden argument is much of an issue since currently there is a burden to develop alternative coding schemes that do not preserve information in the same way instead of using negative integers.
I imagine that if it was easy for Stata developers to allow this they would have already done so. However, the inability of suest to handle the "nonstandard" VCEs produced by margins (as well as by some other commands) somewhat diminishes its value.
Might the v18 developers consider whether suest (or possibly estimates store) could be modified to accommodate results from margins?
This recent paper by Mize, Doan, and Long demonstrates a rather ingenious approach for doing suest with margins (actually, instead of suest, you use gsem):
I know there are user written commands for this, but it would be RIDICULOUSLY helpful for two things on the spatial front: one, to be able to reproject coordinates from a shapefile. Sometimes they're in Mercator projection, and while I don't mind going to Python's geopandas from within Stata, others may not have Python/want to go leaning Python for it.
So reprojection would be awesome. Another thing that would be awesome on a related front is point in polygon merges. Presumably there's already the infrastructure to do (at least the second one) these, but it would be nice to have these extended.
I imagine that if it was easy for Stata developers to allow this they would have already done so. However, the inability of suest to handle the "nonstandard" VCEs produced by margins (as well as by some other commands) somewhat diminishes its value.
Might the v18 developers consider whether suest (or possibly estimates store) could be modified to accommodate results from margins?
It could still be easily addressed with a notational convention like -(1.rep78) conveying the negative value of 1.rep78 while -1.rep78 is used to identify the value of rep78 assigned to -1.
We all have peculiarities in our data that we have to program around, and I'm not in favor of introducing a special case into the construction of expressions.
My preference is to not expand factor variables notation to allow negative values.
If they were to be allowed, I would require that the "i" be explicitly included when selecting a negative value, e.g. i-2.fvar, and to continue to treat -2.fvar as -1 * 2.fvar.
That places the burden on the user taking advantage of the (infrequently needed) capability for negative values to remember that they do not have the convenience of omitting the optional i in those cases. The alternative places the burden on the naive user to avoid the well-camouflaged trap where -2.fvar != -1 * 2.fvar in the (frequently seen) case where fvar takes only nonnegative values.
daniel klein
It could still be easily addressed with a notational convention like -(1.rep78) conveying the negative value of 1.rep78 while -1.rep78 is used to identify the value of rep78 assigned to -1. Then the two conditions could be combined in an unambiguous way -(-1.rep78) would be the negated value of -1.rep78. Forcing all of the values to be strictly >= 0 is unnecessary.
Originally posted by Enrique Pinzon (StataCorp)View Post
Negative values are not allowed with 'i.'. Stata cannot support negative values in factor variables because the expanded list of
indicators varibles for i.fvar
0.fvar
1.fvar
2.fvar
3.fvar
are valid variables you can put in Stata expressions, such as
gen mpg_minus_1rep78 = mpg - 1.rep78
Suppose negative values were allowed. Then
-1.rep78
would be ambiguous, because it might mean (1) the negative of 1.rep78, or mean (2) the indicator for when rep78 takes on the value -1. Since
negative values are not allowed, the meaning is unambiguously (1).
I can't believe I forgot this previously, but go back to allowing factor variables to take negative values. Since the factor variable notation is only creating indicators for the distinct values the assumption of values >= 0 is unnecessary. There are quite a few use cases where negative integers actually preserve the ordered nature of values and support natural meanings. For example, in the education sector coding pre-school (-1), kindergarten (0), and the other grades as positive integers preserves both the order in which the grade levels take place over the lifespan and allows for natural mapping of meanings to the majority of the coding (e.g., 1 = 1st grade, 2 = 2nd grade, etc...). I imagine that one potential reason for this is related to identifying base/reference levels, but it seems like it should be easy to add some notational convention to distinguish between positive and negative values that could be used in the factor variable notation to appropriately identify the desired reference level.
Leave a comment: