rowmin

Fadzai Chikwava

Join Date: Jul 2016

Posts: 91
#1

rowmin

14 Jan 2021, 20:25

I would like to create a new variable which takes the minimum value in each row, (ignoring the zeros). How do you do this?
I tried this below and it didnt work

foreach var of varlist ethos_1rough ethos_2emerg ethos_3temp ethos_5immigrant {
egen higher_ethos = rowmin(`var') if `var' ~=0
}

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(ethos_1rough ethos_2emerg ethos_3temp ethos_5immigrant)
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 2 3 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 3 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 3 0
0 0 3 0
0 0 3 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
1 0 0 0
1 0 0 0
1 0 0 0
0 0 0 0
end
[/CODE]
Tags: None
Raymond Zhang

Join Date: Jan 2021

Posts: 349
#2

14 Jan 2021, 21:10

maybe you can try:

Code:

egen ethos_5immigrant=rowmin(ethos_1rough ethos_2emerg ethos_3temp ) if ethos_1rough!=0 | ethos_2emerg!=0 | ethos_3temp!=0

Best regards.

Raymond Zhang
Stata 17.0,MP
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#3

14 Jan 2021, 21:32

The code in #2 can't be correct because the variable ethos_5immigrant already exists. #1 requests a new variable higher_ethos.

But if I understand #1 correctly, even making that change will not give the right result, because I believe what is wanted is the minimum of the non-zero values, and the code in #2 will return 0 if an observation contains any zeroes and at least one non-zero. I recommend:

Code:

gen higher_ethos = . foreach v of varlist ethos_1rough ethos_2emerg ethos_3temp ethos_5immigrant { replace higher_ethos = `v' if `v' != 0 & `v' < higher_ethos }

This returns missing value for any observation where all of the ethos_* variables are zero, and the lowest non-zero value if there are any such.

Another approach would be to recode the ethos_* variables, replacing all 0 by missings, and then using -egen, rowmin()- This might be simpler to code, but it has the draw back of making unnecessary side-effect changes to the data. Whether that side effect is undesirable, only O.P. could say.
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

14 Jan 2021, 22:39

I think the last suggestion of Clyde is the easiest, and the recoding to missing can be undone if undesirable:

Code:

. recode ethos* (0 = .)
(ethos_1rough: 48 changes made)
(ethos_2emerg: 50 changes made)
(ethos_3temp: 46 changes made)
(ethos_5immigrant: 51 changes made)

. egen min = rowmin(ethos*)
(43 missing values generated)

. recode ethos* (. = 0)
(ethos_1rough: 48 changes made)
(ethos_2emerg: 50 changes made)
(ethos_3temp: 46 changes made)
(ethos_5immigrant: 51 changes made)

Comment

Fadzai Chikwava

Join Date: Jul 2016

Posts: 91
#5

14 Jan 2021, 23:37

Thanks very much for your help
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#6

15 Jan 2021, 06:11

a general caveat on the solution posted in #4 - if the full data set has missing values as well as 0's, this will change those missing values to 0 and that may not be what is desired
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

15 Jan 2021, 07:35

More for amusement or bemusement than as a serious suggestion. The trick is that the reciprocal of 0 will be returned as missing and that max() will ignore missings to the extent possible.

Code:

. gen wanted = 1/max(1/ethos_1rough, 1/ethos_2emerg, 1/ethos_3temp, 1/ethos_5immigrant) 
(43 missing values generated)

* -groups- is from the Stata Journal 
. groups ethos* wanted, missing

  +----------------------------------------------------------------------+
  | ethos_~h   ethos_~g   ethos_~p   ethos_~t   wanted   Freq.   Percent |
  |----------------------------------------------------------------------|
  |        0          0          0          0        .      43     84.31 |
  |        0          0          3          0        3       4      7.84 |
  |        0          2          3          0        2       1      1.96 |
  |        1          0          0          0        1       3      5.88 |
  +----------------------------------------------------------------------+

Comment

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

15 Jan 2021, 08:52

Indeed, if there are missings to start with, setting 0 to missing would not be a good idea.

If there are missings, one can set the 0s to some number which is larger than any of the numbers over which the minimum is computed, e.g., c(maxfloat).

Originally posted by Rich Goldstein View Post

a general caveat on the solution posted in #4 - if the full data set has missing values as well as 0's, this will change those missing values to 0 and that may not be what is desired
Comment

Fadzai Chikwava

Join Date: Jul 2016
Posts: 91

17 Jan 2021, 23:48

As a follow up to the above, mayyou assist to create 4 variables (first 2 and last 2 are similar): :
1. Minimum score (as above) but for the variable in column

"highest ethos"

for each individual (personid) within 90 days based on "date_housing".
2.

Minimum score (as above) but for the variable in column

"highest ethos"

for each individual (personid) within 365 days based on "date_housing".

3. Most frequently occurring score

for the variable in column

"highest ethos" for each individual (personid) within 90 days based on "date_housing".
4. Most frequently occurring score

for the variable in column

"highest ethos" for each individual (personid) within 365 days based on "date_housing".

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double personid float(date_housing highest_ethos)
2337 19690  .
2337 19701  .
2337 20106  .
2337 20214  .
2337 20396  3
2337 20412  3
2337 20507 11
2337 20584  .
2337 20597 10
2337 20618  .
2337 20674  .
2337 20759  .
2337 20842  .
2337 20856 10
2337 20859  .
2337 20877  .
2337 20888  9
2337 20893 10
2337 20905  8
2337 20907  8
2337 20914  .
2337 20920  8
2337 20922  .
2337 20927  .
2337 20930  .
2337 20932  .
2337 20943  8
2337 20962  8
2337 21028  .
2337 21094  .
2337 21128  .
2337 21145  .
2337 21178  .
2337 21189  .
2337 21215  .
2337     .  .
3172 19548  .
3172 19654  .
3172 19758  3
3172 19847  .
3172 20062  .
3172 20328  3
3172 20348  3
3172 20370  3
3172 20445  3
3172 20473  .
3172 20482  3
3172 20489  9
3172 20499  .
3172 20565  3
end
format %d date_housing

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#10

18 Jan 2021, 19:51

Your request is incomplete and unclear in several ways. Does "within 90 days" mean:
Between 90 days before and 1 day before date_housing

Between 89 days before date_housing and date_housing

Between date_housing and 89 days after

Between the day after date_housing and 90 days from date_housing

Between 45 days before and 44 days after date_housing

Between 89 days before and 89 days after date_housing

Some other range including or bordering on date_housing that has something to do with 90 days?

And is this to be done separately for each personid, or for the data set as an entirety?

On the assumption that it is separately per person id and 2 from that list:

Code:

rangestat (min) wanted1 = highest_ethos, by(personid) interval(date_housing -89 0) rangestat (min) wanted2 = highest_ethos, by(personid) interval(date_housing -364 0)

You can adjust the options in these commands to correspond to your responses to these questions.

-rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC

The third and fourth entail yet more underspecification: it is possible, even likely, that there will be ties for "most frequent" value. What rule would you apply to choose among such tied values.

The code for your third and fourth variables is more complicated than that for the first two, so I await your clarifications before responding to that part of your question.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment