Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • rowmin

    I would like to create a new variable which takes the minimum value in each row, (ignoring the zeros). How do you do this?
    I tried this below and it didnt work

    foreach var of varlist ethos_1rough ethos_2emerg ethos_3temp ethos_5immigrant {
    egen higher_ethos = rowmin(`var') if `var' ~=0
    }


    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(ethos_1rough ethos_2emerg ethos_3temp ethos_5immigrant)
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 2 3 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 3 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 3 0
    0 0 3 0
    0 0 3 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    0 0 0 0
    1 0 0 0
    1 0 0 0
    1 0 0 0
    0 0 0 0
    end
    [/CODE]

  • #2
    maybe you can try:
    Code:
    egen ethos_5immigrant=rowmin(ethos_1rough ethos_2emerg ethos_3temp ) if ethos_1rough!=0 | ethos_2emerg!=0 | ethos_3temp!=0
    Best regards.

    Raymond Zhang
    Stata 17.0,MP

    Comment


    • #3
      The code in #2 can't be correct because the variable ethos_5immigrant already exists. #1 requests a new variable higher_ethos.

      But if I understand #1 correctly, even making that change will not give the right result, because I believe what is wanted is the minimum of the non-zero values, and the code in #2 will return 0 if an observation contains any zeroes and at least one non-zero. I recommend:

      Code:
      gen higher_ethos = .
      foreach v of varlist ethos_1rough ethos_2emerg ethos_3temp ethos_5immigrant {
          replace higher_ethos = `v' if `v' != 0 & `v' < higher_ethos
      }
      This returns missing value for any observation where all of the ethos_* variables are zero, and the lowest non-zero value if there are any such.

      Another approach would be to recode the ethos_* variables, replacing all 0 by missings, and then using -egen, rowmin()- This might be simpler to code, but it has the draw back of making unnecessary side-effect changes to the data. Whether that side effect is undesirable, only O.P. could say.

      Comment


      • #4
        I think the last suggestion of Clyde is the easiest, and the recoding to missing can be undone if undesirable:

        Code:
        . recode ethos* (0 = .)
        (ethos_1rough: 48 changes made)
        (ethos_2emerg: 50 changes made)
        (ethos_3temp: 46 changes made)
        (ethos_5immigrant: 51 changes made)
        
        . egen min = rowmin(ethos*)
        (43 missing values generated)
        
        . recode ethos* (. = 0)
        (ethos_1rough: 48 changes made)
        (ethos_2emerg: 50 changes made)
        (ethos_3temp: 46 changes made)
        (ethos_5immigrant: 51 changes made)

        Comment


        • #5
          Thanks very much for your help

          Comment


          • #6
            a general caveat on the solution posted in #4 - if the full data set has missing values as well as 0's, this will change those missing values to 0 and that may not be what is desired

            Comment


            • #7
              More for amusement or bemusement than as a serious suggestion. The trick is that the reciprocal of 0 will be returned as missing and that max() will ignore missings to the extent possible.


              Code:
              . gen wanted = 1/max(1/ethos_1rough, 1/ethos_2emerg, 1/ethos_3temp, 1/ethos_5immigrant) 
              (43 missing values generated)
              
              * -groups- is from the Stata Journal 
              . groups ethos* wanted, missing
              
                +----------------------------------------------------------------------+
                | ethos_~h   ethos_~g   ethos_~p   ethos_~t   wanted   Freq.   Percent |
                |----------------------------------------------------------------------|
                |        0          0          0          0        .      43     84.31 |
                |        0          0          3          0        3       4      7.84 |
                |        0          2          3          0        2       1      1.96 |
                |        1          0          0          0        1       3      5.88 |
                +----------------------------------------------------------------------+

              Comment


              • #8
                Indeed, if there are missings to start with, setting 0 to missing would not be a good idea.

                If there are missings, one can set the 0s to some number which is larger than any of the numbers over which the minimum is computed, e.g., c(maxfloat).


                Originally posted by Rich Goldstein View Post
                a general caveat on the solution posted in #4 - if the full data set has missing values as well as 0's, this will change those missing values to 0 and that may not be what is desired

                Comment


                • #9
                  As a follow up to the above, mayyou assist to create 4 variables (first 2 and last 2 are similar): :
                  1. Minimum score (as above) but for the variable in column
                  "highest ethos"
                  for each individual (personid) within 90 days based on "date_housing".
                  2.
                  Minimum score (as above) but for the variable in column
                  "highest ethos"
                  for each individual (personid) within 365 days based on "date_housing".

                  3. Most frequently occurring score
                  for the variable in column
                  "highest ethos" for each individual (personid) within 90 days based on "date_housing".
                  4. Most frequently occurring score
                  for the variable in column
                  "highest ethos" for each individual (personid) within 365 days based on "date_housing".


                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input double personid float(date_housing highest_ethos)
                  2337 19690  .
                  2337 19701  .
                  2337 20106  .
                  2337 20214  .
                  2337 20396  3
                  2337 20412  3
                  2337 20507 11
                  2337 20584  .
                  2337 20597 10
                  2337 20618  .
                  2337 20674  .
                  2337 20759  .
                  2337 20842  .
                  2337 20856 10
                  2337 20859  .
                  2337 20877  .
                  2337 20888  9
                  2337 20893 10
                  2337 20905  8
                  2337 20907  8
                  2337 20914  .
                  2337 20920  8
                  2337 20922  .
                  2337 20927  .
                  2337 20930  .
                  2337 20932  .
                  2337 20943  8
                  2337 20962  8
                  2337 21028  .
                  2337 21094  .
                  2337 21128  .
                  2337 21145  .
                  2337 21178  .
                  2337 21189  .
                  2337 21215  .
                  2337     .  .
                  3172 19548  .
                  3172 19654  .
                  3172 19758  3
                  3172 19847  .
                  3172 20062  .
                  3172 20328  3
                  3172 20348  3
                  3172 20370  3
                  3172 20445  3
                  3172 20473  .
                  3172 20482  3
                  3172 20489  9
                  3172 20499  .
                  3172 20565  3
                  end
                  format %d date_housing

                  Comment


                  • #10
                    Your request is incomplete and unclear in several ways. Does "within 90 days" mean:
                    1. Between 90 days before and 1 day before date_housing
                    2. Between 89 days before date_housing and date_housing
                    3. Between date_housing and 89 days after
                    4. Between the day after date_housing and 90 days from date_housing
                    5. Between 45 days before and 44 days after date_housing
                    6. Between 89 days before and 89 days after date_housing
                    7. Some other range including or bordering on date_housing that has something to do with 90 days?
                    And is this to be done separately for each personid, or for the data set as an entirety?

                    On the assumption that it is separately per person id and 2 from that list:
                    Code:
                    rangestat (min) wanted1 = highest_ethos, by(personid) interval(date_housing -89 0)
                    rangestat (min) wanted2 = highest_ethos, by(personid) interval(date_housing -364 0)
                    You can adjust the options in these commands to correspond to your responses to these questions.

                    -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC

                    The third and fourth entail yet more underspecification: it is possible, even likely, that there will be ties for "most frequent" value. What rule would you apply to choose among such tied values.

                    The code for your third and fourth variables is more complicated than that for the first two, so I await your clarifications before responding to that part of your question.

                    Comment

                    Working...
                    X