Dear Statalist users,
Some time ago, when I was about to graduate from university, I attended a virtual course on working with household survey data. In that course, the instructor shared some code using the `rangestat` command, but unfortunately, it wasn't explained in detail.
What confused me is that `rangestat` was used in a way that seemed equivalent to the `match()` function in R, or the `MATCH()` function in Excel. I have read the documentation for `rangestat`, and I can't figure out how it is supposed to replicate that kind of functionality. I'm not sure if I misunderstood something or if there is a specific use case that allows for this behavior.
For context, here is what each variable represents:
- `DIRECTORIO`: unique dwelling identifier (each physical housing unit).
- `SECUENCIA_P`: household number within the dwelling (in some cases, more than one household lives in the same housing unit).
- `ORDEN`: the person number or ID within the household (e.g., the household head is usually 1).
- `P6040`: the age of the person.
- `P1134S1A1`: the `ORDEN` number of the household member that this person helps to dress (i.e., caregiving relationship).
In other words, for each person in the dataset, `P1134S1A1` refers to another member of the *same household*, and the goal is to retrieve that person’s age (`P6040`) and assign it to a new variable.
Below, I provide a minimal example of my dataset and the code used, so I can better understand how `rangestat` is functioning in this context.
```stata
* Example generated by -dataex-. To install: ssc install dataex
clear
input long DIRECTORIO byte(SECUENCIA_P ORDEN) int P6040 byte P1134S1A1
2915602 1 1 23 .
2915602 1 2 26 4
2915602 1 3 8 .
2915602 1 4 1 .
2915602 1 5 57 .
2915602 1 6 22 .
2915603 1 1 44 .
2915603 1 2 21 .
2915604 1 1 51 .
2915604 2 1 20 .
2915604 2 2 18 .
2915604 2 3 12 .
2915605 1 1 24 2
2915605 1 2 5 .
2915605 1 3 2 .
2915606 1 1 42 .
2915606 1 2 28 .
2915606 1 3 12 .
2915607 1 1 34 .
2915607 1 2 33 .
2915608 1 1 67 .
2915608 1 2 61 .
2915608 1 3 31 .
2915608 1 4 8 .
2915609 1 1 65 .
2915609 1 2 53 .
2915609 1 3 19 .
2915610 1 1 79 .
2915610 1 2 81 .
2915610 1 3 13 .
2915611 1 1 38 .
2915611 1 2 14 .
2915611 1 3 7 .
2915611 1 4 75 .
2915611 1 5 23 .
2915612 1 1 46 .
2915612 1 2 43 .
2915612 1 3 19 .
2915612 1 4 14 .
2915614 1 1 36 .
2915614 1 2 31 .
2915614 1 3 11 .
2915614 1 4 3 .
2915615 1 1 64 .
2915615 1 2 35 .
2915615 1 3 30 .
2915615 1 4 33 5
2915615 1 5 4 .
2915616 1 1 31 .
2915616 1 2 29 .
2915616 1 3 11 .
2915618 1 1 42 .
2915618 1 2 20 .
2915618 1 3 15 .
2915618 1 4 20 5
2915618 1 5 1 .
2915619 1 1 52 .
2915619 1 2 34 .
2915619 1 3 31 .
2915619 1 4 2 .
2915620 1 1 54 .
2915620 1 2 78 .
2915621 1 1 36 .
2915621 1 2 35 .
2915622 1 1 28 4
2915622 1 2 31 .
2915622 1 3 5 .
2915622 1 4 0 .
2915622 1 5 46 .
end
clonevar lookup_P1134S1A1 = P1134S1A1
replace lookup_P1134S1A1 = 0 if missing(P1134S1A1)
rangestat vestir1 = P6040, by(DIRECTORIO SECUENCIA_P) int(ORDEN lookup_P1134S1A1 lookup_P1134S1A1)
Some time ago, when I was about to graduate from university, I attended a virtual course on working with household survey data. In that course, the instructor shared some code using the `rangestat` command, but unfortunately, it wasn't explained in detail.
What confused me is that `rangestat` was used in a way that seemed equivalent to the `match()` function in R, or the `MATCH()` function in Excel. I have read the documentation for `rangestat`, and I can't figure out how it is supposed to replicate that kind of functionality. I'm not sure if I misunderstood something or if there is a specific use case that allows for this behavior.
For context, here is what each variable represents:
- `DIRECTORIO`: unique dwelling identifier (each physical housing unit).
- `SECUENCIA_P`: household number within the dwelling (in some cases, more than one household lives in the same housing unit).
- `ORDEN`: the person number or ID within the household (e.g., the household head is usually 1).
- `P6040`: the age of the person.
- `P1134S1A1`: the `ORDEN` number of the household member that this person helps to dress (i.e., caregiving relationship).
In other words, for each person in the dataset, `P1134S1A1` refers to another member of the *same household*, and the goal is to retrieve that person’s age (`P6040`) and assign it to a new variable.
Below, I provide a minimal example of my dataset and the code used, so I can better understand how `rangestat` is functioning in this context.
```stata
* Example generated by -dataex-. To install: ssc install dataex
clear
input long DIRECTORIO byte(SECUENCIA_P ORDEN) int P6040 byte P1134S1A1
2915602 1 1 23 .
2915602 1 2 26 4
2915602 1 3 8 .
2915602 1 4 1 .
2915602 1 5 57 .
2915602 1 6 22 .
2915603 1 1 44 .
2915603 1 2 21 .
2915604 1 1 51 .
2915604 2 1 20 .
2915604 2 2 18 .
2915604 2 3 12 .
2915605 1 1 24 2
2915605 1 2 5 .
2915605 1 3 2 .
2915606 1 1 42 .
2915606 1 2 28 .
2915606 1 3 12 .
2915607 1 1 34 .
2915607 1 2 33 .
2915608 1 1 67 .
2915608 1 2 61 .
2915608 1 3 31 .
2915608 1 4 8 .
2915609 1 1 65 .
2915609 1 2 53 .
2915609 1 3 19 .
2915610 1 1 79 .
2915610 1 2 81 .
2915610 1 3 13 .
2915611 1 1 38 .
2915611 1 2 14 .
2915611 1 3 7 .
2915611 1 4 75 .
2915611 1 5 23 .
2915612 1 1 46 .
2915612 1 2 43 .
2915612 1 3 19 .
2915612 1 4 14 .
2915614 1 1 36 .
2915614 1 2 31 .
2915614 1 3 11 .
2915614 1 4 3 .
2915615 1 1 64 .
2915615 1 2 35 .
2915615 1 3 30 .
2915615 1 4 33 5
2915615 1 5 4 .
2915616 1 1 31 .
2915616 1 2 29 .
2915616 1 3 11 .
2915618 1 1 42 .
2915618 1 2 20 .
2915618 1 3 15 .
2915618 1 4 20 5
2915618 1 5 1 .
2915619 1 1 52 .
2915619 1 2 34 .
2915619 1 3 31 .
2915619 1 4 2 .
2915620 1 1 54 .
2915620 1 2 78 .
2915621 1 1 36 .
2915621 1 2 35 .
2915622 1 1 28 4
2915622 1 2 31 .
2915622 1 3 5 .
2915622 1 4 0 .
2915622 1 5 46 .
end
clonevar lookup_P1134S1A1 = P1134S1A1
replace lookup_P1134S1A1 = 0 if missing(P1134S1A1)
rangestat vestir1 = P6040, by(DIRECTORIO SECUENCIA_P) int(ORDEN lookup_P1134S1A1 lookup_P1134S1A1)
Comment