Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trying to isolate stubs using -unab- and -subinstr- leads to unexpected result

    Hello, I am sorry if the title is unclear.

    I have an important number of country-specific datasets organized in wide, but I would like 1) to reshape them from wide to long 2) to append every dataset together, but this is not the subject of this thread.

    Each country has a specific number of rounds that can vary from a country to another, which is why I must reshape them individually using a loop and then appending everything instead of reshaping the appended dataset.

    Since I have many different stubs to use, I am following the article written by Nicholas J. Cox : https://www.stata.com/support/faqs/d...-with-reshape/

    However there is something wrong in my code that doesn't leadme to the expected result. Please have a look at my data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long idstd int(COVd2_r1 COVd2_r2 COVd2_r3) byte(COVb2a_r1 COVb2a_r2 COVb2a_r3)
    674431   .  .   . . .  .
    674432  50 50  -9 2 3 -9
    674433   6  7   8 3 2  2
    674434   0  1   1 3 3  3
    674435   7  .   7 3 .  2
    674436   1  1   . 3 3  .
    674437   .  .   1 . .  2
    674438  15 12   . 3 3  .
    674439   0  1   0 1 1  2
    674440   .  9   . . 1  .
    674441   .  1   1 . 3  3
    674442 123 65 120 3 3  1
    674443 240  .   . 3 .  .
    674444  50 20   . 3 3  .
    674445   2  2   . 2 3  .
    674446   . 11  14 . 2  2
    674447  16 17   . 3 3  .
    674448   1  .   . 3 .  .
    674449   . 15   . . 3  .
    674450   8  6   8 3 3  3
    end
    label values idstd IDSTD
    label values COVd2_r1 COVD2
    label values COVd2_r2 COVD2
    label values COVd2_r3 COVD2
    label values COVb2a_r1 COVB2A
    label values COVb2a_r2 COVB2A
    label values COVb2a_r3 COVB2A

    Let's suppose I would like to have two variables named "COVd2" and "COVb2a" with 3 observations per value of idstd, a unique identifier, each of these observations denoting the rounds _r1 _r2 _r3. I have many other stubs but not all of them start with "COV".

    I used the commands

    Code:
    unab vars_r: *_r?
    global stubs_r: subinstr local vars_r "_r`?'" "", all
    However when using the command -macro list- to check if the globals are correct I get the following outputs:

    Code:
    macro list
    stubs_r:   COVd21 COVb2a1 COVd22 COVb2a2
    which is not what I want since these are not stubs I can use to reshape my data. I suspect this has to do with the "?" in my code but I don't know how to fix it in a way that the global stubs_r would have just COVd2 and COVb2a in store.

    Does anyone know how I could fix this issue?

    Thanks a lot for the help!




  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long idstd int(COVd2_r1 COVd2_r2 COVd2_r3) byte(COVb2a_r1 COVb2a_r2 COVb2a_r3)
    674431   .  .   . . .  .
    674432  50 50  -9 2 3 -9
    674433   6  7   8 3 2  2
    674434   0  1   1 3 3  3
    674435   7  .   7 3 .  2
    674436   1  1   . 3 3  .
    674437   .  .   1 . .  2
    674438  15 12   . 3 3  .
    674439   0  1   0 1 1  2
    674440   .  9   . . 1  .
    674441   .  1   1 . 3  3
    674442 123 65 120 3 3  1
    674443 240  .   . 3 .  .
    674444  50 20   . 3 3  .
    674445   2  2   . 2 3  .
    674446   . 11  14 . 2  2
    674447  16 17   . 3 3  .
    674448   1  .   . 3 .  .
    674449   . 15   . . 3  .
    674450   8  6   8 3 3  3
    end
    label values idstd IDSTD
    label values COVd2_r1 COVD2
    label values COVd2_r2 COVD2
    label values COVd2_r3 COVD2
    label values COVb2a_r1 COVB2A
    label values COVb2a_r2 COVB2A
    label values COVb2a_r3 COVB2A
    
    qui ds idstd, not
    local stubs
    foreach var in `r(varlist)'{
        local stubs "`stubs' `=ustrregexra("`var'", "(\w+)(\_[r]\d+)", "$1")'"
    }
    local stubs: list uniq stubs
    reshape long `stubs', i(idstd) j(which) string
    Res.:

    Code:
    . l, sep(0)
    
         +---------------------------------+
         |  idstd   which   COVd2   COVb2a |
         |---------------------------------|
      1. | 674431     _r1       .        . |
      2. | 674431     _r2       .        . |
      3. | 674431     _r3       .        . |
      4. | 674432     _r1      50        2 |
      5. | 674432     _r2      50        3 |
      6. | 674432     _r3      -9       -9 |
      7. | 674433     _r1       6        3 |
      8. | 674433     _r2       7        2 |
      9. | 674433     _r3       8        2 |
     10. | 674434     _r1       0        3 |
     11. | 674434     _r2       1        3 |
     12. | 674434     _r3       1        3 |
     13. | 674435     _r1       7        3 |
     14. | 674435     _r2       .        . |
     15. | 674435     _r3       7        2 |
     16. | 674436     _r1       1        3 |
     17. | 674436     _r2       1        3 |
     18. | 674436     _r3       .        . |
     19. | 674437     _r1       .        . |
     20. | 674437     _r2       .        . |
     21. | 674437     _r3       1        2 |
     22. | 674438     _r1      15        3 |
     23. | 674438     _r2      12        3 |
     24. | 674438     _r3       .        . |
     25. | 674439     _r1       0        1 |
     26. | 674439     _r2       1        1 |
     27. | 674439     _r3       0        2 |
     28. | 674440     _r1       .        . |
     29. | 674440     _r2       9        1 |
     30. | 674440     _r3       .        . |
     31. | 674441     _r1       .        . |
     32. | 674441     _r2       1        3 |
     33. | 674441     _r3       1        3 |
     34. | 674442     _r1     123        3 |
     35. | 674442     _r2      65        3 |
     36. | 674442     _r3     120        1 |
     37. | 674443     _r1     240        3 |
     38. | 674443     _r2       .        . |
     39. | 674443     _r3       .        . |
     40. | 674444     _r1      50        3 |
     41. | 674444     _r2      20        3 |
     42. | 674444     _r3       .        . |
     43. | 674445     _r1       2        2 |
     44. | 674445     _r2       2        3 |
     45. | 674445     _r3       .        . |
     46. | 674446     _r1       .        . |
     47. | 674446     _r2      11        2 |
     48. | 674446     _r3      14        2 |
     49. | 674447     _r1      16        3 |
     50. | 674447     _r2      17        3 |
     51. | 674447     _r3       .        . |
     52. | 674448     _r1       1        3 |
     53. | 674448     _r2       .        . |
     54. | 674448     _r3       .        . |
     55. | 674449     _r1       .        . |
     56. | 674449     _r2      15        3 |
     57. | 674449     _r3       .        . |
     58. | 674450     _r1       8        3 |
     59. | 674450     _r2       6        3 |
     60. | 674450     _r3       8        3 |
         +---------------------------------+

    Comment


    • #3
      Andrew :

      Your code works well if I change
      qui ds idstd, not by

      qui ds COV*
      Indeed, the only stubs I need to use in the reshaping process are the ones that start with COV, even if my dataset has time-invariant variables. Could you confirm this is fine in this context?

      Comment


      • #4
        Originally posted by Julia Simon View Post
        Indeed, the only stubs I need to use in the reshaping process are the ones that start with COV, even if my dataset has time-invariant variables. Could you confirm this is fine in this context?
        Yes, that's fine. In fact, you could cut to the chase and start at

        Code:
        local stubs
        foreach var of varlist COV*{
            local stubs "`stubs' `=ustrregexra("`var'", "(\w+)(\_[r]\d+)", "$1")'"
        }
        local stubs: list uniq stubs
        reshape long `stubs', i(idstd) j(which) string

        Comment

        Working...
        X