Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape only a group of variables within a dataset

    Dear Stata users,

    I would like to create a variable named region in one column. This is an example of my dataset:

    id issue ind scopes_text
    20181213_002 ADVOCACY_AND_MEDIA A2 Local
    CAPACITY_DEVELOPMENT A3
    CHILDREN_AND_YOUTH
    20190114_001 ADVOCACY_AND_MEDIA A1 Global
    A2
    20190131_001 ADVOCACY_AND_MEDIA F1 Regional - Americas, Oceania
    CAPACITY_DEVELOPMENT F2
    F3
    20190131_002 CAPACITY_DEVELOPMENT B2 Regional - Africa, Asia, Oceania
    CHILDREN_AND_YOUTH B4


    I applied the following code to extract the regions from scopes_text:
    Code:
    ge wherenum = .
    quietly forval j = 0/9 {
    replace wherenum = min(wherenum, strpos(scopes_text, "`j'")) if strpos(scopes_text, "`j'")
    }
    gen scope, parse(- ,)

    Then, I obtained the following:

    id issue ind scopes_text scope1 scope2 scope3 scope4
    20181213_002 ADVOCACY_AND_MEDIA A2 Local Local
    CAPACITY_DEVELOPMENT A3
    CHILDREN_AND_YOUTH
    20190114_001 ADVOCACY_AND_MEDIA A1 Global Global
    A2
    20190131_001 ADVOCACY_AND_MEDIA F1 Regional - Americas, Oceania Regional Americas Oceania
    CAPACITY_DEVELOPMENT F2
    F3
    20190131_002 CAPACITY_DEVELOPMENT B2 Regional - Africa, Asia, Oceania Regional Africa Asia Oceania
    CHILDREN_AND_YOUTH B4


    I would like to create a variable named region (in one column) containing scope from 2 to N as below (it is important that I can accommodate N number of regions). How could I create this variable region?Thank you in advance for your orientation!

    id issue ind scopes_text scope1 scope2 scope3 scope4 region
    20181213_002 ADVOCACY_AND_MEDIA A2 Local Local
    CAPACITY_DEVELOPMENT A3
    CHILDREN_AND_YOUTH
    20190114_001 ADVOCACY_AND_MEDIA A1 Global Global
    A2
    20190131_001 ADVOCACY_AND_MEDIA F1 Regional - Americas, Oceania Regional Americas Oceania Americas
    CAPACITY_DEVELOPMENT F2 Oceania
    F3
    20190131_002 CAPACITY_DEVELOPMENT B2 Regional - Africa, Asia, Oceania Regional Africa Asia Oceania Africa
    CHILDREN_AND_YOUTH B4 Asia
    Oceania

  • #2
    This is going to be much more easy to understand if you provide dataex examples for data both before and after the transformation. It is not clear what observations some of your values belong to.

    Comment


    • #3
      Dear Jorrit,

      Thank you very much for your kind suggestion. Below, I provide dataex examples. I hope this helps. I will be happy to provide further information.

      I would like to create a variable named region in one column. This is an example of my dataset:
      Code:
      clear
      input str12 id str20 issue str2 ind str42 scopes_text
      "20181213_002" "ADVOCACY_AND_MEDIA"   "A2" "Local"                                    
      ""             "CAPACITY_DEVELOPMENT" "A3" ""                                         
      ""             "CHILDREN_AND_YOUTH"   ""   ""                                         
      "20190114_001" "ADVOCACY_AND_MEDIA"   "A1" "Global"                                    
      ""             ""                     "A2" ""                                         
      "20190131_001" "ADVOCACY_AND_MEDIA"   "F1" "Regional - Americas, Oceania, Asia, Europe"
      ""             "CAPACITY_DEVELOPMENT" "F2" ""                                          
      ""             ""                     "F3" ""                                         
      "20190131_002" "CAPACITY_DEVELOPMENT" "B2" "Regional - Africa, Asia, Oceania"         
      ""             "CHILDREN_AND_YOUTH"   "B4" ""                                          
      end
      I applied the following code to extract the regions from scopes_text:
      Code:
      ge wherenum = .
      quietly forval j = 0/9 {
      replace wherenum = min(wherenum, strpos(scopes_text, "`j'")) if strpos(scopes_text, "`j'")
      }
      gen scope, parse(- ,)
      drop wherenum
      Then, I obtained the following:
      Code:
      clear
      input str12 id str20 issue str2 ind str42(scopes_text scope) str9(scope1 scope2) str8(scope3 scope4) str7 scope5
      "20181213_002" "ADVOCACY_AND_MEDIA"   "A2" "Local"                                      "Local"                                      "Local"     ""          ""         ""         ""      
      ""             "CAPACITY_DEVELOPMENT" "A3" ""                                           ""                                           ""          ""          ""         ""         ""      
      ""             "CHILDREN_AND_YOUTH"   ""   ""                                           ""                                           ""          ""          ""         ""         ""      
      "20190114_001" "ADVOCACY_AND_MEDIA"   "A1" "Global"                                     "Global"                                     "Global"    ""          ""         ""         ""      
      ""             ""                     "A2" ""                                           ""                                           ""          ""          ""         ""         ""      
      "20190131_001" "ADVOCACY_AND_MEDIA"   "F1" "Regional - Americas, Oceania, Asia, Europe" "Regional - Americas, Oceania, Asia, Europe" "Regional " " Americas" " Oceania" " Asia"    " Europe"
      ""             "CAPACITY_DEVELOPMENT" "F2" ""                                           ""                                           ""          ""          ""         ""         ""      
      ""             ""                     "F3" ""                                           ""                                           ""          ""          ""         ""         ""      
      "20190131_002" "CAPACITY_DEVELOPMENT" "B2" "Regional - Africa, Asia, Oceania"           "Regional - Africa, Asia, Oceania"           "Regional " " Africa"   " Asia"    " Oceania" ""      
      ""             "CHILDREN_AND_YOUTH"   "B4" ""                                           ""                                           ""          ""          ""         ""         ""      
      end

      I would like to create a variable named region (in one column) containing scope from 2 to N as below (it is important that I can accommodate N number of regions). It should look as below:
      Code:
      clear
      input str12 id str20 issue str2 ind str42(scopes_text scope) str8(scope1 scope2) str7(scope3 scope4) str6 scope5 str8 region
      "20181213_002" "ADVOCACY_AND_MEDIA"   "A2" "Local"                                      "Local"                                      "Local"    ""         ""        ""        ""       ""       
      ""             "CAPACITY_DEVELOPMENT" "A3" ""                                           ""                                           ""         ""         ""        ""        ""       ""       
      ""             "CHILDREN_AND_YOUTH"   ""   ""                                           ""                                           ""         ""         ""        ""        ""       ""       
      "20190114_001" "ADVOCACY_AND_MEDIA"   "A1" "Global"                                     "Global"                                     "Global"   ""         ""        ""        ""       ""       
      ""             ""                     "A2" ""                                           ""                                           ""         ""         ""        ""        ""       ""       
      "20190131_001" "ADVOCACY_AND_MEDIA"   "F1" "Regional - Americas, Oceania, Asia, Europe" "Regional - Americas, Oceania, Asia, Europe" "Regional" "Americas" "Oceania" "Asia"    "Europe" "Americas"
      ""             "CAPACITY_DEVELOPMENT" "F2" ""                                           ""                                           ""         ""         ""        ""        ""       "Oceania"
      ""             ""                     "F3" ""                                           ""                                           ""         ""         ""        ""        ""       "Asia"   
      ""             ""                     ""   ""                                           ""                                           ""         ""         ""        ""        ""       "Europe" 
      "20190131_002" "CAPACITY_DEVELOPMENT" "B2" "Regional - Africa, Asia, Oceania"           "Regional - Africa, Asia, Oceania"           "Regional" "Africa"   "Asia"    "Oceania" ""       "Africa" 
      ""             "CHILDREN_AND_YOUTH"   "B4" ""                                           ""                                           ""         ""         ""        ""        ""       "Asia"   
      ""             ""                     ""   ""                                           ""                                           ""         ""         ""        ""        ""       "Oceania"
      end

      Comment


      • #4
        Okay, so this is going to be fairly tricky. The reason it would be tricky is because your asking for something Stata isn't really designed for.
        The starting data is already a bit of a mess for most tasks you would want to do in Stata, but the transformation you are asking for is not helping either.
        It'd be better if you explain whether that starting data really is the stating point for you (or if you already do some transformations). Also good to explain the purpose of your analysis. It's likely people will be able to advise you on better data layouts for that purpose, and also quite likely that the code for getting there will be simpler than what youre trying to achieve.

        Comment


        • #5
          Dear Jorrit,
          Thank you very much for your kind reply. This is information coming directly from a dataset. I find it a little difficult to believe that Stata cannot allow me to, in my limited understanding, reshape a number of variables (in columns) to one variable (in one column). Please correct me if I am wrong, but several household datasets have layouts similar to the one in this case where families, for example, are identified by id and then other variables (such as number of children) vary in number (extension) for each id. Generally, to analyze certain variables, I assume you can collapse and that is why I would like to create the region variable. Thank you again. I will be happy to know your comments and I hope some solution can be found eventually.

          Comment


          • #6
            Stata could make any transformation, that's not the issue. The question is what a useful layout would be.
            In almost any scenario where you need to split up your scopes variable, you would want to either create the individual scope variables as you have, or create dummy variables for each scope.
            You could have them listed one below the other, but only if each scope would refer to another ID, and that does not seem to be the case here.

            Comment

            Working...
            X