Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loading in blocks of data

    Hello I have the following data.

    It is a .txt file with 7 variables listed vertically with a line separating each observation.

    A:Alpha9
    B:Beta3
    C:1
    D:11
    E:19
    F:13:11, 14 May 2019
    G:Hello world

    A:Alpha11
    B:Beta31
    C:11
    D:112
    E:190
    F:13:21, 14 May 2019
    G:Hello world again

    ...

    I want to turn this into a long format.

    I tried reading the infix help file but I don't think this is right


  • #2
    The following may point you in a useful direction.
    Code:
    // show example data
    type example.txt
    // this assumes your data does not have the string GNXL in any observation
    // note that blank lines are automatically dropped
    import delimited example.txt, delimiters("GNXL",asstring)
    // create a dataset in a long layout
    generate varname = substr(v1,1,1)
    generate value = substr(v1,3,.)
    drop v1
    generate id = sum(varname=="A")
    list, sepby(id)
    // change the datasetto a wide layout
    reshape wide value, i(id) j(varname) string
    rename (value*) (*)
    list, clean
    Code:
    . // show example data
    . type example.txt
    A:Alpha9
    B:Beta3
    C:1
    D:11
    E:19
    F:13:11, 14 May 2019
    G:Hello world
    
    A:Alpha11
    B:Beta31
    C:11
    D:112
    E:190
    F:13:21, 14 May 2019
    G:Hello world again
    
    . // this assumes your data does not have the string GNXL in any observation
    . // note that blank lines are automatically dropped
    . import delimited example.txt, delimiters("GNXL",asstring)
    (encoding automatically selected: ISO-8859-1)
    (1 var, 14 obs)
    
    . // create a dataset in a long layout
    . generate varname = substr(v1,1,1)
    
    . generate value = substr(v1,3,.)
    
    . drop v1
    
    . generate id = sum(varname=="A")
    
    . list, sepby(id)
    
         +-----------------------------------+
         | varname                value   id |
         |-----------------------------------|
      1. |       A               Alpha9    1 |
      2. |       B                Beta3    1 |
      3. |       C                    1    1 |
      4. |       D                   11    1 |
      5. |       E                   19    1 |
      6. |       F   13:11, 14 May 2019    1 |
      7. |       G          Hello world    1 |
         |-----------------------------------|
      8. |       A              Alpha11    2 |
      9. |       B               Beta31    2 |
     10. |       C                   11    2 |
     11. |       D                  112    2 |
     12. |       E                  190    2 |
     13. |       F   13:21, 14 May 2019    2 |
     14. |       G    Hello world again    2 |
         +-----------------------------------+
    
    . // change the datasetto a wide layout
    . reshape wide value, i(id) j(varname) string
    (j = A B C D E F G)
    
    Data                               Long   ->   Wide
    -----------------------------------------------------------------------------
    Number of observations               14   ->   2           
    Number of variables                   3   ->   8           
    j variable (7 values)           varname   ->   (dropped)
    xij variables:
                                      value   ->   valueA valueB ... valueG
    -----------------------------------------------------------------------------
    
    . rename (value*) (*)
    
    . list, clean 
    
           id         A        B    C     D     E                    F                   G  
      1.    1    Alpha9    Beta3    1    11    19   13:11, 14 May 2019         Hello world  
      2.    2   Alpha11   Beta31   11   112   190   13:21, 14 May 2019   Hello world again  
    
    .

    Comment

    Working...
    X