Loading in blocks of data

Laura Freds

Join Date: Oct 2020

Posts: 25
#1

Loading in blocks of data

06 Dec 2021, 18:49

Hello I have the following data.

It is a .txt file with 7 variables listed vertically with a line separating each observation.

A:Alpha9
B:Beta3
C:1
D:11
E:19
F:13:11, 14 May 2019
G:Hello world

A:Alpha11
B:Beta31
C:11
D:112
E:190
F:13:21, 14 May 2019
G:Hello world again

...

I want to turn this into a long format.

I tried reading the infix help file but I don't think this is right
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

06 Dec 2021, 19:20

The following may point you in a useful direction.

Code:

// show example data
type example.txt
// this assumes your data does not have the string GNXL in any observation
// note that blank lines are automatically dropped
import delimited example.txt, delimiters("GNXL",asstring)
// create a dataset in a long layout
generate varname = substr(v1,1,1)
generate value = substr(v1,3,.)
drop v1
generate id = sum(varname=="A")
list, sepby(id)
// change the datasetto a wide layout
reshape wide value, i(id) j(varname) string
rename (value*) (*)
list, clean

Code:

. // show example data
. type example.txt
A:Alpha9
B:Beta3
C:1
D:11
E:19
F:13:11, 14 May 2019
G:Hello world

A:Alpha11
B:Beta31
C:11
D:112
E:190
F:13:21, 14 May 2019
G:Hello world again

. // this assumes your data does not have the string GNXL in any observation
. // note that blank lines are automatically dropped
. import delimited example.txt, delimiters("GNXL",asstring)
(encoding automatically selected: ISO-8859-1)
(1 var, 14 obs)

. // create a dataset in a long layout
. generate varname = substr(v1,1,1)

. generate value = substr(v1,3,.)

. drop v1

. generate id = sum(varname=="A")

. list, sepby(id)

     +-----------------------------------+
     | varname                value   id |
     |-----------------------------------|
  1. |       A               Alpha9    1 |
  2. |       B                Beta3    1 |
  3. |       C                    1    1 |
  4. |       D                   11    1 |
  5. |       E                   19    1 |
  6. |       F   13:11, 14 May 2019    1 |
  7. |       G          Hello world    1 |
     |-----------------------------------|
  8. |       A              Alpha11    2 |
  9. |       B               Beta31    2 |
 10. |       C                   11    2 |
 11. |       D                  112    2 |
 12. |       E                  190    2 |
 13. |       F   13:21, 14 May 2019    2 |
 14. |       G    Hello world again    2 |
     +-----------------------------------+

. // change the datasetto a wide layout
. reshape wide value, i(id) j(varname) string
(j = A B C D E F G)

Data                               Long   ->   Wide
-----------------------------------------------------------------------------
Number of observations               14   ->   2           
Number of variables                   3   ->   8           
j variable (7 values)           varname   ->   (dropped)
xij variables:
                                  value   ->   valueA valueB ... valueG
-----------------------------------------------------------------------------

. rename (value*) (*)

. list, clean 

       id         A        B    C     D     E                    F                   G  
  1.    1    Alpha9    Beta3    1    11    19   13:11, 14 May 2019         Hello world  
  2.    2   Alpha11   Beta31   11   112   190   13:21, 14 May 2019   Hello world again  

.

Announcement

Loading in blocks of data

Comment