File Write: fill in columns in a consecutive manner

Veronika Valcikova

Join Date: Dec 2019
Posts: 34

File Write: fill in columns in a consecutive manner

03 Mar 2023, 10:48

Dear Stata Users / Enthusiasts
I would very much appreciate help with filling a text file columnwise without loosing the already filled in information.
Please, see below the information to my dataset:
- I have a large dataset (over 30 millions children of 2 sexes - males are recoded as 0 and female children as 1)
- This data concerns 50 samples from 46 countries. The samples are recoded as integers, their labels contain the name of the particular sample/country.

Task: I am computing sex ratios for all births, firstborns, secondborns, thirdborns and fourthborns
The code below shows how I was proceeding in the beginning:

Code:

** sex ratio per sample
levelsof sample, local (levels) // 50 labels with sample code
local vlname: value label sample
file open SexRatioPerSample using "SexRatioPerSample.txt", read write
file write SexRatioPerSample  "sample code/sample name        " _column(11) "sex ratio" _n
foreach sample in `r(levels)' {
    quietly: logit sex if sample==`sample'
    local vl: label `vlname' `sample' // 50 labels with sample name
    file write SexRatioPerSample %9s "`sample'/`vl'        " _column(11) %7.3f (exp(_b[_cons])*100) _n
}
file close SexRatioPerSample


** first born sex ratio per sample
levelsof sample
local vlname: value label sample
file open SexRatioPerSample1 using "SexRatioPerSample1.txt", read write
file write SexRatioPerSample1  "sample code/sample name        " _column(11) "sex ratio" _n
foreach sample in `r(levels)' {
    quietly: logit sex if flag_mom==1 & sample==`sample'
    local vl: label `vlname' `sample'
    file write SexRatioPerSample1 %9s "`sample'/`vl'        " _column(11) %7.3f (exp(_b[_cons])*100) _n
}
file close SexRatioPerSample1

repeat the same for children of order 2, 3, 4

Righ now, I would like to code in a more compact way. Yet, I struggle with that. Below, is my try to achieve more proffesional / compact code. First, I am aware of the fact that in the 2nd iteration, the levels of sample are not available as the macro was local (how can I store them globally?) Second of all, I only manage to fill in the first column with the labels of the sample and the second column of the table with the sex ratio for the whole sample.

Code:

levelsof sample, local(levels)
local vlname: value label sample
file open SexRatioPerSampleAll using "SexRatioPerSampleAll.txt", read write
file write SexRatioPerSampleAll  "Sample name      " _column(1) "Sex ratio " _column(2) "Sex ratio 1 " _column(3) "Sex ratio 2 " _column(4) "Sex ratio 3 " _column(5) "Sex ratio 4 "_n

forvalues i = 1/5 {  
    
    foreach sample in `r(levels)' {
        quietly: logit sex if flag_mom==`i' & sample==`sample'
        local vl: label `vlname' `sample'
        file write SexRatioPerSampleAll %9s "`vl'      " _column(`i') %7.3f (exp(_b[_cons])*100) _n
    
    }
    
}
    
file close SexRatioPerSampleAll

I desire to report the 50 labels of the variable sample in the first column. Then, I would like to fill in columns 2-6 with the sex ratios (in column 2: for all births, in column 3: for first borns, column 4: second borns etc. )

Desired format (just an example, no real data)

Sample label	Sex Ratio	Sex Ratio 1	Sex Ratio 2	Sex Ratio 3	Sex Ratio 4
Ghana 2015	105	115	107	97	102
Guinea 2014	105	105	106	104	105
Iran 2011	..
Indonesia 2010	..

Any help is much appreciated.
Thank you!
Best regards,
Veronika

Tags: None

Mike Lacy

Join Date: Apr 2014

Posts: 2411
#2

03 Mar 2023, 11:24

To get a useful answer quickly, I'd advise you to re-read the StataList FAQ for new members, with particular attention to section 12.2 on using -dataex- to present a data example. Without that, your chances of getting some help are not good.

In your new posting in this thread, I'd also encourage you to focus on the result you want and say almost nothing about "how" you are trying to do it. From what I can see above of your code, I can't see how any of it is relevant to what I (vaguely) understand what you want. What people here need in order to help you is 1) a good understanding of the nature of the Stata data file you currently have; 2) what you would like as a result or output from it. My best guess is that -file write- and -logit- will be of no help in finding and displaying sex ratios, or at least are very difficult ways to get that.
1 like
Comment

Veronika Valcikova

Join Date: Dec 2019
Posts: 34

03 Mar 2023, 13:16

Dear Mike
Thank you very much for the valuable input! Therefore, please, find below a more proper post. Again, thank you very much for any help!

Task: For each sample, compute sex ratios for all children no matter the birth order (Sex Ratio All), firstborn children (Sex Ratio 1stborn), secondborn children (Sex Ratio 2ndborn), ...
For example: Sample = Ghana
Sex Ratio All = number of all male children / number of all female children
Sex Ratio 1stborn children = number of all firstborn male children / number of all firstborn female children
...
The order of the child is stored in the variable flag_mom.

Desired output (please denote that the numbers are random):

Sample	Sex Ratio All	Sex Ratio 1stborn	Sex Ratio 2ndborn	Sex Ratio 3rdborn	Sex Ratio 4thborn
Ghana 2010	105	107	103	106	105
Brazil 2015	103	105	102	103	103

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long sample byte(sex flag_mom)
288201001 0 1
288201001 1 2
288201001 0 1
288201001 0 2
288201001 0 1
288201001 0 2
288201001 0 1
288201001 1 2
288201001 1 1
288201001 1 2
288201001 1 1
288201001 0 1
288201001 1 1
288201001 1 2
288201001 1 3
288201001 1 1
288201001 0 2
288201001 1 1
288201001 1 .
288201001 0 1
288201001 0 1
288201001 1 2
288201001 0 1
288201001 0 2
288201001 1 3
288201001 0 1
288201001 1 2
288201001 1 3
288201001 1 4
288201001 1 5
288201001 1 1
288201001 0 2
288201001 0 3
288201001 0 1
288201001 0 1
288201001 1 1
288201001 1 1
288201001 0 2
288201001 1 1
288201001 0 2
288201001 0 3
288201001 1 1
288201001 0 2
288201001 0 3
288201001 0 4
288201001 1 1
288201001 0 1
288201001 0 1
288201001 1 1
288201001 0 2
288201001 0 1
288201001 0 1
288201001 0 2
288201001 0 1
288201001 1 2
288201001 1 1
288201001 0 1
288201001 0 1
288201001 0 1
288201001 0 2
288201001 0 3
288201001 0 1
288201001 1 2
288201001 0 3
288201001 1 1
288201001 0 1
288201001 1 2
288201001 1 3
288201001 0 4
288201001 1 1
288201001 1 1
288201001 1 2
288201001 1 1
288201001 0 2
288201001 1 3
288201001 1 1
288201001 1 1
288201001 0 2
288201001 1 1
288201001 1 2
288201001 0 1
288201001 1 1
288201001 1 1
288201001 0 1
288201001 1 2
288201001 1 1
288201001 0 1
288201001 1 1
288201001 1 2
288201001 0 3
288201001 0 1
288201001 1 2
288201001 0 1
288201001 1 2
288201001 0 3
288201001 1 1
288201001 0 2
288201001 1 3
288201001 1 1
288201001 1 1
end
label values sample sample_lbl
label def sample_lbl 288201001 "Ghana 2010", modify
label values sex sex_lbl_new
label def sex_lbl_new 0 "Female", modify
label def sex_lbl_new 1 "Male", modify

Comment

Veronika Valcikova

Join Date: Dec 2019

Posts: 34
#4

06 Mar 2023, 00:33

Mike Lacy Dear Mike, I provided a better explanation of my problem, or is still something missing? Could you/anybody please take a look on it? Thank you!
Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3449

06 Mar 2023, 04:56

Here is an example that uses collect. I have used different example data, as your example only contained information on one sample (country/years).

Code:

. // create some example data
. clear

. set seed 12345

. set obs 5
Number of observations (_N) was 0, now 5.

. gen sample:sample_lb = _n

. label define sample_lb 1 "Tookland" ///
>                        2 "Eastfarthing" ///
>                        3 "Marish" ///
>                        4 "North Moors" ///
>                        5 "Bridgfields"

. expand 1000
(4,995 observations created)

. bys sample: gen id = _n

. gen sex = runiform() < .5+sample/100

. gen flag_mom = ceil(3*runiform())

.
. // collect the sex ratios
. collect clear

. collect create sexratios
(current collection is sexratios)

. qui: collect _r_b, name(sexratios) : logit sex ibn.sample , nocons or

. forvalues i = 1/3 {
  2.     qui: collect _r_b, name(sexratios) : logit sex ibn.sample if flag_mom==
> `i', nocons or
  3. }

. collect label levels cmdset 1 "all" ///
>                             2 "1st born" ///
>                             3 "2nd born" ///
>                             4 "3rd born" , name(sexratios)

. collect style header result, level(hide)

. collect style cell, nformat(%5.2f)

. collect style cell border_block, border(right, pattern(nil))

. collect layout (colname#result)(cmdset), name(sexratios)

Collection: sexratios
      Rows: colname#result
   Columns: cmdset
   Table 1: 5 x 4

--------------------------------------------
              all 1st born 2nd born 3rd born
--------------------------------------------
Tookland     1.10     0.99     1.14     1.17
Eastfarthing 1.17     1.24     1.14     1.13
Marish       1.14     1.07     1.12     1.23
North Moors  1.19     1.26     1.16     1.17
Bridgfields  1.16     1.22     1.01     1.28
--------------------------------------------

Notice that I did not loop over the samples (country/years). Instead I estimated the sexratios for each sample with logit sex ibn.sample, nocons . To read more about this trick read this Stata tip: https://www.stata-journal.com/articl...article=st0250

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Veronika Valcikova

Join Date: Dec 2019

Posts: 34
#6

14 Mar 2023, 11:21

Dear @Marteen Buis, thank you very much for your help! I really appreciate it as I was struggling with that ..
Comment

Veronika Valcikova

Join Date: Dec 2019
Posts: 34

03 Apr 2023, 04:52

Originally posted by Maarten Buis View Post

Here is an example that uses collect. I have used different example data, as your example only contained information on one sample (country/years).

Code:

. // create some example data
. clear

. set seed 12345

. set obs 5
Number of observations (_N) was 0, now 5.

. gen sample:sample_lb = _n

. label define sample_lb 1 "Tookland" ///
> 2 "Eastfarthing" ///
> 3 "Marish" ///
> 4 "North Moors" ///
> 5 "Bridgfields"

. expand 1000
(4,995 observations created)

. bys sample: gen id = _n

. gen sex = runiform() < .5+sample/100

. gen flag_mom = ceil(3*runiform())

.
. // collect the sex ratios
. collect clear

. collect create sexratios
(current collection is sexratios)

. qui: collect _r_b, name(sexratios) : logit sex ibn.sample , nocons or

. forvalues i = 1/3 {
2. qui: collect _r_b, name(sexratios) : logit sex ibn.sample if flag_mom==
> `i', nocons or
3. }

. collect label levels cmdset 1 "all" ///
> 2 "1st born" ///
> 3 "2nd born" ///
> 4 "3rd born" , name(sexratios)

. collect style header result, level(hide)

. collect style cell, nformat(%5.2f)

. collect style cell border_block, border(right, pattern(nil))

. collect layout (colname#result)(cmdset), name(sexratios)

Collection: sexratios
Rows: colname#result
Columns: cmdset
Table 1: 5 x 4

--------------------------------------------
all 1st born 2nd born 3rd born
--------------------------------------------
Tookland 1.10 0.99 1.14 1.17
Eastfarthing 1.17 1.24 1.14 1.13
Marish 1.14 1.07 1.12 1.23
North Moors 1.19 1.26 1.16 1.17
Bridgfields 1.16 1.22 1.01 1.28
--------------------------------------------

Again, thank you very much!
I wanted to ask whether there is a possibility to weight the logistic regression. In my data, I have individual sampling weights of a float type, for example, 8.92 or 10.00.

Comment

Maarten Buis

Join Date: Mar 2014

Posts: 3449
#8

03 Apr 2023, 05:20

of course, just use the weights in the logit models

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Veronika Valcikova

Join Date: Dec 2019

Posts: 34
#9

03 Apr 2023, 12:22

Originally posted by Maarten Buis View Post

of course, just use the weights in the logit models

Thank you! I used:

Code:

... logit sex ibn.sample [pw = perwt] ...

Everything works well.
Comment

Announcement

File Write: fill in columns in a consecutive manner

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment