Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • File Write: fill in columns in a consecutive manner

    Dear Stata Users / Enthusiasts
    I would very much appreciate help with filling a text file columnwise without loosing the already filled in information.
    Please, see below the information to my dataset:
    - I have a large dataset (over 30 millions children of 2 sexes - males are recoded as 0 and female children as 1)
    - This data concerns 50 samples from 46 countries. The samples are recoded as integers, their labels contain the name of the particular sample/country.

    Task: I am computing sex ratios for all births, firstborns, secondborns, thirdborns and fourthborns
    The code below shows how I was proceeding in the beginning:
    Code:
    ** sex ratio per sample
    levelsof sample, local (levels) // 50 labels with sample code
    local vlname: value label sample
    file open SexRatioPerSample using "SexRatioPerSample.txt", read write
    file write SexRatioPerSample  "sample code/sample name        " _column(11) "sex ratio" _n
    foreach sample in `r(levels)' {
        quietly: logit sex if sample==`sample'
        local vl: label `vlname' `sample' // 50 labels with sample name
        file write SexRatioPerSample %9s "`sample'/`vl'        " _column(11) %7.3f (exp(_b[_cons])*100) _n
    }
    file close SexRatioPerSample
    
    
    ** first born sex ratio per sample
    levelsof sample
    local vlname: value label sample
    file open SexRatioPerSample1 using "SexRatioPerSample1.txt", read write
    file write SexRatioPerSample1  "sample code/sample name        " _column(11) "sex ratio" _n
    foreach sample in `r(levels)' {
        quietly: logit sex if flag_mom==1 & sample==`sample'
        local vl: label `vlname' `sample'
        file write SexRatioPerSample1 %9s "`sample'/`vl'        " _column(11) %7.3f (exp(_b[_cons])*100) _n
    }
    file close SexRatioPerSample1
    
    repeat the same for children of order 2, 3, 4
    Righ now, I would like to code in a more compact way. Yet, I struggle with that. Below, is my try to achieve more proffesional / compact code. First, I am aware of the fact that in the 2nd iteration, the levels of sample are not available as the macro was local (how can I store them globally?) Second of all, I only manage to fill in the first column with the labels of the sample and the second column of the table with the sex ratio for the whole sample.


    Code:
    levelsof sample, local(levels)
    local vlname: value label sample
    file open SexRatioPerSampleAll using "SexRatioPerSampleAll.txt", read write
    file write SexRatioPerSampleAll  "Sample name      " _column(1) "Sex ratio " _column(2) "Sex ratio 1 " _column(3) "Sex ratio 2 " _column(4) "Sex ratio 3 " _column(5) "Sex ratio 4 "_n
    
    forvalues i = 1/5 {  
        
        foreach sample in `r(levels)' {
            quietly: logit sex if flag_mom==`i' & sample==`sample'
            local vl: label `vlname' `sample'
            file write SexRatioPerSampleAll %9s "`vl'      " _column(`i') %7.3f (exp(_b[_cons])*100) _n
        
        }
        
    }
        
    file close SexRatioPerSampleAll
    I desire to report the 50 labels of the variable sample in the first column. Then, I would like to fill in columns 2-6 with the sex ratios (in column 2: for all births, in column 3: for first borns, column 4: second borns etc. )

    Desired format (just an example, no real data)
    Sample label Sex Ratio Sex Ratio 1 Sex Ratio 2 Sex Ratio 3 Sex Ratio 4
    Ghana 2015 105 115 107 97 102
    Guinea 2014 105 105 106 104 105
    Iran 2011 ..
    Indonesia 2010 ..
    Any help is much appreciated.
    Thank you!
    Best regards,
    Veronika

  • #2
    To get a useful answer quickly, I'd advise you to re-read the StataList FAQ for new members, with particular attention to section 12.2 on using -dataex- to present a data example. Without that, your chances of getting some help are not good.

    In your new posting in this thread, I'd also encourage you to focus on the result you want and say almost nothing about "how" you are trying to do it. From what I can see above of your code, I can't see how any of it is relevant to what I (vaguely) understand what you want. What people here need in order to help you is 1) a good understanding of the nature of the Stata data file you currently have; 2) what you would like as a result or output from it. My best guess is that -file write- and -logit- will be of no help in finding and displaying sex ratios, or at least are very difficult ways to get that.

    Comment


    • #3
      Dear Mike
      Thank you very much for the valuable input! Therefore, please, find below a more proper post. Again, thank you very much for any help!

      Task: For each sample, compute sex ratios for all children no matter the birth order (Sex Ratio All), firstborn children (Sex Ratio 1stborn), secondborn children (Sex Ratio 2ndborn), ...
      For example: Sample = Ghana
      Sex Ratio All = number of all male children / number of all female children
      Sex Ratio 1stborn children = number of all firstborn male children / number of all firstborn female children
      ...
      The order of the child is stored in the variable flag_mom.

      Desired output (please denote that the numbers are random):
      Sample Sex Ratio All Sex Ratio 1stborn Sex Ratio 2ndborn Sex Ratio 3rdborn Sex Ratio 4thborn
      Ghana 2010 105 107 103 106 105
      Brazil 2015 103 105 102 103 103

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input long sample byte(sex flag_mom)
      288201001 0 1
      288201001 1 2
      288201001 0 1
      288201001 0 2
      288201001 0 1
      288201001 0 2
      288201001 0 1
      288201001 1 2
      288201001 1 1
      288201001 1 2
      288201001 1 1
      288201001 0 1
      288201001 1 1
      288201001 1 2
      288201001 1 3
      288201001 1 1
      288201001 0 2
      288201001 1 1
      288201001 1 .
      288201001 0 1
      288201001 0 1
      288201001 1 2
      288201001 0 1
      288201001 0 2
      288201001 1 3
      288201001 0 1
      288201001 1 2
      288201001 1 3
      288201001 1 4
      288201001 1 5
      288201001 1 1
      288201001 0 2
      288201001 0 3
      288201001 0 1
      288201001 0 1
      288201001 1 1
      288201001 1 1
      288201001 0 2
      288201001 1 1
      288201001 0 2
      288201001 0 3
      288201001 1 1
      288201001 0 2
      288201001 0 3
      288201001 0 4
      288201001 1 1
      288201001 0 1
      288201001 0 1
      288201001 1 1
      288201001 0 2
      288201001 0 1
      288201001 0 1
      288201001 0 2
      288201001 0 1
      288201001 1 2
      288201001 1 1
      288201001 0 1
      288201001 0 1
      288201001 0 1
      288201001 0 2
      288201001 0 3
      288201001 0 1
      288201001 1 2
      288201001 0 3
      288201001 1 1
      288201001 0 1
      288201001 1 2
      288201001 1 3
      288201001 0 4
      288201001 1 1
      288201001 1 1
      288201001 1 2
      288201001 1 1
      288201001 0 2
      288201001 1 3
      288201001 1 1
      288201001 1 1
      288201001 0 2
      288201001 1 1
      288201001 1 2
      288201001 0 1
      288201001 1 1
      288201001 1 1
      288201001 0 1
      288201001 1 2
      288201001 1 1
      288201001 0 1
      288201001 1 1
      288201001 1 2
      288201001 0 3
      288201001 0 1
      288201001 1 2
      288201001 0 1
      288201001 1 2
      288201001 0 3
      288201001 1 1
      288201001 0 2
      288201001 1 3
      288201001 1 1
      288201001 1 1
      end
      label values sample sample_lbl
      label def sample_lbl 288201001 "Ghana 2010", modify
      label values sex sex_lbl_new
      label def sex_lbl_new 0 "Female", modify
      label def sex_lbl_new 1 "Male", modify

      Comment


      • #4
        Mike Lacy Dear Mike, I provided a better explanation of my problem, or is still something missing? Could you/anybody please take a look on it? Thank you!

        Comment


        • #5
          Here is an example that uses collect. I have used different example data, as your example only contained information on one sample (country/years).

          Code:
          . // create some example data
          . clear
          
          . set seed 12345
          
          . set obs 5
          Number of observations (_N) was 0, now 5.
          
          . gen sample:sample_lb = _n
          
          . label define sample_lb 1 "Tookland" ///
          >                        2 "Eastfarthing" ///
          >                        3 "Marish" ///
          >                        4 "North Moors" ///
          >                        5 "Bridgfields"
          
          . expand 1000
          (4,995 observations created)
          
          . bys sample: gen id = _n
          
          . gen sex = runiform() < .5+sample/100
          
          . gen flag_mom = ceil(3*runiform())
          
          .
          . // collect the sex ratios
          . collect clear
          
          . collect create sexratios
          (current collection is sexratios)
          
          . qui: collect _r_b, name(sexratios) : logit sex ibn.sample , nocons or
          
          . forvalues i = 1/3 {
            2.     qui: collect _r_b, name(sexratios) : logit sex ibn.sample if flag_mom==
          > `i', nocons or
            3. }
          
          . collect label levels cmdset 1 "all" ///
          >                             2 "1st born" ///
          >                             3 "2nd born" ///
          >                             4 "3rd born" , name(sexratios)
          
          . collect style header result, level(hide)
          
          . collect style cell, nformat(%5.2f)
          
          . collect style cell border_block, border(right, pattern(nil))
          
          . collect layout (colname#result)(cmdset), name(sexratios)
          
          Collection: sexratios
                Rows: colname#result
             Columns: cmdset
             Table 1: 5 x 4
          
          --------------------------------------------
                        all 1st born 2nd born 3rd born
          --------------------------------------------
          Tookland     1.10     0.99     1.14     1.17
          Eastfarthing 1.17     1.24     1.14     1.13
          Marish       1.14     1.07     1.12     1.23
          North Moors  1.19     1.26     1.16     1.17
          Bridgfields  1.16     1.22     1.01     1.28
          --------------------------------------------
          Notice that I did not loop over the samples (country/years). Instead I estimated the sexratios for each sample with logit sex ibn.sample, nocons . To read more about this trick read this Stata tip: https://www.stata-journal.com/articl...article=st0250
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Dear @Marteen Buis, thank you very much for your help! I really appreciate it as I was struggling with that ..

            Comment


            • #7
              Originally posted by Maarten Buis View Post
              Here is an example that uses collect. I have used different example data, as your example only contained information on one sample (country/years).

              Code:
              . // create some example data
              . clear
              
              . set seed 12345
              
              . set obs 5
              Number of observations (_N) was 0, now 5.
              
              . gen sample:sample_lb = _n
              
              . label define sample_lb 1 "Tookland" ///
              > 2 "Eastfarthing" ///
              > 3 "Marish" ///
              > 4 "North Moors" ///
              > 5 "Bridgfields"
              
              . expand 1000
              (4,995 observations created)
              
              . bys sample: gen id = _n
              
              . gen sex = runiform() < .5+sample/100
              
              . gen flag_mom = ceil(3*runiform())
              
              .
              . // collect the sex ratios
              . collect clear
              
              . collect create sexratios
              (current collection is sexratios)
              
              . qui: collect _r_b, name(sexratios) : logit sex ibn.sample , nocons or
              
              . forvalues i = 1/3 {
              2. qui: collect _r_b, name(sexratios) : logit sex ibn.sample if flag_mom==
              > `i', nocons or
              3. }
              
              . collect label levels cmdset 1 "all" ///
              > 2 "1st born" ///
              > 3 "2nd born" ///
              > 4 "3rd born" , name(sexratios)
              
              . collect style header result, level(hide)
              
              . collect style cell, nformat(%5.2f)
              
              . collect style cell border_block, border(right, pattern(nil))
              
              . collect layout (colname#result)(cmdset), name(sexratios)
              
              Collection: sexratios
              Rows: colname#result
              Columns: cmdset
              Table 1: 5 x 4
              
              --------------------------------------------
              all 1st born 2nd born 3rd born
              --------------------------------------------
              Tookland 1.10 0.99 1.14 1.17
              Eastfarthing 1.17 1.24 1.14 1.13
              Marish 1.14 1.07 1.12 1.23
              North Moors 1.19 1.26 1.16 1.17
              Bridgfields 1.16 1.22 1.01 1.28
              --------------------------------------------
              Notice that I did not loop over the samples (country/years). Instead I estimated the sexratios for each sample with logit sex ibn.sample, nocons . To read more about this trick read this Stata tip: https://www.stata-journal.com/articl...article=st0250
              Again, thank you very much!
              I wanted to ask whether there is a possibility to weight the logistic regression. In my data, I have individual sampling weights of a float type, for example, 8.92 or 10.00.

              Comment


              • #8
                of course, just use the weights in the logit models
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  Originally posted by Maarten Buis View Post
                  of course, just use the weights in the logit models
                  Thank you! I used:
                  Code:
                  ​​​​​​​...
                  logit sex ibn.sample [pw = perwt] 
                  ...
                  Everything works well.

                  Comment

                  Working...
                  X