Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tabulating multiple variables with a baseline (row) variable

    Hello everyone,

    I am working with a panel dataset consisting of firm-year observations spanning from 2018 to 2021. Within this dataset, I have several categorical variables (e.g., question1, question2, etc.) that vary in their responses ("Y", "N", missing) across different years. My goal is to create tables for each of these variables, displaying the evolution of these responses over time. I believe providing the data structure I'm working with and outlining the desired table will clarify the issue at hand.

    Here's a glimpse of the dataset structure:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float individual_id double year str1(question1_prec question1_ld1c question1_ld2c question1_ld3c)
     1 2018 "Y" ""  ""  ""
     2 2018 "Y" "N" "Y" "Y"
     3 2018 ""  "Y" "Y" "Y"
     4 2018 "Y" "Y" "Y" "Y"
     5 2018 "N" "N" "N" "N"
     8 2018 "Y" "Y" "Y" "Y"
     9 2018 ""  "Y" "Y" "Y"
    10 2018 ""  ""  ""  ""
    11 2018 "N" "N" ""  ""
    12 2018 ""  "N" "N" "N"
    13 2018 "N" "N" "N" "Y"
    14 2018 "Y" "N" "N" "Y"
    end
    In this dataset, the variable question1_prec represents the response to question1 in 2018, while question1_ld1c denotes the response to the same question in 2019, and so forth. I am using the lead versions of these variables to facilitate tabulation, aiming to use the condition if year==2018.

    The table I am aiming to construct looks like this:


    question1_ld1c question1_ld2c question1_ld3c total
    question1_pre N Y . dropped N Y . dropped N Y . dropped
    N 20 40 20 20 20 20 20 40 20 10 10 60 100
    Y 0 50 5 5 5 40 5 10 5 25 10 20 60
    . 0 0 40 0 0 5 35 0 0 10 10 20 40
    200

    Essentially, I aim to visualize the changes in responses for question1 over time. I'm using question1_prec (i.e., the 2018 response) as the baseline and seeking to showcase what happened to firms that responded "N" in 2018: did their responses change to "Y" or missing, were they dropped from the dataset, or did they maintain the "N" response in 2019, 2020, and 2021? What about firms that responded "Y" in 2018? And so on.

    To elucidate further, I am essentially seeking a similar outcome as the following code: tab2 question1_prec question1_ld1c; tab2 question1_prec question1_ld2c; tab2 question1_prec question1_ld3c, but merged into a single comprehensive table.

    Additionally, I would greatly appreciate any advice on exporting such a table. Until now, I have been using asdoc, but I'm uncertain if it can support the solutions provided.

    Despite potentially seeming trivial, the solutions I've attempted so far have not yielded the desired results.

    Thank you sincerely for your assistance!
    Claudio




  • #2
    If you have Stata 17 or newer, you can use the new Tables features to
    build collections of your statistics, combine the collections, and
    define your table's layout. Said table can then be exported to MS Word,
    MS Excel, PDF, LaTeX, HTML, Markdown, or plain text files.

    Here is the code I used to try to reproduce the example table.

    Code:
    clear all
    
    * Example generated by -dataex-. For more info, type help dataex
    input float individual_id double year str1(question1_prec question1_ld1c question1_ld2c question1_ld3c)
     1 2018 "Y" ""  ""  ""
     2 2018 "Y" "N" "Y" "Y"
     3 2018 ""  "Y" "Y" "Y"
     4 2018 "Y" "Y" "Y" "Y"
     5 2018 "N" "N" "N" "N"
     8 2018 "Y" "Y" "Y" "Y"
     9 2018 ""  "Y" "Y" "Y"
    10 2018 ""  ""  ""  ""
    11 2018 "N" "N" ""  ""
    12 2018 ""  "N" "N" "N"
    13 2018 "N" "N" "N" "Y"
    14 2018 "Y" "N" "N" "Y"
    end
    
    * empty strings are not supported by -table- or -collect-, so encode
    * string variables so we can use numeric missing values
    label define noyes 0 "N" 1 "Y"
    foreach x of varlist question1* {
        encode `x', generate(num_`x') label(noyes)
        local lab : var label `x'
        if `"`lab'"' == "" {
            local lab : copy local x
        }
        label var num_`x' `"`lab'"'
    }
    
    * constant variable for the overall totals of the row variable
    gen total = 1
    label var total "Total"
    
    * list of the column variables to tabulate
    unab after : num_question1_ld* total
    
    * loop over column variables, building a collection of cross-tabulations
    * for each
    foreach x of local after {
        table num_question1_prec `x', ///
            name(`x') ///
            zerocounts ///
            missing ///
            nototals
    }
    
    * combine the collections
    collect combine all = `after'
    
    * hide some redundant/unhelpful header information
    collect style header total, level(hide)
    collect style header result, level(hide)
    
    * layout a table of the combined results
    collect layout (num_question1_prec#result) (`after')
    Here is the resulting table.

    Code:
    --------------------------------------------------------------------------------------
                   |    question1_ld1c       question1_ld2c       question1_ld3c     Total
                   |     N      Y      .      N      Y      .      N      Y      .
    ---------------+----------------------------------------------------------------------
    question1_prec |
      N            |     3      0      0      2      0      1      1      1      1       3
      Y            |     2      2      1      1      3      1      0      4      1       5
      .            |     1      2      1      1      2      1      1      2      1       4
    --------------------------------------------------------------------------------------
    See the documentation for collect export for information on how
    to publish your table to a supported document type.

    Comment

    Working...
    X