Tabulating multiple variables with a baseline (row) variable

Claudio Ferrantino

Join Date: Nov 2022
Posts: 3

Tabulating multiple variables with a baseline (row) variable

25 Nov 2023, 11:22

Hello everyone,

I am working with a panel dataset consisting of firm-year observations spanning from 2018 to 2021. Within this dataset, I have several categorical variables (e.g., question1, question2, etc.) that vary in their responses ("Y", "N", missing) across different years. My goal is to create tables for each of these variables, displaying the evolution of these responses over time. I believe providing the data structure I'm working with and outlining the desired table will clarify the issue at hand.

Here's a glimpse of the dataset structure:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float individual_id double year str1(question1_prec question1_ld1c question1_ld2c question1_ld3c)
 1 2018 "Y" ""  ""  ""
 2 2018 "Y" "N" "Y" "Y"
 3 2018 ""  "Y" "Y" "Y"
 4 2018 "Y" "Y" "Y" "Y"
 5 2018 "N" "N" "N" "N"
 8 2018 "Y" "Y" "Y" "Y"
 9 2018 ""  "Y" "Y" "Y"
10 2018 ""  ""  ""  ""
11 2018 "N" "N" ""  ""
12 2018 ""  "N" "N" "N"
13 2018 "N" "N" "N" "Y"
14 2018 "Y" "N" "N" "Y"
end

In this dataset, the variable question1_prec represents the response to question1 in 2018, while question1_ld1c denotes the response to the same question in 2019, and so forth. I am using the lead versions of these variables to facilitate tabulation, aiming to use the condition if year==2018.

The table I am aiming to construct looks like this:

	question1_ld1c				question1_ld2c				question1_ld3c				total
question1_pre	N	Y	.	dropped	N	Y	.	dropped	N	Y	.	dropped
N	20	40	20	20	20	20	20	40	20	10	10	60	100
Y	0	50	5	5	5	40	5	10	5	25	10	20	60
.	0	0	40	0	0	5	35	0	0	10	10	20	40
													200

Essentially, I aim to visualize the changes in responses for question1 over time. I'm using question1_prec (i.e., the 2018 response) as the baseline and seeking to showcase what happened to firms that responded "N" in 2018: did their responses change to "Y" or missing, were they dropped from the dataset, or did they maintain the "N" response in 2019, 2020, and 2021? What about firms that responded "Y" in 2018? And so on.

To elucidate further, I am essentially seeking a similar outcome as the following code: tab2 question1_prec question1_ld1c; tab2 question1_prec question1_ld2c; tab2 question1_prec question1_ld3c, but merged into a single comprehensive table.

Additionally, I would greatly appreciate any advice on exporting such a table. Until now, I have been using asdoc, but I'm uncertain if it can support the solutions provided.

Despite potentially seeming trivial, the solutions I've attempted so far have not yielded the desired results.

Thank you sincerely for your assistance!
Claudio

Tags: None

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 704

12 Dec 2023, 13:39

If you have Stata 17 or newer, you can use the new Tables features to
build collections of your statistics, combine the collections, and
define your table's layout. Said table can then be exported to MS Word,
MS Excel, PDF, LaTeX, HTML, Markdown, or plain text files.

Here is the code I used to try to reproduce the example table.

Code:

clear all

* Example generated by -dataex-. For more info, type help dataex
input float individual_id double year str1(question1_prec question1_ld1c question1_ld2c question1_ld3c)
 1 2018 "Y" ""  ""  ""
 2 2018 "Y" "N" "Y" "Y"
 3 2018 ""  "Y" "Y" "Y"
 4 2018 "Y" "Y" "Y" "Y"
 5 2018 "N" "N" "N" "N"
 8 2018 "Y" "Y" "Y" "Y"
 9 2018 ""  "Y" "Y" "Y"
10 2018 ""  ""  ""  ""
11 2018 "N" "N" ""  ""
12 2018 ""  "N" "N" "N"
13 2018 "N" "N" "N" "Y"
14 2018 "Y" "N" "N" "Y"
end

* empty strings are not supported by -table- or -collect-, so encode
* string variables so we can use numeric missing values
label define noyes 0 "N" 1 "Y"
foreach x of varlist question1* {
    encode `x', generate(num_`x') label(noyes)
    local lab : var label `x'
    if `"`lab'"' == "" {
        local lab : copy local x
    }
    label var num_`x' `"`lab'"'
}

* constant variable for the overall totals of the row variable
gen total = 1
label var total "Total"

* list of the column variables to tabulate
unab after : num_question1_ld* total

* loop over column variables, building a collection of cross-tabulations
* for each
foreach x of local after {
    table num_question1_prec `x', ///
        name(`x') ///
        zerocounts ///
        missing ///
        nototals
}

* combine the collections
collect combine all = `after'

* hide some redundant/unhelpful header information
collect style header total, level(hide)
collect style header result, level(hide)

* layout a table of the combined results
collect layout (num_question1_prec#result) (`after')

Here is the resulting table.

Code:

--------------------------------------------------------------------------------------
               |    question1_ld1c       question1_ld2c       question1_ld3c     Total
               |     N      Y      .      N      Y      .      N      Y      .
---------------+----------------------------------------------------------------------
question1_prec |
  N            |     3      0      0      2      0      1      1      1      1       3
  Y            |     2      2      1      1      3      1      0      4      1       5
  .            |     1      2      1      1      2      1      1      2      1       4
--------------------------------------------------------------------------------------

See the documentation for collect export for information on how
to publish your table to a supported document type.

Announcement

Tabulating multiple variables with a baseline (row) variable

Comment