Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Are two variables forming a bijection?

    I am interested in understanding the "correspondance" between two variables in a dataset that I have to clean. My two variables have no missing values but are not unique identifiers.
    For example (see the MWE below) if I could show that one value of var_a is always associated with the same value of var_b, then the two variables are redundant and I could simply drop one of them. I don't expect this to be the case for all but for a lorge number of pairs.

    On the contrary I might have surjection in one or the other directions (or even some messy other situations). I have tried:

    Code:
    clear all
    input var_a str16 var_b
    1 a //var_a and var_b are bijective
    1 a
    2 b
    3 c
    1 a
    2 b
    3 c
    4 d //var_a is surjective onto var_b
    5 d
    5 d
    6 e //var_b is surjective onto var_a
    6 f
    6 f
    // 7 e
    end
    
    bysort var_a: gen count_a = _N
    bysort var_b: gen count_b = _N
    gen cat = ""
    replace cat = "bij" if count_a == count_b
    replace cat = "a_onto_b" if count_a < count_b
    replace cat = "b_onto_a" if count_a > count_b
    Which gives me the result I want, assuming that all pairwise combination are either bijective or surjective (so that the last observation 7-e never ocurs)
    1. Is there a simpler method to achieve the result?
    2. How could I report more complicated cases. For example including the last pair (7-e) makes that the mapping from the domain (6-7) to (f-e) is not an onto mapping. That's typically the type of cases that I would like to report in my data if they exist (what my code does not do so far)

    Thank you for your help

  • #2
    Next up Nicolas Bourbaki with a Stata question!


    A couple of suggestions:

    1. FAQ https://www.stata.com/support/faqs/d...ions-in-group/

    2. groups from the Stata Journal

    The otherwise unpredictable incantation is as follows. Read st0496 but if interested download from st0496_1

    .
    Code:
     search st0496, entry
    
    Search of official help files, FAQs, Examples, and Stata Journals
    
    SJ-18-1 st0496_1  . . . . . . . . . . . . . . . . . Software update for groups
            (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
            Q1/18   SJ 18(1):291
            groups exited with an error message if weights were specified;
            this has been corrected
    
    SJ-17-3 st0496  . . . . .  Speaking Stata: Tables as lists: The groups command
            (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
            Q3/17   SJ 17(3):760--773
            presents command for listing group frequencies and percents and
            cumulations thereof; for various subsetting and ordering by
            frequencies, percents, and so on; for reordering of columns;
            and for saving tabulated data to new datasets
    Sample results:

    Code:
    clear all
    input var_a str16 var_b
    1 a //var_a and var_b are bijective
    1 a
    2 b
    3 c
    1 a
    2 b
    3 c
    4 d //var_a is surjective onto var_b
    5 d
    5 d
    6 e //var_b is surjective onto var_a
    6 f
    6 f
    // 7 e
    end
    
    groups var* , sepby(var_a)
    
      +---------------------------------+
      | var_a   var_b   Freq.   Percent |
      |---------------------------------|
      |     1       a       3     23.08 |
      |---------------------------------|
      |     2       b       2     15.38 |
      |---------------------------------|
      |     3       c       2     15.38 |
      |---------------------------------|
      |     4       d       1      7.69 |
      |---------------------------------|
      |     5       d       2     15.38 |
      |---------------------------------|
      |     6       e       1      7.69 |
      |     6       f       2     15.38 |
      +---------------------------------+

    Comment

    Working...
    X