Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a Sample from two categorical variables. Looking for replication.

    Hello Statalist,

    This is my second post, sorry in advance if I broke a rule.

    I have a data that contains 140 observations with over 30 variables, but three of those variables are the important ones: corpname (Corporation Name), LPLLC and PBC. The last two variables are categorical (my individual groups).

    I'm trying to make a sample that can be replicated within the individual groups (categorical variables), but every time I run the sample, I get different observations. What can I do to get a sample that can be replicated over time?

    I'm using the next code to generate the specific samples from every subgroup.

    Code:
    set seed 988
    set obs 140
    
    // Create a Random Sample of LPLLC
    
    sort LPLLC
    by LPLLC: count
    
    sample 2 if LPLLC==1, count
    sample 2 if LPLLC==2, count
    sample 2 if LPLLC==3, count
    sample 2 if LPLLC==4, count
    sample 2 if LPLLC==5, count
    sample 2 if LPLLC==6, count
    
    // Create a Random Sample of PBC
    
    sort PBC
    by PBC: count
    
    sample 2 if PBC==1, count
    sample 1 if PBC==2, count
    sample 1 if PBC==3, count
    sample 2 if PBC==4, count
    sample 2 if PBC==5, count
    sample 1 if PBC==6, count
    
    //Show the List for LPLLC
    
    list corpname LPLLC if LPLLC1==1
    list corpname LPLLC if LPLLC2==1
    list corpname LPLLC if LPLLC3==1
    list corpname LPLLC if LPLLC4==1
    list corpname LPLLC if LPLLC5==1
    list corpname LPLLC if LPLLC6==1
    
    //Show the List for PBC
    
    list corpname PBC if PBC1==1
    list corpname PBC if PBC2==1
    list corpname PBC if PBC3==1
    list corpname PBC if PBC4==1
    list corpname PBC if PBC5==1
    list corpname PBC if PBC6==1
    An example of my data is as follow (I believed I did not add this correctly here):

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(LPLLC PBC) str89 corpname
    5 . "2730 L3C"                                                                                
    4 . "AQUARYTHM LPLLC"                                                                          
    2 . "PANADERIA Y CAFETERIA DON NESTOR LOW PROFIT LIMITE..."                                    
    6 . "PAWSOME BLISS LPLLC"                                                                      
    2 . "PLANACTION L3C"                                                                          
    4 . "PORTAL SOCIAL L3C"                                                                        
    4 . "PRCOMMERCE L3C"                                                                          
    3 . "PROSEDUC COMPAÑÍA DE RESPONSABILIDAD LIMITADA CON..."                                  
    . 1 "PROYECTO BUCARABON C.B.S."                                                                
    . 1 "PROYECTO COMUNITARIO PORTUGUÉS CBS"                                                      
    1 . "PUBLLISHERS EDITORIAL BIEKE LIBRE LPLLC"                                                  
    4 . "PUERTA AZUL L.P.L.L.C."                                                                  
    2 . "ARAYAN CONSTRUCTION LOW PROFIT LIMITED LIABILITY C..."                                    
    . 1 "PUERTO RICO SPECIALTY COFFEE ASOCIATION C.B.S."                                          
    . 3 "RECICLA TU CEL, INC. C.B.S."                                                              
                                     
    end
    label values LPLLC LPLLC
    label def LPLLC 1 "0y LPLLC !=SJ", modify
    label def LPLLC 2 "0y LPLLC ==SJ", modify
    label def LPLLC 3 "1y LPLLC !=SJ", modify
    label def LPLLC 4 "1y LPLLC ==SJ", modify
    label def LPLLC 5 "2y+ LPLLC !=SJ", modify
    label def LPLLC 6 "2y+ LPLLC ==SJ", modify
    label values PBC PBC
    label def PBC 1 "0y PBC !=SJ", modify
    label def PBC 2 "0y PBC ==SJ", modify
    label def PBC 3 "1y PBC !=SJ", modify
    label def PBC 4 "1y PBC ==SJ", modify
    label def PBC 5 "2y+ PBC !=SJ", modify
    label def PBC 6 "2y+ PBC ==SJ", modify
    Thanks,

    Félix Quiñones.
    Last edited by Felix Quinones; 03 May 2019, 08:10.

  • #2
    Code:
    help sortseed

    Comment


    • #3
      If you sort on a variable with ties, i.e. a variable where multiple observations have the same value, then there is a problem: How do you sort the numbers 2 and 2? There is no way to decide which 2 is larger, so Stata sorts them randomly. Moreover, the randomness is determined by its own seed. So set seed won't force the ordering to be stable across iterations, as you found out. I would say that normally that is a good thing, but if you wish to fix the order you could add the stable option to sort, see help sort.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Maarten,

        Thanks for your reply.

        The numbers 2 and 2 are the number of sample corporation (observations) that I want to see by every "bucket" (subgroup). This is because I need a representative sample with the same number of observations across years of establishment, type of organization, and location. That constraints are already added in the categorical variable.

        Your comment about the - stable option - in sort works perfect for me.

        Thanks.

        Comment

        Working...
        X