Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New package: xtbalance2 - Create a balanced subsample from unbalanced panel data.

    Thanks to Kit Baum, a new package called xtbalance2 is available on SSC.

    xtbalance2 creates an indicator variable to identify a balanced subsample from an unbalanced dataset. The program tries to maximise the numbers of observations with respect to either the time dimension of the number of cross-sections/groups.

    Example:
    Code:
    use http://www.stata-journal.com/software/sj12-1/st0246/manu_prod, clear
    xtbalance2 , generate(balanceN) optimisation(N)
    xtbalance2 lO lL lY, generate(balanceT) optimisation(T)
    More examples and a detailed description are available in the help file.

    How to install:
    xtbalance2 can be installed via SSC
    Code:
    ssc install xtbalance2
    or via my Github page which will frequently updated:
    Code:
    net from https://github.com/JanDitzen/xtbalance2
    xtbalance2 is a new package and errors might occur. I would be grateful for any reports about bugs or problems here, by mail to me or via my Github page (https://github.com/JanDitzen/xtbalance2).

    Thanks!

  • #2
    Dear @JanDitzen,I have tried this new command with a simple example.But I get error,
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(var1 var2)
    1 1
    1 2
    1 3
    1 4
    2 1
    2 2
    3 1
    3 2
    3 3
    3 4
    4 2
    4 3
    4 4
    4 5
    4 6
    5 3
    end
    
    xtset var1 var2
    
    . xtbalance2 ,gen(BT) o(T)
                  GenTouse():  3301  subscript invalid
              BalancePanel():     -  function returned error
                     <istmt>:     -  function returned error
    r(3301);
    Best regards.

    Raymond Zhang
    Stata 17.0,MP

    Comment


    • #3
      @JanDitzen Hi,JanDitzen, can you help me with the problem in #2?

      Best
      Raymond
      Best regards.

      Raymond Zhang
      Stata 17.0,MP

      Comment


      • #4
        Hi guys,

        I new in Stata and I worked around 2 weeks with these data. I have 3 independent files that in sum are 162 millions of observations. So I need to get a balance panel.

        I have same issue with package called xtbalance2. Please, could somebody help me?

        With the next code, you can get an idea of my data. And after 3 days run in Stata 17 to get a r(3301) with xtbalance2:

        .....


        . duplicates tag id_grupo ndate2, gen(isdup)

        Duplicates in terms of id_grupo ndate2

        . edit if isdup

        . export delimited using "C:\convergencia\salida1.csv", replace
        (file C:\convergencia\salida1.csv not found)
        file C:\convergencia\salida1.csv saved

        . br

        . drop if isdup==1
        (2,955,790 observations deleted)

        . xtset id_grupo ndate2
        repeated time values within panel
        r(451);

        . duplicates report id_grupo ndate2

        Duplicates in terms of id_grupo ndate2

        --------------------------------------
        Copies | Observations Surplus
        ----------+---------------------------
        1 | 22704716 0
        3 | 131499 87666
        4 | 15880 11910
        5 | 3220 2576
        6 | 1614 1345
        7 | 959 822
        8 | 496 434
        9 | 180 160
        10 | 100 90
        11 | 44 40
        12 | 36 33
        13 | 13 12
        --------------------------------------

        . duplicates tag id_grupo ndate2, gen(isdup2)

        Duplicates in terms of id_grupo ndate2

        . edit if isdup2

        . drop if isdup>1
        (154,041 observations deleted)

        . xtset id_grupo ndate2

        Panel variable: id_grupo (unbalanced)
        Time variable: ndate2, 2002m10 to 2021m6, but with gaps
        Delta: 1 month

        . summarize

        Variable | Obs Mean Std. dev. Min Max
        -------------+---------------------------------------------------------
        v1 | 22,704,716 2.03e+07 2.69e+07 5 9.41e+07
        v2 | 22,704,716 201340 489.4354 200210 202106
        v3 | 22,704,716 1.344135 .4843213 1 4
        v4 | 22,704,716 .0033951 .0581686 0 1
        v5 | 22,704,716 9.067859 5.30689 1 21
        -------------+---------------------------------------------------------
        v6 | 22,703,442 10626.01 3724.339 0 16305
        v7 | 22,704,716 589461.6 598870 0 8.50e+07
        v8 | 22,704,716 5.257182 2.387976 1 8
        v9 | 22,704,716 569884.1 435544.4 0 7912074
        v10 | 22,704,716 319493.1 245391.4 0 1.48e+07
        -------------+---------------------------------------------------------
        v11 | 22,704,716 .0000101 .0031828 0 1
        v12 | 22,704,716 .0259429 .1589651 0 1
        v13 | 22,704,716 .0458401 .2091383 0 1
        v14 | 22,704,716 5.26e+07 3.06e+07 4 1.00e+08
        ndate2 | 22,704,716 645.4778 58.60598 513 737
        -------------+---------------------------------------------------------
        id_grupo | 22,704,716 139876.6 92437.04 1 336712
        isdup | 22,704,716 0 0 0 0
        isdup2 | 22,704,716 0 0 0 0

        . br

        . save "C:\convergencia\datos_renta3porciento_sin_dup li1. dta"
        file C:\convergencia\datos_renta3porciento_sin_dupli1.d ta saved

        . keep if ndate2 >= 2007m1
        2007m1 invalid name
        r(198);

        . keep if ndate2 >= "2007m1"
        type mismatch
        r(109);

        . keep if v2 >= 200701
        (2,510,320 observations deleted)

        . xtbalance2 id_grupo ndate2 v7, generate(balanceT2) optimisation(T)
        GenTouse(): 3301 subscript invalid
        BalancePanel(): - function returned error
        <istmt>: - function returned error
        r(3301);

        . describe

        Contains data from C:\convergencia\datos_renta3porciento_sin_dupli1.d ta
        Observations: 20,194,396
        Variables: 19 27 Jul 2022 17:11
        ----------------------------------------------------------------------------------------------------------------------------------------
        Variable Storage Display Value
        name type format label Variable label
        ----------------------------------------------------------------------------------------------------------------------------------------
        v1 long %12.0g
        v2 long %12.0g
        v3 byte %8.0g
        v4 byte %8.0g
        v5 byte %8.0g
        v6 int %8.0g
        v7 long %12.0g
        v8 byte %8.0g
        v9 float %9.0g
        v10 float %9.0g
        v11 byte %8.0g
        v12 byte %8.0g
        v13 byte %8.0g
        v14 long %12.0g
        ndate2 float %tm
        id_grupo float %9.0g group(v1)
        isdup byte %12.0g
        isdup2 byte %12.0g
        balanceT2 double %10.0g
        ----------------------------------------------------------------------------------------------------------------------------------------
        Sorted by: id_grupo ndate2
        Note: Dataset has changed since last saved.


        . list in 1/6

        +---------------------------------------------------------------------------------------------------------------------------+
        1. | v1 | v2 | v3 | v4 | v5 | v6 | v7 | v8 | v9 | v10 | v11 | v12 | v13 | v14 | ndate2 | id_grupo |
        | 5 | 201102 | 1 | 0 | 1 | 7301 | 107920 | 7 | 133666.5 | 157971.8 | 0 | 0 | 0 | 24355712 | 2011m2 | 1 |
        |---------------------------------------------------------------------------------------------------------------------------|
        | isdup | isdup2 | balanc~2 |
        | 0 | 0 | . |
        +---------------------------------------------------------------------------------------------------------------------------+

        +---------------------------------------------------------------------------------------------------------------------------+
        2. | v1 | v2 | v3 | v4 | v5 | v6 | v7 | v8 | v9 | v10 | v11 | v12 | v13 | v14 | ndate2 | id_grupo |
        | 5 | 201111 | 2 | 0 | 1 | 7306 | 82360 | 3 | 269808.4 | 78350.64 | 0 | 0 | 0 | 22637047 | 2011m11 | 1 |
        |---------------------------------------------------------------------------------------------------------------------------|
        | isdup | isdup2 | balanc~2 |
        | 0 | 0 | . |
        +---------------------------------------------------------------------------------------------------------------------------+

        +---------------------------------------------------------------------------------------------------------------------------+
        3. | v1 | v2 | v3 | v4 | v5 | v6 | v7 | v8 | v9 | v10 | v11 | v12 | v13 | v14 | ndate2 | id_grupo |
        | 3050 | 200707 | 1 | 0 | 3 | 6101 | 187440 | 2 | 169250 | 30155.85 | 0 | 0 | 0 | 6278274 | 2007m7 | 2 |
        |---------------------------------------------------------------------------------------------------------------------------|
        | isdup | isdup2 | balanc~2 |
        | 0 | 0 | . |
        +---------------------------------------------------------------------------------------------------------------------------+


        Please, help. And also my apologize for my writing. Thank you.

        Xion L.

        Stata 17.0 mp
        Last edited by Xion Larqui; 30 Jul 2022, 21:20.

        Comment

        Working...
        X