Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with collapsing data

    Good day,

    I have the following data.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int CNTRY_NAME double Lev2 float(internal_reasons immi_reasons)    long    pernum    float(IN_work    IN_family    IN_educ    IN_marriage    IN_mwar_disaster    IN_housingr    IN_other    imm_work    imm_family)
    116 116001001 1 .  26  26   .  .  . . .  .  . .
    116 116001001 2 .  36   .  36  .  . . .  .  . .
    116 116001001 3 .   5   .   .  5  . . .  .  . .
    116 116001001 4 .  56   .   .  . 56 . .  .  . .
    116 116001001 7 .  16   .   .  .  . . . 16  . .
    116 116001001 . 7   4   .   .  .  . . .  .  . .
    116 116001002 1 .  11  11   .  .  . . .  .  . .
    116 116001002 2 .  12   .  12  .  . . .  .  . .
    116 116001002 3 .   2   .   .  2  . . .  .  . .
    116 116001002 4 .  13   .   .  . 13 . .  .  . .
    116 116001002 7 .  12   .   .  .  . . . 12  . .
    116 116001003 1 . 254 254   .  .  . . .  .  . .
    116 116001003 2 . 185   . 185  .  . . .  .  . .
    116 116001003 3 .  30   .   . 30  . . .  .  . .
    116 116001003 4 .  93   .   .  . 93 . .  .  . .
    116 116001003 5 .   1   .   .  .  . 1 .  .  . .
    116 116001003 7 .  41   .   .  .  . . . 41  . .
    116 116001003 . 1  13   .   .  .  . . .  . 13 .
    116 116001003 . 2   3   .   .  .  . . .  .  . 3
    116 116001003 . 7   1   .   .  .  . . .  .  . .
    end
    [/CODE]

    I am trying have all the data for variables IN_work IN_family IN_educ IN_marriage IN_mwar_disaster IN_housingr IN_other imm_work imm_family on one row for each unique Lev2 and not cascading as they appear now. Something like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int CNTRY_NAME double Lev2 float(internal_reasons immi_reasons)    long    pernum    float(IN_work    IN_family    IN_educ    IN_marriage    IN_mwar_disaster    IN_housingr    IN_other    imm_work    imm_family)
    116 116001001 1 .  26  26   36 5  56 . .  .  . .
    [/CODE]

    How would i do this correctly?

    Thank you in advance
    Sunganani Kalemba
    PhD Student.
    Queensland

  • #2
    How about

    Code:
    drop *reasons pernum 
    unab work : IN* imm* 
    collapse `work' , by(CNTRY_NAME Lev2)
    format Lev2 %10.0f 
    edit
    ?

    Comment


    • #3
      Thanks Nick for your prompt response. But it seems the collapse command returns the means for each grouping of IN* imm* and not the original values as required.
      Here is part of the transformed dataset:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int CNTRY_NAME double Lev2 float(IN_work IN_family IN_educ IN_marriage IN_mwar_disaster IN_housingr IN_other imm_work imm_family)
      116 116001001 4.3333335         6  .8333333  9.333333         0         0  2.666667         0         0
      116 116001002       2.2       2.4        .4       2.6         0         0       2.4         0         0
      116 116001003  28.22222 20.555555  3.333333 10.333333 .11111111         0 4.5555553 1.4444444  .3333333
      116 116001004       5.2       3.8        .4         1         0        .2       1.6        .2        .1
      116 116001005       5.1       6.4        .4        .6         0        .1       2.2        .1        .4
      116 116001006       6.8       6.4        .5         1        .1        .7        .9        .1         0
      116 116001007      23.2      28.2        .4       2.3        .1        .1       3.4        .1        .2
      116 116001099      96.1      61.5         2       1.9        .2        .4       9.4       3.8        .5
      116 116002001 4.4545455  6.363636  .6363636 1.1818181         0  .1818182 1.5454545  .0909091  .3636364
      116 116002003 19.384615 16.615385 13.153846  7.153846 .23076923 .23076923  9.615385         1 2.1538463
      116 116002004      13.2        20       1.1       3.4        .8       4.6       3.5        .2         0
      116 116002005       2.8       2.4        .4       3.4         0         0       4.6         0         0
      116 116002006  6.909091 12.636364 .27272728  4.090909  .0909091  .4545455  4.818182  .1818182  .0909091
      116 116002007 36.333332  52.77778         2 2.3333333         1  2.777778         8 .11111111 .22222222
      116 116002008       4.5     4.375      .125      3.25         0         0     4.375         0       .25
      116 116002009  61.88889  85.33334  .3333333  4.777778 .11111111  1.888889  3.222222 .44444445 .11111111
      116 116002010  33.77778  51.66667 .44444445 2.3333333 1.4444444 .11111111 4.5555553  .3333333 .22222222
      116 116002099      11.9      36.4        .4        .8        .4       2.8       2.7        .2        .7
      116 116003001  4.571429  3.857143 .14285715         3         0 .14285715 1.2857143         0         0
      116 116003002         7      3.25        .5     3.125      .125      .125      3.25         0         0
      end
      Sunganani Kalemba
      PhD Student.
      Queensland

      Comment


      • #4
        Correct that collapse by default produces means, but the suggestion in #2 was deliberate.

        What I suggested seemed to work fine for the example you showed, as the mean over one non-missing value is precisely that value.

        I have to conclude that your data example in #1 was misleading. Perhaps there is some dimension of variability you're not explaining, e.g. a time variable.

        Comment


        • #5
          You are absolutely right. I have replaced the missing values with zeros prior to running the collapse command. My apologies. As you can see, your code works brilliantly. Thanks again.

          ----------------------- copy starting from the next line -----------------------
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input int CNTRY_NAME double Lev2 float(IN_work IN_family IN_educ IN_marriage IN_mwar_disaster IN_housingr IN_other imm_work imm_family)
          116 116001001  26  36   5 56  .  .  16  .  .
          116 116001002  11  12   2 13  .  .  12  .  .
          116 116001003 254 185  30 93  1  .  41 13  3
          116 116001004  52  38   4 10  .  2  16  2  1
          116 116001005  51  64   4  6  .  1  22  1  4
          116 116001006  68  64   5 10  1  7   9  1  .
          116 116001007 232 282   4 23  1  1  34  1  2
          116 116001099 961 615  20 19  2  4  94 38  5
          116 116002001  49  70   7 13  .  2  17  1  4
          116 116002003 252 216 171 93  3  3 125 13 28
          116 116002004 132 200  11 34  8 46  35  2  .
          116 116002005  14  12   2 17  .  .  23  .  .
          116 116002006  76 139   3 45  1  5  53  2  1
          116 116002007 327 475  18 21  9 25  72  1  2
          116 116002008  36  35   1 26  .  .  35  .  2
          116 116002009 557 768   3 43  1 17  29  4  1
          116 116002010 304 465   4 21 13  1  41  3  2
          116 116002099 119 364   4  8  4 28  27  2  7
          116 116003001  32  27   1 21  .  1   9  .  .
          116 116003002  56  26   4 25  1  1  26  .  .
          Sunganani Kalemba
          PhD Student.
          Queensland

          Comment

          Working...
          X