Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace the value of a variable with a new value

    Hi everyone!

    I have a variable named "n_sector" which receives value from 35 to 98. Now I want to classify the values of this variable into 10 groups (which means I want to replace the value of this variable so that there are only 10 values: SA, SB..... SL), according to this criterion:
    value range new value
    35-50 (from 35 to 50) SA
    51-55 SB
    56 SC
    57-58 SD
    59-61 SE
    62-69 SF
    70-76 SG
    77-86 SH
    87-93 SK
    94-98 SL
    I want to repeat this command for 64 provinces (64 province codes). Could you please help me with this? I guess that we need the loop here but I do not know how. I am a starter of SAtata. Thank you so much!

    Here is a sample of my data set for province 01

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str2 province_code int n_sector
    "01" 53
    "01" 78
    "01" 49
    "01" 94
    "01" 76
    "01" 76
    "01" 69
    "01" 76
    "01" 44
    "01" 56
    "01" 49
    "01" 92
    "01" 42
    "01" 77
    "01" 76
    "01" 54
    "01" 94
    "01" 48
    "01" 56
    "01" 53
    "01" 92
    "01" 76
    "01" 53
    "01" 81
    "01" 94
    "01" 49
    "01" 53
    "01" 66
    "01" 76
    "01" 90
    "01" 94
    "01" 76
    "01" 71
    "01" 76
    "01" 94
    "01" 54
    "01" 76
    "01" 53
    "01" 91
    "01" 81
    "01" 76
    "01" 53
    "01" 76
    "01" 76
    "01" 53
    "01" 94
    "01" 87
    "01" 76
    "01" 94
    "01" 97
    "01" 57
    "01" 73
    "01" 98
    "01" 98
    "01" 94
    "01" 96
    "01" 76
    "01" 94
    "01" 76
    "01" 53
    "01" 63
    "01" 94
    "01" 76
    "01" 76
    "01" 82
    "01" 74
    "01" 69
    "01" 94
    "01" 86
    "01" 53
    "01" 94
    "01" 69
    "01" 94
    "01" 76
    "01" 98
    "01" 66
    "01" 76
    "01" 53
    "01" 76
    "01" 71
    "01" 51
    "01" 93
    "01" 66
    "01" 73
    "01" 77
    "01" 69
    "01" 92
    "01" 92
    "01" 67
    "01" 56
    "01" 92
    "01" 66
    "01" 92
    "01" 80
    "01" 92
    "01" 90
    "01" 69
    "01" 76
    "01" 66
    "01" 92
    end

  • #2
    I recommend storing your "new values" (what I call codes, below) in a separate dataset for reference, and then merging the codes dataset with your main dataset. Something along the lines of the output shown below.

    .ÿ
    .ÿversionÿ17.0

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿ//ÿCreateÿaÿseparateÿ"crosswalk"ÿdatasetÿforÿreferenceÿandÿauditing
    .ÿinputÿstr5ÿrgeÿstr2ÿcod

    ÿÿÿÿÿÿÿÿÿÿÿrgeÿÿÿÿÿÿÿÿcod
    ÿÿ1.ÿ35-50ÿÿÿSA
    ÿÿ2.ÿ51-55ÿÿÿSB
    ÿÿ3.ÿ56ÿÿÿÿÿÿÿÿÿÿSC
    ÿÿ4.ÿ57-58ÿÿÿSD
    ÿÿ5.ÿ59-61ÿÿÿSE
    ÿÿ6.ÿ62-69ÿÿÿSF
    ÿÿ7.ÿ70-76ÿÿÿSG
    ÿÿ8.ÿ77-86ÿÿÿSH
    ÿÿ9.ÿ87-93ÿÿÿSK
    ÿ10.ÿ94-98ÿÿÿSL
    ÿ11.ÿend

    .ÿ
    .ÿquietlyÿsplitÿrge,ÿgenerate(rge)ÿparse(-)ÿdestring

    .ÿquietlyÿreplaceÿrge2ÿ=ÿrge1ÿifÿmissing(rge2)

    .ÿgenerateÿbyteÿepdÿ=ÿrge2ÿ-ÿrge1ÿ+ÿ1

    .ÿquietlyÿexpandÿepd

    .ÿquietlyÿbysortÿcod:ÿreplaceÿrge1ÿ=ÿrge1[_n-1]ÿ+ÿ1ÿifÿ_nÿ>ÿ1

    .ÿbyÿcod:ÿassertÿrge1[_N]ÿ==ÿrge2

    .ÿrenameÿrge1ÿn_sector

    .ÿ
    .ÿ//ÿForÿillustrationÿofÿwhatÿtheÿcodeÿaboveÿdoes:
    .ÿlistÿrgeÿcodÿn_sectorÿifÿinlist(cod,ÿ"SC",ÿ"SD"),ÿnoobsÿsepby(cod)

    ÿÿ+------------------------+
    ÿÿ|ÿÿÿrgeÿÿÿcodÿÿÿn_sectorÿ|
    ÿÿ|------------------------|
    ÿÿ|ÿÿÿÿ56ÿÿÿÿSCÿÿÿÿÿÿÿÿÿ56ÿ|
    ÿÿ|------------------------|
    ÿÿ|ÿ57-58ÿÿÿÿSDÿÿÿÿÿÿÿÿÿ57ÿ|
    ÿÿ|ÿ57-58ÿÿÿÿSDÿÿÿÿÿÿÿÿÿ58ÿ|
    ÿÿ+------------------------+

    .ÿ
    .ÿkeepÿcodÿn_sector

    .ÿ
    .ÿtempfileÿCodes

    .ÿquietlyÿsaveÿ`Codes'

    .ÿ
    .ÿ//ÿNowÿ-merge-ÿwithÿmainÿdataset
    .ÿdropÿ_all

    .ÿquietlyÿinputÿstr2ÿprovince_codeÿbyteÿn_sector

    .ÿ
    .ÿmergeÿm:1ÿn_sectorÿusingÿ`Codes',ÿassert(matchÿusing)ÿnogenerateÿnoreport

    .ÿ
    .ÿtabulateÿcod,ÿmissing

    ÿÿÿÿÿÿÿÿcodÿ|ÿÿÿÿÿÿFreq.ÿÿÿÿÿPercentÿÿÿÿÿÿÿÿCum.
    ------------+-----------------------------------
    ÿÿÿÿÿÿÿÿÿSAÿ|ÿÿÿÿÿÿÿÿÿ18ÿÿÿÿÿÿÿ13.64ÿÿÿÿÿÿÿ13.64
    ÿÿÿÿÿÿÿÿÿSBÿ|ÿÿÿÿÿÿÿÿÿ15ÿÿÿÿÿÿÿ11.36ÿÿÿÿÿÿÿ25.00
    ÿÿÿÿÿÿÿÿÿSCÿ|ÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿ2.27ÿÿÿÿÿÿÿ27.27
    ÿÿÿÿÿÿÿÿÿSDÿ|ÿÿÿÿÿÿÿÿÿÿ2ÿÿÿÿÿÿÿÿ1.52ÿÿÿÿÿÿÿ28.79
    ÿÿÿÿÿÿÿÿÿSEÿ|ÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿ2.27ÿÿÿÿÿÿÿ31.06
    ÿÿÿÿÿÿÿÿÿSFÿ|ÿÿÿÿÿÿÿÿÿ16ÿÿÿÿÿÿÿ12.12ÿÿÿÿÿÿÿ43.18
    ÿÿÿÿÿÿÿÿÿSGÿ|ÿÿÿÿÿÿÿÿÿ29ÿÿÿÿÿÿÿ21.97ÿÿÿÿÿÿÿ65.15
    ÿÿÿÿÿÿÿÿÿSHÿ|ÿÿÿÿÿÿÿÿÿ12ÿÿÿÿÿÿÿÿ9.09ÿÿÿÿÿÿÿ74.24
    ÿÿÿÿÿÿÿÿÿSKÿ|ÿÿÿÿÿÿÿÿÿ15ÿÿÿÿÿÿÿ11.36ÿÿÿÿÿÿÿ85.61
    ÿÿÿÿÿÿÿÿÿSLÿ|ÿÿÿÿÿÿÿÿÿ19ÿÿÿÿÿÿÿ14.39ÿÿÿÿÿÿ100.00
    ------------+-----------------------------------
    ÿÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿÿ132ÿÿÿÿÿÿ100.00

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .


    I also recommend not overwriting your n_sector variable with the alphabetical codes, but rather keeping them in separate variables, like above.

    Comment


    • #3
      Here is an alternative way:

      Code:
      #delimit ;
          recode n_sector
              (35/50 = 1)
              (51/55 = 2)
              (56 =    3)
              (57/58 = 4)
              (59/61 = 5)
              (62/69 = 6)
              (70/76 = 7)
              (77/86 = 8)
              (87/93 = 9)
              (94/98 = 10)
              , gen(wanted)
          ;
          
          label define SECTORS
              1  "SA"
              2  "SB"
              3  "SC"
              4  "SD"
              5  "SE"
              6  "SF"
              7  "SG"
              8  "SH"
              9  "SK"
              10 "SL"
              ;
      #delimit cr
      
      label values wanted SECTORS
      I wasn't sure whether you wanted to create a string variable or numeric. The method above creates a numeric variable; you can always use decode to make it a string, if needed.

      With the community-contributed command groups (from SSC), you can see the group construction:

      Code:
      . groups wanted n_sector, show(freq) sepby(wanted)
      
        +---------------------------+
        | wanted   n_sector   Freq. |
        |---------------------------|
        |     SA         42       1 |
        |     SA         44       1 |
        |     SA         48       1 |
        |     SA         49       3 |
        |---------------------------|
        |     SB         51       1 |
        |     SB         53      10 |
        |     SB         54       2 |
        |---------------------------|
        |     SC         56       3 |
        |---------------------------|
        |     SD         57       1 |
        |---------------------------|
        |     SF         63       1 |
        |     SF         66       5 |
        |     SF         67       1 |
        |     SF         69       5 |
        |---------------------------|
        |     SG         71       2 |
        |     SG         73       2 |
        |     SG         74       1 |
        |     SG         76      21 |
        |---------------------------|
        |     SH         77       2 |
        |     SH         78       1 |
        |     SH         80       1 |
        |     SH         81       2 |
        |     SH         82       1 |
        |     SH         86       1 |
        |---------------------------|
        |     SK         87       1 |
        |     SK         90       2 |
        |     SK         91       1 |
        |     SK         92       8 |
        |     SK         93       1 |
        |---------------------------|
        |     SL         94      13 |
        |     SL         96       1 |
        |     SL         97       1 |
        |     SL         98       3 |
        +---------------------------+
      Last edited by Hemanshu Kumar; 23 Nov 2022, 22:08.

      Comment


      • #4
        Not only isHemanshu's approach cleverer, but I forgot to add the keep() option to my merge line. It should've been
        Code:
        merge m:1 n_sector using `Codes', assert(match using) keep(match) nogenerate noreport
        .ÿ
        .ÿlocalÿNÿ=ÿ_N

        .ÿmergeÿm:1ÿn_sectorÿusingÿ`Codes',ÿassert(matchÿusing)ÿkeep(match)ÿ///
        >ÿÿÿÿÿÿÿÿÿnogenerateÿnoreport

        .ÿassertÿ_Nÿ==ÿ`N'

        .ÿtabulateÿcod,ÿmissing

        ÿÿÿÿÿÿÿÿcodÿ|ÿÿÿÿÿÿFreq.ÿÿÿÿÿPercentÿÿÿÿÿÿÿÿCum.
        ------------+-----------------------------------
        ÿÿÿÿÿÿÿÿÿSAÿ|ÿÿÿÿÿÿÿÿÿÿ6ÿÿÿÿÿÿÿÿ6.00ÿÿÿÿÿÿÿÿ6.00
        ÿÿÿÿÿÿÿÿÿSBÿ|ÿÿÿÿÿÿÿÿÿ13ÿÿÿÿÿÿÿ13.00ÿÿÿÿÿÿÿ19.00
        ÿÿÿÿÿÿÿÿÿSCÿ|ÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿ3.00ÿÿÿÿÿÿÿ22.00
        ÿÿÿÿÿÿÿÿÿSDÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿ1.00ÿÿÿÿÿÿÿ23.00
        ÿÿÿÿÿÿÿÿÿSFÿ|ÿÿÿÿÿÿÿÿÿ12ÿÿÿÿÿÿÿ12.00ÿÿÿÿÿÿÿ35.00
        ÿÿÿÿÿÿÿÿÿSGÿ|ÿÿÿÿÿÿÿÿÿ26ÿÿÿÿÿÿÿ26.00ÿÿÿÿÿÿÿ61.00
        ÿÿÿÿÿÿÿÿÿSHÿ|ÿÿÿÿÿÿÿÿÿÿ8ÿÿÿÿÿÿÿÿ8.00ÿÿÿÿÿÿÿ69.00
        ÿÿÿÿÿÿÿÿÿSKÿ|ÿÿÿÿÿÿÿÿÿ13ÿÿÿÿÿÿÿ13.00ÿÿÿÿÿÿÿ82.00
        ÿÿÿÿÿÿÿÿÿSLÿ|ÿÿÿÿÿÿÿÿÿ18ÿÿÿÿÿÿÿ18.00ÿÿÿÿÿÿ100.00
        ------------+-----------------------------------
        ÿÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿÿ100ÿÿÿÿÿÿ100.00

        .ÿ
        .ÿexit

        endÿofÿdo-file


        .

        Comment


        • #5
          Thanks for the mention of groups in #3.

          There is a copy on SSC, but the source to be cited, please, is the Stata Journal. I used a very common English word as command name, which I don't feel especially embarrassed about, but a side-effect is that the references are a little elusive.

          .
          Code:
            search st0496, entry
          
          Search of official help files, FAQs, Examples, and Stata Journals
          
          SJ-18-1 st0496_1  . . . . . . . . . . . . . . . . . Software update for groups
                  (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                  Q1/18   SJ 18(1):291
                  groups exited with an error message if weights were specified;
                  this has been corrected
          
          SJ-17-3 st0496  . . . . .  Speaking Stata: Tables as lists: The groups command
                  (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                  Q3/17   SJ 17(3):760--773
                  presents command for listing group frequencies and percents and
                  cumulations thereof; for various subsetting and ordering by
                  frequencies, percents, and so on; for reordering of columns;
                  and for saving tabulated data to new datasets
          In fact there was an earlier publication, but citing this is entirely optional.


          Code:
          . search makematrix, sj
          
          Search of official help files, FAQs, Examples, and Stata Journals
          
          SJ-3-4  pr0011  . . . . . . . .  Speaking Stata: Problems with tables, Part II
                  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
                  Q4/03   SJ 3(4):420--439                                 (no commands)
                  reviews three user-written commands (tabcount, makematrix,
                  and groups) as different approaches to tabulation problems
          makematrix had some fair ideas, but holds no appeal now given Mata.

          Comment


          • #6
            Thank you so much for your reply Joseph Coveney Hemanshu Kumar .It really helps me. Thank you for your clarification Nick Cox !

            Comment

            Working...
            X