Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a new variable from new information and an existing variable

    Hi, hope you are all ok!

    I am trying to create a variable named "conflicto", which has 6 categories (1, 2, 3, 4, 6, 7) note: there is no 5 category. I am creating "conflicto" from an existing variable called P756S2, and information I got outside of my dataset, to assign the categories.





    gen conflicto= 1 if P756S2==5001 | P756S2==5040 |/*
    > */ P756S2==5045 | P756S2==5361 |/*
    > */ P756S2==5649 | P756S2== 8001 | /*
    > */ P756S2==11001 | P756S2==13001 |/*
    > */ P756S2==13670 | P756S2==18001 | /*
    > */ P756S2==18592 | P756S2==18753 | /*
    > */ P756S2==19001 | P756S2==19256 | /*
    > */ P756S2==19532| P756S2==19698 | /*
    > */ P756S2==20011 | P756S2==23807 | /*
    > */ P756S2==41001 | P756S2==41020 | /*
    > */ P756S2==44001| P756S2==50001 | /*
    > */ P756S2==50590| P756S2==52079 | /*
    > */ P756S2==52612 | P756S2==52835 | /*
    > */ P756S2==54001| P756S2==54250 | /*
    > */ P756S2==54810 | P756S2==66001 | /*
    > */ P756S2==68001 | P756S2==68081 | /*
    > */ P756S2==70001 | P756S2==73001 | /*
    > */ P756S2==76001| P756S2==76109 | /*
    > */ P756S2==76275 | P756S2==76834 | /*
    > */ P756S2==81001 | P756S2==81065 | /*
    > */ P756S2==81736 | P756S2==81794
    (265,846 missing values generated)

    . replace conflicto= 2 if P756S2==52356 | P756S2==54670
    (266 real changes made)

    .
    . replace conflicto= 3 if P756S2==5197 |P756S2==5234 | /*
    > */ P756S2==5250 | P756S2==5313 | /*
    > */ P756S2==5579 | P756S2==5604 |/*
    > */ P756S2==5652 | P756S2==5660 | /*
    > */ P756S2==5756 | P756S2==5790 | /*
    > */ P756S2==5847 | P756S2==5854 | /*
    > */ P756S2==5887 | P756S2==5893 | /*
    > */ P756S2==13244 | P756S2==17541 | /*
    > */ P756S2==17662 | P756S2==18410 |/*
    > */ P756S2==19050 | P756S2==19142 |/*
    > */ P756S2==19212 | P756S2==19821 |/*
    > */ P756S2==20001 | P756S2==20013| /*
    > */ P756S2==23001 | P756S2==23466 | /*
    > */ P756S2==23580 | P756S2==27001 | /*
    > */ P756S2==47001 | P756S2==47189 |/*
    > */ P756S2==50251 | P756S2==50711 |/*
    > */ P756S2==52001 | P756S2==52540 |/*
    > */ P756S2==52678 | P756S2==54206|/*
    > */ P756S2==54498 | P756S2==54720 |/*
    > */ P756S2==54800 | P756S2==66594 |/*
    > */ P756S2==68655 | P756S2==70508 |/*
    > */ P756S2==73168 | P756S2==73555 |/*
    > */ P756S2==81300 | P756S2==86568 |/*
    > */ P756S2==86865 | P756S2==95001
    (8,229 real changes made)

    .
    . replace conflicto= 4 if P756S2==5002 |P756S2==5004 | /*
    > */ P756S2==5021 | P756S2==5030 | /*
    > */ P756S2==5031 | P756S2==5034 |/*
    > */ P756S2==5036 | P756S2==5038 | /*
    > */ P756S2==5042 | P756S2==5044 | /*
    > */ P756S2==5051 | P756S2==5055 | /*
    > */ P756S2==5079 | P756S2==5086 | /*
    > */ P756S2==5088 | P756S2==5091 | /*
    > */ P756S2==5093 | P756S2==5101 |/*

    ..........................................


    the category 4 of conflicto has too values to be assigned nd I get this error:

    too many numeric literals
    r(130);



    What do you suggest me to do?
    Thanks in advance
    Last edited by Silvana Builes; 17 Feb 2023, 09:18.

  • #2
    In principle what you are doing could work, although it appears from Stata's error message that you attempted to include too many expressions in one command.

    I'd prefer to use the built-in -recode- command instead (-help recode-), and I'd use /// to break long lines rather than /* and */, which is messy and error-prone:

    Code:
    recode P756S2 (5045 5361 5646 5045 5361 5649 = 2) ///
                  (52356 54670 = 2) ///
                  (5197 5234 5250 = 3), generate(conflicto)
    Another possibility is the -inlist()- function (-help inlist()-) rather than a long list of logical expressions

    Code:
    gen conflicto = 1 if
       inlist(P756S2, 5045, 5361, 5649, 8001,    ///
              11001 13001, 13670, 18001, 18592,  ///
              18753, 19001, 19256)
    replace conflicto = 2 if inlist .....
    In both approaches, I've only covered a small number of all the values you're working with, but the structure I've used can be extended as necessary.

    Comment


    • #3
      Answers on this will depend on taste as well as technique. I would make the translation a matter of merging with a dictionary file that carries the translation. So, that file would have two variables

      Code:
      P756S2  conflicto
      and observations like



      Code:
      5001 1
      5040 1
      ...
      5002 4
      5004 4 
      ....
      See https://www.stata.com/support/faqs/d...s-for-subsets/ for the big idea.

      I think there are these advantages to this device:

      1. It is relatively easy to check the file against an original.

      2. It is easy to modify the file if you revise your ideas or seek a different analysis.

      3. It is easy to explain to people looking at your work, especially if they don't use Stata much or at all. .

      4. Complicated code with many logical operators is harder to read, to write, to check and to maintain.


      .

      Comment


      • #4
        Dear Mike, I just tried your first option and it worked.
        Dear Nick, great suggestion as well.

        I was wondering if there was a quick way of doing this. It took me the whole afternoon to assign many of the observations to that variable. I got 104 lines for the whole assignment of the P756S2


        recode P756S2 (5001 5040 5045 5361 5649 8001 11001 13001 13670 =1) ///
        (18001 18592 18753 19001 19256 19532 19698 20011 =1) ///
        (18592 18753 19001 19256 19532 19698 20011 23807 41001 =1) ///
        .
        .
        .
        .
        .
        .
        (91407 91430 91460 91530 91540 91669 94663 94883 94886 94885 68549 94887 94888 97511 97666 97889 =7), generate(conflicto)

        Comment


        • #5
          The implication of your posts is that you can produce your categorical variables only by typing out long lists of identifiers. If so, whether you type them out as part of commands or let them be data is incidental to the fact that you need to type them out at least once (although important otherwise).

          The alternative would be defining the categories by rules or criteria and advice would depend on knowing what those are.

          Comment


          • #6
            Thank you Nick!!

            Comment


            • #7
              I'd agree with Nick's advice here, and also say that, in the context of a large number of values to be recoded/replaced, I'd prefer the approach he recommended above, i.e., merging with a dictionary file. In such cases, I like that better than -recode-. Also, in contexts in which I encounter the kind of problem Silvana describes, dictionary information might be available in a downloadable file. I've enountered something like that in (e.g.) coding identifiers for smaller geographical areas into larger ones (e.g., regions), where government agencies might already have prepared such a file. Using that in a -merge- might well be easier than creating a long -recode- statement.

              Comment

              Working...
              X