Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a variable summing the values of another variable

    Hi all STATA list members,

    I have the following problem (which could seem trivial for many of you).

    I have a variable expressed in terms of 1 and 0 values:

    E.g

    ID months_in_occupation months

    1 0111100000 4
    2 1000001111 5
    3 0000111100 4
    4 1111111100 8

    What I want to create a variable (months) which is the sum of (1) for each individual (ID). Any tips on how to do it?
    Thanks in advance,
    Rezart


  • #2
    Rezart:
    welcome to this forum.
    See -egen- and its -rowtotal- function, like in the following toy-example:
    Code:
    . set obs 3
    
    . g id=_n
    
    . g month_1=1 in 1/2
    
    . replace month_1=0 if month_1 ==.
    
    . g month_2=1 in 2/3
    
    . replace month_2=0 if month_2 ==.
    
    . egen flag=rowtotal(month_*)
    
    . list
    
         +-------------------------------+
         | id   month_1   month_2   flag |
         |-------------------------------|
      1. |  1         1         0      1 |
      2. |  2         1         1      2 |
      3. |  3         0         1      1 |
         +-------------------------------+
    Last edited by Carlo Lazzaro; 01 Aug 2018, 11:35.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Try this code:

      foreach n of numlist 1/10{
      gen month_`n'=substr(months_in_occupation,`n',1)
      destring month_`n', replace
      }
      egen months=rowtotal(month_1-month_10)

      Comment


      • #4
        Rezart, please follow the instructions in the FAQ 12.2 about posting data (https://www.statalist.org/forums/help#stata).

        I have to guess, but it seems like you have a string variable. In that case, I would suggest:

        Code:
        *clean the string to remove any unintentional space before or after the string
        gen clean=strtrim(months_in_occupation)
        *remove all the zeros
        gen no_zeros=subinstr(clean, "0", "", .)
        *count the length of the remaining string
        gen num_ones=length(no_zeros)
        
        **or, in one command**
        gen months=length(subinstr(strtrim(months_in_occupation), "0", "", .))
        Stata/MP 14.1 (64-bit x86-64)
        Revision 19 May 2016
        Win 8.1

        Comment


        • #5
          As a footnote to Carole's nice solution, see also https://www.stata-journal.com/sjpdf....iclenum=dm0056

          Comment


          • #6
            Thank you all. I did not solve the problem as proposed by Carole since the variable "months_in_occupation" is not a string variable, but a byte. Do you know how to do it for a byte variable?

            Comment


            • #7
              Rezart:
              you can convert it into a -string- and apply Carol's solution:
              Code:
              tostring months_in_occupation, replace
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Originally posted by Carlo Lazzaro View Post
                Rezart:
                you can convert it into a -string- and apply Carol's solution:
                Code:
                tostring months_in_occupation, replace
                Thanks Carlo, I tried to do it before but it is not possible without using the "force" option. Using force I lose some information on that variable.

                Comment


                • #9
                  If the values of months_in_occupation are as you say they are in post #1, it's not a byte, and almost surely is a string.
                  The loss of precision with tostring-ing a variable such as your months_in_occupation also shoudlnt matter for the purpose here.

                  That said, please help clear up the confusion by showing an example of your dataset with dataex. That gives everyone all the info they need.
                  do:
                  Code:
                  ssc install dataex
                  dataex in 1/20
                  No need to install dataex if you are n Stata 15
                  And read more about the why and ow of dataex in the FAQ: https://www.statalist.org/forums/help#stata

                  Edit: the variable months_in_occupation could theoretically be a byte, with variable labels as you show them, but that would be a quite unconventional way of storing that info.
                  Last edited by Jorrit Gosens; 02 Aug 2018, 05:00.

                  Comment


                  • #10
                    Originally posted by Jorrit Gosens View Post
                    If the values of months_in_occupation are as you say they are in post #1, it's not a byte, and almost surely is a string.
                    The loss of precision with tostring-ing a variable such as your months_in_occupation also shoudlnt matter for the purpose here.

                    That said, please help clear up the confusion by showing an example of your dataset with dataex. That gives everyone all the info they need.
                    do:
                    Code:
                    ssc install dataex
                    dataex in 1/20
                    No need to install dataex if you are n Stata 15
                    And read more about the why and ow of dataex in the FAQ: https://www.statalist.org/forums/help#stata

                    Edit: the variable months_in_occupation could theoretically be a byte, with variable labels as you show them, but that would be a quite unconventional way of storing that info.
                    Thank you for your patience.

                    The variable was set authomatically as byte since I am importing the data from a .csv format. When I transform the variable (months_in_occupation) expresed as byte in string format, it has 1 digit/number less than the byte format.
                    I suppose this happens because the extension of the string is set " str11" while the max extension of my byte variable is 12 digits (each month of the year). I think that one solution is the command:

                    tostring months_in_occupation, generate(months_in_occupation_str) force
                    + a command saying to STATA to set the extension of "months_in_occupation_str" to str12.

                    Thanks for helping

                    Comment


                    • #11
                      This still seems very confused. Unfortunately my guess is that Carlo's advice in #7 won't help here.

                      1. If the variable looked like a string to Stata -- it would have been imported as a string. If it looked numeric -- then with values like 1111111100 it would have not have been imported as byte, as a decimal integer that big can't fit into a byte. Stata doesn't have a variable type that is binary or binary string.

                      2. I don't think there is any implication: import from .csv => read in as byte. Or, at least, I don't understand your argument or inference there.

                      3. "transform" could mean numerous different things. It is not self-explanatory unless you explain precisely what was done.

                      4. tostring cannot restore detail lost on input or generation. If the variable is a byte, as you are telling us, then the most you could get out of tostring is string values that could not be longer than "100".

                      My wild guess is that somehow you encoded a string as numeric. If so then the information you want is within the value labels

                      The quickest ways to a solution here are that we get information on some or all of the following.

                      * You show us enough of the .csv that we can understand the original form of the data.

                      * You show us the exact code that led to the variable concerned,

                      * You've been asked (twice now!) to give a data example (and are quoting that back to us) but the request still stands. This should be easiest to satisfy and is where you should start.

                      The FAQ Advice has been honed and tuned over 20 years or so as a distillation of advice on what works (and doesn't work) well here in getting good, quick, helpful, correct answers. A major reason you haven't solved your problem yet is that you're ignoring the most important advice it contains.

                      Not the question at all, just terminology: in Stata [sic] string and numeric are groups of data or storage types, not (display) formats. Naturally people talk about file formats outside Stata. In cases of confusion, the only evidence we trust on how a variable is stored is examples and the information given by describe.

                      Comment


                      • #12
                        Solved by using Carole's suggestion and adding.

                        Code:

                        tostring months_occupation, generate(months_occupation_str) force format(%13.0g)

                        I was loosing one digit while transforming from byte to string. Using "format(%13.0g)" I obtained a str12

                        Thank you to all for the help and patince,

                        Rezart
                        Last edited by Rezart Hoxhaj; 02 Aug 2018, 11:25.

                        Comment


                        • #13
                          Good to hear this has been solved, but:
                          It wasnt a byte variable. Bytes cant be more than 3 characters long.
                          If it was a byte with a label, tostring wouldn't get you anywhere. You'd need decode.
                          Stata would not reduce the format length of the string when that would mean characters get dropped. These three:
                          Code:
                          tostring months_occupation, generate(months_occupation_str) force format(%13.0g)
                          tostring months_occupation, generate(months_occupation_str) force format(%12.0g)
                          tostring months_occupation, generate(months_occupation_str) force
                          would give the exact same result, a string with format %12.0g if the original variable was 12 characters long.
                          Stata would not take a 12 character value and decide tostring it as an 11 character string.


                          Again, for future posts, always give a data example with
                          dataex, and post exact code that you used.
                          Both your description of your data and the things that you claim Stata did have been confusing.
                          With a good data example this would have been solved with a single reply only.

                          Comment


                          • #14
                            Thank you all,

                            I'll keep in mind your suggestions for next time.

                            Comment

                            Working...
                            X