Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sorry for the question from a newbie

    Hello everyone, I am starting to make my first steps on stata. I do several different tests but I encountered a difficulty. Since I have little knowledge of the subject, I would like to ask you the question.

    I will save myself in advance if this post has already been posted to and thank you for sending it to me to redirect me.

    I have more than 20 str2 variables (car, pets, g_married,
    phy_act, ...) . Each variable reviews the answer: Yes, No, Na.

    So I used the "replace" command but the problem is that I have to type 3 lines of code to change the set of values by 0 , 1 or . My question is whether there is a way of being able to ask stata to modify directly all the 20 variables at once or without writing each time:

    replace car="0" if car =="No"
    replace car ="1" if car =="Yes"
    replace car ="." if car =="NA"

    Thank you in advance to those who will give me an answer.

  • #2
    No need to apologise for being a learner. But a data example would help and please use a full real name, unless your family name really is ZZ. See #12 and #6 at https://www.statalist.org/forums/help

    How can "Yes" fit into a str2 variable? Let's guess that your real data are in Italian. Also, string values "0" "1" are even less useful than what you have already. For almost all statistical purposes you need rather (0, 1) indicator variables. Don't try to overwrite your original data.


    Code:
    clear
    set obs 10
    set seed 31459265
    foreach v in car pets g_married {
    gen `v' = cond(runiform() < 0.6, "Si", cond(runiform() < 0.9, "No", "NA"))
    }
    
    list
    
    * You start here
    label def whatever 1 "Si" 0 "No" .a "NA"
    
    foreach v in car pets g_married {
    encode `v', gen(I_`v') label(whatever)
    label var I_`v' "`v'"
    }
    Results:

    Code:
    . list I_*
    
         +---------------------------+
         | I_car   I_pets   I_g_ma~d |
         |---------------------------|
      1. |    No       No         NA |
      2. |    Si       No         Si |
      3. |    No       NA         No |
      4. |    No       No         Si |
      5. |    Si       Si         No |
         |---------------------------|
      6. |    Si       NA         No |
      7. |    No       NA         Si |
      8. |    Si       No         Si |
      9. |    Si       No         Si |
     10. |    Si       Si         No |
         +---------------------------+
    
    . list I_*, nola
    
         +---------------------------+
         | I_car   I_pets   I_g_ma~d |
         |---------------------------|
      1. |     0        0         .a |
      2. |     1        0          1 |
      3. |     0       .a          0 |
      4. |     0        0          1 |
      5. |     1        1          0 |
         |---------------------------|
      6. |     1       .a          0 |
      7. |     0       .a          1 |
      8. |     1        0          1 |
      9. |     1        0          1 |
     10. |     1        1          0 |
         +---------------------------+
    I've not used an accent above, but you know exactly what the real data look like.

    Comment


    • #3
      Thank you for your help. But that didn’t work.

      So I come back to you with the type and name of each variable.

      Currently I use the command: replace var ="1" if var =="Yes" and it works but it’s quite tedious as a method.
      so I would like to be able to do this with the variables faster:

      - ct_chgt str3 %9s
      - ct_med str3 %9s
      - ct_norm str3 %9s
      From the moment before I could create a loop:
      foreach var of varlist ct_chgt ct_med ct_norm {
      replace `var' = "0" if `var' == "Yes"
      replace `var' = "1" if `var' == "No"
      replace `var' = "." if `var' == "Na"
      }

      but I have a mismatch type error r(109). I don’t understand normamelement that should work as they are str variables. Or I didn’t understand the foreach or replace function
      Thank you in advance for your help.
      Last edited by Alex Klein; 22 Mar 2023, 08:38.

      Comment


      • #4
        Please give us an data example using
        Code:
        dataex car pets g_married phy_act

        Comment


        • #5
          Chen Samulsion is right. You’re wasting your own time and ours too if you post incomplete or inaccurate details.

          Comment


          • #6
            Trying again. Your story has changed, but on the understanding that you have data like this

            Code:
            clear 
            set obs 10 
            foreach v in chgt med norm {
                gen ct_`v' = cond(runiform() < 0.7, "Yes", cond(runiform() < 0.9, "No", "Na"))
            }
            
            list 
            
                 +----------------------------+
                 | ct_chgt   ct_med   ct_norm |
                 |----------------------------|
              1. |     Yes       Na       Yes |
              2. |      No       No       Yes |
              3. |     Yes      Yes        No |
              4. |     Yes       No       Yes |
              5. |      No      Yes       Yes |
                 |----------------------------|
              6. |     Yes      Yes        No |
              7. |     Yes      Yes       Yes |
              8. |     Yes      Yes       Yes |
              9. |      Na      Yes       Yes |
             10. |     Yes      Yes       Yes |
                 +----------------------------+
            then this works fine:

            Code:
             
            foreach var of varlist ct_chgt ct_med ct_norm {
            replace `var' = "0" if `var' == "Yes"
            replace `var' = "1" if `var' == "No"
            replace `var' = "." if `var' == "Na"
            }
            
                 +----------------------------+
                 | ct_chgt   ct_med   ct_norm |
                 |----------------------------|
              1. |       0        .         0 |
              2. |       1        1         0 |
              3. |       0        0         1 |
              4. |       0        1         0 |
              5. |       1        0         0 |
                 |----------------------------|
              6. |       0        0         1 |
              7. |       0        0         0 |
              8. |       0        0         0 |
              9. |       .        0         0 |
             10. |       0        0         0 |
                 +----------------------------+
            I would not do that, but what you show is legal. As already pointed out, numeric (0, 1) indicator variables are much more useful.


            Code:
            foreach v in chgt med norm {
                
                gen `v' = 1 if ct_`v' == "No"
                replace `v' = 0 if ct_`v' == "Yes"
                replace `v' = . if ct_`v' == "Na"
            
            }

            Comment


            • #7
              Hello everyone! You are right and I apologize for my attitude; I have understood my mistake and do not wish to waste your time anymore. Thank you for your interventions as well. Actually in my excel document some variable contains text and I wanted to replace them with numerical values. I agree with you that it’s probably better to work with numerical values. Finally, thanks to your code and several attempts I was able to solve the problem of my variables with the following command:

              Foreach var of varlist chgt med norm {

              replace `var' = "0" if `var' == "No"
              replace `var' = "1" if `var' == "yes"
              replace `var' = "" if `var' == "NA"
              }

              My new difficulty is found in a vairbale:
              date type: str10 format: %10s
              This variable as name indicates contains dates in day/month/year format.

              My goal would be to be able to create a "semi_annual" variable where the date values will be in numerical format like 1, 2 , 3, 4 .
              So I think there are ways to present different data and in a better way despite all this allows to learn a better understanding of the use of stata functions and tools.

              I don’t arrive empty-handed because I produced a code:

              foreach date of date {
              gen semi_annual = date(date, "DMY")
              replace date = 1 if semi_annual >= "01/01/2020" & semi_annual <= "31/06/2020"
              replace date = 2 if semi_annual >= "01/07/2020" & semi_annual <= "31/12/2020"
              }

              Ah and I want to avoid that the variable date contains specific dates for example between the dates of 01/03/2020 - 31/06/2020, there is no date 05/02/2020, 1/05/2020, 14/07/2020.

              But I get an error message: invalid syntax r(198);

              Thank you forwardbce for your help
              Last edited by Alex Klein; 23 Mar 2023, 07:56.

              Comment


              • #8
                Thanks for these comments, although we are left with no explanation for the problem you posted in #3 and there is still no data example.

                Nevertheless #7 can be answered, if only partially.

                Code:
                foreach date of date {
                gen semi_annual = date(date, "DMY")
                replace date = 1 if semi_annual >= "01/01/2020" & semi_annual <= "31/06/2020"
                replace date = 2 if semi_annual >= "01/07/2020" & semi_annual <= "31/12/2020"
                }
                contains three errors that I can see.

                1. The syntax

                Code:
                foreach date of date
                matches none of the possibilities for foreach. The loop is not needed anyway as you appear to be looping over one variable. Possibly you are bringing to Stata from experience in some other software an idea that a loop is needed here because it is a loop over observations. But that's Stata's default mode, to work on all observations in a variable any way.

                2. So, let's focus on

                Code:
                gen semi_annual = date(date, "DMY")
                replace date = 1 if semi_annual >= "01/01/2020" & semi_annual <= "31/06/2020"
                replace date = 2 if semi_annual >= "01/07/2020" & semi_annual <= "31/12/2020"
                The first statement looks fine. The second statement is legal but not at all what you want. This error is more subtle. Here the inequalities are based on alphanumeric sorting. Stata has no sense whatsoever that the strings are dates and should be interpreted as such. So, for example, the date string "01/07/2020" counts as being "less than" "31/06/2020" in the only sense that matters that if you sorted this date variable all values starting "01" would sort before all values starting "02" and so on up to all values "31".

                I think what you seek is just

                Code:
                gen semi_annual = 1 + (daily(date, "DMY") >= mdy(7, 1, 2020)) 
                although a variable

                Code:
                gen ddate = daily(date, "DMY") 
                would likely to be useful or even needed sooner or later.

                3. The third error is to refer to a date "31/06/2020". There was no such date and this error won't bite you as such because the code is wrong any way for the reason just explained. But a very subtle error would be if you referred to
                Code:
                daily("31/06/2020", "DMY")
                which would be returned as missing and that could lead to hard-to-spot or hard-to-trace errors.

                Note: I recommend the use of daily() not date() because the name date() leads all too many Stata users to imagine that it is a general date function, whereas it only yields daily dates. It's the same code underneath.

                EDIT Stata also supports half-yearly dates, so a variable

                Code:
                gen hdate = hofd(daily(date, "DMY"))
                could also be useful.
                Last edited by Nick Cox; 23 Mar 2023, 08:36.

                Comment

                Working...
                X