Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help to add observations for each category.

    Hi Everyone,

    I am trying to find the total of the variables in stata but keep getting error message "that one of the variables is a string variable". Below is the dataset, which shows the programs in rows and the variables local, state and local in columns. There are over 200 observations, which are grouped like the way i have presented it here, but just selected two for this post.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str26 program str2 ParticipantsPublic str3 local str1 state str2 national
    ""                           ""   ""    ""  ""  
    ""                           ""   ""    ""  ""  
    ""                           ""   ""    ""  ""  
    "Coverage"                   ""   ""    ""  ""  
    "Screening"                  ""   ""    ""  ""  
    "Lead"                       "0"  "2"   "0" "0" 
    "Breastfeeding"              ""   ""    ""  ""  
    "Family"                     ""   ""    ""  ""  
    " Home"                      ""   ""    ""  ""  
    "Transition"                 ""   ""    ""  ""  
    "Abroad"                     "2"  "12"  "3" "23"
    "Evaluation"                 "0"  "3"   "0" "0" 
    "Developmental Screening"    ""   ""    ""  ""  
    "Genetics"                   "0"  "0"   "0" "0" 
    "Health Equity"              "7"  "6"   "5" "11"
    "Prevention"                 ""   ""    ""  ""  
    "Screening"                  ""   ""    ""  ""  
    "Newborn Screening"          ""   ""    ""  ""  
    "Nutrition"                  "0"  "0"   "0" "0" 
    "Health"                     ""   ""    ""  ""  
    "Perinatal/ Postpartum Care" ""   ""    ""  ""  
    "Prenatal Care"              ""   ""    ""  ""  
    "Child"                      ""   ""    ""  ""  
    "Sleep"                      ""   ""    ""  ""  
    "eCigarette Use"             ""   ""    ""  ""  
    "Visit"                      ""   ""    ""  ""  
    ""                           ""   ""    ""  ""  
    ""                           ""   ""    ""  ""  
    ""                           ""   ""    ""  ""  
    ""                           ""   ""    ""  ""  
    "Coverage"                   ""   ""    ""  ""  
    "Screening"                  ""   ""    ""  ""  
    "Lead"                       "0"  "20"  "0" "1" 
    "Breastfeeding"              ""   ""    ""  ""  
    "Family"                     ""   ""    ""  ""  
    " Home"                      ""   ""    ""  ""  
    "Transition"                 "0"  "5"   "0" "1" 
    "Abroad"                     "2"  "70"  "2" "2" 
    "Evaluation"                 "6"  "100" "0" "6" 
    "Developmental Screening"    ""   ""    ""  ""  
    "Genetics"                   "0"  "10"  "0" "0" 
    "Health Equity"              "0"  "10"  "0" "0" 
    "Prevention"                 ""   ""    ""  ""  
    "Screening"                  ""   ""    ""  ""  
    "Newborn Screening"          ""   ""    ""  ""  
    "Nutrition"                  "10" "3"   "0" "0" 
    "Health"                     ""   ""    ""  ""  
    "Perinatal/ Postpartum Care" ""   ""    ""  ""  
    "Prenatal Care"              ""   ""    ""  ""  
    "Child"                      ""   ""    ""  ""  
    "Sleep"                      ""   ""    ""  ""  
    "eCigarette Use"             ""   ""    ""  ""  
    "Visit"                      ""   ""    ""  ""  
    ""                           ""   ""    ""  ""  
    end

    I was to add the them and present the output in the format below -- please, any ideas on how to code this in stata?
    program Participants/ Public local state national
    Coverage
    Screening
    Lead
    Breastfeeding
    Family
    Home
    Transition
    Abroad
    Evaluation
    Developmental Screening
    Genetics
    Health Equity
    Prevention
    Screening
    Newborn Screening
    Nutrition
    Health
    Perinatal/ Postpartum Care
    Prenatal Care
    Child
    Sleep
    eCigarette Use
    Visit

  • #2
    Al of the variables ParaticipantsPublic through national are string variables. You can see that by running -des ParticipantsPublic-local- and in the Storage Type column of the output you will see those variables have str storage types. So they have to be converted to numeric before you can do any calculations with them. The data looks to me like it was imported into Stata from a spreadsheet, and the import for some reason imported those variables as text instead of numbers.

    Code:
    destring ParticipantsPublic-national, replace
    collapse (sum) ParticipantsPublic-national, by(program)
    Now, it is possible that in the full data set, there are observations where those variables contain something other than empty ("") or the text representation of a number. I'm thinking of things like "N/A", or"--" or something like that. In fact, the very fact that the import process left those variables as strings suggests it is likely that this has happened somewhere. If it has, this code will not work, because -destring- will refuse to convert those variables that are "contaminated" in this way, and then -collapse- will again tell you that there is a type mismatch. If that happens, you have to find the offending observations and deal with them in some way. To find them you can do something like this:
    Code:
    gen bad = 0
    foreach v of varlist ParticipantsPublic-national {
        replace bad = 1 if missing(real(`v')) & !missing(`v')
    }
    browse if bad
    Then you can examine these and figure out how to correct or eliminate them.

    Comment


    • #3
      Thanks Clyde

      Comment

      Working...
      X