Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing Data - Replace Variable Command Issue - Using ds and foreach commands

    I am using Stata version 15.1 and I have 47 variables and 1,010 observations in my dataset. I am doing educational research with 112 colleges included. I have values that are missing and I want to do a mean replacement of some of these values by the institution. I do not want to just take the general mean and apply it since the mean per institution would be best suited to help with the missing data. I acknowledge the shortcomings of mean replacement, but I am looking at how this might change my outcomes with it included.


    What I have researched - I have read two very helpful articles to address this and I feel certain my problem should be solved, but yet I am having technical issues. 1. https://www.statalist.org/forums/for...mean-or-median 2. https://www.stata.com/statalist/arch.../msg00062.html. The first article provided a code solution and the second article helped me to look at how to do a mean replacement based on insitution type.

    My code (I tried to model it on article one and the code utilized there with the addition of "by unitid" which I understood to be supporting my approach by producing a mean replacement for missing values based on the mean of the insitutions values and not just the general mean)

    Nick Cox recommended using the ds command instead of dropping string variables so I utilized this and then I tried to merge it with the code recommended in the first post by Joro Kolev.

    . ds, has (type numeric)
    . foreach var of varlist * {
    2. summ `var'
    3. replace `var' =r(mean), by unitid if missing (`var')
    4. }

    . ds, has (type numeric)
    unitid total_~75000 total_stu_~v tuition_fe~d act_~75_pctl exp_aux_en~l net_revenue number_adm~d ne~e_missing act_compos.. number_app~g
    year net_pr~75000 fips sa~d_25_pctl rev_total_~t exp_total_~t merge_admi~e number_enr~l tuition_fe~g act_compos.. number_adm~g
    total_~30000 total~110000 type_of_aid sa~d_75_pctl exp_instru~l endowment~eg composite_~x merge_admi~t sat_crit_r.. exp_acad_s~g number_enr~g
    net_pr~30000 net_p~110000 avg_tuitio~t sat_math_2~l exp_acad_s~l assets_net merge_fina~i net_price_.. sat_crit_r.. endowment~ng
    total_~48000 total_stu_~e tuition_pu~d sat_math_7~l exp_studen~l cpi sex net_price_.. sat_math_2~g composite_~s
    net_pr~48000 net_price_~e fees_publi~d act_~25_pctl exp_inst_s~l hepi number_app~d net_price_.. sat_math_7~g sex_missing

    . foreach var of varlist * {
    foreach command may not result from a macro expansion interactively or in do files
    r(198);

    I read in the article below that there may be an issue pasting from word. I did not do that and instead took my code directly from post #1 so I am hoping I haven't overlooked something quite obvious.
    --https://www.statalist.org/forums/forum/general-stata-discussion/general/1423435-foreach-in-do-edit

    This is the mdesc for the first few variables I have:

    . mdesc

    Variable | Missing Total Percent Missing
    ----------------+-----------------------------------------------
    unitid | 0 1,010 0.00
    year | 0 1,010 0.00
    inst_name_~a | 14 1,010 1.39
    opeid | 14 1,010 1.39
    total_~30000 | 2 1,010 0.20
    net_pr~30000 | 2 1,010 0.20
    total_~48000 | 2 1,010 0.20
    net_pr~48000 | 3 1,010 0.30
    total_~75000 | 2 1,010 0.20
    net_pr~75000 | 5 1,010 0.50
    total~110000 | 2 1,010 0.20
    net_p~110000 | 12 1,010 1.19
    total_stu_~e | 2 1,010 0.20
    net_price_~e | 28 1,010 2.77

    ------sorry the pasting did not come through very well. I tried a table insert and Excel, but it didn't make it work any better.

    -->Can I provide anything else to help reach a solution? I tried my best to be thorough and to read the posting guidelines before sending across my first question, but I would be very happy to update or expand any information to reach a resolution. I really appreciate any help anyone might offer.

  • #2
    Your code contains several errors. Try this:
    Code:
    ds, has (type numeric)
    foreach var of varlist `r(varlist)' {
        by unitid, sort: egen temp = mean(`var')
        replace `var' = temp if missing(`var')
        drop temp
    }
    When you wrote -foreach var of varlist * {-, you ignored the output from -ds-, which is saved in r(varlist). Instead you told Stata to loop over *, which means all variables in the data set.

    Within the loop, the summ `var' command would do a single summarization of `var' without regard to unitid. The r(mean) picked up in the subsequent -replace- command would be the grand mean, again without regard to unitid. Then again, the -by unitid- located after the comma in that -replace- command is a syntax error, so nothing would have run anyway.

    Comment


    • #3
      . foreach var of varlist * {
      foreach command may not result from a macro expansion interactively or in do files
      r(198);

      I read in the article below that there may be an issue pasting from word. I did not do that and instead took my code directly from post #1 so I am hoping I haven't overlooked something quite obvious.
      It is not obvious to me what has produced that error message here. The post to which you linked about pasting from Word dealt with the issue of "smart quotes." But there are no quotes, smart or otherwise, in what you wrote. So that isn't the problem. The most likely cause is that there are non-printing "control characters" in what you copy/pasted from this Forum. It doesn't happen all that often, but it does happen. These are characters that this web site uses (many web sites use them) to control the display of information on the computer screen . They do not produce anything visible on the screen--but Stata "sees" them and they mislead its parser into thinking something other than what you intended was typed. The solution is to delete the command entirely (including the end of line sequence) and then just retype it by hand. Typing from the keyboard will not introduce non-printing characters unless you specifically go out of your way to introduce them using your numeric keypad.

      ------sorry the pasting did not come through very well. I tried a table insert and Excel, but it didn't make it work any better.
      The way to show Stata code and results nicely aligned and easily readable is with code delimiters. When you are typing into a post, just above your edit window there should be a toolbar that includes, among other things a # button. Click on that and it will create a pair of code delimiters. Paste the code or Stata results directly between those. If you do not see a toolbar that contains a # button, you will nevertheless see a shorter toolbar offset to the right that contains an Upload Attachments button, an A button (underlined and in blue) and a smiley face button. Click on the A button and the toolbar that contains (among man other things) the # button will appear.
      Last edited by Clyde Schechter; 01 Sep 2022, 19:51.

      Comment


      • #4
        Clyde,

        I am very sorry for my delay in posting back. I was able to try the code you helped me correct and bottom line, you are a lifesaver. This code works well. One factor that I appreciate is that when it looks for the mean for each institution (Using the variable unitid) to do a mean replacement if blank (for numeric variables) it skips those where there is no data. This is especially important since my data looks at ACT score data and some institutions do not require an ACT for admissions and do not have any data at all so this data being missing and not filled with any mean data is the most appropriate outcome.

        Thanks for the tip about inputting the code. This was a real rookie move, but with your help, I won't have this issue moving forward.

        Using your approach this is how it should have been posted initially

        Code:
        . ds, has (type numeric)
        . foreach var of varlist * {
        2. summ `var'
        3. replace `var' =r(mean), by unitid if missing (`var')
        4. }
        This was the successful code you sent my way (When I use the Code command it always saves with it put on one line instead of laid out like you did below. My code looks fine and is laid out like yours, but when it saves it transposes it to one line. I apologize for not being able to format this appropriately. If you let me know the issue with how I pasted it in here I can fix it for the future.):

        Code:
         ds, has (type numeric) foreach var of varlist `r(varlist)' {     by unitid, sort: egen temp = mean(`var')     replace `var' = temp if missing(`var')     drop temp }
        Thank you again, Clyde, for your assistance with this issue.
        Last edited by Rocky Christensen; 07 Sep 2022, 18:32.

        Comment

        Working...
        X