filling in missing values:how to realize the logic?

Fred Lee

Join Date: Nov 2017
Posts: 473

filling in missing values:how to realize the logic?

22 Mar 2019, 21:45

Take the example from the picture below, if missing values exist in stress(observation 1), then check if the observation's (observation 1) projName exist in firmName variable, if yes, replace the missing value of stress with the stress value of observation 2? (I am not sure I am clear)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str102 firmName str123 projName byte(firmAge stress)
"三角兽（北京）科技有限公司"                   "三角兽人工智能语义交互系统"                   1 .
"三角兽（北京）科技有限公司"                   "三角兽人工智能语义交互系统"                   1 .
"三角兽（北京）科技有限公司"                   "三角兽人工智能语义交互系统"                   1 .
"三角兽（北京）科技有限公司"                   "三角兽人工智能语义交互系统"                   1 .
"三角兽（北京）科技有限公司"                   "三角兽人工智能语义交互系统"                   1 .
"世优（北京）科技有限公司"                      "虚拟角色动画实时制作平台"                      2 3
"世优（北京）科技有限公司"                      "虚拟角色动画实时制作平台"                      2 3
"世优（北京）科技有限公司"                      "虚拟角色动画实时制作平台"                      2 3
"世优（北京）科技有限公司"                      "虚拟角色动画实时制作平台"                      2 3
"世优（北京）科技有限公司"                      "虚拟角色动画实时制作平台"                      2 3
"北京越视科技有限公司"                            "基于人脸识别技术的企业智能管理集成系统" 2 .
"北京越视科技有限公司"                            "基于人脸识别技术的企业智能管理集成系统" 2 .
"北京越视科技有限公司"                            "基于人脸识别技术的企业智能管理集成系统" 2 .
"北京越视科技有限公司"                            "基于人脸识别技术的企业智能管理集成系统" 2 .
"北京越视科技有限公司"                            "基于人脸识别技术的企业智能管理集成系统" 2 .
"基于人脸识别技术的企业智能管理集成系统" ""                                                          . 3
end

The original data:

Click image for larger version

Name: 2.png
Views: 1
Size: 113.2 KB
ID: 1489630

here is I want:

Click image for larger version

Name: 3.png
Views: 1
Size: 63.7 KB
ID: 1489631

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10214

23 Mar 2019, 05:07

Thanks for the data example. The following supposes that in groups where you have missing values, all are missing. On the other hand, in groups where you have non-missing values, there are no missing values. These conditions are satisfied in your data example. If this is not the case in your real data, you must first address this before running the code. Note that if you have two consecutive groups of missing values, you cannot replace the latter group with a non-missing value because the former is missing.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str102 firmName str123 projName byte(firmAge stress)
"三角兽（北京）科技有限公司"                   "三角兽人工智能语义交互系统"                   1 .
"三角兽（北京）科技有限公司"                   "三角兽人工智能语义交互系统"                   1 .
"三角兽（北京）科技有限公司"                   "三角兽人工智能语义交互系统"                   1 .
"三角兽（北京）科技有限公司"                   "三角兽人工智能语义交互系统"                   1 .
"三角兽（北京）科技有限公司"                   "三角兽人工智能语义交互系统"                   1 .
"世优（北京）科技有限公司"                      "虚拟角色动画实时制作平台"                      2 3
"世优（北京）科技有限公司"                      "虚拟角色动画实时制作平台"                      2 3
"世优（北京）科技有限公司"                      "虚拟角色动画实时制作平台"                      2 3
"世优（北京）科技有限公司"                      "虚拟角色动画实时制作平台"                      2 3
"世优（北京）科技有限公司"                      "虚拟角色动画实时制作平台"                      2 3
"北京越视科技有限公司"                            "基于人脸识别技术的企业智能管理集成系统" 2 .
"北京越视科技有限公司"                            "基于人脸识别技术的企业智能管理集成系统" 2 .
"北京越视科技有限公司"                            "基于人脸识别技术的企业智能管理集成系统" 2 .
"北京越视科技有限公司"                            "基于人脸识别技术的企业智能管理集成系统" 2 .
"北京越视科技有限公司"                            "基于人脸识别技术的企业智能管理集成系统" 2 .
"基于人脸识别技术的企业智能管理集成系统" ""                                                          . 3
end

bys firmName projName: gen tag=1 if _n==1 & stress==.
replace tag=sum(tag) if !missing(tag)
replace tag=tag[_n+1] if missing(tag)
bys tag (stress): replace stress= stress[_n-1] if missing(stress) & !missing(tag)
bys firmName projName (stress): replace stress= stress[1] if missing(stress)

Result:

Code:

. l firmName firmAge stress tag , sepby( firmName )

     +-----------------------------------------------------------------+
     |                               firmName   firmAge   stress   tag |
     |-----------------------------------------------------------------|
  1. |             三角兽（北京）科技有限公司         1        .     . |
  2. |             三角兽（北京）科技有限公司         1        .     . |
  3. |             三角兽（北京）科技有限公司         1        .     . |
  4. |             三角兽（北京）科技有限公司         1        .     . |
  5. |             三角兽（北京）科技有限公司         1        .     1 |
     |-----------------------------------------------------------------|
  6. |               世优（北京）科技有限公司         2        3     . |
  7. |               世优（北京）科技有限公司         2        3     2 |
  8. |               世优（北京）科技有限公司         2        3     . |
  9. |               世优（北京）科技有限公司         2        3     . |
 10. |               世优（北京）科技有限公司         2        3     . |
     |-----------------------------------------------------------------|
 11. |                   北京越视科技有限公司         2        3     2 |
 12. |                   北京越视科技有限公司         2        3     . |
 13. |                   北京越视科技有限公司         2        3     . |
 14. |                   北京越视科技有限公司         2        3     . |
 15. |                   北京越视科技有限公司         2        3     . |
     |-----------------------------------------------------------------|
 16. | 基于人脸识别技术的企业智能管理集成系统         .        3     . |
     +-----------------------------------------------------------------+

Comment

Fred Lee

Join Date: Nov 2017
Posts: 473

23 Mar 2019, 05:53

Originally posted by Andrew Musau View Post

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str102 firmName str123 projName byte(firmAge stress)
"三角兽（北京）科技有限公司" "三角兽人工智能语义交互系统" 1 .
"三角兽（北京）科技有限公司" "三角兽人工智能语义交互系统" 1 .
"三角兽（北京）科技有限公司" "三角兽人工智能语义交互系统" 1 .
"三角兽（北京）科技有限公司" "三角兽人工智能语义交互系统" 1 .
"三角兽（北京）科技有限公司" "三角兽人工智能语义交互系统" 1 .
"世优（北京）科技有限公司" "虚拟角色动画实时制作平台" 2 3
"世优（北京）科技有限公司" "虚拟角色动画实时制作平台" 2 3
"世优（北京）科技有限公司" "虚拟角色动画实时制作平台" 2 3
"世优（北京）科技有限公司" "虚拟角色动画实时制作平台" 2 3
"世优（北京）科技有限公司" "虚拟角色动画实时制作平台" 2 3
"北京越视科技有限公司" "基于人脸识别技术的企业智能管理集成系统" 2 .
"北京越视科技有限公司" "基于人脸识别技术的企业智能管理集成系统" 2 .
"北京越视科技有限公司" "基于人脸识别技术的企业智能管理集成系统" 2 .
"北京越视科技有限公司" "基于人脸识别技术的企业智能管理集成系统" 2 .
"北京越视科技有限公司" "基于人脸识别技术的企业智能管理集成系统" 2 .
"基于人脸识别技术的企业智能管理集成系统" "" . 3
end

bys firmName projName: gen tag=1 if _n==1 & stress==.
replace tag=sum(tag) if !missing(tag)
replace tag=tag[_n+1] if missing(tag)
bys tag (stress): replace stress= stress[_n-1] if missing(stress) & !missing(tag)
bys firmName projName (stress): replace stress= stress[1] if missing(stress)

Result:

Code:

. l firmName firmAge stress tag , sepby( firmName )

+-----------------------------------------------------------------+
| firmName firmAge stress tag |
|-----------------------------------------------------------------|
1. | 三角兽（北京）科技有限公司 1 . . |
2. | 三角兽（北京）科技有限公司 1 . . |
3. | 三角兽（北京）科技有限公司 1 . . |
4. | 三角兽（北京）科技有限公司 1 . . |
5. | 三角兽（北京）科技有限公司 1 . 1 |
|-----------------------------------------------------------------|
6. | 世优（北京）科技有限公司 2 3 . |
7. | 世优（北京）科技有限公司 2 3 2 |
8. | 世优（北京）科技有限公司 2 3 . |
9. | 世优（北京）科技有限公司 2 3 . |
10. | 世优（北京）科技有限公司 2 3 . |
|-----------------------------------------------------------------|
11. | 北京越视科技有限公司 2 3 2 |
12. | 北京越视科技有限公司 2 3 . |
13. | 北京越视科技有限公司 2 3 . |
14. | 北京越视科技有限公司 2 3 . |
15. | 北京越视科技有限公司 2 3 . |
|-----------------------------------------------------------------|
16. | 基于人脸识别技术的企业智能管理集成系统 . 3 . |
+-----------------------------------------------------------------+

Thanks, Andrew. Your code really helps a lot. In groups where I have missing values, all are missing. If it can work for two consecutive groups of missing values, it will be better.
Thanks again!

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10214

23 Mar 2019, 08:02

Just iterate the code. For example, with a maximum of 5 consecutive groups of missings

Code:

forv i=1/5 {
di in red "This is iteration # " `i'-1 
bys firmName projName: gen tag=1 if _n==1 & stress==.
replace tag=sum(tag) if !missing(tag)
replace tag=tag[_n+1] if missing(tag)
bys tag (stress): replace stress= stress[_n-1] if missing(stress) & !missing(tag)
bys firmName projName (stress): replace stress= stress[1] if missing(stress)
drop tag
}

Comment

Fred Lee

Join Date: Nov 2017

Posts: 473
#5

23 Mar 2019, 08:03

Oh,yes! Thanks a lot, Andrew!
Comment

Announcement

filling in missing values:how to realize the logic?

Comment

Comment

Comment

Comment