Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fill missing values by groups

    Hi there,

    I have read quite some posts on the topic but I don't seem to find a solution. Please consider the sample below.

    In this sample, I have one collaborator ("10065968") which is associated with two inventors (as also indicated by count_invid). In some years, the collaborator is active (case_collab!="") while neither of the inventors is (case_invid==""), in this case in 1997.


    Using expand I duplicated (by a factor count_invid) such years in order to be able to have one of these observation per inventor, tag indicates the duplicated observation in this case. Since I am running an analysis at the dyadic level inventor-collaborator, I would like to associate the activity of the collaborator in year 1997 to each of the two inventors, as in "wanted".

    In my dataset, I have up to 15 inventors per collaborator so I would need a scalable way to do so.

    Thank you so much for any suggestion you can provide,

    Best wishes,
    Giovanna Capponi



    clear
    input str10 case_invid str23(inventor wanted) float year str23 collaborator str10 case_collab float count_invid byte tag
    "" "" "10000709" 1997 "10065968" "12954" 2 1
    "" "" "10158528" 1997 "10065968" "12954" 2 0
    "25766" "10000709" "10000709" 2000 "10065968" "" 2 0
    "25763" "10000709" "10000709" 2000 "10065968" "" 2 0
    "25915" "10000709" "10000709" 2001 "10065968" "" 2 0
    "25982" "10000709" "10000709" 2001 "10065968" "" 2 0
    "29357" "10000709" "10000709" 2001 "10065968" "" 2 0
    "44819" "10000709" "10000709" 2004 "10065968" "48769" 2 0
    "44803" "10000709" "10000709" 2004 "10065968" "44803" 2 0
    "44819" "10000709" "10000709" 2004 "10065968" "44803" 2 0
    "44803" "10000709" "10000709" 2004 "10065968" "48769" 2 0
    "44819" "10000709" "10000709" 2004 "10065968" "48471" 2 0
    "44803" "10000709" "10000709" 2004 "10065968" "48770" 2 0
    "44819" "10000709" "10000709" 2004 "10065968" "48770" 2 0
    "44803" "10000709" "10000709" 2004 "10065968" "48471" 2 0
    "44803" "10000709" "10000709" 2004 "10065968" "48927" 2 0
    "44819" "10000709" "10000709" 2004 "10065968" "48927" 2 0
    "44819" "10000709" "10000709" 2004 "10065968" "44819" 2 0
    "44803" "10000709" "10000709" 2004 "10065968" "44819" 2 0
    "50217" "10000709" "10000709" 2005 "10065968" "50217" 2 0
    "50217" "10000709" "10000709" 2005 "10065968" "50213" 2 0
    "60823" "10000709" "10000709" 2007 "10065968" "62944" 2 0
    "60823" "10000709" "10000709" 2007 "10065968" "57293" 2 0
    "60823" "10000709" "10000709" 2007 "10065968" "61848" 2 0
    "67824" "10000709" "10000709" 2009 "10065968" "67824" 2 0
    "67824" "10000709" "10000709" 2009 "10065968" "68070" 2 0
    "67824" "10000709" "10000709" 2009 "10065968" "71017" 2 0
    "67824" "10000709" "10000709" 2009 "10065968" "68865" 2 0
    "67824" "10000709" "10000709" 2009 "10065968" "70374" 2 0
    "67824" "10000709" "10000709" 2009 "10065968" "71003" 2 0
    "67824" "10000709" "10000709" 2009 "10065968" "70725" 2 0
    "36266" "10158528" "10158528" 2002 "10065968" "35960" 2 0
    "36024" "10158528" "10158528" 2002 "10065968" "36030" 2 0
    "36266" "10158528" "10158528" 2002 "10065968" "36266" 2 0
    "36266" "10158528" "10158528" 2002 "10065968" "36030" 2 0
    "36024" "10158528" "10158528" 2002 "10065968" "35960" 2 0
    "36024" "10158528" "10158528" 2002 "10065968" "36266" 2 0
    "36472" "10158528" "10158528" 2003 "10065968" "36472" 2 0
    "36471" "10158528" "10158528" 2003 "10065968" "36472" 2 0
    "36472" "10158528" "10158528" 2003 "10065968" "36471" 2 0
    "36662" "10158528" "10158528" 2003 "10065968" "36662" 2 0
    "36471" "10158528" "10158528" 2003 "10065968" "36411" 2 0
    "36662" "10158528" "10158528" 2003 "10065968" "36411" 2 0
    "36662" "10158528" "10158528" 2003 "10065968" "36471" 2 0
    "36662" "10158528" "10158528" 2003 "10065968" "36472" 2 0
    "36472" "10158528" "10158528" 2003 "10065968" "36411" 2 0
    "36622" "10158528" "10158528" 2003 "10065968" "36413" 2 0
    "36347" "10158528" "10158528" 2003 "10065968" "36472" 2 0
    "36347" "10158528" "10158528" 2003 "10065968" "36471" 2 0
    "36413" "10158528" "10158528" 2003 "10065968" "36413" 2 0
    "36413" "10158528" "10158528" 2003 "10065968" "36472" 2 0
    "36472" "10158528" "10158528" 2003 "10065968" "36662" 2 0
    "36622" "10158528" "10158528" 2003 "10065968" "36347" 2 0
    "36472" "10158528" "10158528" 2003 "10065968" "36347" 2 0
    "36471" "10158528" "10158528" 2003 "10065968" "36347" 2 0
    "36471" "10158528" "10158528" 2003 "10065968" "36413" 2 0
    "36411" "10158528" "10158528" 2003 "10065968" "36662" 2 0
    "36347" "10158528" "10158528" 2003 "10065968" "36662" 2 0
    "36662" "10158528" "10158528" 2003 "10065968" "36347" 2 0
    "36411" "10158528" "10158528" 2003 "10065968" "36413" 2 0
    "36413" "10158528" "10158528" 2003 "10065968" "36347" 2 0
    "36347" "10158528" "10158528" 2003 "10065968" "36411" 2 0
    "36413" "10158528" "10158528" 2003 "10065968" "36471" 2 0
    "36411" "10158528" "10158528" 2003 "10065968" "36471" 2 0
    "36471" "10158528" "10158528" 2003 "10065968" "36662" 2 0
    "36471" "10158528" "10158528" 2003 "10065968" "36471" 2 0

  • #2
    You can try fillmissing (from SSC), which works with groups and without groups. For this specific problem, I am not clear which variables do you want to fill and with what values, neither the grouping variables are clear to me.

    Code:
    ssc install fillmissing
    You can find more on fillmissing here.
    https://fintechprofessor.com/2019/12...lues-in-stata/
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    FinTechProfessor.com
    https://asdocx.com
    Check out my asdoc program, which sends outputs to MS Word.
    For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

    Comment


    • #3
      Thank you for your reply! Actually I think it is more of a "sorting" problem. I want to fill missing values of "inventor" with different values of the same variable. The grouping variable is "collaborator". I hope this clarifies.
      Best,
      Giovanna

      Comment


      • #4
        To understand more clearly what you want to achieve, can you please use the following code and tell us whether it does what you want or there was something else you wanted.
        Code:
        bys collaborator: fillmissing inventor
        Regards
        --------------------------------------------------
        Attaullah Shah, PhD.
        Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
        FinTechProfessor.com
        https://asdocx.com
        Check out my asdoc program, which sends outputs to MS Word.
        For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

        Comment


        • #5
          Dear Attaullah,

          Thanks for your help, I tried the code but what it does (that I don't what) is to replace the missing values with the closest inventor ID. Back to my sample above, I want the two missing values to be filled with the two inventors ID associated with "collaborator", meaning "10000709" and "10158528" as in variable wanted. With fillmissing the missing values are both replaced by "10000709", which when the variable is sorted is the closest to the missing cases. I hope this clarifies! Thanks!

          Giovanna

          Comment


          • #6
            Real data are fine and we shoot if people don't give concrete examples, but on this occasion the example is rather scary-looking. I think you'd get faster answers with a minimal example showing what you have for much simplified identifiers say 1 2 3 and years say 1997 1998 and then what you want. Are all the variables needed to understand the problem?

            Comment

            Working...
            X