Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop observations by a variable value if equal to one contained in a vector created with putmata or one contained in a local

    Dear All,

    I need to sample 40 observations from a dataset of 80 and then reload the full dataset to get the other 40 by dropping the ones in the first group.
    Code:
    import excel "...db.xlsx", sheet("...1") firstrow
    generate n=_n
    sample 50
    Now I need to save the values of "n" variable of these sampled obs, re-import the dataset and drop the observations with n equal to this vector of values.
    I thought about 2 ways of performing this task:
    1 - save the vector of values in a matrix with putmata and re-importing the dataset
    Code:
    putmata group1=(n)
    clear
    import excel "...db.xlsx", sheet("...") firstrow
    generate n=_n
    but then I don't know how to drop the observations if n == one of the values in group1

    2 - alternatively I thought I could save this vector of values in a local macro by using
    Code:
    levelsof n
    after sampling, but I don't know how to store that output in a local.
    I've done many attempts without success.

    Thanks in advance for the support.
    Last edited by Valentino Santori; 21 Jul 2023, 08:35.

  • #2
    Code:
    import excel "...db.xlsx", sheet("...1") firstrow clear
    isid *
    sample 50
    tempfile selected
    save `selected'
    import excel "...db.xlsx", sheet("...1") firstrow clear
    merge 1:1 * using `selected'
    tab _merge

    Comment


    • #3
      Hi Andrew Musau and thanks for your answer, save command return this error
      Code:
      . save `selected'
      invalid file specification
      EDIT: My bad, I was trying to exec one line at time.
      Last edited by Valentino Santori; 21 Jul 2023, 09:57.

      Comment


      • #4
        You should run the code all at once in a do-file. You are probably running it in parts and thereby getting the error. But it should be more efficient to save the file to disk or use frames.

        Code:
        import excel "...db.xlsx", sheet("...1") firstrow clear
        isid *
        frame put *, into(match)
        frame match: sample 50
        frlink 1:1 *, frame(match)
        replace match= !missing(match)
        tab match
        Last edited by Andrew Musau; 21 Jul 2023, 09:46.

        Comment


        • #5
          Thanks Andrew Musau I was editing the post but you were faster replying.
          May I ask you if it is possible to do the thing I would like to in the ways I was trying ?

          Comment


          • #6
            This is not tested.

            Code:
            import ...
            gen n=_n
            sample 50
            levelsof n, local(levels) sep(,)
            
            import...
            gen n=_n
            gen match= inlist(n, `levels')
            tab match

            Comment

            Working...
            X