Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rounding off or changing data type for household id

    Hello
    This is a conceptual question. I have a dataset running into 6000 household ids some of which are in the form 10.1 or 10.2. I had to drop the obs with the decimals above 1 and convert the remaining into integers (say, 10.1 to 10)
    In order to append the data with same dataset from another year, I need the household ids in integers and not decimals. I tried rounding off 10.1 to 10 which did not work.
    replace hhid =round(hhid, 0.0) (cue taken from a previous statalist query) However, this did not work. I changed the data type of household id from double to int which serves my purpose but I am not sure how right is that. The household ids don't go beyond 6000 so I don't think I am losing precision here. I have read Stata's help section on data_types but my doubt remains. Please correct me where I am wrong. Thanks Smriti

  • #2
    Originally posted by Saini Smriti View Post
    In order to append the data with same dataset from another year, I need the household ids in integers and not decimals.
    That's certainly not a requirement from Stata.

    append will automatically promote the datatype from integer to floating point if needed, and there is no issue the other way for the range you have. See below.

    .ÿ
    .ÿversionÿ15.1

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿquietlyÿsetÿobsÿ2

    .ÿ
    .ÿgenerateÿfloatÿidnÿ=ÿ10.1

    .ÿquietlyÿreplaceÿidnÿ=ÿ6000ÿinÿl

    .ÿ
    .ÿtempfileÿtmpfil0

    .ÿquietlyÿsaveÿ`tmpfil0'

    .ÿ
    .ÿquietlyÿreplaceÿidnÿ=ÿfloor(idn)

    .ÿcompress
    ÿÿvariableÿidnÿwasÿfloatÿnowÿint
    ÿÿ(4ÿbytesÿsaved)

    .ÿ
    .ÿtempfileÿtmpfil1

    .ÿquietlyÿsaveÿ`tmpfil1'

    .ÿ
    .ÿ*
    .ÿ*ÿBeginÿhere
    .ÿ*
    .ÿ//ÿAppendingÿfloatingÿpointÿIDÿdatasetÿintoÿintegerÿIDÿdataset
    .ÿappendÿusingÿ`tmpfil0'
    (note:ÿvariableÿidnÿwasÿint,ÿnowÿfloatÿtoÿaccommodateÿusingÿdata'sÿvalues)

    .ÿ
    .ÿlist,ÿnoobs

    ÿÿ+------+
    ÿÿ|ÿÿidnÿ|
    ÿÿ|------|
    ÿÿ|ÿÿÿ10ÿ|
    ÿÿ|ÿ6000ÿ|
    ÿÿ|ÿ10.1ÿ|
    ÿÿ|ÿ6000ÿ|
    ÿÿ+------+

    .ÿ
    .ÿ//ÿTheÿreverse:ÿintegerÿIDÿdatasetÿappendedÿtoÿfloatingÿpointÿIDÿdataset
    .ÿuseÿ`tmpfil0',ÿclear

    .ÿappendÿusingÿ`tmpfil1'

    .ÿ
    .ÿlist,ÿnoobs

    ÿÿ+------+
    ÿÿ|ÿÿidnÿ|
    ÿÿ|------|
    ÿÿ|ÿ10.1ÿ|
    ÿÿ|ÿ6000ÿ|
    ÿÿ|ÿÿÿ10ÿ|
    ÿÿ|ÿ6000ÿ|
    ÿÿ+------+

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .


    Anyway, you commanded the round() function to round to one decimal place, and not to an integer; if that's what you mean by "this did not work", then try using floor() as above.

    As an uninvited aside, I recommend to keep IDs as string. You're not going to be doing any arithmetic on them, and so the only loss is for those multilevel / hierarchical estimation commands (e.g., -xtset-, -anova-) that require numeric variables for the identifier. For that, you can -encode- them.

    Comment


    • #3
      round(whatever, 0) should mean rounding to the nearest multiple of zero, except that it is interpreted (in practice if not in principle) as no rounding at all. round(whatever, 1) should mean rounding to the nearest integer. But then 10.1 and 10.2 would both get rounded to 10; is that really what you want?

      I don't know how non-integer identifiers arise, but they will prove problematic for some Stata purposes. Although Joseph Coveney makes a good case for string identifiers, there are contexts where Stata insists on integers.

      Comment


      • #4
        Originally posted by Nick Cox View Post
        round(whatever, 0) . . . is interpreted (in practice if not in principle) as no rounding at all.
        Whoops! Missed that. From the helpfile: "For y = 0, the function is defined as returning x unmodified." Sorry about the misstep above.

        Comment


        • #5
          Thanks for the help.

          Joseph, when I wrote I need it to append the dataset, I meant that I needed it to make a comparison between the same ids(one of which has become 10.1 in the latter round). I understand that appending would not require any such rounding off with the id. I am trying to wrap my head around the floor command and how I can use it. Thanks for the introduction to the command.

          Nick, thanks for the input on rounding. I was not able to understand how to use the command correctly. The hhid was in decimals because the houses split during the two rounds(hence, 10.1 and 10.2). I only need the main household for my analysis(10.1). Hence, i need to round it off to 10 to facilitate comparison between two rounds. The command has served my purpose.

          Thanks
          Smriti

          Comment


          • #6
            OK, but round() is a function, not a command. The difference is a bit more than just terminology. See e.g. https://www.stata-journal.com/articl...article=dm0058

            Also, if you had household 10.6 and so forth, you need floor() not round(), as Joseph suggests.

            Comment

            Working...
            X