Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dealing with a variable that has both numeric and categorical observations

    Hi everyone. I am new to this site so please bear with me.

    I am running a regression of property value against several house characteristics such as the number of bedrooms, bathrooms etc. One of the variables is 'propertysize'.

    When collecting data for 'propertysize', some observations had a numerical value available in metres squared such as '100' whilst some observations had no number available so I had to estimate whether the property was 'small', 'medium', or 'large'. I collected all observations under the same variable 'propertysize'.

    I want to run several regressions.

    a) a regression where 'propertysize' only includes the numbers, thus removing my estimates
    b) a regression where 'propertysize' only includes the original 'small', 'medium' and 'large' estimates I made, thus removing the numerical data
    c) a regression where the numerical values are transformed to one of the size categories 'small', 'medium' and 'large'. So between the size of 0-65 metres squared a property is classed as 'small', 65-100 is 'medium', and >100 is 'large'. These transformed numbers would then be in the same format as my estimates and a regression would be run on the categories.

    I have been succesfsul in part a).

    However, I cannot find a method to work b) or c).




  • #2
    I should add that my attempt for b is as follows;

    clonevar propertysize2 = propertysize
    clonevar propertysize3 = propertysize2 if propertysize2== "SMALL"
    clonevar propertysize4 = propertysize2 if propertysize2== "MEDIUM"
    clonevar propertysize5 = propertysize2 if propertysize2== "LARGE"

    I was then trying to stack propertysize3 propertysize4 and propertysize5 but doing so deleted my dataset.

    Comment


    • #3
      Data example please: see https://www.statalist.org/forums/help#stata



      At a guess

      Code:
      gen wanted = propertysize if inlist(propertysize, "SMALL", "MEDIUM", "LARGE")
      replace wanted = "SMALL" if real(propertysize) < 65
      replace wanted = "MEDIUM" if inrange(real(propertysize), 65, 100)
      replace wanted = "LARGE" if inrange(real(propertysize), 100, .)
      
      * what didn't work??? 
      tab wanted if !inlist(wanted, "SMALL", "MEDIUM", "LARGE")
      
      label def wanted 1 SMALL 2 MEDIUM 3 LARGE
      encode wanted, gen(desired) label(wanted)
      Your definitions 0-65 and 65- leave ambiguity about which way 65 jumps. One possibility above.
      Last edited by Nick Cox; 14 Mar 2019, 07:04.

      Comment


      • #4
        Below is an example of my data

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long soldprice str7 postcode str28 buildingnameornumber str25 street str17 propertysize
        175000 "SS0 0AA" "202"                    "SOUTHBOURNE GROVE"  ""                 
        357500 "SS0 0AB" "3"                      "MERILIES GARDENS"   ""                 
        225000 "SS0 0AF" "256"                    "SOUTHBOURNE GROVE"  ""                 
        243000 "SS0 0AQ" "334"                    "SOUTHBOURNE GROVE"  ""                 
        385000 "SS0 0BD" "5"                      "KENILWORTH GARDENS" ""                 
        340000 "SS0 0BG" "25"                     "MANNERING GARDENS"  "198"              
        490000 "SS0 0BJ" "30"                     "ARUNDEL GARDENS"    ""                 
        320000 "SS0 0BL" "5"                      "ARUNDEL GARDENS"    ""                 
        400000 "SS0 0BT" "124"                    "KENILWORTH GARDENS" ""                 
        249000 "SS0 0DL" "74"                     "SOMERSET AVENUE"    ""                 
        329995 "SS0 0DR" "49"                     "WINSFORD GARDENS"   ""                 
        165000 "SS0 0DS" "170"                    "BRIDGWATER DRIVE"   "MEDIUM"           
        330000 "SS0 0DW" "23"                     "SOMERSET AVENUE"    ""                 
        210000 "SS0 0DY" "8"                      "BRAMPTON CLOSE"     ""                 
         70000 "SS0 0DZ" "3"                      "EXFORD AVENUE"      ""                 
        250000 "SS0 0EB" "20"                     "LANGPORT DRIVE"     ""                 
        180000 "SS0 0EB" "26"                     "LANGPORT DRIVE"     "74.09999999999999"
        220000 "SS0 0ED" "9"                      "SOMERTON AVENUE"    ""                 
        290000 "SS0 0EF" "20"                     "EXFORD AVENUE"      ""                 
        227000 "SS0 0EW" "500"                    "PRINCE AVENUE"      ""                 
        153000 "SS0 0EX" "634"                    "PRINCE AVENUE"      "66.2"             
        188000 "SS0 0HA" "293"                    "BRIDGWATER DRIVE"   ""                 
        174995 "SS0 0HA" "307"                    "BRIDGWATER DRIVE"   "70.91"            
        175000 "SS0 0HA" "317"                    "BRIDGWATER DRIVE"   ""                 
        150000 "SS0 0HE" "264"                    "MENDIP CRESCENT"    ""                 
        172000 "SS0 0HN" "10"                     "MENDIP CRESCENT"    ""                 
        175000 "SS0 0HN" "78"                     "MENDIP CRESCENT"    "MEDIUM"           
        205000 "SS0 0HP" "21"                     "DULVERTON AVENUE"   ""                 
        155000 "SS0 0HS" "7"                      "DULVERTON CLOSE"    ""                 
        145000 "SS0 0HS" "20"                     "DULVERTON CLOSE"    "45.7"             
        159995 "SS0 0HW" "17"                     "BRUTON AVENUE"      ""                 
        175000 "SS0 0JB" "633"                    "PRINCE AVENUE"      "MEDIUM"           
        210000 "SS0 0JE" "13"                     "LYMPSTONE CLOSE"    ""                 
        195000 "SS0 0JE" "18"                     "LYMPSTONE CLOSE"    "122.8"            
        175000 "SS0 0JQ" "587"                    "PRINCE AVENUE"      "MEDIUM"           
        147500 "SS0 0JQ" "595"                    "PRINCE AVENUE"      "107.42"           
        135500 "SS0 0LA" "43"                     "DENTON AVENUE"      ""                 
         66500 "SS0 0LE" "3"                      "HORNBY AVENUE"      ""                 
        130000 "SS0 0LF" "113"                    "HORNBY AVENUE"      "41.2"             
        150000 "SS0 0ND" "SOUTH POINT, 374 - 386" "PRINCE AVENUE"      ""                 
        175000 "SS0 0NF" "302"                    "PRINCE AVENUE"      ""                 
        225000 "SS0 0NP" "59"                     "MIDHURST AVENUE"    ""                 
        332500 "SS0 0NR" "64"                     "MIDHURST AVENUE"    ""                 
        156000 "SS0 0NU" "17"                     "COLEMANS AVENUE"    ""                 
        175000 "SS0 0NW" "194"                    "PRINCE AVENUE"      "MEDIUM"           
        235000 "SS0 0PA" "2"                      "OAK TREE GARDENS"   ""                 
        315000 "SS0 0PA" "3"                      "OAK TREE GARDENS"   ""                 
        365000 "SS0 0PA" "4"                      "OAK TREE GARDENS"   ""                 
        190000 "SS0 0PB" "11"                     "THEAR CLOSE"        "MEDIUM"           
        250000 "SS0 0PL" "287"                    "PRITTLEWELL CHASE"  ""                 
        185000 "SS0 0PS" "213"                    "WESTBOURNE GROVE"   ""                 
        392000 "SS0 0PT" "322A"                   "WESTBOURNE GROVE"   ""                 
         94950 "SS0 0PU" "319"                    "WESTBOURNE GROVE"   ""                 
        168500 "SS0 0PW" "245"                    "PRITTLEWELL CHASE"  ""                 
        242500 "SS0 0PX" "285"                    "CARLTON AVENUE"     ""                 
        247000 "SS0 0PY" "380A"                   "WESTBOURNE GROVE"   ""                 
        220000 "SS0 0PZ" "419"                    "WESTBOURNE GROVE"   "MEDIUM"           
        145000 "SS0 0QA" "THE GABLES, 27"         "NORTHVILLE DRIVE"   ""                 
        129000 "SS0 0QA" "THE GABLES, 27"         "NORTHVILLE DRIVE"   ""                 
        156000 "SS0 0QH" "143"                    "CARLTON AVENUE"     ""                 
         52000 "SS0 0QH" "193"                    "CARLTON AVENUE"     ""                 
        110000 "SS0 0QL" "78"                     "CARLTON AVENUE"     ""                 
        180000 "SS0 0QN" "5"                      "CARLTON AVENUE"     ""                 
        160000 "SS0 0RJ" "130"                    "HOBLEYTHICK LANE"   ""                 
        230000 "SS0 0RS" "15"                     "CHASE GARDENS"      ""                 
        225000 "SS0 0RT" "184"                    "PRITTLEWELL CHASE"  ""                 
        200000 "SS0 0SD" "117"                    "CARLINGFORD DRIVE"  ""                 
        200000 "SS0 0SE" "211"                    "CARLINGFORD DRIVE"  ""                 
        117500 "SS0 0SN" "BERKLEY COURT, 140"     "GAINSBOROUGH DRIVE" ""                 
        180000 "SS0 0SU" "8"                      "CLEVELAND DRIVE"    ""                 
        175000 "SS0 0TE" "32"                     "HIGHFIELD GROVE"    "72.94"            
         87500 "SS0 7AB" "180"                    "NORTH ROAD"         ""                 
        122500 "SS0 7AE" "5"                      "AVEBURY ROAD"       ""                 
        125000 "SS0 7AF" "165A"                   "NORTH ROAD"         ""                 
        133000 "SS0 7AF" "175"                    "NORTH ROAD"         ""                 
        175000 "SS0 7AG" "90"                     "NORTH ROAD"         ""                 
        132625 "SS0 7AJ" "17"                     "CLIFF AVENUE"       ""                 
         80000 "SS0 7BB" "80A"                    "SALISBURY AVENUE"   ""                 
        135000 "SS0 7BB" "121"                    "SALISBURY AVENUE"   "70.3"             
        150000 "SS0 7BB" "123"                    "SALISBURY AVENUE"   "MEDIUM"           
         52500 "SS0 7BH" "BLACKDOWN"              "NORTH ROAD"         ""                 
        126000 "SS0 7DD" "291A"                   "HAMLET COURT ROAD"  "57"               
        117500 "SS0 7DE" "254A"                   "HAMLET COURT ROAD"  ""                 
        115000 "SS0 7DF" "18"                     "WINDSOR ROAD"       ""                 
         79500 "SS0 7DG" "HOWARDS COURT"          "BALMORAL ROAD"      ""                 
        123000 "SS0 7DG" "HOWARDS COURT"          "BALMORAL ROAD"      ""                 
        108500 "SS0 7DS" "13"                     "RAYLEIGH AVENUE"    ""                 
        119000 "SS0 7DS" "32A"                    "RAYLEIGH AVENUE"    "MEDIUM"           
        190000 "SS0 7DT" "22"                     "ROCHFORD AVENUE"    "SMALL"            
        172000 "SS0 7DT" "38"                     "ROCHFORD AVENUE"    ""                 
        116250 "SS0 7DU" "25"                     "CARISBROOKE ROAD"   ""                 
        168000 "SS0 7DW" "25"                     "OSBORNE ROAD"       ""                 
        124000 "SS0 7DW" "38A"                    "OSBORNE ROAD"       ""                 
         97000 "SS0 7DX" "19A"                    "CLAREMONT ROAD"     "56.6"             
        135000 "SS0 7DZ" "8"                      "CLAREMONT ROAD"     ""                 
        110000 "SS0 7EX" "MONTAGUE COURT"         "HAMLET COURT ROAD"  ""                 
        137500 "SS0 7EX" "MONTAGUE COURT"         "HAMLET COURT ROAD"  ""                 
         92000 "SS0 7HL" "41"                     "ARGYLL ROAD"        ""                 
        150000 "SS0 7HN" "18A"                    "ARGYLL ROAD"        "55.4"             
        140000 "SS0 7HP" "4"                      "CEYLON ROAD"        ""                 
        end
        label var soldprice "Sold Price" 
        label var postcode "Postcode" 
        label var buildingnameornumber "Building Name or Number" 
        label var street "Street" 
        label var propertysize "Lot Size (sq m) "

        Comment


        • #5
          Thank you so much Nick!

          The code you provided works exactly in the way I had hoped for part c)

          For part b) where I exclude any numeric values from 'propertysize' (for example deleting 198 corresponding to Mannering Gardens, or 74.0999 for Langport Drive) leaving only the estimated 'SMAL', 'MEDIUM' and 'LARGE', should I look at a different method to what I attempted before?

          Comment


          • #6
            That's a matter of your goals and I can't easily advise. From what I can see you only have data for some of your properties, so you start out badly off on using size for any purpose and have no guarantee of things getting much better.

            Non-British readers may not recognise postcodes in Southend, Essex: https://en.wikipedia.org/wiki/SS_postcode_area

            The postcode of my home ends with NJ, which is fortuitous but satisfying.

            Comment


            • #7
              I've managed to come up with a solution for part b. Simply using the first line of your code gives the desired solution.

              The datset is much larger than the sample I provided with over 1,500 observations and once the dataset is complete I shall eliminate houses that do not have data leaving me with around 600 observations.

              Thanks for your help

              Comment


              • #8
                I hesitate to come in this late, but the data posted does suggest you may have a real sample selection bias problem - you have a lot of missing size data.

                Comment


                • #9
                  Thanks for noting the lack of 'size' data Phil. I will be running regressions without 'size' as I realise the sample selection bias issue. I merely recorded the data to indicate my ideal data set and how my results would be impacted by including an observation which suffers from selection bias. Thanks!

                  Comment

                  Working...
                  X