I'm trying to import an excel dataset of Belgian surnames into Stata (Source of dataset: Family names | Statbel (fgov.be), but I'm encountering issues when sorting data, probably regarding their format.
import excel "C:$path\Family_names_2023.xlsx", sheet("top10000_2023") firstrow clear
* Example generated by -dataex-. For more info, type help dataex
clear
input int Belgium str20 B int(C FlemishRegion) str19 E int(F WalloonRegion) str20 H int(I BrusselsCapitalRegion) str14 K int L
1 "Peeters" 30716 1 "Peeters" 27459 1 "Dubois" 9096 1 "Diallo" 4798
2 "Janssens" 28115 2 "Janssens" 24554 2 "Lambert" 8311 2 "Bah" 2682
3 "Maes" 24392 3 "Maes" 21324 3 "Martin" 6578 3 "Barry" 1831
4 "Jacobs" 18846 4 "Jacobs" 16244 4 "Dupont" 6126 4 "Sow" 1353
5 "Mertens" 17725 5 "Mertens" 14632 5 "Simon" 5988 5 "Nguyen" 965
So far so good, the problem is, when I try sorting by C, the order is ascending, so that I get:
sort by C
Example generated by -dataex-. For more info, type help dataex
input int Belgium str20 B int(C FlemishRegion) str19 E int(F WalloonRegion) str20 H int(I BrusselsCapitalRegion) str14 K int L
9994 "Bulckaen" 181 9918 "Van Steyvoort" 104 . "" . . "" .
9994 "Davidts" 181 9918 "Werniers" 104 . "" . . "" .
9994 "Dehez" 181 . "" . . "" . . "" .
9994 "Van Boxem" 181 . "" . . "" . . "" .
9994 "Scauflaire" 181 . "" . . "" . . "" .
I tried to see whether applying absolute value works, nothing.
Codebook also seems to show range and percentiles as I expect them to be.
codebook C
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
C #
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Type: Numeric (int)
Range: [181,30716] Units: 1
Unique values: 1,768 Missing .: 2/10,048
Mean: 634.564
Std. dev.: 1093.88
Percentiles: 10% 25% 50% 75% 90%
200 234 333 595 1219
This problem seems present in the native excel data too and applies to all the numeric variables, but I would like to find a solution in Stata since I'm not familiar with excel at all.
Thank you in advance
import excel "C:$path\Family_names_2023.xlsx", sheet("top10000_2023") firstrow clear
* Example generated by -dataex-. For more info, type help dataex
clear
input int Belgium str20 B int(C FlemishRegion) str19 E int(F WalloonRegion) str20 H int(I BrusselsCapitalRegion) str14 K int L
1 "Peeters" 30716 1 "Peeters" 27459 1 "Dubois" 9096 1 "Diallo" 4798
2 "Janssens" 28115 2 "Janssens" 24554 2 "Lambert" 8311 2 "Bah" 2682
3 "Maes" 24392 3 "Maes" 21324 3 "Martin" 6578 3 "Barry" 1831
4 "Jacobs" 18846 4 "Jacobs" 16244 4 "Dupont" 6126 4 "Sow" 1353
5 "Mertens" 17725 5 "Mertens" 14632 5 "Simon" 5988 5 "Nguyen" 965
So far so good, the problem is, when I try sorting by C, the order is ascending, so that I get:
sort by C
Example generated by -dataex-. For more info, type help dataex
input int Belgium str20 B int(C FlemishRegion) str19 E int(F WalloonRegion) str20 H int(I BrusselsCapitalRegion) str14 K int L
9994 "Bulckaen" 181 9918 "Van Steyvoort" 104 . "" . . "" .
9994 "Davidts" 181 9918 "Werniers" 104 . "" . . "" .
9994 "Dehez" 181 . "" . . "" . . "" .
9994 "Van Boxem" 181 . "" . . "" . . "" .
9994 "Scauflaire" 181 . "" . . "" . . "" .
I tried to see whether applying absolute value works, nothing.
Codebook also seems to show range and percentiles as I expect them to be.
codebook C
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
C #
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Type: Numeric (int)
Range: [181,30716] Units: 1
Unique values: 1,768 Missing .: 2/10,048
Mean: 634.564
Std. dev.: 1093.88
Percentiles: 10% 25% 50% 75% 90%
200 234 333 595 1219
This problem seems present in the native excel data too and applies to all the numeric variables, but I would like to find a solution in Stata since I'm not familiar with excel at all.
Thank you in advance
Comment