Dear all,
I have a dataset of two string variables: Input_address and Output_address. I need to convert them from string to int to save memory (about 100million rows).
The point is that some Input_address and output_address are the same, and I want them to remain the same also after encode the variables.
I mean, if Input_address (first column) is "hanb23bd4.."*, and in Output_address I have the same "hanb23bd4.."*, I want them to be converted into the same number once run encode.
I don't know how to do that.
*fictitious example
This was my previous post, and I received a good suggestion of using "multencode".
The problem is that I quickly reach the limit of unique values of encode (65 thousands more ore less, and I need approximately 18million of unique values).
How can solve it? I need something that works exactly in the same way of multencode, but with much more unique values.
I also have the need to export the resulting dataset in a .csv file, but doing multencode and exporting as a .csv I still have a file with the string instead of the integer.
Thank you in advance,
Marco
I have a dataset of two string variables: Input_address and Output_address. I need to convert them from string to int to save memory (about 100million rows).
The point is that some Input_address and output_address are the same, and I want them to remain the same also after encode the variables.
I mean, if Input_address (first column) is "hanb23bd4.."*, and in Output_address I have the same "hanb23bd4.."*, I want them to be converted into the same number once run encode.
I don't know how to do that.
*fictitious example
This was my previous post, and I received a good suggestion of using "multencode".
The problem is that I quickly reach the limit of unique values of encode (65 thousands more ore less, and I need approximately 18million of unique values).
How can solve it? I need something that works exactly in the same way of multencode, but with much more unique values.
I also have the need to export the resulting dataset in a .csv file, but doing multencode and exporting as a .csv I still have a file with the string instead of the integer.
Thank you in advance,
Marco

Comment