Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wrong values when generate new variable stata

    Hello there,

    I am generating new variables based on existing variables. It gives me different values whereas it should display the same values as in the existing variable.

    I am wondering whether the problem does not come from the variable's type. The newly generated variable is "float" whereas the existing variable is "long".

    I had to change the format of existing variables as I was not able to generate new variables. So I destring them by using the following command:

    Code:
    *destring the variables needed to use the generate command as it works only with numeric values, drop and rename them automatically
    
    foreach var of varlist TotalAssets Totaldebt Cash  BetaWS NetIncome EBITDA Governancepillarscore {
        encode `var', generate(`var'_encoded)
        rename `var' `var'_old
        rename `var'_encoded `var'
        drop `var'_old
    }
    How I generated my variables:

    Code:
    gen leverage = Totaldebt/TotalAssets
    gen CashSponsors = Cash/TotalAssets
    gen Beta = HistoricBeta
    gen Beta2 = BetaWS
    gen Profitability = NetIncome/TotalAssets
    gen Profitability2 = EBITDA/TotalAssets
    gen Size = ln(TotalAssets)
    gen Governance = Governancepillarscore
    I think the problem comes from the fact that the existing values are blue values, so is it maybe showing labels?

    Take a look at "Governance" variable, the values do not match with "Governancepillarscore"
    Click image for larger version

Name:	Capture d’écran 2023-05-31 154607.png
Views:	1
Size:	262.2 KB
ID:	1715638


    Thanks a lot for your help!

    Omar
    Last edited by Omar Bader; 31 May 2023, 07:49.

  • #2
    No data example here but I can guess at the problem: encode generates garbage from arbitrary string variables even if they include numeric characters only.

    Here is a demonstration:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str8 badvar
    "1"       
    "10"      
    "2"       
    "29"      
    "100"     
    "200"     
    "10000000"
    end
    
    encode badvar, gen(stillbad)
    
    l, nola
    
         +---------------------+
         |   badvar   stillbad |
         |---------------------|
      1. |        1          1 |
      2. |       10          2 |
      3. |        2          5 |
      4. |       29          7 |
      5. |      100          3 |
         |---------------------|
      6. |      200          6 |
      7. | 10000000          4 |
         +---------------------+
    The new values are not even in numeric order because by default encode sorts in dictionary order first, so strings 1 10 100 10000000 2 200 29 are mapped to 1 2 3 4 5 6 7, not what you want.

    At a guess, you needed to apply destring, not encode.

    This is explained: This is from the help for encode.


    Do not use encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see real() or [D] destring.
    As above, destring and encode are different commands for different purposes. You cannot "destring" by using encode!

    Comment

    Working...
    X