Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • counting observations per string variable, generating mean per string variable

    Hi all!

    I am currently writing my master's thesis using stata for the empirical part and I am stuck when it comes to the following "exercise". I was wondering if maybe someone has a solution to my problem?
    I have two variables: "company" which are tickers of various stocks, and "css" which is a number ranging from 0 to 100.

    I would now like to count for each company how often css is greater than 50, less than 50 or equal to 50. Additionally, I would like to have the mean of css per company. The data looks as follows:

    Company CSS
    AAPL 30
    AMZN 46
    MSFT 66
    ROG 50
    AAPL 60
    ROG 21
    ......

    I tried to generate a new variable containing the mean using:

    bysort company: gen a = mean(css)

    but after applying this code the column company was suddenly empty. Regarding the counting part (css > 50, <50, =50) I am completely lost.

    I would really appreciate any kind of help

    Thanks in advance and all the best,
    Nicole

  • #2
    Here's one approach.

    .ÿ
    .ÿversionÿ17.0

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿ//ÿseedem
    .ÿsetÿseedÿ1658699370

    .ÿ
    .ÿinputÿstr4ÿcompanyÿintÿcss

    ÿÿÿÿÿÿÿcompanyÿÿÿÿÿÿÿcss
    ÿÿ1.ÿAAPLÿ30
    ÿÿ2.ÿAMZNÿ46
    ÿÿ3.ÿMSFTÿ66
    ÿÿ4.ÿROGÿ50
    ÿÿ5.ÿAAPLÿ60
    ÿÿ6.ÿROGÿ21
    ÿÿ7.ÿend

    .ÿ
    .ÿquietlyÿexpandÿ100

    .ÿquietlyÿreplaceÿcssÿ=ÿruniformint(0,ÿ100)

    .ÿ
    .ÿ*
    .ÿ*ÿBeginÿhere
    .ÿ*
    .ÿbysortÿcompany:ÿegenÿdoubleÿmean_cssÿ=ÿmean(css)

    .ÿgenerateÿbyteÿlevel_cssÿ=ÿcond(cssÿ<ÿ50,ÿ1,ÿcond(cssÿ==ÿ50,ÿ2,ÿ3))ÿifÿ!mi(css)

    .ÿ
    .ÿ//ÿMeans
    .ÿpreserve

    .ÿbyÿcompany:ÿkeepÿifÿ_nÿ==ÿ1
    (596ÿobservationsÿdeleted)

    .ÿlistÿcompanyÿmean_cssÿ,ÿnoobsÿseparator(0)

    ÿÿ+--------------------+
    ÿÿ|ÿcompanyÿÿÿmean_cssÿ|
    ÿÿ|--------------------|
    ÿÿ|ÿÿÿÿAAPLÿÿÿÿÿ48.955ÿ|
    ÿÿ|ÿÿÿÿAMZNÿÿÿÿÿÿ49.44ÿ|
    ÿÿ|ÿÿÿÿMSFTÿÿÿÿÿÿÿ48.1ÿ|
    ÿÿ|ÿÿÿÿÿROGÿÿÿÿÿ45.695ÿ|
    ÿÿ+--------------------+

    .ÿ
    .ÿ//ÿBreakdown
    .ÿrestore

    .ÿtabulateÿcompanyÿlevel_css,ÿrowÿnokey

    ÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿlevel_css
    ÿÿÿcompanyÿ|ÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿÿÿ2ÿÿÿÿÿÿÿÿÿÿ3ÿ|ÿÿÿÿÿTotal
    -----------+---------------------------------+----------
    ÿÿÿÿÿÿAAPLÿ|ÿÿÿÿÿÿÿ106ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿÿ93ÿ|ÿÿÿÿÿÿÿ200ÿ
    ÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿ53.00ÿÿÿÿÿÿÿ0.50ÿÿÿÿÿÿ46.50ÿ|ÿÿÿÿ100.00ÿ
    -----------+---------------------------------+----------
    ÿÿÿÿÿÿAMZNÿ|ÿÿÿÿÿÿÿÿ48ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿ52ÿ|ÿÿÿÿÿÿÿ100ÿ
    ÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿ48.00ÿÿÿÿÿÿÿ0.00ÿÿÿÿÿÿ52.00ÿ|ÿÿÿÿ100.00ÿ
    -----------+---------------------------------+----------
    ÿÿÿÿÿÿMSFTÿ|ÿÿÿÿÿÿÿÿ54ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿ46ÿ|ÿÿÿÿÿÿÿ100ÿ
    ÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿ54.00ÿÿÿÿÿÿÿ0.00ÿÿÿÿÿÿ46.00ÿ|ÿÿÿÿ100.00ÿ
    -----------+---------------------------------+----------
    ÿÿÿÿÿÿÿROGÿ|ÿÿÿÿÿÿÿ112ÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿÿ85ÿ|ÿÿÿÿÿÿÿ200ÿ
    ÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿ56.00ÿÿÿÿÿÿÿ1.50ÿÿÿÿÿÿ42.50ÿ|ÿÿÿÿ100.00ÿ
    -----------+---------------------------------+----------
    ÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿ320ÿÿÿÿÿÿÿÿÿÿ4ÿÿÿÿÿÿÿÿ276ÿ|ÿÿÿÿÿÿÿ600ÿ
    ÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿ53.33ÿÿÿÿÿÿÿ0.67ÿÿÿÿÿÿ46.00ÿ|ÿÿÿÿ100.00ÿ

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .

    Comment


    • #3
      I'm not sure what you mean by "column company was completely empty," but the problem with your code is that mean() is an egen function:

      Code:
      bysort company: egen mean = mean(css)
      As for counting:

      Code:
      local conditions >50 <50 ==50
      local names greaterthan lowerthan equalto
      forv x = 1/3{
      bysort company: egen `:word `x' of `names''_50 = total(css`:word `x' of `conditions'')
      }

      Comment


      • #4
        That table would be clearer if you label the values. So, instead try something like this.
        Code:
        *
        * Begin here
        *
        // Means
        bysort company: egen double mean_css = mean(css)
        by company: generate byte first = _n == 1
        list company mean_css if first, noobs separator(0)
        
        // Breakdown
        generate byte level_css = cond(css < 50, 1, cond(css == 50, 2, 3)) if !mi(css)
        label define Levels 1 "<50" 2 "=50" 3 ">50"
        label values level_css Levels
        
        tabulate company level_css, row nokey
        
        exit

        Comment


        • #5
          thank you Joseph and Ali! I ended up using Alis code and it worked perfectly, @Jospeh I tried your code as well and got the feedback that there were too many values in the dataset for the last step (tabulate company) otherwise it worked as well.
          thanks again I really appreciate it!

          Comment

          Working...
          X