Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to do the calculation of matrix mutiplying database

    Hi Stata users,

    I have some problems with the calculation of matrix with database. For example, I have a database with 100 variables (x1-x100) and 200 observations. And I have also a matrix with values beta1-beta100, I need to do this calculation:

    for each observation, I need to generate a new variable which is: x1*beta1+x2*beta2+x3*beta3+...+x100*beta100. This calculation will be repeated for the 200 observations.

    So my question is (1) how to write the code to calculate the above formula, and (2) in a general way, if the number of my matrix values is larger than the number of variables in the database, for example, I have x1-x100 variables, but I have the matrix with values beta1-beta105, x variables could be partly matched to matrix according to their names, how shall I write the code?

    Thank you very much.

  • #2
    Here's a way to do what I think you want. I'm assuming your Beta matrix has only one column.

    Beyond that, I would also say that if my approach is too slow, there likely are more purely matrix solutions, possibly using Stata's Mata language, that might do what you want faster. Also, what you describe sounds you might be generating predicted values after an estimation command. If that is true, there are easier and faster ways to do that.
    Finally, note that Stata commands applied to -generate- variables always apply to *all* observations, as is true in most data analysis languages, so there is no need to repeat the command. This is a fundamental feature of Stata.

    Code:
    // Create example data like what is described, just for demonstration purposes.
    set seed 475
    clear
    local N = 200
    local nvar = 100
    local nrow = 105 // more matrix rows than variables
    set obs `N'
    forval i =1/`nvar' {
       gen x`i' = ceil(runiform() * 3)
    }
    mat Beta = J(`nrow', 1, .)
    forval i = 1/`nrow' {
       mat Beta[`i', 1] = ceil(runiform() * 6)
    }
    // end creating example data
    //
    // Real work starts here.
    gen double newvar = 0
    forval i = 1/`nrow' {
      // if variable corresponding to this matrix row exists, accumulate the product
       capture confirm variable x`i'
       if (_rc == 0) {
           replace newvar = newvar + x`i' * Beta[`i', 1]
       }    
    }

    Comment


    • #3
      Thank you so much Mike. It solves my problem. Yes, my matrix has only one colume, but it has rownames, so for each row, it has one rowname which is matched to the x variable names.

      Could I ask what if i represents some characters in the variables' name, not 1/`nrow', in your syntax? For example, my variables are xaa, xab, xac, xxe, xxf..., and the matrix has the rownames matched to the variables' name, aa, ab, ac, xe, xf...

      Sorry for my poor coding skills.

      Xueying

      Comment


      • #4
        See -help extended_fcn- and -help foreach-.

        You could do something like this, which I have not tested:

        Code:
        gen double newvar = 0
        local rnames: rownames Beta
        foreach row of local rnames {
           capture confirm variable x`row'  
           if (_rc == 0) {
              local i = rownumb(Beta,"`row'")
              replace newvar = newvar + x`row' * Beta[`i', 1]
           }    
        }

        Comment


        • #5
          Thank you so much, Mike.

          Comment


          • #6
            Hi Mike, sorry to interupt you again. Since the rownames of Beta matrix are exact the same as the variable name, I wrote the code as follows (so x-prefix was dropped, compared to your code), but obtained the error message "type mismatch".

            gen newvar=0
            local rnames: rownames Beta
            foreach row of local rnames {
            capture confirm variable `row'
            if (_rc == 0) {
            local i = rownumb(Beta, "`row'")
            replace alcohol_score = alcohol_score + `row' * Beta[`i', 1]
            }
            }

            I have confirmed with the database that Beta matrix and variables are all double formats, so I don't know what's wrong with the code. I have also tried to revised the `row' as "`row'", there is no error message, but the values of newvar are 0 for all observations, which is impossble.

            I appreciate if you could help me check with the code. I am very confused with these symbols.

            Thank you very much again.

            Xueying

            Comment


            • #7

              The code you show would cause other problems than what you report. You have created "newvar" as I did before, but are trying to run your code with "alcohol_score." In the code you show, this would have caused the -replace- command to fail, as alcohol_score would not exist. Because of that, you must have omitted relevant code and information.

              I will proceed with helping you on the condition that you re-read the FAQ for StataList and follow the suggestions and norms there, namely that:

              1) You show an example of your data using the -dataex- command.
              2) You cut and paste your code and Stata's error messages directly from the results window, and paste it as text here. Use the "#" code marker on StataList to make it readable. In general, diagnosing a problem is impossible without such material.
              3) In your case, you need your Beta matrix, and paste a sample of it here as text.

              "Type mismatch" often means that you are trying to do numerical calculations with a string, but it might also mean some other things, and these are not determinable from what you have shown.

              Comment


              • #8
                Sorry Mike, I have made some misunderstanding to you. I solved the problem yesterday with other methods, based on the code which you suggested before and the current code is showed below:

                gen double alcohol_score=0
                foreach var of varlist cg* {
                replace alcohol_score=alcohol_score + `var'*X["`var'", 1]
                }

                In the code, alcohol_score is what I needed to calculate, cg* are the valist in my database, and X is the matrix which has some rownames matched to the names of the varlist in my database.

                Sorry that I mixed my variables and your code together, but did not check the inconsistency. I am really very sorry.

                Thank you very much for your help.

                Xueying

                Comment

                Working...
                X