How to generate an empty matrix and fill it row by row in a loop?

Liz Broom

Join Date: May 2023

Posts: 32
#1

How to generate an empty matrix and fill it row by row in a loop?

26 May 2023, 03:23

Hi guys, I'm very new to Stata and very stuck! I am trying to generate an empty matrix (which I have filled with '.' missing values) and then fill it row by row vertically with 18 values generated for 5 categories of a variable in a loop. I have done this by first creating the empty J(5, 18, .) matrix, then running ladder command in a loop to generate the data which is to be stored in the matrix, then generating a matrix from this data (I also wonder if it is better at this stage to make a vector instead of a matrix?), then merging the 2 matrices. However, the new values do not replace the '.' missing values of the empty matrix. My code is as follows:

/////////////////////

matrix varsub_trans_matrix = J(5, 18, .)

foreach var_sub in `var_subs' {
ladder `var_sub'
matrix `var_sub'_matrix = (r(ident), r(P_ident), r(square), r(P_square), r(cube), r(P_cube), r(sqrt), r(P_sqrt), r(inv), r(P_inv), r(invsq), r(P_invsq), r(invcube), r(P_invcube), r(invsqrt), r(P_invsqrt), r(log), r(P_log))
matrix `var'_matrix = varsub_trans_matrix \ `var_sub'_matrix
}

/////////////////////

Is there a different way of merging the matrices to input my newly generated data and replace the missing values, so that the new data is inputted in iterative rows?

In the following step, I plan to then combine the 5 x 18 matrices of 9 different variables in a larger J(45, 18, .) matrix, which I also generated with missing values.

Many thanks for your help!!

Last edited by Liz Broom; 26 May 2023, 03:40.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35676
#2

26 May 2023, 03:44

I can't easily follow what you are trying without a data example

The natural order of transformations is that used by ladder --from cube downwards.

For an alternative to ladder see transplot from SSC. https://www.statalist.org/forums/for...dable-from-ssc

There was a detailed critique of ladder and its siblings in the presentation cited in that thread. Here is the reference again:

The slides are accessible at https://www.stata.com/meeting/uk19/slides/uk19_cox.pptx

Last edited by Nick Cox; 26 May 2023, 04:22.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35676

26 May 2023, 07:07

In essence, your code adds extra elements to the matrix; it doesn't replace anything.

This code works in the sense that it gets a result.

Code:

sysuse auto, clear 

local vars mpg weight price turn trunk 

capture matrix drop A 

foreach v of local vars { 
    ladder `v'
    matrix A = nullmat(A) , (r(invcube), r(P_invcube), r(invsq), r(P_invsq), r(inv), r(P_inv), r(invsqrt), ///
                r(P_invsqrt), r(log), r(P_log), r(sqrt), r(P_sqrt), r(ident), r(P_ident), r(square),  /// 
                r(P_square), r(cube), r(P_cube))' 
} 

local rownames r(invcube), r(P_invcube), r(invsq), r(P_invsq), r(inv), r(P_inv), r(invsqrt), r(P_invsqrt), r(log), r(P_log), r(sqrt), r(P_sqrt), r(ident), r(P_ident), r(square), r(P_square), r(cube), r(P_cube 
local rownames : subinstr local rownames "r(" "", all 
local rownames : subinstr local rownames ")," "", all 

mat colnames A = `vars'
mat rownames A = `rownames'

mat li A 

A[18,5]
                 mpg     weight      price       turn      trunk
  invcube  24.296495  12.366109  6.7738147  5.1172701  62.503264
P_invcube  5.298e-06  .00206411  .03381309  .07741033  2.677e-14
    invsq  11.987886  8.0306501  1.7523758   5.442786  41.465625
  P_invsq  .00249381  .01803709  .41636712  .06578306  9.905e-10
      inv  2.3593071  8.0371655  4.7115097  6.3145239  18.993858
    P_inv  .30738522  .01797843   .0948219  .04254206  .00007508
  invsqrt  .19910439  9.5796074  6.6180213  6.3303476  9.9333864
P_invsqrt   .9052427  .00831409  .03655232  .04220681  .00696615
      log  .86951019  10.369083  10.487306  5.8491466  4.6114645
    P_log   .6474232   .0056025  .00528093   .0536876  .09968578
     sqrt  4.9426961  8.8087444  15.815618  4.9348465   3.749922
   P_sqrt  .08447091  .01222378  .00036786  .08480309  .15336095
    ident  10.949392  5.6597478  21.767246  3.7600703  4.1940211
  P_ident   .0041915   .0590203  .00001876  .15258474  .12282305
   square  27.027402  4.4900511  33.773732  2.2164426  4.5459277
 P_square  1.352e-06  .10592483  4.636e-08  .33014567  .10300643
     cube  43.592814  12.937007  44.972316  4.3309401  12.174223
   P_cube  3.419e-10  .00155155  1.715e-10  .11469601  .00227196

Minimally, 18 rows and 5 columns in my view work better than the transpose.

I wouldn't want to have to read this. If you remain wedded to this approach -- manifestly from #2 I am not -- I would suggest splitting the P-values from the rest.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35676

26 May 2023, 08:00

Here is the last suggestion taken forward:

Code:

sysuse auto, clear 

local vars mpg weight price turn trunk 

capture matrix drop A 
capture matrix drop A1
capture matrix drop A2


foreach v of local vars { 
    ladder `v'
    matrix A = nullmat(A) , (r(invcube), r(P_invcube), r(invsq), r(P_invsq), r(inv), r(P_inv), r(invsqrt), ///
                r(P_invsqrt), r(log), r(P_log), r(sqrt), r(P_sqrt), r(ident), r(P_ident), r(square),  /// 
                r(P_square), r(cube), r(P_cube))' 
    matrix A1 = nullmat(A1) , (r(invcube), r(invsq), r(inv), r(invsqrt), r(log), r(sqrt), r(ident), r(square), r(cube))' 
    matrix A2 = nullmat(A2) , (r(P_invcube), r(P_invsq), r(P_inv), ///
                r(P_invsqrt), r(P_log), r(P_sqrt), r(P_ident), r(P_square), r(P_cube))' 
} 

local rownames r(invcube), r(P_invcube), r(invsq), r(P_invsq), r(inv), r(P_inv), r(invsqrt), r(P_invsqrt), r(log), r(P_log), r(sqrt), r(P_sqrt), r(ident), r(P_ident), r(square), r(P_square), r(cube), r(P_cube 
local rownames : subinstr local rownames "r(" "", all 
local rownames : subinstr local rownames ")," "", all 

mat colnames A = `vars'
mat rownames A = `rownames'

mat li A 

local rownames r(invcube), r(invsq), r(inv), r(invsqrt), r(log), r(sqrt), r(ident), r(square), r(cube
local rownames : subinstr local rownames "r(" "", all 
local rownames : subinstr local rownames ")," "", all 
              
matrix rownames A1 = `rownames'
matrix colnames A1 = `vars' 
matrix rownames A2 = `rownames'
matrix colnames A2 = `vars'

mat li A1, format(%4.1f)

mat li A2, format(%4.3f)

Code:

 
. mat li A1, format(%4.1f)

A1[9,5]
            mpg  weight   price    turn   trunk
invcube    24.3    12.4     6.8     5.1    62.5
  invsq    12.0     8.0     1.8     5.4    41.5
    inv     2.4     8.0     4.7     6.3    19.0
invsqrt     0.2     9.6     6.6     6.3     9.9
    log     0.9    10.4    10.5     5.8     4.6
   sqrt     4.9     8.8    15.8     4.9     3.7
  ident    10.9     5.7    21.8     3.8     4.2
 square    27.0     4.5    33.8     2.2     4.5
   cube    43.6    12.9    45.0     4.3    12.2

. 
. mat li A2, format(%4.3f)

A2[9,5]
            mpg  weight   price    turn   trunk
invcube   0.000   0.002   0.034   0.077   0.000
  invsq   0.002   0.018   0.416   0.066   0.000
    inv   0.307   0.018   0.095   0.043   0.000
invsqrt   0.905   0.008   0.037   0.042   0.007
    log   0.647   0.006   0.005   0.054   0.100
   sqrt   0.084   0.012   0.000   0.085   0.153
  ident   0.004   0.059   0.000   0.153   0.123
 square   0.000   0.106   0.000   0.330   0.103
   cube   0.000   0.002   0.000   0.115   0.002

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35676
#5

29 May 2023, 01:34

The example is arbitrary -- just the good old auto data -- but it underlines the perils and pitfalls of letting any command automate transformation choices for you (and, despite its wonderful name, I am not much keener on Box-Cox).

In particular ladder suggests or at least implies inverse square root for mpg as getting you closer to a normal distribution. I don't recollect anyone showing enthusiasm for inverse square roots. It's true that mpg is right-skewed and has higher kurtosis than a normal, but the marginal distribution is not problematic for most analyses.

What can be an issue is when you look at relationships, say trying to predict mpg from weight where moderate but definite curvature limits the utility or success of a linear fit. But that issue is solved by using the reciprocal of mpg -- labelled inv in ladder output -- which has a much simpler rationale in terms of units and dimensions than inverse square root, as gallons per mile is an easy reworking of miles per gallon. Metric equivalents of miles per gallon are sensibly used in many countries. What's more, the physical or engineering interpretation of the relationship is now easier to approach.
Comment

Announcement

How to generate an empty matrix and fill it row by row in a loop?

Comment

Comment

Comment

Comment