Solving for the stationary distribution of an irreducible Markov chain amounts to computing a positive solution vector to a homogeneous system of linear equations with a singular coefficient matrix, A, subject to a normalization constraint. Despite recent advances, practicing performance analysts generally prefer iterative methods based on splittings when they want to compare the performance of newly devised algorithms against existing ones, or when they need candidate solvers to evaluate the performance of a systems model at hand. Experimental results for large, sparse Markov chains, especially the ill-conditioned nearly completely decomposable (NCD) ones, are few. We believe there is need for further research in this area, specifically to help in understanding the effects of the degree of coupling of NCD Markov chains and their nonzero structure on the convergence characteristics and space requirements of iterative solvers. The work of several researchers ,, ,,,, has raised important and interesting questions that led to research in a related direction. These questions are the following: "How must one go about partitioning the global coefficient matrix A into blocks when (the system is NCD and) a two-level iterative solver (such as block SOR) is to be employed? Are block partitionings dictated by the NCD normal form of P, the stochastic one-step transition probability matrix, necessarily superior to others? Is it worth investing alternative partitionings? Better yet, for a fixed labeling and partitioning of the states, how does the performance of block SOR (or even that of point SOR) compare to the performance of the iterative aggregation-disaggregation (IAD) algorithm ? Finally, is there any merit in using two-level iterative solvers when preconditioned Krylov subspace methods , , , ,  are available?"
When seeking answers to these questions, we have not considered two-level solvers of the inner-outer iteration type , but have attempted at solving diagonal blocks (and the coupling matrix  in IAD) directly by Gaussian elimination. The memory needed to solve the coupling matrix is set aside at the beginning and what is left is used for diagonal blocks. Blocks of order 1 and 2 are treated separately. We obtain the LU factorizations of as many diagonal blocks as possible given available memory and do this in such a way that smaller blocks are treated first, leaving the big blocks to be solved using point SOR when there is insufficient memory. Currently, we use a considerably larger tolerance (i.e., 0.001), a relaxation parameter of 1.0 (hence, Gauss-Seidel), and a maximum number of iterations of 100 with the point SOR algorithm when solving diagonal blocks. Furthermore, the block Gauss-Seidel correction  in the disaggregation step of IAD is replaced by block SOR.
Four block partitioning techniques are considered. The first one results from the near-complete decomposability test (ncdtest) of the MARkov Chain Analyzer (MARCA) . It determines the strongly connected components of the transition probability matrix by ignoring the nonzeros less than a prespecified decomposability parameter. Then symmetric permutations are performed to put the matrix into the form in which the diagonal blocks form the strongly connected components. In a recent paper , it is shown that the ncdtest algorithm may fail to produce a correct NCD partitioning of the state space. The same paper gives an improved NCD partitioning algorithm, which has the same run-time complexity as that of ncdtest. We name this new NCD partitioning algorithm newncd and experiment with it. Also two straightforward partitionings are investigated. The equal partitioning forms (approximately) equal order blocks. The second straightforward partitioning, other, uses blocks of order respectively 1,2,3,... Finally, the Threshold PABLO (TPABLO) partitioning algorithm  is considered on some of the test problems.
Results of experiments on a large suite of Markov chains  show that two-level iterative solvers are in most cases superior to incomplete LU (ILU) preconditioned Krylov subspace solvers. For two-level iterative solvers, there are cases in which a straightforward partitioning of the coefficient matrix gives a faster solution than can be obtained using the ncdtest or newncd partitioning algorithms. However, in between newncd and ncdtest, the former gives faster converging iterations than the latter in a larger number of the test cases. In general, it is possible to solve each of the problems (except one which takes a minimum of about 82 seconds to solve) in less than 1 minute. This includes time spent for partitioning or preconditioning.