HPSC -- High Performance Statistical Computing for Data Intensive Research

All Websites: HPSC | Phyloclustering | R_note | About me |
About HPSC
Reference

Overview
Rscript
Rmpi
SPMD
Master/Workers

Cookbook
NGSA



Section: About HPSC

What Is This?

This web page introduces a simple computing framework for "Big Data" called single program multiple data (SPMD), and many statistical methodology can be fairly easily redesigned in the same way. We aim to introduce ideas in the sense of STATISTICS, and provide Cookbook to illustrate the framework covering from fundamental statistics to advance methodology. Tentatively, the pages will cover basic ideas of parallel computing, statistical computing, and R programming, and they will be illustrated in a simple manner. "Have a Big dream of Bigger than Big."

About Computing Environment

By default, all examples of this website are illustrated in the Unix/Linux system with Rmpi for R. Rmpi original developed on LAM/MPI system, but works for most of MPI systems now. OpenMPI can work pretty well with a few minor changes for running SPMD code in an interactive mode of R. Also, all examples are assumed running under the single program multiple data (SPMD) framework. All examples are tested in both LAM/MPI and OpenMPI environment.

Currently, LAM/MPI is not for MS Windows system. For MS Windows users, MPICH2 is suggested with a little bit more configurations, and has some success reports for Rmpi. It also has some success reports for SPMD code with Rmpi.

If you don't have many machines/processors, the easier way you can test and learn is to install VirtualBox with Unix/Linux system. The VirtualBox allows to generate simultaneously multiple virtual computers in most common systems. You can duplicate the virtual machines/processors inside VirtualBox as many as you want. Therefore, a parallel computing environment can be done in a single machine. Regardless of computing performance, it is helpful for testing programs and for building projects in a consistent environment.

Need Help

Authors

Wei-Chen Chen and George Ostrouchov.

Acknowledgment

Wei-Chen thanks Dr. George Ostrouchov of Oak Ridge National Laboratory for helpful discussion, and provide insightful suggestions and materials about general parallel computing. The contents are outcomes part of the project "Visual Data Exploration and Analysis of Ultra-large Climate Data" supported by U.S. DOE Office of Sience.

Wei-Chen also thanks Dr. Hao Yu, the author of Rmpi, for great discussion about Rmpi design and parallel programming in Rmpi. Also, Wei-Chen thanks Stephen Weston, one author of Parallel R Data Analysis in the Distributed World, for sharing MPI and snow information in R.

This website is built on a machine located in the Department of Statistics at Iowa State University in Ames Iowa, USA.

[ Go to top ]

Created: Oct 19 2011
Last Revised: Feb 13 2012, 13:40 (EST Oak Ridge, TN, USA)
Maintained: Wei-Chen Chen
E-Mail: wccsnow @ gmail.com
free counters Best Resolution
Firefox 3.5
1024x768
small font