This is the draft home page for the Tufts UIT Research Computing space. To be reorganized later.
UIT Research computing options:
- Linux research cluster
- Bioinformatic server
- CarmaWeb server
- Visualization Center
- GIS Center
- Cluster attached Database node
What is a Cluster?
Cluster computing is the result of connecting many local computers (nodes) together via a high speed connection to provide a single shared resource. Its distributed processing system allows complex computations to run in parallel as the tasks are shared among the individual processors and memory. Applications that are capable of utilizing cluster systems break down the large computational tasks into smaller components that can run in serial or parallel across the cluster systems, enabling a dramatic improvement in the time required to process large problems and complex tasks.
Tufts Linux Research Cluster
The Tufts Linux Research Cluster is comprised of 40 identical IBM Linux systems (compute nodes) interconnected via an Infiniband network. Each cluster node has eight 2.8Ghz Intel Xeon CPUs and 16 or 32 gigabytes of memory for a total of 320 compute cores. The Linux operating system on each node is RedHat configured identically across every machine. In addition there is a login node and a management node supporting the compute node array. Client/user workstations access the cluster via the Tufts 220.127.116.11 LAN. The user/login node, however, has an additional network interface card that connects to the compute nodes using private non-routable IP addressing . This scheme allows the compute nodes to be a "virtualized" resource managed my LSF, and abstracted away behind the user node. This approach allows the cluster to scale up to any number of nodes and provides the structure for future growth.
Access to Emboss software(http://emboss.sourceforge.net/index.html) is available on server emboss.uit.tufts.edu, which provides both shell and web access. In both cases you will need an account.
For shell access to command line tools:
>ssh -Y emboss.uit.tufts.edu
For access to the web interface, wEmboss, you will need an account as well.
For access to emboss web documentation:
Carmaweb server: (http://carmaweb.uit.tufts.edu)
UIT and the Medical School hosts and supports a web based service known as CarmaWeb. The focus of CarmaWeb is genetic microarray analysis. These tools are built upon bioConductor and R software. One may request an account via the website.
Research database(HPCdb) node
Cluster users may request access to mySQL database for supporting their research computing needs. Requests are treated like software requests. Please reference the Software Request Policy statement in this document.
Installed Cluster Software:
Platform Computing, Inc.'s LSF (Load Sharing Facility) software is a distributed load sharing and batch queuing suite of applications that can dispatch user requests to compute nodes in accordance with a Tufts-defined policy. It manages, monitors, and analyzes resources and load on the cluster. Platform LSF is layered in a way that allows it to sits on top of and extend the operating system services, speaking to the competing needs of resource management on the cluster. LSF commands must be used to submit batch jobs and assign interactive jobs to processors. bsub and lsrun are the usual command tools for this. It's important to note that cluster compute nodes are the only targets under LSF control. Jobs are not submitted to computers outside of the cluster. For more information about LSF command usage and job submission, you can read the man pages (example: type man lsrun at the cluster prompt) or the cluster tipsheet for commands: bsub,bkill, lsrun, and bjobs.
Ansys is a suite of finite element based applications that provide real-world simulations of structural, thermal, electromagnetic and fluid-flow behavior of 3-D product. All Ansys products integrate with CAD environments.
Abaqus is a suite of applications used by many in the engineering community for the analysis of multi-body dynamics problems that aid the medical, automotive, aerospace, defense, and manufacturing community.
Fluent is a Computational Fluid Dynamics (CFD) software package commonly used in engineering education for research in fluid mechanics. The Fluent University Program provides universities with special, low-cost access to many of Fluents full-featured general use products. Each package includes a preprocessor, solver, and postprocessor.
MATLAB is a high-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numerical computation. Using MATLAB, you can solve technical computing problems faster than with traditional programming languages, such as C, C++, and Fortran. The following Matlab toolboxes are licensed:
Control System Toolbox
Distributed Computing Toolbox
Fuzzy Logic Toolbox
Image Processing Toolbox
Neural Network Toolbox
Partial Differential Equation Toolbox
Signal Processing Toolbox
Simulink Control Design
System Identification Toolbox
Virtual Reality Toolbox
Comsol is specifically designed to easily couple transport phenomena, including computational fluid dynamics (CFD) as well as mass and energy transport to chemical-reaction kinetics and process-related modeling. Licensed Modules include:
MultiPhysics, Chemical Engineering, Acoustics, Structural Mechanics, Script
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
Stata is an integrated statistical package for Windows, Macintosh, and Unix platforms. More than just a statistical package, Stata is also a full data-management system with complete statistical and graphical capabilities. It features both X-window and text user interfaces.
DEFORM (Design Environment for FORMing) is an engineering software environment that enables designers to analyze metal forming processes. DEFORM-3D is also a simulation system that is designed to analyze the three-dimensional flow of complex metal forming processes, allowing for a more complex analysis of shapes than 2D models can provide.
R is a widely available object oriented statistical package. The current list of installed packages can be found in directory /usr/lib/R/library/. This represents a base installation suitable for most routine tasks, however not all available packages as found on the web site are installed. If some other R package is needed, please make a software installation request as outlined above. Extensive user documentation and tutorials is also available on the web site.
Materials Studio® is a validated software environment that brings the world's most advanced materials simulation and informatics technology.
It dramatically enhances your ability to mine, analyze, present, and communicate data and information relating to chemicals and materials. Materials Studio's accurate prediction and representation of materials structure, properties, and inter-relationships provides valuable insight. The following Materials Studio products are available: CASTEP, DMol.
(available Fall 2008)
Dacapo is a total energy program based on density functional theory. It uses a plane wave basis for the valence electronic states and describes the core-electron interactions with Vanderbilt ultrasoft pseudo-potentials.
Chimera is a highly extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles. High-quality images and animations can be generated.
Maple is a well known environment for mathematical problem-solving, exploration, data visualization, and technical authoring. In may ways it is similar to Mathematica and Matlab.
Star-P software is a client-server parallel-computing platform that's been designed to work with multiple Very High Level Language (VHLL) client applications such as MATLAB®, Python, or R, and has built-in tools to expand VHLL computing capability through addition of libraries and hardware-based accelerators.
MCCE (Multi-Conformation Continuum Electrostatics) is a biophysics simulation program combining continuum electrostatics and molecular mechanics.
WPP is a parallel computer program for simulating time-dependent elastic and viscoelastic wave propagation, with some provisions for acoustic wave propagation. WPP solves the governing equations in displacement formulation using a node-based finite difference approach on a Cartesian grid. WPP implements substantial capabilities for 3-D seismic modeling,
Mathematica and gridMathematica
Mathematica, advertised as a â€œone-stop shopâ€ for technical work, â€œintegrates a numeric and symbolic computational engine, graphics system, programming language, documentation system, and advanced connectivity to other applicationsâ€. Not only does this application have parallel functionality built into it from the ground up, but the HYPERLINK "http://documents.wolfram.com/applications/parallel/" wolfram.com web site has extensive documentation, including numerous detailed tutorials.
ImageMagick® is a software suite to create, edit, and compose bitmap images. It can read, convert and write images in a variety of formats (over 100) including DPX, EXR, GIF, JPEG, JPEG-2000, PDF, PhotoCD, PNG, Postscript, SVG, and TIFF. Use ImageMagick to translate, flip, mirror, rotate, scale, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.
Installed python modules: matplotlib, numpy, Networkx, Biopython,
Portland Compilers http://www.pgroup.com/
Portland Group compilers are available for use on the cluster. They are not part of the default environment on the head node, but they can be accessed by use of the module command. Fortran, C and C++ compilers and development tools enable use of networked compute nodes of Intel x64 processor-based workstations and servers to tackle serious scientific computing applications. PGI compilers offer world-class performance and features including auto-parallelization for multi-core, OpenMP directive-based parallelization, and support for the PGI Unified Binary™ technology.
GCC(C, C++, Fortran) compilers
The cluster 64-bit login node requires Gnu GCC 64-bit compiler and as a result becomes the default native compiler. No Module setup is required.
Documentation is available at GCC online documentation or from the following man pages:
> man gcc
> man g77
Tufts licenses the Intel compilers for use on the cluster. Access is via the following two commands:
ifort - Intel fortran compiler
icc - Intel C compiler
Local documentation in HTHL format can be found at:
or via manpages depending on what Module is loaded:
> man icc
> man ifc
Fortran quick reference is available by typing
> man ifort Text Editing tools:
emacs, vi, vim, nano, nedit,