The Tufts High Performance Compute (HPC) cluster delivers 35,845,920 cpu hours and 59,427,840 gpu hours of free compute time per year to the user community.

Teraflops: 60+ (60+ trillion floating point operations per second) cpu: 4000 cores gpu: 6784 cores Interconnect: 40GB low latency ethernet

For additional information, please contact Research Technology Services at tts-research@tufts.edu


Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 55 Next »

This is the draft home page for the Tufts UIT Research Computing space.  To be reorganized later.

UIT Research computing options:

  • Linux research cluster
  • Bioinformatic server
  • CarmaWeb server
  • Visualization Center
  • GIS Center
  • Cluster attached Database node
     

What is a Cluster?
Cluster computing is the result of connecting many local computers (nodes) together via a high speed connection to provide a single shared resource. Its distributed processing system allows complex computations to run in parallel as the tasks are shared among the individual processors and memory. Applications that are capable of utilizing cluster systems break down the large computational tasks into smaller components that can run in serial or parallel across the cluster systems, enabling a dramatic improvement in the time required to process large problems and complex tasks.


 

Tufts Linux Research Cluster
The Tufts Linux Research Cluster is comprised of 40 identical IBM Linux systems (compute nodes) interconnected via an Infiniband network. Each cluster node has eight  2.8Ghz Intel Xeon CPUs and 16 or 32 gigabytes of memory for a total of 320 compute cores. The Linux operating system on each node is RedHat 5 configured identically across every machine.  In addition there is a login node and a management node supporting the compute node array.  Client/user workstations access the cluster via the Tufts 130.64.0.0 LAN or remotely with ssh.  The user/login node has an additional network interface  that connects to the compute nodes using private non-routable IP addressing via the Infiniband hardware . This scheme allows the compute nodes to be a "virtualized" resource managed by LSF, and abstracted away behind the user node.  This approach also allows the cluster to scale  to a large number of nodes and provides the structure for future growth.
 

Bioinformatic services:

Access to Emboss software is available on server  emboss.uit.tufts.edu, which provides both shell and web access.  In both cases you will need an account.

For shell access to command line tools:
> ssh -Y  emboss.uit.tufts.edu

For access to the web interface wEmboss.

For access to emboss web documentation.

Carmaweb server:
UIT and the Medical School hosts and supports a web based service known as CarmaWeb.  The focus of CarmaWeb is genetic microarray analysis.   These tools are built upon bioConductor and R software.   One may request an account via the website.  Additional information here.
 

Research database(HPCdb) node

Cluster users may request access to mySQL database for supporting their research computing needs.   Requests are treated like software requests.  Please reference the Software Request Policy statement in this document.   

Tufts Visualization Center

A description may be found here.  The user guide is available here

The research cluster is available to VisWall users for additional computational resources. Current connectivity follows standard practices using ssh and x11 forwarding. Viswall users with a cluster account my forward cluster based application graphic output for display on the VisWall.  Future plans to integrate high speed network connectivity between the VisWall and research cluster is in developement. 

GIS Center
Several GIS links can be found here

Tufts Research Cluster supports indirectly, GIS spatial statistical computation with the availability of modern spatial statistics  programs as found in R. This is a useful resource when faced with either complex estimation tasks,  long runtimes or access to more memory than is often available on desktop workstations.  R programs such as the following are available:

fields, ramps, spatial, geoR, geoRglm, RandomFields, sp, spatialCovariance, spatialkernel, spatstat, spBayes, splancs,

For additional information contact via email cluster01-support@tufts.edu.


Policy oriented topics: 

Account Information

UIT support center provides account management assistance.  Access to the cluster requires a Research Cluster account. Faculty, other staff, and students are required to fill out a UIT Research Computer Account Application form, available at the following locations:

  • Medford campus:
    Contact the University IT Support Center at x7-3376 or by email at uitsc@tufts.edu for further assistance.
  • Boston campus:
    Present your Tufts ID during normal business hours at the Health Sciences Library 5th floor Learning Resources Center Helpdesk.
  • Grafton campus:
    Present your Tufts ID during normal business hours at the Veterinary Library during normal business hours.

Note: Students must have their application form signed by a faculty member or advisor.

 You can also download a copy of the UIT Research Computer Account Application:

Click here  to download a copy in Microsoft Word fomat

Click here  to download a copy in Adobe PDF fomat


Grant Funding related topics: 

Researchers that require deadicted access to compute node resources and are seeking funding for those resouces may wish to consider contributing additional nodes to the cluster.  Tufts' research cluster has been designed to allow for compute node expansion. The obvious advantage to a researcher is that one does not have to support  a separate computing resource, obtain additional licensing, etc. In order to participate, additional nodes need to be of a certain kind, consistent with the current cluster design.  In addition, a special LSF queue will be structured to allow one or more designated researchers priority access to the contributed nodes.  In return, when those nodes are unused, they will become part of the larger pool of LSF managed compute node resources available to the Tufts research community.  For additional information contact Lionel Zupan, Associate Dir. for Reserach Computing at x74933 or via email: Lionel.Zupan@Tufts.edu.

Software request policy:
Please send your request via email to  cluster01-support@tufts.edu  and address the following questions:

  • What is the the name of the software?
  • Where can additional information about the software be found?
  • Who are the intended users of the software?
  • When is it needed by?
  • Will it be used in support of a grant and if so what grant?
  • What if any special requirements are needed? 

Cluster Storage Options: 

 All accounts are created with fixed 200 megabyte home directory diskquota.  In addition, a directory is automatically created in filesystem /scratch on  the headnode and each compute node.  The directory is named with your Tufts UTLN; such as /scratch/utln/.  There is no quota and no backup for files located there. Typically ~100+ gig is available.  In addition this filesystem is subject to automated cleaning.  You may use this for temporary storage supporting your research computing needs. Note, this storage is not available to your desktop as a mounted filesystem.  Your access is via your ssh login or through a file transfer progam.  For additional temporary storage beyond what is offered in /scratch,  UIT provides a 2TB filesystem called /cluster/shared/.  Access to this storage is by request and suitable application requirement/justification. The naming convention is the same as /scratch/ ;   /cluster/shared/utln/.  Please work out of your named directory.

Research Storage Solution: 

Tufts UIT provides a network stroage based solution to faculty researchers. This solution is often used in support of grant based projects where the the amount of data, multiple access needs and backup are best meet as a network storage solution.  A common mode of use it to support a small lab where multiple persons may interact with research data from several computers.  Another possibilility is to request that the storage be made available to one or more accounts on the research cluster.  For additional information contact Lionel Zupan, Associate Dir. for Reserach Computing at x74933 or via email: Lionel.Zupan@Tufts.edu.  Click here to request Storage.

Network Concurrent Software Licenses:

Software on the research cluster is supported by a triple redundant FlexLM license server.  This makes possible uninterrupted 24x7 license service requests by software clients.  In addition, most of the licensed software on the cluster is "shared" with Tufts.  In effect, access by Tufts owned workstations is available via various computer labs and faculty machines. Authenticated access  is restricted within the Tufts network domain.  Setup and additional information can be found here

Support venue:

If you have any questions about cluster related usage, applications, or assistance with software, contact: cluster01-support@tufts.edu.

Cluster user software environment:
 Each cluster shell account  has a default  bash shell upon login. This should meet the needs of 90% of most users.  To change to other shells, such as csh, zsh, tcsh, use the chsh command.   Additional info on the chsh command is available through the man pages.  

Module management for software package user environment:
Software environments are managed through the use of the module approach.  Many commercial packges and some public domain software  require various settings that can often lead to clashes in the user shell environment.  In order to use some package on the cluster, you must load that package's module.  For example to use matlab:

> module load matlab

To see what packages are under module control:

> module avail

To unload a package from your envrionment:

> module unload matlab


 


Installed 64bit Cluster Software:  
LSF

Platform Computing, Inc.'s LSF (Load Sharing Facility) software is a distributed load sharing and batch queuing suite of applications that can dispatch user requests to compute nodes in accordance with a Tufts-defined policy. It manages, monitors, and analyzes resources and load on the cluster. Platform LSF is layered in a way that allows it to sits on top of and extend the operating system services, speaking to the competing needs of resource management on the cluster.  LSF commands must be used to submit batch jobs and assign interactive jobs to processors. bsub and lsrun are the usual command tools for this.  It's important to note that cluster compute nodes are the only targets under LSF control. Jobs are not  submitted to computers outside of the cluster. For more information about LSF command usage and job submission, you can read the man pages (example: type man lsrun at the cluster prompt) or the cluster tipsheet for commands: bsub,bkill, lsrun, and bjobs.
    

Ansys

Ansys is a suite of finite element based applications that provide real-world simulations of structural, thermal, electromagnetic and fluid-flow behavior of 3-D product. All Ansys products integrate with CAD environments.

Abaqus

Abaqus is a suite of applications used by many in the engineering community for the analysis of multi-body dynamics problems that aid the medical, automotive, aerospace, defense, and manufacturing community.

Fluent 
Fluent is a Computational Fluid Dynamics (CFD) software package commonly used in engineering education for research in fluid mechanics. The Fluent University Program provides universities with special, low-cost access to many of Fluents full-featured general use products. Each package includes a preprocessor, solver, and postprocessor.

Matlab

MATLAB is a high-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numerical computation. Using MATLAB, you can solve technical computing problems faster than with traditional programming languages, such as C, C++, and Fortran. Extensive documentation and tutorials are provided within Matlab.  The following Matlab toolboxes are licensed:

MATLAB
Simulink
Control System Toolbox
Distributed Computing Toolbox
Financial Toolbox
Fuzzy Logic Toolbox
Image Processing Toolbox
MATLAB Compiler
Neural Network Toolbox
Optimization Toolbox
Partial Differential Equation Toolbox
Real-Time Workshop
Signal Processing Toolbox
Simulink Control Design
Spline Toolbox
Statistics Toolbox
System Identification Toolbox
Virtual Reality Toolbox
Wavelet Toolbox

Comsol  

Comsol  is specifically designed to easily couple transport phenomena, including computational fluid dynamics (CFD) as well as mass and energy transport to chemical-reaction kinetics and process-related modeling.   Licensed Modules include:
MultiPhysics, Chemical Engineering, Acoustics, Structural Mechanics, Script 

Weka

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. 

Stata

Stata is an integrated statistical package for Windows, Macintosh, and Unix platforms. More than just a statistical package, Stata is also a full data-management system with complete statistical and graphical capabilities.  It features both X-window and text user interfaces. 

DEFORM-3D 
DEFORM (Design Environment for FORMing) is an engineering software environment that enables designers to analyze metal forming processes. DEFORM-3D is also a simulation system that is designed to analyze the three-dimensional flow of complex metal forming processes, allowing for a more complex analysis of shapes than 2D models can provide. 

R
R is a widely available object oriented statistical package. The current list of installed packages can be found in directory /usr/lib/R/library/. This represents a base installation suitable for most routine tasks, however not all available packages as found on the web site are installed. If some other R package is needed, please make a software installation request as outlined above. Extensive user documentation and tutorials are available on the web site.

Materials Studio

Materials Studio® is a validated software environment that brings the world's most advanced materials simulation and informatics technology.
It dramatically enhances your ability to mine, analyze, present, and communicate data and information relating to chemicals and materials. Materials Studio's accurate prediction and representation of materials structure, properties, and inter-relationships provides valuable insight. The following Materials Studio products are available: CASTEP, DMol.  
 

Dacapo(available Fall 2008)

Dacapo is a total energy program based on density functional theory. It uses a plane wave basis for the valence electronic states and describes the core-electron interactions with Vanderbilt ultrasoft pseudo-potentials. 

Chimera

Chimera is a highly extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles. High-quality images and animations can be generated.  

Maple
Maple is a well known environment for mathematical problem-solving, exploration, data visualization, and technical authoring. In may ways it is similar to Mathematica and Matlab.

Star-P

Star-P software is a client-server parallel-computing platform that's been designed to work with multiple Very High Level Language (VHLL) client applications such as MATLAB®, Python, or R, and has built-in tools to expand VHLL computing capability through addition of libraries and hardware-based accelerators.

MCEE   

MCCE (Multi-Conformation Continuum Electrostatics) is a biophysics simulation program combining continuum electrostatics and molecular mechanics.

WPP
WPP is a parallel computer program for simulating time-dependent elastic and viscoelastic wave propagation, with some provisions for acoustic wave propagation. WPP solves the governing equations in displacement formulation using a node-based finite difference approach on a Cartesian grid.  WPP implements substantial capabilities for 3-D seismic modeling,  

Mathematica  and gridMathematica
Mathematica, advertised as a “one-stop shop” for technical work, “integrates a numeric and symbolic computational engine, graphics system, programming language, documentation system, and advanced connectivity to other applications”. Not only does this application have parallel functionality built into it from the ground up, but the HYPERLINK "http://documents.wolfram.com/applications/parallel/" wolfram.com web site has extensive documentation, including numerous detailed tutorials.

Imagemagick 

ImageMagick® is a software suite to create, edit, and compose bitmap images. It can read, convert and write images in a variety of formats (over 100) including DPX, EXR, GIF, JPEG, JPEG-2000, PDF, PhotoCD, PNG, Postscript, SVG, and TIFF. Use ImageMagick to translate, flip, mirror, rotate, scale, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves. 
 

Paraview

ParaView is a multi-platform visualization application designed to visualize large data sets.

Python Compiler

 Installed  python modules: matplotlib, numpy, Networkx, Biopython

Perl Compiler 

Perl is a stable, cross platform programming language. Perl is extensible. There are over 500 third party modules available from the Comprehensive Perl Archive Network (CPAN).

Portland Compilers 

Portland Group compilers are available for use on the cluster. They are not part of the default environment on the head node, but they can be accessed by use of the module command.  Fortran, C and C++ compilers and development tools enable use of networked compute nodes of  Intel x64 processor-based workstations and servers to tackle serious scientific computing applications.  PGI compilers offer world-class performance and features including auto-parallelization for multi-core, OpenMP directive-based parallelization, and support for the PGI Unified Binary™ technology.

GCC(C, C++, Fortran) compilers
The cluster 64-bit login node requires Gnu GCC 64-bit compiler and as a result becomes the  default native compiler. No Module setup is required.
Documentation is available at GCC online documentation or from the following man pages:
> man gcc
> man g77  
 

Intel compilers

Tufts licenses the Intel compilers for use on the cluster.  Access is via the following two commands:

ifort - Intel fortran compiler
icc   - Intel C compiler

Local documentation in HTHL format can be found at:
/opt/intel/cc/9.1.038/doc/main_cls/index.htm
or via manpages depending on what Module is loaded:
> man icc
> man ifc
Fortran quick reference is available by typing
> man ifort 

Text Editing tools:

           emacs, vi, vim, nano, nedit   

FireFox browser:

A web browser is provided to allow viewing of locally installed software product documentation.   Access to the internet is restricted. 



 

Frequently Asked Questions - FAQs: 

What are some reasons for using the cluster: 

  • access to MPI based parallel programs
  • access to larger amounts of memory than 32bit computers offer
  • access to the large public domain of scientific computing programs
  • access to compilers
  • assess to large amounts of storage
  • access to batch processing for running numerous independent serial jobs
  • access to 64bit versions of programs you may already have on your 32bit desktop

What is MPI: 

MPI stands for Message Passing Interface. The goal of MPI is to develop a widely used standard for writing message-passing programs.

What installed programs provide a parallel solution: 

The following provide MPI based solutions:  Abaqus, Ansys, Fluent, gridMathematica, StarP/matlab, StarP/python, paraview, MaterialStudio, Dacapo

The following provide thread based parallelism: comsol, matlab

When does 64bit computing matter: 

When there is a need for memory and storage beyond the 32bit barriers.

Is it possible to run linux 32bit executables on the cluster: 

There is a good chance that it will succeed. But there might be other issues preventing it from running.  Try it out...

Where can I find additional information about MPI: 

http://www-unix.mcs.anl.gov/mpi/

http://www.nersc.gov/nusers/resources/software/libs/mpi/

http://www.faqs.org/faqs/mpi-faq/

http://www.redbooks.ibm.com/abstracts/sg245380.html

What is a good web based tutorial for MPI:

 http://ci-tutor.ncsa.uiuc.edu/login.php

What email and web services are available on the cluster:

The cluster does not accept incoming mail, nor is a webserver available for public use. These services are provided elsewhere by UIT.


What is the Tufts responsible use policy:

Find it here

How do I login to the cluster:

Use your Tufts UTLN and LDAP password associated with your Tufts email.


What is a Tufts UTLN:

This is your Tufts username issued for purposes of Tufts email.

How to connect to the cluster with a PC:

 UIT research servers require an ssh connection to a host providing shell access. Use of Cygin ssh, SecureCRT, Putty, etc... will work.   Other forms of connection such as XDM, rsh, rlogin, telnet are  not supported.  

What connection protocols are supported on the cluster: 

     ssh, ssh2, sftp, scp, 

What is the name of the cluster:

cluster02.uit.tufts.edu 

How to connect to the cluster with a Mac:
Open an xterm window and use ssh.

   > ssh -Y  cluster02.uit.tufts.edu 

Provide your username and password. 

How to connect to the cluster with a linux OS:

 Open a local shell window or xterm and connect with:

    > ssh -Y  cluster02.uit.tufts.edu   

Provide your username and password. 

Can I connect to the cluster from home or while I am traveling: 

Yes, use an ssh based login solution such as ssh, SecureCrt, etc...

What is an xServer:

This is a program that runs on your workstation/desktop OS that 'listens' for X-Window transmissions sent from the cluster and redisplays these on your workstation.  These transmissions are generated by an application running on a host that you are connected to.  For example, if you intend to use Ansys on the cluster, you need to display the Ansys gui interface locally on your desktop.    
 

What xserver is needed for  WinXP desktops:  

There are many free and commercial xserver programs.  UIT recommends Cygwin.  Commercial options include Exceed, XWin32, and others...

Where do I get  Cygwin:

A&S Cygwin installation documentation can be obtained here

What Cygwin programs do I install:

Install base Cygwin, OpenSSH, and OpenGL at a minimun. 

How do I connect to the cluster using Cygin:

Connect with ssh to the headnode of the cluster:

> ssh -Y -C  yourusername@cluster02.uit.tufts.edu

How can I make sure Cygwin is working with the cluster:

To test the cygwin xserver,  try a simple cluster-side  X-window application:

> xclock

A clock should appear on your desktop.  
 

What xserver is needed for a Mac:  ?

An xserver is provided when you install the Mac X developement tools.**?????? **

What xserver is needed for linux: 

Linux distributions come with Xwindows which provides xserver support. 

Do I need SecureCRT to connect to  a host:

No, if you use Cygwin you do not need SecureCRT.  If you choose to use it, you will likely need a windows based xserver such as Exceed or Xwin32 or similar if you expect to display graphics.  
 

How to transfer files:  

Any file transfer agent program supporting either  scp or  sftp protocol will work. There are many freeware choices.  WinScp for WindowXP is very good.

SecureCRT also provides sftp file transfers.  The graphical file transfer program  filezilla is available to linux and unix users.  
 

What is my home directory:
This is where you have access to storage that allows you to read, write and delete files.  The path to your home directory is: /cluster/home/xx/yy/utln

where xx is the first letter of your first name

where yy is the first letter of your last name

where utln is your Tufts issued username for purposes of email. 

What is the diskquota on home directories:

All accounts are created with a 200 megabyte quota.

May I request a diskquota  increase to my home  directory: 

Often the best solution is to use cluster specific temporary storage options in conjunction with your desktop.  Programs such as WinScp allow you to drag and drop files between the cluster and your desktop.

How do I find out how much total diskspace I have used:

 Login to your account and type the following:

> du -sk  .

How do I obtain access to additional storage on /cluster/shared/ : 

Please send your request via email to  cluster01-support@tufts.edu.

Where do I find basic unix/linux resources:

There are many  web based tutorials and howto's for anything linux oriented.  Some sites of interest: http://www.linux-tutorial.info/    http://www.ee.surrey.ac.uk/Teaching/Unix/   http://www.linux.org/lessons/beginner/toc.html

What are some of the basic linux and related commands:

Most usage is centered around  a dozen or so commands:

 ls, more, less, cat, nano, pwd, cd, man, bsub, bkill, bjobs, ps, scp, ssh, cp, chmod, rm, mkdir, passwd, history, zip, unzip, tar, df, du

What is a man page:

man pages are linux/unix style text based documentation.  To obtain documentation on the command cat:

> man cat

>xman                    is the command for the x-based inteface to man.

> man man              is the man documentation.

> man -k graphics    finds all related commands concerning graphics 


What is the backup policy:

Your data residing in your home directory is automatically backed up by UIT.  The policy adheres to industry standard best backup practices.  To request a retrivial of data, contact the UIT support center x73376.  You should have available basic info such as the file name(s) and approximately when it existed and what directory.  

What is the local storage automated cleaning policy:

The headnode of the cluster and compute nodes provides two areas of temporary storage separate from your home directory.  File systems  /scratch/utln/ and /tmp  are available.  Each is subject to automated cleaning rules.  All files older than 20 days are deleted and these file systems are not backed up.

Are the compute nodes named differently from the old cluster compute nodes:

Yes.  You should not hard code the names anywhere.  

Some applications required a login to former compute01; is this still the case:

No. 

Why do I have to submit jobs to compute nodes:

The cluster has been configured to allocate work to compute nodes in a manner that provides efficient and fair use of resources.  A job queueing system called LSF is provided as the work interface to the compute nodes.  Your work is then distributed to queues that provide compute node resources.  Login to compute nodes via ssh is not suggested and you will be asked to refrain from using the resouces in that manner;  let LSF do it!

Is SPSS or SAS available on the cluster:
Neither is available on the cluster.  Software packages R and Stata provide that functionality instead.

Where can I find information about PC/Mac based Tufts software licenses:

Here

  • No labels