UIT Research Computing Resources
For additional information, please contact Lionel Zupan, Associate Director for Research Computing, at x74933 or via email Lionel.Zupan@Tufts.edu.
Tufts UIT Research computing options
- High-performance computing research cluster
- Bioinformatics server
- CarmaWeb server
- Visualization Center
- GIS Center
1. Tufts High-performance computing research cluster
What is a Cluster?
Cluster computing is the result of connecting many local computers (nodes) together via a high speed connection to provide a single shared resource. Its distributed processing system allows complex computations to run in parallel as the tasks are shared among the individual processors and memory. Applications that are capable of utilizing cluster systems break down the large computational tasks into smaller components that can run in serial or parallel across the cluster systems, enabling a dramatic improvement in the time required to process large problems and complex tasks.
Tufts Linux Research Cluster
The Tufts Linux Research Cluster is comprised of 40 identical IBM Linux systems (compute nodes) interconnected via an Infiniband network. Each cluster node has eight 2.8Ghz Intel Xeon CPUs and 16 or 32 gigabytes of memory for a total of 320 compute cores. The Linux operating system on each node is RedHat 5 configured identically across every machine. In addition there is a login node and a management node supporting the compute node array. Client/user workstations access the cluster via the Tufts Network or remotely with ssh. The user/login node has an additional network interface that connects to the compute nodes using private non-routable IP addressing via the Infiniband hardware. This scheme allows the compute nodes to be a "virtualized" resource managed by the queueing software LSF, and abstracted away behind the user node. This approach also allows the cluster to scale to a large number of nodes and provides the structure for future growth.
The login node of the cluster is reserved for the use of compilers, running shell tools, and launching and submitting programs to compute nodes. The login node is not for running long running programs, etc... for computing purpose, please use the compute nodes and various queues.
Cluster User Accounts
Click Account Information for additional information about cluster accounts.
Contribute your own nodes to the new research cluster
Researchers that need their own high-performance computing (HPC) resources (and are applying for external grant funding to do so) may wish to consider contributing additional nodes to the research cluster rather than to develop and support their own HPC infrastructure. The research cluster has been designed to allow for this kind of compute node expansion. The obvious advantage to a researcher is that one does not have to support a separate computing resource, obtain additional licensing, etc.
In order to participate, additional nodes need to be of a certain kind, consistent with the current cluster design(as described above). In addition, a special LSF queue will be structured to allow one or more designated researchers priority access to the contributed cores. In return, when those cores are unused, they will become part of the larger pool of LSF managed compute node resources available to the Tufts research community.
For additional information, please contact Lionel Zupan, Associate Director for Research Computing, at x74933 or via email Lionel.Zupan@Tufts.edu.
Research Cluster Restrictions
Conditions and use of the research cluster include and are not limited to the following expectations. Additional related details may be found throughout this page.
Expectations |
---|
no user root access |
supported OS is RedHat 5 Enterprise version |
no user ability to reboot node(s) |
all cluster login access is via the headnode |
no user machine room access to cluster hardware |
no alternative linux kernels other than that provided by RHEL 5 |
no access to Infiniband or Ethernet network hardware or software |
no user cron or at access |
no user servers/demons such as: HTTP, FTP. etc. |
all user jobs destined for compute nodes are submitted via LSF's bsub command |
all compute nodes follow one naming convention |
only UIT NFS storage is supported |
unused contributed node CPU time reverts to cluster user community |
no user contributed direct connect storage |
only limited outgoing Internet access from the headnode will be allowed; exceptions must be reviewed |
allow 2-week turn around for software requests |
Only user home directories are backed up |
temporary public storage file systems have no quota and are subject to automated file deletions |
Cluster quality of service is managed through LSF queues and priorities |
Software request policy
Please send your request via email to cluster-support@tufts.edu and address the following questions:
- What is the the name of the software?
- Where can additional information about the software be found?
- Who are the intended users of the software?
- When is it needed by?
- Will it be used in support of a grant and if so what grant?
- What if any special requirements are needed?
Cluster Storage Options
Click here for details.
Network Concurrent Software Licenses
Support venue
If you have any questions about cluster related usage, applications, or assistance with software, please contact cluster-support@tufts.edu.
Cluster software environment
Installed Cluster Software
Compilers, Editors, etc...
Frequently Asked Questions - FAQs:
Cluster Connections/Logins
Parallel programming related information
Account related FAQs:
X based graphics FAQs
Application specific Information FAQs
Linux and LSF information FAQs
Compilation FAQs
Miscellaneous FAQs
How busy is the cluster:
One way to get a sense of this is from the Ganglia link.
What email and web services are available on the cluster:
The cluster does not accept incoming mail, nor is a webserver available for public use. These services are provided elsewhere by Tufts.
What is the backup policy:
Your data residing in your home directory is automatically backed up by UIT. A moving window of one year is used for backup purposes. It is possible to retrieve files as old as one year. For files less than one month old, these may be restored almost immediately. The policy adheres to industry standard best backup practices. To request a restore of data, contact the UIT support center x73376. You should have available basic info such as the file name(s) and approximately when it existed and what directory.
Is SPSS or SAS available on the cluster:
Neither is available on the cluster. Software packages R and Stata provide that functionality instead.
Where can I find information about PC/Mac based Tufts software licenses such as SAS or SPSS:
Can I connect to the license server from home via my ISP to use Matlab on my Tufts laptop:
Programs such as Matlab and others that check out FlexLM based network concurrent licenses can not be used directly over the Internet, as you can while on campus. IP filtering limits license check-outs to the Tufts network domain. You may use the Tufts VPN solution to obtain check-outs.
Cluster user use cases (please click on link)
2. Bioinformatics services
a. Emboss and wEmboss:
Access to Emboss software is available on server http//emboss.uit.tufts.edu, which provides both shell and web access. In both cases you will need an account. The server hardware is a single quad core 64 bit host with 4 gig of ram.
For shell access to command line tools:
> ssh -Y emboss.uit.tufts.edu
For access to the web interface wEmboss.
For access to emboss web documentation.
Former GCG/Seqweb users can find equivalent Emboss functionality
here:
Emboss tutorial
If you have any questions about Emboss related usage, applications, or assistance with software, please contact bio-support@tufts.edu.
Bioinformatic related FAQs
Where are my old seqweb sequences:
Your old seqweb data is at: /nfshome/seqweb/users/your-user-name/
There you will find three directories with your data:
result state work
You may retrieve these with a file transfer program like
WinScp (http://(www.winscp.org) and store locally on your pc/mac. You may then
use a local web browser to look at the old seqweb data. You may also cut and paste sequence data into a wEmboss web session.
If I use the web interface to emboss where is data stored:
wemboss data is written into a directory called wProjects under your shell account. The path will be: /home/your-user-name/wProjects/
Will I have access to my old gcg shell account and data:
Your home directory on the old bioinformatic server is mounted as your directory on the new emboss server. However, access is via a shell login, not with the web interface, wEmboss.
b. Carmaweb server (please click on link)
UIT and the Medical School hosts and supports a web based service known as CarmaWeb. The focus of CarmaWeb is genetic microarray analysis. CARMAweb is a web based tool that allows the analysis of Affymetrix GeneChip, ABI microarrays and two color microarrays. The analysis includes normalization and data preprocessing, detection for differentially expressed genes, cluster analysis and GO analysis. These tools are built upon bioConductor and R software. One may request an account via the website.
Additional information here.
A CARMAweb tutorial is available here
The server hardware is a single quad core 64 bit host with 4 gig of ram.
If you have any questions about CarmaWeb related usage, applications, or assistance with software, please contact bio-support@tufts.edu.
3. Tufts Center for Scientific Visualization (or VisWall)
A description may be found here. The user guide is available here.
The research cluster is available to VisWall users for additional computational resources. Current connectivity follows standard practices using ssh and x11 forwarding. Viswall users with a cluster account may forward cluster based application graphic output for display on the VisWall. Future plans to integrate high speed network connectivity between the VisWall and research cluster is in development.
Monthly training classing on the use of the facility can be checked here
4. GIS Center
Several GIS links can be found here.
Tufts Research Cluster indirectly supports GIS spatial statistical computation with the availability of modern spatial statistics programs as found in R. This is a useful resource when faced with either complex estimation tasks, long runtimes or access to more memory than is often available on desktop workstations. R programs such as the following are available:
fields, ramps, spatial, geoR, geoRglm, RandomFields, sp, spatialCovariance, spatialkernel, spatstat, spBayes, splancs,
For additional information please contact cluster-support@tufts.edu.
5. Tufts ICPSR data subscription
The Inter-university Consortium for Political and Social Research (ICPSR) is a unit of the Institute for Social Research at the University of Michigan. ICPSR was established in 1962 to serve social scientists around the world by providing a central repository and dissemination service for computer-readable social science data, training facilities in basic and advanced techniques of quantitative social analysis, and resources that facilitate the use of advanced computer technology by social scientists.
The Tufts community may obtain research data and related web services from the ICPSR while one's computer is in the Tufts network domain. This is required for license authenication purposes. Special case exceptions are possible, but need to be arranged ahead of time. For additional information please contact cluster-support@tufts.edu.