The dedicated big data cluster is designed to support rapid prototyping of data-intensive work, at a smaller scale than the Supercomputer Cedar.
Benefits of the interactive big data cluster include:
- Map-reduce programming model offers a simplified approach to store, access and distribute large datasets across multiple servers
- Spark allows you to process large data sets ideal for machine learning and graph algorithms. Spark also includes Spark SQL, which helps extract valuable information from large datasets by executing SQL queries on distributed datasets
- Utilize machine learning, natural language processing and graph workloads
- Interactive cluster powered by Hadoop, an open-source distributed processing framework that supports big data tools
Learn how you can accelerate your research with this free, dedicated resource.
Hadoop Cluster Specs
TB disk space across 7 nodes, connected via 100 Gbit network connectivity
shelves with 84 drives, for a total of 2.3 PB of usable storage
GPU Node Spec
Large GPU node with 4 pascal P100 16 GB graphics cards
Base node with 48 cores and 192 GB memory
General Compute Node Specs