Description:
The emergence of distributed, extreme-scale science applications is generating significant challenges for data transfer. Ideally, high-performance data transfer should reach terabit/s throughput to make full use of the underlying networks and provide real-time and deadline-bound data transfer.
Although significant improvements have been made in the area of bulk data transfer, currently-available data transfer tools and services cannot successfully meet these challenges, for the following reasons:
- Existing data transfer tools and services lack a data-transfer-centric approach to seamlessly and effectively integrate and coordinate the various entities in an end-to- end data transfer loop.
- Existing data transfer tools and services lack effective mechanisms to minimize cross-interference between data transfers.
- Existing data transfer tools and services are oblivious to user (or user application) requirements (e.g., deadlines and QoS requirements).
- Inefficiencies arise when existing data transfer tools are run on DTNs.
These are common and fundamental problems for bulk data transfer in the extreme-scale era.
BigData Express provides:
- A data-transfer-centric architecture to seamlessly integrate and effectively coordinate the various resources in an end-to-end data transfer loop
- SDN and SDS to improve network and storage I/O performance
- A time-constraint-based scheduler to schedule data transfer tasks
- An admission control mechanism to provide guaranteed resources for admitted data transfer tasks
- A rate control mechanism to improve data transfer schedulability and reduce cross- interference between data transfers
BigData Express is a Schedulable, Predictable, and High-performance data transfer service for large-scale scientific computing facilities. BigData Express is useful for large-scale scientific computing facilities like LCF, NERSC, US-LHC, and their many collaborators in industry and academia.