NVIDIA’s Founder and CEO Jen-Hsun Huang has unveiled their new inferencing platform in Accelerated Computing and AI – The NVIDIA TensorRT 3.
NVIDIA TensorRT™ is used by developers to facilitate high-performance inference on NVIDIA GPUs. It takes a network definition, and then optimizes it by transforming weights, merging layers and tensors, and chooses efficient intermediate data formats.
TensorRT and AI
Developers can use TensorRT to focus on developing AI powered applications, rather than performance tuning for inference deployment.
What’s new in TensorRT 3?
TensorRT 3 is the new key which is used for unlocking optimal inference performance on Volta GPUs. It is the third generation of TensorRT. NVIDIA reports up to a 40x higher throughput in under 7ms real-time latency vs CPU-Only inference.
- 3.7x Faster inference on Tesla V100 vs. Tesla P100 under 7ms real-time latency.
- 18x Faster deployment and optimization of TensorFlow models.
- Use of Python API for easy use and improved levels of productivity.
Also, the traditional CPU-based platform for interference which rocks 160 dual-CPU servers draws 65kW, while NVIDIA’s GPU-Based Solution is 2066% more efficient. This is why NVIDIA is pushing into the AI Market.
In conclusion, the TensorRT 3 is definitely a huge leap in Artifical Intelligence. NVIDIA says that they have what it takes to be the leader in Artifical Intelligence (AI), and they surely do!