The big picture: After attempting (and failing) to position Watson as the next generation platform for AI applications, IBM is now focusing on creating hardware components for the latest generative AI models. The market is evolving, AI technology is moving into production, and Big Blue is eager to claim a share of Nvidia's dominance sooner rather than later.
IBM recently announced the Telum II Processor and the Spyre Accelerator, two chip designs aimed at assisting customers with modern AI workloads. The corporation, naturally, prioritizes selling its own hardware, which is why both chips are exclusively compatible with IBM z16 mainframe computers.
Telum II is the latest iteration of the Telum architecture, introduced in 2021. IBM stated that the new chip was developed using Samsung's 5nm manufacturing process and features eight high-performance cores running at 5.5GHz. The company also revealed a 40 percent increase in on-chip cache memory, with virtual L3 and L4 capacities expanding to 360MB and 2.88GB, respectively.
The Telum II chip also includes a novel data processing unit, designed to accelerate I/O operations directly within the CPU. "These hardware enhancements are designed to provide significant performance improvements for clients over previous generations," IBM stated. Each new Telum II processor is expected to deliver a 4x increase in computing power, reaching 24 trillion operations per second (TOPS).
TOPS alone don't tell the whole story, IBM stated. The Telum architecture has been improved and optimized for today's AI ecosystem, with high throughput and low-latency inferencing. The new chip also supports INT8 data types, which should increase efficiency in applications designed with INT8 technology, such as newer AI models.
The second piece of AI hardware introduced by IBM at Hot Chips 2024 is the Spyre Accelerator, a PCIe card containing 32 AI accelerator cores, which share a similar architecture to the AI accelerator included in the Telum II processor. IBM suggests that potential customers use both the Telum II and Spyre to run larger AI model sets in what the company calls "ensemble AI" use cases.
The ensemble AI method leverages multiple AI models to enhance performance and accuracy in the final results. IBM explained this technology using a claims fraud detection example, where the initial risk assessment made by traditional neural networks is combined with large language models. According to IBM, ensemble AI techniques are so effective at optimizing AI workloads that they can comply with regulatory requirements while mitigating financial crimes.
The Telum II processor and Spyre Accelerator have broad use cases. IBM highlighted that its new chips can support fraud detection, advanced anti-money laundering models, and more. They can also be used to develop AI assistants, the company added.