
Traditional AI projects face hardware limitations such as memory bandwidth constraints, lack of dedicated machine learning operations hardware in CPUs, and limited parallel processing capabilities1. These limitations can lead to bottlenecks, slower processing times, and reduced model accuracy, especially for large-scale and complex machine learning tasks.

GPUs fell short for scaling AI tasks due to improper utilization and overspending on GPUs that failed to meet the specific requirements of large-scale AI applications5. Additionally, the global chip shortage and increased demand for AI tasks made GPUs expensive and inaccessible, hindering the scaling of AI inference tasks.