inference

Stable19 articles8 sourcesDernière mise à jour: 1 g önce

Sujets associésartificial intelligence anthropic google tpu GPU AI Nvidia AI agents

Derniers articles

China's AI Hardware Lags US Despite Model Competitiveness

Chinese AI models are competitive with US counterparts, but hardware lags. While domestic chips are used for inference, pre-training still relies on foreign silicon. China is pushing for self-sufficiency in AI hardware across all development stages.

SCMP Tech

Xiaomi Breaks AI Inference Speed Record with MiMo-V2.5-Pro-UltraSpeed

En développement

Tech·10.06.2026Résumé IA

Xiaomi Breaks AI Inference Speed Record with MiMo-V2.5-Pro-UltraSpeed

Xiaomi achieves AI inference speed record with MiMo-V2.5-Pro-UltraSpeed, processing over 1,000 tokens per second on a single 8-GPU node, outperforming competitors with software innovations.

Decrypt

Chinese Researchers Achieve Breakthrough in AI Model Post-Training with Huawei Chips

Tech

06.06.2026Résumé IA

Chinese Researchers Achieve Breakthrough in AI Model Post-Training with Huawei Chips

Chinese researchers, led by Huawei and multiple institutes, successfully conducted full-parameter post-training on a 1.6 trillion-parameter AI model using over 1,000 Huawei chips, enhancing China's AI self-reliance.

SCMP Tech

Perplexity Unveils Hybrid AI Inference Orchestrator at Computex 2026

En développement

Tech·03.06.2026Résumé IA

Perplexity Unveils Hybrid AI Inference Orchestrator at Computex 2026

Perplexity CEO Aravind Srinivas announced a new hybrid local-server AI inference orchestrator at Computex 2026. The system automatically decides where to run AI tasks, balancing local processing for privacy with cloud power for complex computations, aiming for optimal "token value per watt."

Decrypt

MiniMax Unveils M3 AI Model for Coding and Automation

Tech

01.06.2026Résumé IA

MiniMax Unveils M3 AI Model for Coding and Automation

Chinese AI startup MiniMax launched its M3 model, boasting reduced computational needs, faster speeds, and a 1 million token processing capacity. It reportedly outperforms rivals like GPT-5.5 and Gemini 3.1 Pro on coding benchmarks.

SCMP Tech

Groq Seeks $650 Million in New Funding for Inference Neocloud Business

ACTU

31.05.2026Résumé IA

Groq Seeks $650 Million in New Funding for Inference Neocloud Business

Groq aims to raise $650 million from existing investors to expand its inference neocloud business, leveraging its AI chip and systems, following a $20 billion licensing deal with Nvidia.

TechCrunch

Tech

27.05.2026Résumé IA

China Adds AI Chips to 'Secure' Tech Assessments, Expanding Domestic Tech Drive

China includes AI chips in 'secure and reliable' tech assessments for the first time, part of its Xinchuang initiative to replace foreign hardware/software with domestic alternatives, targeting AI infrastructure amid US export controls.

SCMP Tech

Nvidia Expands Its Inference Kingdom, Report Says

En développement

Tech·24.05.2026Résumé IA

Nvidia Expands Its Inference Kingdom, Report Says

A report by SemiAnalysis suggests Nvidia is dominating the AI inference market, focusing on cost-effectiveness, speed, and volume rather than just model performance. The company offers integrated solutions from GPUs to data center design, aiming to lower the cost per token.

ITmedia

NVIDIA's "AI Factory" Strategy: Leading the "Inference Race"

En développement

Tech·22.05.2026Résumé IA

NVIDIA's "AI Factory" Strategy: Leading the "Inference Race"

NVIDIA is shifting from a GPU manufacturer to an "AI factory" provider, optimizing entire data centers for AI inference. This strategy addresses the explosive growth in inference demand, driven by lower costs and the rise of AI agents, marking a transition from the "learning race" to the "inference race."

ITmedia

Alibaba Cloud Summit: AI as 'Manufacturing', New LLM Launched

Tech

20.05.2026Résumé IA

Alibaba Cloud Summit: AI as 'Manufacturing', New LLM Launched

Alibaba's senior VP Liu Weiguang described AI as a new form of manufacturing at the Alibaba Cloud Summit, calling their operations "China's AI factory." The company also launched Qwen3.7-Max, a new LLM for AI agents.

SCMP Tech

How ByteDance plans to turn OpenClaw craze into a profitable AI business

Tech

13.05.2026

How ByteDance plans to turn OpenClaw craze into a profitable AI business

ByteDance’s Volcano Engine, the cloud unit that released an OpenClaw-based cloud agent tool ArkClaw, is betting that the next phase of artificial intelligence will hinge on cheaper tokens, higher inference efficiency and longer context windows. “Agent-related token consumption still accounts for a single-digit percentage of total token usage, but it is growing,” said Li Guodong, chief architect of ArkClaw, on Tuesday on the sidelines of OpenClaw’s first mainland China event since the open-source...

SCMP Tech

As Anthropic announces partnership with SpaceX, Elon Musk shares ‘background check’ of Claude team

ACTU

07.05.2026

As Anthropic announces partnership with SpaceX, Elon Musk shares ‘background check’ of Claude team

Elon Musk's xAI has leased its Colossus 1 supercomputer to Anthropic, a move that follows Musk's public shift from criticizing the AI lab to expressing confidence in its leadership. This deal provides Anthropic with significant GPU capacity, addressing its urgent compute needs and enabling immediate inference workloads. The partnership also hints at future collaborations for orbital AI compute infrastructure.

Times of India

Anthropic in Talks to Acquire AI Inference Chips from UK Startup Fractile

En développement

Business·02.05.2026Résumé IA

Anthropic in Talks to Acquire AI Inference Chips from UK Startup Fractile

AI company Anthropic is reportedly negotiating a deal to purchase specialized AI inference chips from the British startup Fractile.

Economic Times

How Google's Secret Chip Empire Quietly Became AI's Biggest Competitive Weapon

En développement

Tech·29.04.2026Résumé IA

How Google's Secret Chip Empire Quietly Became AI's Biggest Competitive Weapon

This article reveals how Google secretly developed custom AI chips (TPUs) since 2016, positioning itself to compete with Nvidia when the AI revolution arrived. Despite appearing caught off guard by ChatGPT in November 2022, Google had been building its own silicon for nearly a decade. The company now offers TPU access through Google Cloud, with Meta signing on as a customer, causing Nvidia's stock to drop. Google recently announced TPU v8 with configurations for both training and inference workloads, representing its vision as a full-stack AI company.

Times of India

GitHub Copilot Shifts to Usage-Based Pricing as AI Costs Surge

En développement

Tech·28.04.2026Résumé IA

GitHub Copilot Shifts to Usage-Based Pricing as AI Costs Surge

GitHub announced it will transition GitHub Copilot to a usage-based billing model starting June 1, replacing the current flat subscription allocation system with AI Credits tied to monthly payments. The change comes as inference costs have nearly doubled since January, driven by agentic AI assistants consuming massive token volumes. Additional usage beyond credits will be priced based on token consumption at varying API rates depending on model sophistication.

Ars Technica

Google Unveils Eighth-Generation TPUs Split Into Training and Inference Variants

Tech

22.04.2026Résumé IA

Google Unveils Eighth-Generation TPUs Split Into Training and Inference Variants

Google announced its eighth-generation Tensor Processing Units, splitting the lineup into TPU8t for training and TPU8i for inference. The training variant offers 121 FP4 EFlops per pod with 9,600 chips, nearly triple Ironwood's capacity, while claiming twice the performance per watt. The inference variant features 384MB of on-chip SRAM and runs in pods of 1,152 chips. Google positions this as a response to the 'agent era' requiring specialized hardware, directly competing with Nvidia's AI accelerators.

Ars Technica

Google splits AI training and inference into separate TPU chips in latest Nvidia challenge

Tech

22.04.2026Résumé IA

Google splits AI training and inference into separate TPU chips in latest Nvidia challenge

Google said its eighth-generation TPU family will separate AI training and inference into different chips, aiming to improve efficiency as demand grows for specialized AI hardware and agent-based systems.

CNBC

Jensen Huang backs Nvidia over TPUs, Google DeepMind CEO Demis Hassabis pushes back

ACTU

20.04.2026

Jensen Huang backs Nvidia over TPUs, Google DeepMind CEO Demis Hassabis pushes back

The AI race is shifting from training to inference, with Google challenging Nvidia's dominance. Google is doubling down on its custom TPUs for faster, cheaper AI responses, aiming to power the next wave of AI agents. This strategic move, leveraging a decade of chip design, positions Google as a formidable competitor as AI usage scales globally.

Times of India

Anthropic Signs Massive AI Compute Deal with Google, Potentially Worth Hundreds of Billions

En développement

Tech·13.04.2026Résumé IA

Anthropic Signs Massive AI Compute Deal with Google, Potentially Worth Hundreds of Billions

Anthropic has secured a multi-gigawatt AI compute deal with Google and Broadcom, potentially worth hundreds of billions, favoring Google's TPUs over Nvidia's GPUs. This move highlights the growing importance of specialized chips for AI inference.

Times of India