API Gateway now supports response streaming, letting you send partial responses to clients immediately instead of waiting for full payloads—critical for LLM-powered apps and long-running operations.
Amazon API Gateway's new response streaming feature changes how REST APIs deliver data. Instead of buffering entire responses server-side before transmission, API Gateway now streams bytes to clients as they become available. This drops time-to-first-byte dramatically (the post cites a 98% reduction in Total Blocking Time for one customer) and extends the payload limit beyond the previous 10MB cap, with request timeouts up to 15 minutes.

Google just open-sourced an MCP server that lets you manage Kubernetes clusters through natural language prompts.
The new GKE Gemini CLI extension bundles GKE-specific context, prompts, and tools into Gemini CLI, letting you interact with clusters conversationally. It also works as a standalone MCP server with any MCP-compatible client. Install is one command: the extension adds GKE resources, pre-built prompts for common workflows, and integrated tooling for Cloud Observability.

Security Incident Response now auto-gathers evidence from CloudTrail, IAM, EC2, and cost data—then correlates it into a timeline before you even start investigating.
AWS has added an "investigative agent" to its Security Incident Response service. When you create a case (manually or triggered from GuardDuty/Security Hub), the agent asks clarifying questions (which credentials? what timeframe? which account?), then automatically queries CloudTrail logs, pulls IAM configurations, checks EC2 instance details, and analyzes cost patterns for anomalies. Minutes later, you get a summary with critical findings, an events timeline, and affected resources—ready for containment decisions or escalation to AWS CIRT.
If you're building internal platforms for ML/AI teams, this addresses the "how do we safely let agents run arbitrary code" problem at the infrastructure layer. The separation between the Python SDK (for AI engineers) and the underlying Kubernetes primitives (for platform ops) is intentional. It's being built as a CNCF project, so you're not locked into GKE, though the snapshot and pre-warming features are GKE-exclusive for now.


