Skip to main content

Posts

This week in AI updates: GitHub Copilot SDK, Claude’s new constitution, and more (January 23, 2026)

GitHub Copilot SDK now in technical preview The SDK allows developers to embed agentic capabilities into their applications using the same execution loop used by the GitHub Copilot CLI. The SDK repository includes setup instructions, starter examples, and SDK references for all of the supported languages. GitHub recommends starting by defining a single task, such as updating files or running a command, and letting Copilot plan and execute steps while the application supplies domain-specific tools and constraints. Anthropic drafts new constitution for Claude models The constitution is Anthropic’s vision for Claude’s values and behavior. The main sections in this updated version include specifications related to helpfulness, ethics, safety, nature, and guidelines for how to handle specific issues, like medical advice or cybersecurity requests. “The constitution is a crucial part of our model training process, and its content directly shapes Claude’s behavior. Training models is a ...
Recent posts

New Relic adds monitoring for ChatGPT apps

New Relic customers will now be able to monitor their custom ChatGPT apps to ensure they’re delivering the intended performance, reliability, and user experience. “Bringing business services into the natural flow of a ChatGPT conversation is a powerful, intuitive, and revenue-generating strategy,” said Brian Emerson, chief product officer of New Relic. “But once your carefully crafted application instantiates inside ChatGPT, it traditionally enters a black box where standard browser monitoring tools can fail.” The company went on to explain that when an app is rendered in a conversation, developers can’t see things like layout shifts or broken buttons. Additionally, security headers, content security policies, i-frame sandbox rules, and limitations on client-side storage can hide important performance and user experience data. New Relic’s answer to this problem is to send in an agent that can collect and analyze data. It can track PageViews, PageViewtimings, and AjaxRequests, provi...

Testlio launches new AI-powered QA analysis solution

Testlio has announced the release of a new AI-driven QA analysis solution called LeoInsights . The new platform is powered by the company’s intelligence layer LeoAI Engine, which was trained on 13 years of testing data, 2.6+ million test cases, and 600,000+ devices. It can provide executive summaries featuring key changes, emerging risks, and critical issues, simplifying multiple QA reports into one that can be shared with leaders. LeoInsights also offers a value calculator that quantifies efficiency gains, cost savings, and quality impact, helping QA teams better demonstrate their value to leadership. The calculator can aggregate data across workspaces, do scenario modeling with adjustable inputs, and generate PDFs that can be shared with executives for budgeting and investment discussions. The tool can also provide alerts when unusual trends and anomalies are spotted, helping QA teams discover risks that they might not have otherwise noticed. It also provides app review and senti...

Codenotary updates its free SBOM scanning tool with capabilities that better support AI apps

Codenotary is adding new capabilities to its SBOM.sh service, which provides free analysis of software bills of materials (SBOMs). According to the company, the updates were made in consideration of AI applications, and the tool now treats datasets as software supply chain artifacts. “Traditional SBOM tools were built for an earlier era – focusing primarily on source code to improve visibility into the software supply chain,” said Moshe Bar, CEO and co-founder of Codenotary. “Security teams are swimming in SBOMs, but they’re not getting the actionable clarity they need — especially as AI transforms software with AI applications are built on datasets which are entirely ignored by traditional SBOMs.” It now provides documentation of dataset sources, licensing terms, and governance controls, which helps organizations be more audit-ready. SBOM.sh also now captures lineage metadata, such as base-model origins, fine-tuning history, version identifiers, and update pathways. Additionally...

AI lacks theory of mind – why that matters

A lot of interest has been garnered by large language models (LLMs) and their abilities, but there’s one ability that remains solely human.  We don’t share it with mammals or machines.  That ability is called “theory of mind,” and it’s the mind-reading ability that allows us to coordinate and collaborate with others. Mind reading sounds like the power of some superhero or super villain.  However, the truth is that even babies do it.  We learn to predict what others are thinking and how they’ll react.  Babies learn this skill around three years of age.  They begin to recognize what others do – and don’t – know.  While comic books make the super power sound like reading every thought and every memory our everyday human power of mind reading is limited to awareness, lack of awareness, and simple prediction. As adults, this power allows us to do things like joint cooperation (animals can only use parallel cooperation) – and has allowed us to become the d...

GitLab’s Duo Agent Platform is now generally available

GitLab has made its Duo Agent Platform generally available, providing development teams with agentic AI automation that has access to an organization’s full context, standards, and guardrails. The GA release includes Agentic Chat, providing context-aware assistance throughout the GitLab platform. Agentic Chat builds on the previously released Duo Chat, and brings in context from issues, merge requests, pipelines, security findings, and more, and can perform actions on a developer’s behalf. For example, in the Web UI, Agentic Chat can create issues, epics, merge requests, and highlight key findings and create actionable guidance based on organizational context. Additionally, in the IDE, it can generate code, configurations, and infrastructure-as-code, as well as fix bugs, generate texts, and produce documentation. Other ways Agentic Chat can be used are helping developers understand, configure, or troubleshoot CI/CD pipelines or create new ones, and on the security front, it can exp...

ScyllaDB Releases Integrated Vector Search: 1B Vectors with 2ms P99s and 250K QPS Throughput

ScyllaDB today announced the general availability of its new Vector Search capability, which is integrated into ScyllaDB X Cloud. This high-performance vector search supports the industry’s largest models with low TCO. ScyllaDB is commonly used for real-time AI workloads such as latency-sensitive machine learning, predictive analytics, and fraud detection. It is trusted by high-growth companies such as Tripadvisor, ShareChat, and Freshworks to power large-scale latency-sensitive feature stores. As ScyllaDB’s customers began adopting vector search, many found standalone vector databases to be overly complex and costly at scale. In response, ScyllaDB added Vector Search to its ScyllaDB Cloud offering. ScyllaDB Vector Search is built on ScyllaDB’s shard-per-core architecture with a Rust-based extension that leverages USearch , the industry-standard ANN search library. The architecture separates storage and indexing responsibilities while keeping the system unified from the user’s perspe...

MetalBear launches mirrord for CI to improve testing process for cloud native apps

MetalBear is launching a new tool that allows development teams to run CI tests against Kubernetes environments without needing to deploy code to it or spin up test environments. According to MetalBear, testing cloud native applications can be difficult because a change made to a single service requires other services to be tested to see how it behaves. This is typically accomplished by spinning up new cloud environments or using local Kubernetes tools, but spinning up new environments can take 20-30 minutes, increase cloud costs, and add ongoing maintenance, and using local tools also has its drawbacks because local clusters don’t always behave like real ones. Mirrord for CI aims to address these concerns by securely connecting a runner to an existing Kubernetes cluster, and then running a test suite with real services, dependencies, and traffic, enabling development teams to test against real conditions. “Your code, i.e. the microservice in the branch you want to merge, runs in t...

Report: Companies with technical debt unlikely to see benefits from AI adoption

Organizations that have modernized their applications are three times more likely to see a clear ROI on their AI investments compared to those that haven’t, according to a new survey from Cloudflare. The 2026 App Innovation Report found that 93% of leaders believe that updating their software was “the single most important factor in boosting their AI capabilities.” Organizations that have fallen behind on their modernization efforts report being 85% less confident in their infrastructure. Those who fall into that camp often only modernize reactively after a security breach happens. Additionally, companies that align security with modernization are four times more likely to reach advanced AI maturity. “If you aren’t modernizing your business to embrace AI and prevent the next wave of cyberattacks, you aren’t just standing still, you’re rapidly falling behind. The winners of this era of the Internet will ultimately be defined by their infrastructure,” said Matthew Prince, CEO and co...

Chainguard adds 10 new projects to EmeritOSS program for prolonging the life of open source tools

Chainguard is adding 10 new open source projects to EmeritOSS, its program for supporting mature open source projects that don’t require continuous upkeep or whose maintainers need to step away. “EmeritOSS exists for the projects that have earned their stripes. They’ve shipped, scaled, and supported real systems, and while their maintainers may be ready to step back, the software itself still has plenty of life left. EmeritOSS provides continuity-focused stewardship for mature projects by maintaining public, non-competitive forks, addressing security issues through dependency updates and releases, and clearly documenting support boundaries,” Chainguard wrote in a blog post . EmeritOSS first launched in December with three starting projects: Kaniko , Kubeapps , and ingress-nginx . The 10 new projects that are being added span object storage, monitoring, data processing, backup integrations, and observability. They include: MinIO Prometheus PushProx Cassandra Exporter Prometheus ...

This week in AI updates: Google’s UCP standard, a redesigned Slackbot, and more (January 16, 2026)

Google unveils new open-source standard for agentic commerce Google has announced a new open-source standard for agentic commerce called the Universal Commerce Protocol (UCP). Developed in collaboration with a number of commerce companies, including Shopify, Etsy, Wayfair, Target, and Walmart, UCP establishes a common language and primitives for the commerce journey between consumer surfaces, businesses, and payment providers. “As consumers embrace conversational experiences, they expect seamless transitions from brainstorming and research to final purchase. That means it’s critical to support real-time inventory checks, dynamic pricing, and instant transactions, all within the user’s current conversational context,” Google wrote in a blog post . Newly redesigned Slackbot is now generally available Salesforce announced that the newly redesigned Slackbot is now generally available, offering users an out-of-the-box AI agent that lives within Slack. “By bringing the full power of ...

Box Extract intelligently pulls information from unstructured content to help with workflow automation

Box announced the launch of Box Extract, which intelligently pulls information from content and saves it as metadata, helping organizations automate workflows and accelerate decision-making by making information more easily accessible. According to the company, a lot of organizational knowledge lives in contracts, product specifications, policy documents, charts, and other types of unstructured content. Box Extract utilizes agentic capabilities and AI models from Google, Anthropic, and OpenAI to accurately extract this information. Box explained that legacy tools often focus only on extracting text, whereas Box Extract understands document structure and meaning. It breaks the document down into components like paragraphs, tables, and charts, and then pulls out important information from those components. Customers will also be able to create their own Extract Agents that are tailored specifically to their business needs. Information extracted is stored in Box as custom metadata, an...

Copilot Studio Extension now available in VS Code

Microsoft has announced the general availability of its Copilot Studio Extension for Visual Studio Code. The extension allows developers to build and manage Copilot Studio agents directly from within their IDE. According to Microsoft, the extension is useful because developers need to have similar controls and processes when developing agents as they do for other applications: source control, pull requests, change history, and repeatable deployments. With the Copilot Studio Extension, developers can clone an agent to their local workspace, edit it in VS Code, review changes and apply them to Copilot Studio, and deploy using the same processes used to deploy regular applications. Other benefits include standard Git integration, pull request-based reviews, clear history for auditability, and VS Code features like keyboard shortcuts, search, navigation, and a local development loop. “We built this extension so agent development can feel like the way software teams already work: in y...

Testing AI-Infused Applications: Strategies for Reliable Automation

AI is transforming the software landscape, with many organizations integrating AI-driven workflows directly into their applications or exposing their functionality to external, AI-powered processes . This evolution brings new and unique challenges for automated testing. Large language models (LLMs), for example, inherently produce non-deterministic outputs, which complicate traditional testing methods that rely on predictable results matching specific expectations. Repeatedly verifying LLM-based systems leads to repeated calls to these models—and if the LLM is provided by a third party, costs can quickly escalate. Additionally, new protocols such as MCP and Agent2Agent (A2A) are being adopted, enabling LLMs to gain richer context and execute actions, while agentic systems can coordinate between different agents in the environment. What strategies can teams adopt to ensure reliable and effective testing of these new, AI-infused applications in the face of such complexity and unpredicta...

Kaggle introduces Community Benchmarks to allow for custom evaluations of AI models

Kaggle has announced that it now offers Community Benchmarks, enabling AI practitioners to design, run, and share their own benchmarks for evaluating AI models. Kaggle is a community platform run by Google that offers models and resources for data scientists and machine learning practitioners. Last year, it had introduced Kaggle Benchmarks to provide evaluations from research groups, such as Meta’s MultiLoKo and Google’s FACTS suite benchmarks. This latest announcement extends this to the community as a whole, allowing them to create benchmarks specific to their own use cases. According to Google, AI capabilities are evolving so quickly that the existing ways of benchmarking and evaluating them aren’t able to keep up. With Community Benchmarks, the company hopes to bridge this gap and provide a more flexible and transparent framework for evaluation. To get started, users can create a task, which enables them to test an AI model’s performance on a specific problem. Once multiple tas...