Categories – Featured

OpenMCP: A Reproducible Benchmarking Harness for Evaluating Computer-Use Agents on Chameleon

How two NYU master's students built a self-hosted, hardware-diverse evaluation framework for the fast-growing MCP ecosystem

MCP-enabled computer-use agents are proliferating faster than our ability to evaluate them — and most existing benchmarks depend on commercial APIs that get deprecated without notice. OpenMCP is an open-source, fully self-hosted benchmarking harness built by NYU researchers that lets anyone reproducibly evaluate MCP agents across diverse hardware, from H100 datacenter GPUs to a Raspberry Pi 5 at the edge.

Running Artifact Evaluations on Chameleon

A practical guide for AE organizers using shared research infrastructure

Chameleon has supported artifact evaluations at more than 30 events across 16 major HPC and systems conferences. This guide distills those lessons into practical advice for AE organizers: how to plan hardware access, structure author and reviewer workflows, and keep reproducible artifacts alive after the evaluation closes.

Baremetal H100 nodes on Chameleon

We are excited to announce that Chameleon now has two baremetal H100 nodes at CHI@TACC! Last year, we announced H100 GPUs with virtual machines at KVM@TACC, but now we have this same hardware configured for baremetal on CHI@TACC, letting you run H100 experiments without going through a virtualization layer.

Chameleon Newsletter & Changelog March 2026

Welcome to the Chameleon March 2026 Newsletter!

This month we're highlighting the last chance to register for the Sixth Chameleon User Meeting, a new webinar recording from UTEP's MINCER team, platform updates including multi-instance GPU support, new cloud traces, switch performance improvements, and several testbed usability enhancements.