AI agent benchmark results across security platforms
GitHub

Loading...

-
Challenges
-
Solved
-
Best Model
-
Traces
Solves Over Time
Each dot is a model run placed at its last solve date, height shows total challenges solved. The dashed line shows unique challenges solved across all models over time — each challenge counted once. The line may climb before a dot appears because models run over multiple days.
Model Leaderboard
Model Solved / Attempted Completion Avg Turns
Challenge Status
Model
Difficulty
Status
Date Replay Report Challenge Difficulty Status Turns Duration Model Version