Skip to the content.

Welcome to My Blog

Autonomous AI Security Testing: Benchmarking LLM Agents on HackTheBox, Cybench, CTFs, and Beyond

Recent Posts

BoxPwnr is a fun experiment to see how far Large Language Models (LLMs) can go in solving HackTheBox machines autonomously.

Key Features

Current Results


View All Posts About