Kousa4 Stack
ArticlesCategories
Open Source

How to Build Self-Improving AI Agents Locally with Hermes and NVIDIA Hardware

Published 2026-05-16 15:06:37 · Open Source

Overview

Agentic AI is transforming productivity by enabling autonomous task execution. Following the success of frameworks like OpenClaw, the open-source community has embraced Hermes Agent—a new framework that has garnered over 140,000 GitHub stars in under three months and, as of last week, is the most used agent on OpenRouter. Developed by Nous Research, Hermes is designed for reliability and self-improvement, two qualities historically challenging to achieve. It is provider- and model-agnostic, optimized for always-on local use, making NVIDIA RTX PCs, NVIDIA RTX PRO workstations, and NVIDIA DGX Spark the ideal hardware to run it at full speed, 24/7.

How to Build Self-Improving AI Agents Locally with Hermes and NVIDIA Hardware
Source: blogs.nvidia.com

This guide will walk you through setting up Hermes Agent locally using NVIDIA hardware and the Qwen 3.6 series models from Alibaba, which are high-performance, open-weight LLMs that outperform previous-generation larger models. By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

Prerequisites

Before you begin, ensure you have the following:

  • Hardware: An NVIDIA RTX PC, RTX PRO workstation, or DGX Spark. For Qwen 3.6 27B or 35B models, at least 20GB of free GPU memory (for 35B) or less for 27B.
  • Software: Windows or Linux with NVIDIA drivers (version 545 or later recommended).
  • Tools: Docker and NVIDIA Container Toolkit installed for GPU acceleration.
  • Models: Access to Qwen 3.6 models (27B or 35B) via Hugging Face or NVIDIA NGC. You'll need a Hugging Face token for download.
  • Knowledge: Basic familiarity with command line, Docker, and Python. No deep AI expertise required.

Step-by-Step Instructions

Step 1: Set Up Your NVIDIA Environment

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

First, verify your GPU is recognized. Open a terminal and run:

nvidia-smi

You should see your GPU model, driver version, and available memory. Next, install the NVIDIA Container Toolkit to enable GPU passthrough to Docker:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

For Windows, ensure you have WSL2 and Docker Desktop with WSL2 integration enabled.

Step 2: Download Qwen 3.6 Model

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

Choose either the 27B or 35B parameter model. The 35B model runs on ~20GB of memory and outperforms 120B models (which require 70GB+). The 27B is a dense model that matches accuracy of 400B models. Use huggingface-cli:

pip install huggingface_hub
huggingface-cli download Qwen/Qwen3.6-35B-Instruct --local-dir ./qwen35b

Replace with the correct repository name if needed. Ensure you have a Hugging Face token set (huggingface-cli login).

Step 3: Launch Hermes Agent with Docker

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

Pull the Hermes Agent Docker image optimized for NVIDIA GPUs:

docker pull nousresearch/hermes-agent:latest-cuda

Run the container with GPU access and mount the model directory:

docker run --gpus all -d --name hermes-agent \
  -v $(pwd)/qwen35b:/models \
  -e MODEL_PATH=/models \
  -p 8080:8080 \
  nousresearch/hermes-agent:latest-cuda

This launches a web interface at http://localhost:8080 and a REST API for integration.

Step 4: Configure Self-Evolving Skills

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

Hermes automatically saves learnings from complex tasks as skills. To enable this, edit the configuration file (hermes_config.yaml inside the container or mount it):

How to Build Self-Improving AI Agents Locally with Hermes and NVIDIA Hardware
Source: blogs.nvidia.com
skills:
  auto_learn: true
  max_skills: 50
  memory_dir: /data/skills

Restart the container to apply changes. Skills are stored as JSON files that can be reviewed and manually curated.

Step 5: Integrate with Messaging and Files

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

Hermes supports Slack, Discord, and file access. For Slack integration, set environment variables:

docker run --gpus all -d --name hermes-agent \
  -e SLACK_BOT_TOKEN=xoxb-... \
  -e SLACK_APP_TOKEN=xapp-... \
  ...

For local file access, mount directories:

-v /path/to/files:/data/files

Now your agent can read, write, and process files.

Common Mistakes

Insufficient GPU Memory

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

The Qwen 3.6 35B model requires ~20GB of GPU memory. If you have less, use the 27B model or enable 4-bit quantization. Check with nvidia-smi during startup; if the container crashes with OOM, reduce model size.

Missing NVIDIA Container Toolkit

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

Without the toolkit, Docker cannot access the GPU, leading to very slow inference on CPU. Verify with docker run --gpus all nvidia/cuda:11.0-base nvidia-smi.

Skill Overload

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

If auto-learning creates too many skills (max_skills too high), the agent may become slower. Set a reasonable limit and periodically review skills via the web UI. Remove duplicates or outdated ones.

Firewall Blocking Ports

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

If you cannot access the web interface, ensure port 8080 is open in your firewall. On Linux, use sudo ufw allow 8080.

Summary

By the end, you'll have a self-improving local AI agent that can run continuously, learn from tasks, and execute complex workflows.

Hermes Agent combined with Qwen 3.6 on NVIDIA RTX hardware delivers a powerful, self-improving AI that runs entirely locally. Key takeaways:

  • Self-evolving skills: Agent learns from tasks and saves reusable skills.
  • Contained sub-agents: Efficient task management with small context windows.
  • Reliability by design: Pre-tested skills minimize debugging.
  • Hardware matters: NVIDIA GPUs provide the performance needed for 24/7 operation.

By following this guide, you have set up a local agent that not only performs tasks but improves over time, making it ideal for power users who demand privacy, speed, and adaptability.