AGNTCY Multi-Agent Customer Service Platform – Phase 1

Like, follow and subscribe
BLOG
SHARE
BLOG
SHARE

Executive Summary

Creating a maintainable, scalable, secure and cost-effective system that incorporates modern AI elements for content generation, classification and decision-making is hard. Choosing the right technologies and vendor partners is like hitting a moving target from a great distance; things often don’t end up being what you expected by the time you need them. Choosing the right architecture is equally difficult and relies on assumptions about organizational and operational maturity in addition to the aforementioned technological and product-readiness concerns. And the process of implementation itself needs to be handled carefully, accounting for unforeseeable changes in costs, functional requirements and the sometimes-uneven progress of development itself as people, and priorities, shift over time.

This project is an example of a real-world journey of building and deploying a multi-agent AI system, from initial concept to production deployment on Microsoft Azure. We aim to illustrate not only an excellent final implementation, but a way of planning for change and minimizing risks and costs from start to finish.

This project is intended to be primarily an educational tool, but we’re not sacrificing the -ilities that a true production-quality solution requires. You should be able – if you choose – to adapt this project to your unique circumstances and deploy it for your own business. It will work. This is not a toy.

Up to Table of Contents

Building Enterprise-class, Multi-Agent Systems

The development team has been tasked with building an intelligent customer service orchestration platform, a multi-agent system where specialized AI agents handle different aspects of customer interactions. Some of the agents may interact with AI models, while others execute rules and deterministic processes to add data, cleanse, measure, throttle or otherwise contribute to the behavior of the system as a whole.

The decision to implement an agentic component model is not a matter of debate, given that scaling and maintainability and security all require some degree of modularity and create boundaries between differing parts of the system. The question is how, exactly, those components should be conceived, implemented and connected.

For this project, we made several technology choices which could be debated, and should be. In many cases, different organizations may need to make different choices, but we decided to move forward with a toolkit that we know, and which should work well in the majority of real-world cases.

Our architecture employs five specialized agents working in concert to deliver exceptional customer service. The Intent Classification Agent serves as the initial routing mechanism, analyzing incoming customer requests and directing them to appropriate handlers. The Knowledge Retrieval Agent searches across internal documentation, FAQs, product catalogs, and integrated systems like Shopify, Zendesk, and policy databases to gather relevant information. The Response Generation Agent synthesizes contextually appropriate responses, while the Escalation Agent identifies complex cases requiring human intervention based on sentiment analysis and complexity scoring. Finally, the Analytics Agent passively collects metrics and performance data to support continuous improvement.

These agents communicate using the AGNTCY SDK’s protocol stack, which includes A2A (Agent-to-Agent) for custom logic and peer communication, MCP (Model Context Protocol) for external API integration, SLIM (Secure Low-Latency Interactive Messaging) for security-critical interactions, and NATS for high-throughput pub-sub messaging patterns. This multi-protocol approach allows each agent to use the most appropriate transport mechanism for its specific requirements.

The production architecture shown above illustrates our Azure deployment strategy for Phases 4 and 5. All agents run as containerized workloads in Azure Container Instances with auto-scaling capabilities. Azure Cosmos DB serves as our conversation state store using serverless mode for cost optimization. Azure Cache for Redis handles session management on the Basic C0 tier. The Application Gateway provides load balancing and SSL termination, while Azure Key Vault manages secrets with Managed Identity integration. OpenTelemetry Collector aggregates telemetry data, which flows into ClickHouse for storage and Grafana for visualization. This architecture is designed to deliver production-grade reliability while staying within our $200 monthly budget constraint.

Up to Table of Contents

Executing this project in phases

Phase 1: Infrastructure and Containerization

Phase 1 establishes the foundation for local development with zero cloud costs. We’ve created a comprehensive Docker Compose environment consisting of 13 services running on a single bridge network. This includes NATS messaging (ports 4222-8222), SLIM transport with gateway password authentication (port 46357), ClickHouse database (ports 9000, 8123), OpenTelemetry Collector (ports 4317, 4318), Grafana dashboards (port 3001), five agent containers, and four mock API services.

Our shared utilities module provides 1,136 lines of production-grade code with 100% test coverage. The Factory Singleton pattern ensures thread-safe SDK initialization with proper resource cleanup. All agents follow consistent patterns with configuration loading, structured logging, graceful shutdown handling, and comprehensive error handling. The testing framework includes 63 passing tests (9 Docker-dependent tests skipped locally) with 46% overall coverage, which is appropriate for Phase 1 mock implementations.

We’ve implemented Docker optimization techniques that reduced image sizes by 40-50% and build times by over 90%. Agent images now range from 150-200MB down from 250-300MB, with build times dropping from 60-90 seconds to just 5-10 seconds for code changes. This is achieved through multi-stage builds, layer caching optimization, and dependency installation before code copying.

The CI/CD pipeline validates code across multiple platforms (Windows, Linux, macOS) and Python versions (3.12, 3.13). We enforce code quality through Flake8 linting, Black formatting, and Bandit security scanning. The minimum 46% coverage requirement ensures core utilities maintain their 100% coverage target.

Our mock APIs mirror production service behavior without incurring costs. Mock Shopify implements 8 endpoints across products, inventory, orders, and checkouts (195 lines). Mock Zendesk provides ticket creation, retrieval, updates, and user management (278 lines). Mock Mailchimp handles email marketing and subscriber management (274 lines), while Mock Google Analytics simulates GA4 Measurement Protocol event tracking (219 lines). These mocks enable full integration testing without external dependencies.

Phase 2: Business Logic Implementation

Phase 2 will replace our keyword-based intent classification with real NLP models, either using Azure Cognitive Services, OpenAI embeddings, or a custom-trained model. We’ll integrate Azure OpenAI for LLM-powered response generation, enabling contextual synthesis and personalization based on customer history. Knowledge retrieval will be enhanced with vector embeddings for semantic search and improved relevance ranking. Multi-language support will be architected for Phase 4 deployment, with language detection and topic-based routing to language-specific agent instances. All of this development remains at $0 cost through local Docker execution.

Phase 3: Testing and Validation

Phase 3 expands our test coverage beyond the current 46% to a target of 80%. We’ll implement comprehensive integration testing, end-to-end functional tests, and performance benchmarking with Locust for load testing. The CI/CD pipeline will be enhanced with automated performance tests, UI testing with Playwright, and deployment validation. Quality gates will enforce coverage requirements, performance thresholds, and security scanning results before any production deployment.

Phase 4: Azure Production Setup

Phase 4 transitions to Azure with a target budget of $180-200 monthly. We’ll provision Azure Container Instances ($15-20/month), Cosmos DB serverless tier ($25-30/month), Redis Cache Basic C0 ($15-20/month), Application Gateway ($20-30/month), Container Registry ($5/month), Key Vault ($5/month), Application Insights ($5-10/month), Cognitive Search ($10-15/month), and Blob Storage ($5/month). Terraform will manage all infrastructure as code. Multi-language support for Canadian French and Spanish will be deployed with pre-translated response templates to avoid real-time translation costs. Real API integration with Shopify, Zendesk, Mailchimp, and Google Analytics will replace our mocks.

Phase 5: Production Deployment and Go-Live

Phase 5 validates our production deployment through comprehensive testing. Security validation includes penetration testing, OWASP scanning, and compliance verification. Load testing will confirm our system handles 100 concurrent users and 1000 requests per minute while maintaining sub-2-minute response times. Disaster recovery procedures will be tested with RPO of 1 hour and RTO of 4 hours. Monitoring and alerting through Application Insights and Grafana will be fully operational. Final performance validation will measure against our KPIs: response times under 2 minutes, CSAT above 80%, cart abandonment below 30%, and 70%+ automation rate.

Up to Table of Contents

Understanding our choices

Why we chose the AGNTCY.org SDK

The AGNTCY SDK provides production-grade multi-agent orchestration infrastructure that would take months to build from scratch. It offers built-in agent discovery, identity management, and secure messaging through multiple transport protocols. The A2A protocol enables custom agent logic with peer-to-peer communication, while MCP provides standardized external API integration. SLIM transport ensures low-latency, secure communication for sensitive interactions, and NATS pub-sub handles high-throughput scenarios like analytics event collection.

The SDK’s observability integration with OpenTelemetry provides distributed tracing out of the box. Our Factory Singleton pattern creates thread-safe SDK clients with proper lifecycle management and graceful shutdown. The demo mode allows development without full infrastructure, accelerating initial prototyping. As an open-source project with active development and community support, AGNTCY represents a strategic choice for building maintainable, scalable agent systems without vendor lock-in.

Up to Table of Contents
Why we chose Azure and Terraform for IaC

Microsoft Azure provides the optimal balance of functionality and cost for our $200 monthly budget. Azure Container Instances offer pay-per-second billing with sub-second startup times, dramatically reducing costs compared to always-on App Service plans. Cosmos DB serverless mode charges per request rather than provisioned throughput, aligning perfectly with variable customer service workloads. The comprehensive suite of cost-optimized tiers (Redis Basic C0, Container Registry Basic, Application Gateway Standard_v2) enables production deployment without enterprise pricing.

Azure’s integration story is compelling: Managed Identity eliminates secrets management complexity, Application Insights provides observability without third-party tools, and Cognitive Search delivers semantic knowledge retrieval with built-in vector search. The East US region offers the most comprehensive service availability at competitive pricing. Azure’s commitment to OpenTelemetry ensures our observability strategy remains portable.

Terraform manages our infrastructure as code with declarative resource definitions, state management in Azure Blob Storage, and environment-specific configurations for dev and production. The Azure provider offers comprehensive coverage of all services we need, with mature documentation and community support. Infrastructure versioning alongside application code in Git enables proper change management and disaster recovery through infrastructure recreation.

Up to Table of Contents
Why we chose Docker and GitHub

Docker Desktop for Windows enables $0 development costs for Phases 1-3 while maintaining production parity. Our 13-service Docker Compose orchestration includes all infrastructure dependencies, agent containers, and mock services running on a single machine. Multi-stage builds optimize image sizes and build times. Layer caching accelerates iteration. Non-root user enforcement improves security. The consistency between local Docker containers and Azure Container Instances minimizes deployment surprises.

GitHub provides version control, collaboration, and CI/CD through GitHub Actions at zero cost for public repositories. Our workflow validates code across three operating systems and two Python versions on every commit. GitHub Desktop lowers the barrier for developers less familiar with command-line Git. The public repository serves our educational mission, allowing others to learn from our implementation patterns. GitHub’s integration with Dependabot and security scanning tools ensures we maintain dependency hygiene and address vulnerabilities promptly.

Up to Table of Contents
Why we chose Shopify, Mailchimp, Zendesk and Google Analytics

These four platforms represent the most common customer service integration points for e-commerce businesses. Shopify dominates the e-commerce platform market with robust APIs for products, inventory, orders, and checkout events. Our Knowledge Retrieval Agent integrates with Shopify to provide real-time product information and order status. Mailchimp delivers email marketing capabilities with a generous free tier (500 contacts, 1000 sends monthly), enabling automated customer engagement without additional costs.

Zendesk offers enterprise-grade ticketing that our Escalation Agent uses to create support cases requiring human intervention. While Zendesk requires budget allocation ($19-49 per agent monthly), the trial and sandbox options support development and testing. Google Analytics 4 provides web analytics and event tracking at no cost, feeding data to our Analytics Agent for performance monitoring and insight generation.

By choosing the most widely deployed platforms in each category, we maximize the educational value and real-world applicability of this project. Developers can easily adapt our integration patterns to their specific environments, and the mock APIs we built during Phase 1 serve as reference implementations for anyone building similar systems.

Up to Table of Contents
Why we chose OpenAI

OpenAI’s models through Azure OpenAI Service provide the optimal balance of capability, latency, and cost for our Response Generation Agent. Azure OpenAI offers the same GPT models as OpenAI’s API with additional enterprise features: deployment within our Azure environment, Managed Identity authentication, and data residency in our selected region. The pay-per-token pricing model aligns with our variable workload, and aggressive caching strategies minimize redundant generation costs.

For Phase 1 development, we use template-based responses to avoid any LLM costs during infrastructure buildout. Phase 2 will integrate Azure OpenAI with careful token usage monitoring to stay within our $20-50 monthly LLM budget estimate. The Response Generation Agent is architected to support multiple LLM providers, allowing us to evaluate alternatives like Claude through Anthropic’s API if Azure OpenAI pricing becomes prohibitive.

OpenAI’s function calling capabilities integrate naturally with our A2A and MCP protocols, enabling the LLM to invoke other agents and external APIs when generating responses. This creates a powerful orchestration layer where the LLM acts as an intelligent coordinator rather than just a text generator. The combination of AGNTCY’s agent framework with OpenAI’s reasoning capabilities delivers true agentic behavior at a fraction of the cost of building custom models.

Up to Table of Contents

Always read the README

PROJECT-README.txt

Show More Show Less
Up to Table of Contents
  • AGNTCY Summit: Multi-agent systems at scale

    Join open source leaders and innovators to discover the Internet of Agents through real-world use cases, interactive demos, and thought leadership sessions that explore the future of scalable multi-agent systems with open source projects like AGNTCY and more.

Helpful resources