Building Production-Ready AI Agents with Spring AI: My learning from the workshop
Hey fellow Java developers, Today marks day 92 of my #100DaysOfJava journey, I have an exciting story to share with you! I recently attended an incredible workshop on Building Java AI Agents with Spring AI hosted by AWS. In this workshop I learned about building an AI agent that can not only chat with you but also manage a fictional unicorn rental business as example application.
What you will learn
This article will demonstrate my learnings from the workshop, like how to construct a complete AI agent system using Spring AI, progressing from a simple chat interface to a sophisticated application featuring:
- Persistent Memory Management
- Retrieval-Augmented Generation (RAG)
- External Tool Integration
- ChatClient Interface
- Vector Store Integration
- AWS Bedrock Integration
- Model Context Protocol (MCP)
- Production Deployment
- Security & Performance
- Real-world Architecture
We’ll build a fictional unicorn rental system that showcases real-world AI integration patterns applicable to any business domain.
Picture this: You’re a Java developer who has been comfortable with Spring Boot for years, and suddenly someone tells you that you can now build AI-powered applications using the same familiar Spring concepts you already know and love. That’s exactly what Spring AI brings to the table!
Understanding Spring AI’s Core Architecture
Spring AI provides a unified abstraction layer over various AI model providers, including OpenAI, Azure OpenAI, and Amazon Bedrock. The framework’s central component is the ChatClient interface, which standardizes interactions with different AI models while maintaining Spring’s dependency injection and configuration principles.
To fully grasp Spring AI’s architecture, it’s essential to understand its foundational concepts. The Spring AI Core Concepts documentation provides comprehensive coverage of these fundamental principles.
The ChatClient Interface: Your Gateway to AI Magic
At the heart of Spring AI lies the ChatClient interface - think of it as your personal assistant that can talk to any AI model. Here’s what makes it so powerful:
| |
It can use any model under the hood with system prompt and other configuration. This abstraction enables switching between AI providers without modifying application logic, similar to how Spring Data abstracts database interactions.
| |
The streaming endpoint returns a Flux<String>, which means the frontend receives the AI response word by word, creating that magical typewriter effect!
Step 2: Giving Our Agent Memory
What good is a rental agent who forgets who you are the moment you ask a second question? This is where MessageChatMemoryAdvisor comes to the rescue. As LLM doesnt have persistent memory between each call, if we dont manage memory ourselves we wont remember important information in our chats. Furthermore, assistant wont be able to give accurate and helpful answers without memory
| |
The MessageWindowChatMemory maintains a sliding window of recent messages, while JdbcChatMemoryRepository provides persistence using standard database infrastructure.
Streaming Responses for Enhanced User Experience
Modern AI applications benefit from streaming responses that provide immediate feedback:
| |
The Flux<String> return type enables real-time streaming of AI responses, improving perceived performance and user engagement.
Web Interface
The web interface features:
- Real-time streaming responses - Watch as the AI types responses in real-time!
- Dark mode design - Perfect for developers who love dark themes
- Responsive layout - Works beautifully on desktop and mobile
- Chat history - See your entire conversation flow
Implementing Retrieval-Augmented Generation (RAG)
RAG enhances AI responses by incorporating domain-specific knowledge from external sources. This approach combines the general knowledge of pre-trained models with current, specific information from your organization’s data.
Vector Store Integration
Spring AI integrates seamlessly with PostgreSQL’s PGVector extension for vector similarity search:
| |
The RAG workflow operates as follows:
- Documents are chunked and converted to vector embeddings using Amazon Titan
- User queries are similarly embedded
- Vector similarity search retrieves relevant document chunks
- The AI model generates responses using both its training data and retrieved context
Configuration for RAG
The application properties configure the complete RAG pipeline:
# RAG Configuration
spring.ai.model.embedding=bedrock-titan
spring.ai.bedrock.titan.embedding.model=amazon.titan-embed-text-v2:0
spring.ai.bedrock.titan.embedding.input-type=text
spring.ai.vectorstore.pgvector.initialize-schema=true
spring.ai.vectorstore.pgvector.dimensions=1024
This configuration automatically initializes the vector database schema and configures Amazon Titan for generating embeddings.
Tool Calling: Extending AI Capabilities
Tool calling allows AI agents to interact with external systems and APIs, transforming them from simple chatbots into capable automation systems.
Implementing External Tools
Tools are defined using the @Tool annotation and standard Java methods:
| |
DateTime Tool for Temporal Context
A simpler tool demonstrates basic temporal functionality:
| |
When users request weather information, the AI automatically determines it needs current time context and weather data, orchestrating multiple tool calls to provide comprehensive responses.
Model Context Protocol (MCP) Integration
MCP standardizes communication between AI agents and external services. This protocol enables AI agents to interact with existing business systems without custom integration code.
Converting Existing Services to MCP Servers
Existing Spring Boot services can be enhanced with MCP capabilities by adding the @Tool annotation to service methods:
| |
This approach allows AI agents to perform actual business operations, transforming them from conversational interfaces into functional business automation tools.
Configuration-Driven Development
Spring AI’s configuration-centric approach minimizes boilerplate code:
# Application basics
spring.application.name=agent
logging.level.org.springframework.ai=DEBUG
# Amazon Bedrock Configuration
spring.ai.bedrock.converse.chat.options.model=us.anthropic.claude-3-7-sonnet-20250219-v1:0
# UI Configuration
spring.thymeleaf.cache=false
spring.thymeleaf.prefix=classpath:/templates/
spring.thymeleaf.suffix=.html
# JDBC Memory Configuration
spring.ai.chat.memory.repository.jdbc.initialize-schema=always
spring.datasource.username=postgres
# RAG Configuration
spring.ai.model.embedding=bedrock-titan
spring.ai.bedrock.titan.embedding.model=amazon.titan-embed-text-v2:0
spring.ai.bedrock.titan.embedding.input-type=text
spring.ai.vectorstore.pgvector.initialize-schema=true
spring.ai.vectorstore.pgvector.dimensions=1024
# MCP Client Configuration
spring.ai.mcp.client.toolcallback.enabled=true
This configuration automatically sets up:
- Claude 3.5 Sonnet as our chat model
- Amazon Titan for embeddings
- PostgreSQL for memory and vector storage
- Thymeleaf for web templates
- MCP client for external tool integration
No manual beans, no complex configuration classes - just properties!
The Complete Architecture: A Symphony of Components
Our final unicorn rental agent was a masterpiece of modern software architecture:
- ChatClient - The conductor orchestrating everything
- Web Interface - Beautiful Thymeleaf + Tailwind CSS frontend with real-time streaming
- Memory System - Both local and persistent, keeping track of conversations
- Vector Store - RAG capabilities for domain knowledge
- Tool Calling - Weather, datetime, and business operations
- MCP Integration - Communication with external services
- AWS Bedrock - Powerful Claude AI models and Titan embeddings
- PostgreSQL + PGVector - Reliable data storage with vector capabilities
The beauty of this architecture is how all these components work together seamlessly, yet each can be developed and tested independently.
Deployment: Taking It to the Cloud
The workshop didn’t stop at development - we deployed our creation to AWS EKS (Elastic Kubernetes Service)! Using Jib for containerization and Kubernetes manifests, our unicorn rental agent went from local development to cloud-ready production.
| |
Key Takeaways for Fellow Java Developers
- Familiar Territory: Spring AI uses the same patterns we know and love from Spring Framework
- Progressive Enhancement: Start simple, add features incrementally
- Real Business Value: AI agents can perform actual business operations, not just chat
- Production Ready: With proper architecture, these agents can handle real-world loads
- Ecosystem Integration: Seamlessly works with existing Spring/Java infrastructure
Performance and Scaling
Several factors impact production performance:
- Memory Management: Configure appropriate message window sizes based on conversation length requirements
- Vector Store Optimization: Regularly maintain vector indexes and consider partitioning strategies for large datasets
- Connection Pooling: Configure database connection pools appropriately for concurrent AI interactions
- Caching: Implement caching strategies for frequently accessed embeddings and tool responses
Security Considerations
Production deployments require attention to security:
- Authentication: Integrate conversation IDs with proper user authentication systems
- API Rate Limiting: Implement rate limiting to prevent abuse of AI endpoints
- Data Privacy: Ensure conversation data handling complies with privacy regulations
- Input Validation: Validate and sanitize all user inputs before processing
Common Pitfalls and Best Practices
When developing AI applications with Spring AI, several common pitfalls can impact performance, cost, and user experience. Here’s how to avoid them:
Memory Management
Unlimited conversation history can lead to context window overflow and increased costs. Implement sliding window memory with appropriate limits:
| |
Tool Design Principles
Effective tools should be:
- Atomic: Each tool should perform a single, well-defined operation
- Idempotent: Tools should produce consistent results for identical inputs
- Error-Resilient: Implement comprehensive error handling and fallback mechanisms
Vector Store Maintenance
RAG systems require ongoing maintenance:
- Regular reindexing of document embeddings
- Monitoring for embedding drift as models evolve
- Implementing document versioning and update strategies
Real-World Applications
The patterns demonstrated extend to various enterprise scenarios:
Customer Service Automation
AI agents can handle customer inquiries by accessing knowledge bases, order systems, and external APIs for real-time information retrieval and problem resolution.
Internal Tool Integration
Development teams can create AI assistants that interact with CI/CD systems, monitoring tools, and documentation platforms to streamline operational workflows.
Business Process Automation
AI agents can orchestrate complex business processes by integrating with ERP systems, approval workflows, and notification services.
Conclusion: The Future of Java AI Development
Spring AI represents a paradigm shift for Java developers entering the AI space. Instead of learning entirely new frameworks or switching to Python, you can leverage your existing Spring expertise to build sophisticated AI applications.
Key takeaways for your next Java AI project:
- Leverage Familiar Patterns: Spring AI uses dependency injection, configuration properties, and annotations you already know
- Start Simple, Scale Smart: Begin with basic chat functionality and progressively add memory, RAG, and tool calling
- Production-Ready Architecture: The framework includes enterprise features like connection pooling, error handling, and monitoring out of the box
- Provider Flexibility: Switch between OpenAI, Azure, AWS Bedrock, or other providers without changing your application code
- Real Business Value: Move beyond chatbots to create AI agents that integrate with existing business systems
The combination of Spring AI’s familiar development model with powerful AI capabilities opens up endless possibilities for Java applications. Whether you’re building customer service automation, internal development tools, or complex business process orchestration, Spring AI provides the foundation you need.
As the AI landscape continues evolving, Spring AI ensures Java developers can innovate without abandoning their expertise. The future of enterprise AI is here, and it speaks Java.
Ready to start building? Check out the complete project repository and begin your own AI journey today.
This post is part of my #100DaysOfJava challenge. Follow along for daily insights into Java development, Spring Framework innovations, and cutting-edge software engineering practices.
Resources: