
1. Overview
Artificial Intelligence is changing the way we build web applications. Hugging Face is a popular platform that provides a vast collection of open-source and pre-trained LLMs.
We can use Ollama, an open-source tool, to run LLMs on our local machines. It supports running GGUF format models from Hugging Face.
In this tutorial, we’ll explore how to use Hugging Face models with Spring AI and Ollama. We’ll build a simple chatbot using a chat completion model and implement semantic search using an embedding model.
2. Dependencies
Let’s start by adding the necessary dependency to our project’s pom.xml file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
<version>1.0.0-M5</version>
</dependency>
The Ollama starter dependency helps us to establish a connection with the Ollama service. We’ll use it to pull and run our chat completion and embedding models.
Since the current version, 1.0.0-M5, is a milestone release, we’ll also need to add the Spring Milestones repository to our pom.xml:
<repositories>
<repository>
<id>spring-milestones</id>
<name>Spring Milestones</name>
<url>https://repo.spring.io/milestone</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
This repository is where milestone versions are published, as opposed to the standard Maven Central repository.
3. Setting up Ollama With Testcontainers
To facilitate local development and testing, we’ll use Testcontainers to set up the Ollama service.
3.1. Test Dependencies
First, let’s add the necessary test dependencies to our pom.xml:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-spring-boot-testcontainers</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>ollama</artifactId>
<scope>test</scope>
</dependency>
We import the Spring AI Testcontainers dependency for Spring Boot and the Ollama module of Testcontainers.
3.2. Defining Testcontainers Bean
Next, let’s create a @TestConfiguration class that defines our Testcontainers beans:
@TestConfiguration(proxyBeanMethods = false)
class TestcontainersConfiguration {
@Bean
public OllamaContainer ollamaContainer() {
return new OllamaContainer("ollama/ollama:0.5.4");
}
@Bean
public DynamicPropertyRegistrar dynamicPropertyRegistrar(OllamaContainer ollamaContainer) {
return registry -> {
registry.add("spring.ai.ollama.base-url", ollamaContainer::getEndpoint);
};
}
}
We specify the latest stable version of the Ollama image when creating the OllamaContainer bean.
Then, we define a DynamicPropertyRegistrar bean to configure the base-url of the Ollama service. This allows our application to connect to the started Ollama container.
3.3. Using Testcontainers During Development
While Testcontainers is primarily used for integration testing, we can use it during local development, too.
To achieve this, we’ll create a separate main class in our src/test/java directory:
public class TestApplication {
public static void main(String[] args) {
SpringApplication.from(Application::main)
.with(TestcontainersConfiguration.class)
.run(args);
}
}
We create a TestApplication class and, inside its main() method, start our main Application class with the TestcontainersConfiguration class.
This setup helps us run our Spring Boot application and have it connect to the Ollama service, started via Testcontainers.
4. Using a Chat Completion Model
Now that we’ve got our local Ollama container set up, let’s use a chat completion model to build a simple chatbot.
4.1. Configuring Chat Model and Chatbot Beans
Let’s start by configuring a chat completion model in our application.yaml file:
spring:
ai:
ollama:
init:
pull-model-strategy: when_missing
chat:
options:
model: hf.co/microsoft/Phi-3-mini-4k-instruct-gguf
To configure a Hugging Face model, we use the format of hf.co/{username}/{repository}. Here, we specify the GGUF version of the Phi-3-mini-4k-instruct model provided by Microsoft.
It’s not a strict requirement to use this model for our implementation. Our recommendation would be to set up the codebase locally and play around with more chat completion models.
Additionally, we set the pull-model-strategy as when_missing. This ensures that Spring AI pulls the specified model if it’s not available locally.
On configuring a valid model, Spring AI automatically creates a bean of type ChatModel, allowing us to interact with the chat completion model.
Let’s use it to define the additional beans required for our chatbot:
@Configuration
class ChatbotConfiguration {
@Bean
public ChatMemory chatMemory() {
return new InMemoryChatMemory();
}
@Bean
public ChatClient chatClient(ChatModel chatModel, ChatMemory chatMemory) {
return ChatClient
.builder(chatModel)
.defaultAdvisors(new MessageChatMemoryAdvisor(chatMemory))
.build();
}
}
First, we define a ChatMemory bean and use the InMemoryChatMemory implementation. This maintains the conversation context by storing the chat history in the memory.
Next, using the ChatMemory and ChatModel beans, we create a bean of type ChatClient, which is our main entry point for interacting with our chat completion model.
4.2. Implementing a Chatbot
With our configurations in place, let’s create a ChatbotService class. We’ll inject the ChatClient bean we defined earlier to interact with our model.
But first, let’s define two simple records to represent the chat request and response:
record ChatRequest(@Nullable UUID chatId, String question) {}
record ChatResponse(UUID chatId, String answer) {}
The ChatRequest contains the user’s question and an optional chatId to identify an ongoing conversation.
Similarly, the ChatResponse contains the chatId and the chatbot’s answer.
Now, let’s implement the intended functionality:
public ChatResponse chat(ChatRequest chatRequest) {
UUID chatId = Optional
.ofNullable(chatRequest.chatId())
.orElse(UUID.randomUUID());
String answer = chatClient
.prompt()
.user(chatRequest.question())
.advisors(advisorSpec ->
advisorSpec
.param("chat_memory_conversation_id", chatId))
.call()
.content();
return new ChatResponse(chatId, answer);
}
If the incoming request doesn’t contain a chatId, we generate a new one. This allows the user to start a new conversation or continue an existing one.
We pass the user’s question to the chatClient bean and set the chat_memory_conversation_id parameter to the resolved chatId to maintain conversation history.
Finally, we return the chatbot’s answer along with the chatId.
4.3. Interacting With Our Chatbot
Now that we’ve implemented our service layer, let’s expose a REST API on top of it:
@PostMapping("/chat")
public ResponseEntity<ChatResponse> chat(@RequestBody ChatRequest chatRequest) {
ChatResponse chatResponse = chatbotService.chat(chatRequest);
return ResponseEntity.ok(chatResponse);
}
We’ll use the above API endpoint to interact with our chatbot.
Let’s use the HTTPie CLI to start a new conversation:
http POST :8080/chat question="Who wanted to kill Harry Potter?"
We send a simple question to the chatbot, let’s see what we get as a response:
{
"chatId": "7b8a36c7-2126-4b80-ac8b-f9eedebff28a",
"answer": "Lord Voldemort, also known as Tom Riddle, wanted to kill Harry Potter because of a prophecy that foretold a boy born at the end of July would have the power to defeat him."
}
The response contains a unique chatId and the chatbot’s answer to our question.
Let’s continue this conversation by sending a follow-up question using the chatId from the above response:
http POST :8080/chat chatId="7b8a36c7-2126-4b80-ac8b-f9eedebff28a" question="Who should he have gone after instead?"
Let’s see if the chatbot can maintain the context of our conversation and provide a relevant response:
{
"chatId": "7b8a36c7-2126-4b80-ac8b-f9eedebff28a",
"answer": "Based on the prophecy's criteria, Voldemort could have targeted Neville Longbottom instead, as he was also born at the end of July to parents who had defied Voldemort three times."
}
As we can see, the chatbot does indeed maintain the conversation context as it references the prophecy we discussed in the previous message.
The chatId remains the same, indicating that the follow-up answer is a continuation of the same conversation.
5. Using an Embedding Model
Moving on from the chat completion model, we’ll now use an embedding model to implement semantic search on a small dataset of quotes.
We’ll fetch the quotes from an external API, store them in an in-memory vector store, and perform a semantic search.
5.1. Fetching Quote Records From an External API
For our demonstration, we’ll use the QuoteSlate API to fetch quotes.
Let’s create a QuoteFetcher utility class for this:
class QuoteFetcher {
private static final String BASE_URL = "https://quoteslate.vercel.app";
private static final String API_PATH = "/api/quotes/random";
private static final int DEFAULT_COUNT = 50;
public static List<Quote> fetch() {
return RestClient
.create(BASE_URL)
.get()
.uri(uriBuilder ->
uriBuilder
.path(API_PATH)
.queryParam("count", DEFAULT_COUNT)
.build())
.retrieve()
.body(new ParameterizedTypeReference<>() {});
}
}
record Quote(String quote, String author) {}
Using RestClient, we invoke the QuoteSlate API with the default count of 50 and use ParameterizedTypeReference to deserialize the API response to a list of Quote records.
5.2. Configuring and Populating an In-Memory Vector Store
Now, let’s configure an embedding model in our application.yaml:
spring:
ai:
ollama:
embedding:
options:
model: hf.co/nomic-ai/nomic-embed-text-v1.5-GGUF
We use the GGUF version of the nomic-embed-text-v1.5 model provided by nomic-ai. Again, feel free to try this implementation with a different embedding model.
After specifying a valid model, Spring AI automatically creates a bean of type EmbeddingModel for us.
Let’s use it to create a vector store bean:
@Bean
public VectorStore vectorStore(EmbeddingModel embeddingModel) {
return SimpleVectorStore
.builder(embeddingModel)
.build();
}
For our demonstration, we create a bean of SimpleVectorStore class. It’s an in-memory implementation that emulates a vector store using the java.util.Map class.
Now, to populate our vector store with quotes during application startup, we’ll create a VectorStoreInitializer class that implements the ApplicationRunner interface:
@Component
class VectorStoreInitializer implements ApplicationRunner {
private final VectorStore vectorStore;
// standard constructor
@Override
public void run(ApplicationArguments args) {
List<Document> documents = QuoteFetcher
.fetch()
.stream()
.map(quote -> {
Map<String, Object> metadata = Map.of("author", quote.author());
return new Document(quote.quote(), metadata);
})
.toList();
vectorStore.add(documents);
}
}
In our VectorStoreInitializer, we autowire an instance of VectorStore.
Inside the run() method, we use our QuoteFetcher utility class to retrieve a list of Quote records. Then, we map each quote into a Document and configure the author field as metadata.
Finally, we store all the documents in our vector store. When we invoke the add() method, Spring AI automatically converts our plaintext content into vector representation before storing it in our vector store. We don’t need to explicitly convert it using the EmbeddingModel bean.
5.3. Testing Semantic Search
With our vector store populated, let’s validate our semantic search functionality:
private static final int MAX_RESULTS = 3;
@ParameterizedTest
@ValueSource(strings = {"Motivation", "Happiness"})
void whenSearchingQuotesByTheme_thenRelevantQuotesReturned(String theme) {
SearchRequest searchRequest = SearchRequest
.builder()
.query(theme)
.topK(MAX_RESULTS)
.build();
List<Document> documents = vectorStore.similaritySearch(searchRequest);
assertThat(documents)
.hasSizeBetween(1, MAX_RESULTS)
.allSatisfy(document -> {
String title = String.valueOf(document.getMetadata().get("author"));
assertThat(title)
.isNotBlank();
});
}
Here, we pass some common quote themes to our test method using @ValueSource. We then create a SearchRequest object with the theme as the query and MAX_RESULTS as the number of desired results.
Next, we call the similaritySearch() method of our vectorStore bean with the searchRequest. Similar to the add() method of the VectorStore, Spring AI converts our query to its vector representation before querying the vector store.
The returned documents will contain quotes that are semantically related to the given theme, even if they don’t contain the exact keyword.
6. Conclusion
In this article, we’ve explored using Hugging Face models with Spring AI.
Using Testcontainers, we set up the Ollama service, creating a local test environment.
First, we used a chat completion model to build a simple chatbot. Then, we implemented semantic search using an embedding model.
As always, all the code examples used in this article are available over on GitHub.
The post Using Hugging Face Models With Spring AI and Ollama first appeared on Baeldung.