Using Hugging Face Models With Spring AI and Ollama

1. Overview

Artificial Intelligence is changing the way we build web applications. Hugging Face is a popular platform that provides a vast collection of open-source and pre-trained LLMs.

We can use Ollama, an open-source tool, to run LLMs on our local machines. It supports running GGUF format models from Hugging Face.

In this tutorial, we’ll explore how to use Hugging Face models with Spring AI and Ollama. We’ll build a simple chatbot using a chat completion model and implement semantic search using an embedding model.

2. Dependencies

Let’s start by adding the necessary dependency to our project’s pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    <version>1.0.0-M5</version>
</dependency>

The Ollama starter dependency helps us to establish a connection with the Ollama service. We’ll use it to pull and run our chat completion and embedding models.

Since the current version, 1.0.0-M5, is a milestone release, we’ll also need to add the Spring Milestones repository to our pom.xml:

<repositories>
    <repository>
        <id>spring-milestones</id>
        <name>Spring Milestones</name>
        <url>https://repo.spring.io/milestone</url>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
    </repository>
</repositories>

This repository is where milestone versions are published, as opposed to the standard Maven Central repository.

3. Setting up Ollama With Testcontainers

To facilitate local development and testing, we’ll use Testcontainers to set up the Ollama service.

3.1. Test Dependencies

First, let’s add the necessary test dependencies to our pom.xml:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-spring-boot-testcontainers</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>ollama</artifactId>
    <scope>test</scope>
</dependency>

We import the Spring AI Testcontainers dependency for Spring Boot and the Ollama module of Testcontainers.

3.2. Defining Testcontainers Bean

Next, let’s create a @TestConfiguration class that defines our Testcontainers beans:

@TestConfiguration(proxyBeanMethods = false)
class TestcontainersConfiguration {
    @Bean
    public OllamaContainer ollamaContainer() {
        return new OllamaContainer("ollama/ollama:0.5.4");
    }
    @Bean
    public DynamicPropertyRegistrar dynamicPropertyRegistrar(OllamaContainer ollamaContainer) {
        return registry -> {
            registry.add("spring.ai.ollama.base-url", ollamaContainer::getEndpoint);
        };
    }
}

We specify the latest stable version of the Ollama image when creating the OllamaContainer bean.

Then, we define a DynamicPropertyRegistrar bean to configure the base-url of the Ollama service. This allows our application to connect to the started Ollama container.

3.3. Using Testcontainers During Development

While Testcontainers is primarily used for integration testing, we can use it during local development, too.

To achieve this, we’ll create a separate main class in our src/test/java directory:

public class TestApplication {
    public static void main(String[] args) {
        SpringApplication.from(Application::main)
          .with(TestcontainersConfiguration.class)
          .run(args);
    }
}

We create a TestApplication class and, inside its main() method, start our main Application class with the TestcontainersConfiguration class.

This setup helps us run our Spring Boot application and have it connect to the Ollama service, started via Testcontainers.

4. Using a Chat Completion Model

Now that we’ve got our local Ollama container set up, let’s use a chat completion model to build a simple chatbot.

4.1. Configuring Chat Model and Chatbot Beans

Let’s start by configuring a chat completion model in our application.yaml file:

spring:
  ai:
    ollama:
      init:
        pull-model-strategy: when_missing
      chat:
        options:
          model: hf.co/microsoft/Phi-3-mini-4k-instruct-gguf

To configure a Hugging Face model, we use the format of hf.co/{username}/{repository}. Here, we specify the GGUF version of the Phi-3-mini-4k-instruct model provided by Microsoft.

It’s not a strict requirement to use this model for our implementation. Our recommendation would be to set up the codebase locally and play around with more chat completion models.

Additionally, we set the pull-model-strategy as when_missing. This ensures that Spring AI pulls the specified model if it’s not available locally.

On configuring a valid model, Spring AI automatically creates a bean of type ChatModel, allowing us to interact with the chat completion model.

Let’s use it to define the additional beans required for our chatbot:

@Configuration
class ChatbotConfiguration {
    @Bean
    public ChatMemory chatMemory() {
        return new InMemoryChatMemory();
    }
    @Bean
    public ChatClient chatClient(ChatModel chatModel, ChatMemory chatMemory) {
        return ChatClient
          .builder(chatModel)
          .defaultAdvisors(new MessageChatMemoryAdvisor(chatMemory))
          .build();
    }
}

First, we define a ChatMemory bean and use the InMemoryChatMemory implementation. This maintains the conversation context by storing the chat history in the memory.

Next, using the ChatMemory and ChatModel beans, we create a bean of type ChatClient, which is our main entry point for interacting with our chat completion model.

4.2. Implementing a Chatbot

With our configurations in place, let’s create a ChatbotService class. We’ll inject the ChatClient bean we defined earlier to interact with our model.

But first, let’s define two simple records to represent the chat request and response:

record ChatRequest(@Nullable UUID chatId, String question) {}
record ChatResponse(UUID chatId, String answer) {}

The ChatRequest contains the user’s question and an optional chatId to identify an ongoing conversation.

Similarly, the ChatResponse contains the chatId and the chatbot’s answer.

Now, let’s implement the intended functionality:

public ChatResponse chat(ChatRequest chatRequest) {
    UUID chatId = Optional
      .ofNullable(chatRequest.chatId())
      .orElse(UUID.randomUUID());
    String answer = chatClient
      .prompt()
      .user(chatRequest.question())
      .advisors(advisorSpec ->
          advisorSpec
            .param("chat_memory_conversation_id", chatId))
      .call()
      .content();
    return new ChatResponse(chatId, answer);
}

If the incoming request doesn’t contain a chatId, we generate a new one. This allows the user to start a new conversation or continue an existing one.

We pass the user’s question to the chatClient bean and set the chat_memory_conversation_id parameter to the resolved chatId to maintain conversation history.

Finally, we return the chatbot’s answer along with the chatId.

4.3. Interacting With Our Chatbot

Now that we’ve implemented our service layer, let’s expose a REST API on top of it:

@PostMapping("/chat")
public ResponseEntity<ChatResponse> chat(@RequestBody ChatRequest chatRequest) {
    ChatResponse chatResponse = chatbotService.chat(chatRequest);
    return ResponseEntity.ok(chatResponse);
}

We’ll use the above API endpoint to interact with our chatbot.

Let’s use the HTTPie CLI to start a new conversation:

http POST :8080/chat question="Who wanted to kill Harry Potter?"

We send a simple question to the chatbot, let’s see what we get as a response:

{
    "chatId": "7b8a36c7-2126-4b80-ac8b-f9eedebff28a",
    "answer": "Lord Voldemort, also known as Tom Riddle, wanted to kill Harry Potter because of a prophecy that foretold a boy born at the end of July would have the power to defeat him."
}

The response contains a unique chatId and the chatbot’s answer to our question.

Let’s continue this conversation by sending a follow-up question using the chatId from the above response:

http POST :8080/chat chatId="7b8a36c7-2126-4b80-ac8b-f9eedebff28a" question="Who should he have gone after instead?"

Let’s see if the chatbot can maintain the context of our conversation and provide a relevant response:

{
    "chatId": "7b8a36c7-2126-4b80-ac8b-f9eedebff28a",
    "answer": "Based on the prophecy's criteria, Voldemort could have targeted Neville Longbottom instead, as he was also born at the end of July to parents who had defied Voldemort three times."
}

As we can see, the chatbot does indeed maintain the conversation context as it references the prophecy we discussed in the previous message.

The chatId remains the same, indicating that the follow-up answer is a continuation of the same conversation.

5. Using an Embedding Model

Moving on from the chat completion model, we’ll now use an embedding model to implement semantic search on a small dataset of quotes.

We’ll fetch the quotes from an external API, store them in an in-memory vector store, and perform a semantic search.

5.1. Fetching Quote Records From an External API

For our demonstration, we’ll use the QuoteSlate API to fetch quotes.

Let’s create a QuoteFetcher utility class for this:

class QuoteFetcher {
    private static final String BASE_URL = "https://quoteslate.vercel.app";
    private static final String API_PATH = "/api/quotes/random";
    private static final int DEFAULT_COUNT = 50;
    public static List<Quote> fetch() {
        return RestClient
          .create(BASE_URL)
          .get()
          .uri(uriBuilder ->
              uriBuilder
                .path(API_PATH)
                .queryParam("count", DEFAULT_COUNT)
                .build())
          .retrieve()
          .body(new ParameterizedTypeReference<>() {});
    }
}
record Quote(String quote, String author) {}

Using RestClient, we invoke the QuoteSlate API with the default count of 50 and use ParameterizedTypeReference to deserialize the API response to a list of Quote records.

5.2. Configuring and Populating an In-Memory Vector Store

Now, let’s configure an embedding model in our application.yaml:

spring:
  ai:
    ollama:
      embedding:
        options:
          model: hf.co/nomic-ai/nomic-embed-text-v1.5-GGUF

We use the GGUF version of the nomic-embed-text-v1.5 model provided by nomic-ai. Again, feel free to try this implementation with a different embedding model.

After specifying a valid model, Spring AI automatically creates a bean of type EmbeddingModel for us.

Let’s use it to create a vector store bean:

@Bean
public VectorStore vectorStore(EmbeddingModel embeddingModel) {
    return SimpleVectorStore
      .builder(embeddingModel)
      .build();
}

For our demonstration, we create a bean of SimpleVectorStore class. It’s an in-memory implementation that emulates a vector store using the java.util.Map class.

Now, to populate our vector store with quotes during application startup, we’ll create a VectorStoreInitializer class that implements the ApplicationRunner interface:

@Component
class VectorStoreInitializer implements ApplicationRunner {
    private final VectorStore vectorStore;
    // standard constructor
    @Override
    public void run(ApplicationArguments args) {
        List<Document> documents = QuoteFetcher
          .fetch()
          .stream()
          .map(quote -> {
              Map<String, Object> metadata = Map.of("author", quote.author());
              return new Document(quote.quote(), metadata);
          })
          .toList();
        vectorStore.add(documents);
    }
}

In our VectorStoreInitializer, we autowire an instance of VectorStore.

Inside the run() method, we use our QuoteFetcher utility class to retrieve a list of Quote records. Then, we map each quote into a Document and configure the author field as metadata.

Finally, we store all the documents in our vector store. When we invoke the add() method, Spring AI automatically converts our plaintext content into vector representation before storing it in our vector store. We don’t need to explicitly convert it using the EmbeddingModel bean.

5.3. Testing Semantic Search

With our vector store populated, let’s validate our semantic search functionality:

private static final int MAX_RESULTS = 3;
@ParameterizedTest
@ValueSource(strings = {"Motivation", "Happiness"})
void whenSearchingQuotesByTheme_thenRelevantQuotesReturned(String theme) {
    SearchRequest searchRequest = SearchRequest
      .builder()
      .query(theme)
      .topK(MAX_RESULTS)
      .build();
    List<Document> documents = vectorStore.similaritySearch(searchRequest);
    assertThat(documents)
      .hasSizeBetween(1, MAX_RESULTS)
      .allSatisfy(document -> {
          String title = String.valueOf(document.getMetadata().get("author"));
          assertThat(title)
            .isNotBlank();
      });
}

Here, we pass some common quote themes to our test method using @ValueSource. We then create a SearchRequest object with the theme as the query and MAX_RESULTS as the number of desired results.

Next, we call the similaritySearch() method of our vectorStore bean with the searchRequest. Similar to the add() method of the VectorStore, Spring AI converts our query to its vector representation before querying the vector store.

The returned documents will contain quotes that are semantically related to the given theme, even if they don’t contain the exact keyword.

6. Conclusion

In this article, we’ve explored using Hugging Face models with Spring AI.

Using Testcontainers, we set up the Ollama service, creating a local test environment.

First, we used a chat completion model to build a simple chatbot. Then, we implemented semantic search using an embedding model.

As always, all the code examples used in this article are available over on GitHub.

The post Using Hugging Face Models With Spring AI and Ollama first appeared on Baeldung.

Using Hugging Face Models With Spring AI and Ollama

1. Overview

2. Dependencies

3. Setting up Ollama With Testcontainers

3.1. Test Dependencies

3.2. Defining Testcontainers Bean

3.3. Using Testcontainers During Development

4. Using a Chat Completion Model

4.1. Configuring Chat Model and Chatbot Beans

4.2. Implementing a Chatbot

4.3. Interacting With Our Chatbot

5. Using an Embedding Model

5.1. Fetching Quote Records From an External API

5.2. Configuring and Populating an In-Memory Vector Store

5.3. Testing Semantic Search

6. Conclusion

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112