Spring AI With ChromaDB Vector Store

1. Overview

With traditional databases, we typically rely on exact keyword or basic pattern matching to implement our search functionality. While sufficient for simple applications, this approach fails to fully understand the meaning and context behind natural language queries.

Vector stores address this limitation by storing data as numeric vectors that capture their meaning. Similar words end up close to each other, which allows for semantic search, where the relevant results are returned even if they don’t contain the exact keywords used in the query.

In this tutorial, we’ll explore how to integrate ChromaDB, an open-source vector store, with Spring AI.

To convert our text data into vectors that ChromaDB can store and search, we’ll need an embedding model. We’ll use Ollama to run an embedding model locally.

2. Dependencies

Let’s start by adding the necessary dependencies to our project’s pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-chroma-store-spring-boot-starter</artifactId>
    <version>1.0.0-M4</version>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    <version>1.0.0-M4</version>
</dependency>

The ChromaDB starter dependency enables us to establish a connection with our ChromaDB vector store and interact with it.

Additionally, we import the Ollama starter dependency, which we’ll use to run our embedding model.

Since the current version, 1.0.0-M4, is a milestone release, we’ll also need to add the Spring Milestones repository to our pom.xml:

<repositories>
    <repository>
        <id>spring-milestones</id>
        <name>Spring Milestones</name>
        <url>https://repo.spring.io/milestone</url>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
    </repository>
</repositories>

This repository is where milestone versions are published, as opposed to the standard Maven Central repository.

Since we’re using multiple Spring AI starters in our project, let’s also include the Spring AI Bill of Materials (BOM) in our pom.xml:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.0.0-M4</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

With this addition, we can now remove the version tag from both of our starter dependencies.

The BOM eliminates the risk of version conflicts and ensures our Spring AI dependencies are compatible with each other.

3. Setting up Local Test Environment With Testcontainers

To facilitate local development and testing, we’ll use Testcontainers to set up our ChromaDB vector store and Ollama service.

The prerequisite for running the required services via Testcontainers is an active Docker instance.

3.1. Test Dependencies

First, let’s add the necessary test dependencies to our pom.xml:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-spring-boot-testcontainers</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>chromadb</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>ollama</artifactId>
    <scope>test</scope>
</dependency>

These dependencies provide us with the necessary classes to spin up ephemeral Docker instances for both of our external services.

3.2. Defining Testcontainers Beans

Next, let’s create a @TestConfiguration class that defines our Testcontainers beans:

@TestConfiguration(proxyBeanMethods = false)
class TestcontainersConfiguration {
    @Bean
    @ServiceConnection
    public ChromaDBContainer chromaDB() {
        return new ChromaDBContainer("chromadb/chroma:0.5.20");
    }
    @Bean
    @ServiceConnection
    public OllamaContainer ollama() {
        return new OllamaContainer("ollama/ollama:0.4.5");
    }
}

We specify the latest stable versions for our containers.

We also annotate our bean methods with @ServiceConnection. This dynamically registers all the properties required to set up a connection with both of our external services.

Even when not using the Testcontainers support, Spring AI automatically connects to ChromaDB and Ollama when running locally on their default ports of 8000 and 11434, respectively.

However, in production, we can override the connection details using the corresponding Spring AI properties:

spring:
  ai:
    vectorstore:
      chroma:
        client:
          host: ${CHROMADB_HOST}
          port: ${CHROMADB_PORT}
    ollama:
      base-url: ${OLLAMA_BASE_URL}

Once the connection details are configured correctly, Spring AI automatically creates beans of type VectorStore and EmbeddingModel for us, allowing us to interact with our vector store and embedding model, respectively. We’ll look at how to use these beans later in the tutorial.

Although @ServiceConnection automatically defines the necessary connection details, we’ll still need to configure a few additional properties in our application.yml file:

spring:
  ai:
    vectorstore:
      chroma:
        initialize-schema: true
    ollama:
      embedding:
        options:
          model: nomic-embed-text
      init:
        chat:
          include: false
        pull-model-strategy: WHEN_MISSING

Here, we enable schema initialization for ChromaDB. Then, we configure nomic-embed-text as our embedding model and instruct Ollama to pull the model if it’s not present in our system.

Alternatively, we can use a different embedding model from Ollama or a Hugging Face model as per requirement.

3.3. Using Testcontainers During Development

While Testcontainers is primarily used for integration testing, we can also use it during our local development.

To achieve this, we’ll create a separate main class in our src/test/java directory:

class TestApplication {
    public static void main(String[] args) {
        SpringApplication.from(Application::main)
          .with(TestcontainersConfiguration.class)
          .run(args);
    }
}

We create a TestApplication class and, inside its main method, start our main Application class with our TestcontainersConfiguration class.

This setup helps us to set up and manage our external services locally. We can run our Spring Boot application and have it connect to our external services, which are started via Testcontainers.

4. Populating ChromaDB at Application Startup

Now that we have our local environment set up, let’s populate our ChromaDB vector store with some sample data during application startup.

4.1. Fetching Poetry Records From PoetryDB

For our demonstration, we’ll use the PoetryDB API to fetch poems.

Let’s create a PoetryFetcher utility class for this:

class PoetryFetcher {
    private static final String BASE_URL = "https://poetrydb.org/author/";
    private static final String DEFAULT_AUTHOR_NAME = "Shakespeare";
    public static List<Poem> fetch() {
        return fetch(DEFAULT_AUTHOR_NAME);
    }
    public static List<Poem> fetch(String authorName) {
        return RestClient
          .create()
          .get()
          .uri(URI.create(BASE_URL + authorName))
          .retrieve()
          .body(new ParameterizedTypeReference<>() {});
    }
}
record Poem(String title, List<String> lines) {}

We use RestClient to invoke the PoetryDB API with the specified authorName. To deserialize the API response to a list of Poem records, we use ParameterizedTypeReference without explicitly specifying the generic response type, and Java will infer the type for us.

We also overload our fetch() method without any parameter to retrieve poems by the author Shakespeare. We’ll be using this method in our next section.

4.2. Storing Documents in ChromaDB Vector Store

Now, to populate our ChromaDB vector store with poems during application startup, we’ll create a VectorStoreInitializer class that implements the ApplicationRunner interface:

@Component
class VectorStoreInitializer implements ApplicationRunner {
    private final VectorStore vectorStore;
    // standard constructor
    @Override
    public void run(ApplicationArguments args) {
        List<Document> documents = PoetryFetcher
          .fetch()
          .stream()
          .map(poem -> {
              Map<String, Object> metadata = Map.of("title", poem.title());
              String content = String.join("", poem.lines());
              return new Document(content, metadata);
          })
          .toList();
        vectorStore.add(documents);
    }
}

In our VectorStoreInitializer, we autowire an instance of VectorStore.

Inside the run() method, we use our PoetryFetcher utility class to retrieve a list of Poem records. Then, we map each poem into a Document with the lines as content and the title as metadata.

Finally, we store all the documents in our vector store. When we invoke the add() method, Spring AI automatically converts our plaintext content into vector representation before storing it in our vector store. We don’t need to explicitly convert it using the EmbeddingModel bean.

By default, Spring AI uses SpringAiCollection as the collection name to store data in our vector store, but we can override it using the spring.ai.vectorstore.chroma.collection-name property.

5. Testing Semantic Search

With our ChromaDB vector store populated, let’s validate our semantic search functionality:

private static final int MAX_RESULTS = 3;
@ParameterizedTest
@ValueSource(strings = {"Love and Romance", "Time and Mortality", "Jealousy and Betrayal"})
void whenSearchingShakespeareTheme_thenRelevantPoemsReturned(String theme) {
    SearchRequest searchRequest = SearchRequest
      .query(theme)
      .withTopK(MAX_RESULTS);
    List<Document> documents = vectorStore.similaritySearch(searchRequest);
    assertThat(documents)
      .hasSizeLessThanOrEqualTo(MAX_RESULTS)
      .allSatisfy(document -> {
          String title = String.valueOf(document.getMetadata().get("title"));
          assertThat(title)
            .isNotBlank();
        });
}

Here, we pass some common Shakespearean themes to our test method using @ValueSource. We then create a SearchRequest object with the theme as the query and MAX_RESULTS as the number of desired results

Next, we call the similaritySearch() method of our vectorStore bean, with our searchRequest. Similar to the add() method of the VectorStore, Spring AI converts our query to its vector representation before querying our vector store.

The returned documents will contain poems that are semantically related to the given theme, even if they don’t contain the exact keyword.

6. Conclusion

In this article, we explored how to integrate ChromaDB vector store with Spring AI.

Using Testcontainers, we started Docker containers for our ChromaDB and Ollama services, creating a local test environment.

We looked at how to populate our vector store with poems from the PoetryDB API during application startup. Then, we used common poetry themes to validate our semantic search functionality.

As always, all the code examples used in this article are available over on GitHub.

Spring AI With ChromaDB Vector Store

1. Overview

2. Dependencies

3. Setting up Local Test Environment With Testcontainers

3.1. Test Dependencies

3.2. Defining Testcontainers Beans

3.3. Using Testcontainers During Development

4. Populating ChromaDB at Application Startup

4.1. Fetching Poetry Records From PoetryDB

4.2. Storing Documents in ChromaDB Vector Store

5. Testing Semantic Search

6. Conclusion

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112