Tutorial for Creating Private AI Apps with Spring AI and gpt-oss

With the emergence of gpt-oss, we can now expect decent quality even in private AI applications.
Let's create a simple AI app with Spring AI once again. The versions we'll use are Spring Boot 3.5 and Spring AI 1.0.

We'll use Ollama as the foundation to utilize gpt-oss with OpenAI API compatibility.

Note that the content of this article can of course be used even for public AI (like OpenAI) applications.

Table of Contents

Tutorial Goal
Installing Ollama
Loading Models
Testing Ollama's OpenAI API
Creating Spring AI App Template
Using ChatClient
Checking Chat API HTTP Logs
Using Structured Output
Using Chat Memory
Using VectorStore
Loading Documents from Files
Adding Related Documents to Chat API Prompts (RAG)
Application Example: Creating a Chronosia Immigration Advisor Based on Personality Assessment

Tutorial Goal

This time, we'll learn the basic elements of Spring AI step by step, and finally create an immigration advisor app for "Chronosia", a fictional country.

Chronosia is a fictional island nation that emerged when the Pacific Ocean floor suddenly rose 5,000 meters on January 1, 2025. Since it's located on the International Date Line, the eastern and western regions have different calendar dates.

Chronosia's flag represents night (black) on the left half and day (white) on the right half, with clock hands pointing to "12 o'clock" in the center.

Installing Ollama

Install Ollama with brew.

brew install ollama --force

We used the following version.

$ ollama --version       
Warning: could not connect to a running Ollama instance
Warning: client version is 0.11.10

Start the Ollama server with the following command.

$ ollama serve
Couldn't find '/Users/toshiaki/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIH0VuekYP+7rvKr/Ss4jZmJYNrlWhRo2qR7lkBE5BkdX

time=2025-09-10T14:12:02.852+09:00 level=INFO source=routes.go:1331 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/toshiaki/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-09-10T14:12:02.852+09:00 level=INFO source=images.go:477 msg="total blobs: 0"
time=2025-09-10T14:12:02.852+09:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2025-09-10T14:12:02.852+09:00 level=INFO source=routes.go:1384 msg="Listening on 127.0.0.1:11434 (version 0.11.10)"
time=2025-09-10T14:12:02.875+09:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="96.0 GiB" available="96.0 GiB"

You can stop it with Ctrl+C.

Ollama's default context length is 4096, and in the latter part of the tutorial, this context length may not provide the expected responses, so we recommend setting it to at least double, 8192.
You can change it with the environment variable OLLAMA_CONTEXT_LENGTH.
The model we'll use supports up to 131072. Increasing this value will consume more GPU memory, so adjust as needed.

export OLLAMA_CONTEXT_LENGTH=8192
ollama serve

Loading Models

This time we'll use gpt-oss:20b for chat and nomic-embed-text:v1.5 for embedding.

Download the models with the following commands.

ollama pull gpt-oss:20b
ollama pull nomic-embed-text:v1.5

You can check available models with the following command.

$ ollama ls
NAME                     ID              SIZE      MODIFIED       
nomic-embed-text:v1.5    0a109f422b47    274 MB    6 seconds ago     
gpt-oss:20b              aa4295ac10c3    13 GB     34 minutes ago

You can check model details with the following command.

$ ollama show gpt-oss:20b
  Model
    architecture        gptoss    
    parameters          20.9B     
    context length      131072    
    embedding length    2880      
    quantization        MXFP4     

  Capabilities
    completion    
    tools         
    thinking      

  Parameters
    temperature    1    

  License
    Apache License               
    Version 2.0, January 2004    
    ...     

$ ollama show nomic-embed-text:v1.5
  Model
    architecture        nomic-bert    
    parameters          136.73M       
    context length      2048          
    embedding length    768           
    quantization        F16           

  Capabilities
    embedding    

  Parameters
    num_ctx    8192    

  License
    Apache License               
    Version 2.0, January 2004    
    ...

Tip

To get practical performance responses using gpt-oss:20b, at least 32GB of VRAM is said to be required. This tutorial has been tested with 96GB of VRAM. The VRAM information is displayed in the ollama serve log above.

If your execution environment has less VRAM, try using a smaller model such as gemma3:4b instead of gpt-oss:20b.
In that case, replace all instances of gpt-oss:20b with gemma3:4b in the following sections.

Testing Ollama's OpenAI API

Once the models are loaded, let's verify that they work with OpenAI API compatibility.
Query with curl as follows.

curl -s http://localhost:11434/v1/chat/completions \
  --json '{
   "model": "gpt-oss:20b",
   "messages": [
      {"role": "user", "content": "Give me a joke."}
   ]
 }' | jq .

If you get a response like the following, it's successful.

{
  "id": "chatcmpl-658",
  "object": "chat.completion",
  "created": 1757482176,
  "model": "gpt-oss:20b",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Why don't scientists trust atoms? Because they make up everything!",
        "reasoning": "User: \"Give me a joke.\" Should respond with a joke. Probably something safe. Provide one joke. Let's pick a short and classic. Possibly one-liner: \"Why don't scientists trust atoms? Because they make up everything.\" Should do.\n\nAdd friendly."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 72,
    "completion_tokens": 79,
    "total_tokens": 151
  }
}

Let's also try embedding.

curl -s http://localhost:11434/v1/embeddings \
  --json '{
   "model": "nomic-embed-text:v1.5",
   "input": "Spring AI is a framework for building AI-powered applications."
 }' | jq .

If you get a response like the following, it's successful.

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        0.042710546,
        0.034793288,
        -0.17683603,
        ...
      ],
      "index": 0
    }
  ],
  "model": "nomic-embed-text:v1.5",
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 12
  }
}

Creating Spring AI App Template

Create a Spring AI app template using Spring Initializr.

curl -s https://start.spring.io/starter.tgz \
       -d artifactId=tut-spring-ai \
       -d name=tut-spring-ai \
       -d baseDir=tut-spring-ai  \
       -d packageName=com.example \
       -d dependencies=spring-ai-openai,web,postgresql,jdbc,spring-ai-vectordb-pgvector,spring-ai-chat-memory-repository-jdbc,actuator,configuration-processor,prometheus,native,testcontainers,docker-compose \
       -d type=maven-project \
       -d applicationName=TutSpringAiApplication | tar -xzvf -
cd tut-spring-ai

We selected the OpenAI module for Chat API and Embedding API clients.
We selected pgvector as the vector database.
We selected JDBC as the chat memory repository.

Using ChatClient

In Spring AI, access to LLM's Chat API is abstracted by ChatClient. Since we selected the OpenAI module in Spring Initializr, ChatClient will access OpenAI API-compatible endpoints.

Let's create a simple controller using ChatClient.

cat <<'EOF' > src/main/java/com/example/HelloController.java
package com.example;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class HelloController {

    private final ChatClient chatClient;

    public HelloController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    @PostMapping(path = "/")
    public String hello(@RequestBody Request request) {
        return this.chatClient.prompt().messages().user(request.prompt()).call().content();
    }

    public record Request(String prompt) {

    }

}
EOF

The OpenAI API endpoint is set to https://api.openai.com by default, but we need to change it to Ollama's endpoint. Modify application.properties as follows.

cat <<'EOF' > src/main/resources/application.properties
spring.ai.openai.api-key=dummy
spring.ai.openai.base-url=http://localhost:11434
spring.ai.openai.chat.options.model=gpt-oss:20b
spring.docker.compose.enabled=false
EOF

Docker Compose is disabled for now as we're not using it.

Start the app with the following command. We're not using it now, but PostgreSQL for the vector database will start with Docker (Testcontainers).

./mvnw spring-boot:test-run

$ curl http://localhost:8080 --json '{"prompt":"Why is the sky blue? Answer in 200 chars."}'
The sky looks blue because sunlight scatters off air molecules. Short blue wavelengths scatter most (Rayleigh scattering), making the sky appear blue during the day.

Checking Chat API HTTP Logs

Let's check what requests ChatClient sends to the OpenAI API and what responses it receives.
We'll use Logbook to check HTTP client logs.
Add the following dependency to pom.xml to enable HTTP logging.

        <dependency>
            <groupId>org.zalando</groupId>
            <artifactId>logbook-spring-boot-autoconfigure</artifactId>
            <version>3.12.3</version>
        </dependency>

The default log formatter is too verbose for our purposes and makes logs hard to read, so let's create the following simple formatter.

cat <<EOF > src/main/java/com/example/SimpleHttpLogFormatter.java
package com.example;

import java.io.IOException;
import org.zalando.logbook.Correlation;
import org.zalando.logbook.HttpLogFormatter;
import org.zalando.logbook.HttpMessage;
import org.zalando.logbook.HttpRequest;
import org.zalando.logbook.HttpResponse;
import org.zalando.logbook.Origin;
import org.zalando.logbook.Precorrelation;
import org.zalando.logbook.RequestURI;
import org.zalando.logbook.StructuredHttpLogFormatter;

public class SimpleHttpLogFormatter implements HttpLogFormatter {

    /**
     * Produces an HTTP-like request in individual lines.
     * @param precorrelation the request correlation
     * @param request the HTTP request
     * @return a line-separated HTTP request
     * @throws IOException if reading body fails
     */
    @Override
    public String format(Precorrelation precorrelation, HttpRequest request) throws IOException {
        String correlationId = precorrelation.getId();
        String body = request.getBodyAsString();

        StringBuilder result = new StringBuilder(body.length() + 2048);

        result.append(direction(request));
        result.append(" Request: ");
        result.append(correlationId);
        result.append('\n');

        result.append("Remote: ");
        result.append(request.getRemote());
        result.append('\n');

        result.append(request.getMethod());
        result.append(' ');
        RequestURI.reconstruct(request, result);
        result.append(' ');
        result.append(request.getProtocolVersion());
        result.append('\n');

        writeBody(body, result);

        return result.toString();
    }

    /**
     * Produces an HTTP-like request in individual lines.
     * @param correlation the request correlation
     * @return a line-separated HTTP request
     * @throws IOException if reading body fails
     * @see StructuredHttpLogFormatter#prepare(Precorrelation, HttpRequest)
     */
    @Override
    public String format(Correlation correlation, HttpResponse response) throws IOException {
        String correlationId = correlation.getId();
        String body = response.getBodyAsString();

        StringBuilder result = new StringBuilder(body.length() + 2048);

        result.append(direction(response));
        result.append(" Response: ");
        result.append(correlationId);
        result.append("\nDuration: ");
        result.append(correlation.getDuration().toMillis());
        result.append(" ms\n");

        result.append(response.getProtocolVersion());
        result.append(' ');
        result.append(response.getStatus());
        String reasonPhrase = response.getReasonPhrase();
        if (reasonPhrase != null) {
            result.append(' ');
            result.append(reasonPhrase);
        }

        result.append('\n');

        writeBody(body, result);

        return result.toString();
    }

    private String direction(HttpMessage request) {
        return request.getOrigin() == Origin.REMOTE ? "Incoming" : "Outgoing";
    }

    private void writeBody(String body, StringBuilder output) {
        if (!body.isEmpty()) {
            output.append('\n');
            output.append(body);
        }
        else {
            output.setLength(output.length() - 1); // discard last newline
        }
    }

}
EOF

Create an AppConfig class and register LogbookClientHttpRequestInterceptor with RestClientCustomizer.
RestClient is used by the OpenAI ChatClient under the hood.

cat <<EOF > src/main/java/com/example/AppConfig.java
package com.example;

import org.springframework.boot.web.client.RestClientCustomizer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.zalando.logbook.spring.LogbookClientHttpRequestInterceptor;

@Configuration(proxyBeanMethods = false)
class AppConfig {

    @Bean
    RestClientCustomizer restClientCustomizer(LogbookClientHttpRequestInterceptor logbookClientHttpRequestInterceptor) {
        return restClientBuilder -> restClientBuilder.requestInterceptor(logbookClientHttpRequestInterceptor);
    }

    @Bean
    SimpleHttpLogFormatter simpleHttpLogFormatter() {
        return new SimpleHttpLogFormatter();
    }

}
EOF

Set Logbook's log level to trace.

cat <<'EOF' >> src/main/resources/application.properties
logging.level.org.zalando.logbook.Logbook=trace
EOF

Restart the app and query with curl again.

./mvnw spring-boot:test-run

The following logs will be output. You can see what requests are sent to Ollama's API and what responses are returned. This will be useful for debugging when you want to do more advanced usage.

2025-09-10T17:52:14.364+09:00 TRACE 75722 --- [nio-8080-exec-2] org.zalando.logbook.Logbook              : Incoming Request: dbd65b7dec575edd
Remote: 0:0:0:0:0:0:0:1
POST http://localhost:8080/ HTTP/1.1

{"prompt":"Why is the sky blue? Answer in 200 chars."}
2025-09-10T17:52:14.367+09:00 TRACE 75722 --- [nio-8080-exec-2] org.zalando.logbook.Logbook              : Outgoing Request: 8b7d2fb4a77811bb
Remote: localhost
POST http://localhost:11434/v1/chat/completions HTTP/1.1

{"messages":[{"content":"Why is the sky blue? Answer in 200 chars.","role":"user"}],"model":"gpt-oss:20b","stream":false,"temperature":0.7}
2025-09-10T17:52:19.759+09:00 TRACE 75722 --- [nio-8080-exec-2] org.zalando.logbook.Logbook              : Incoming Response: 8b7d2fb4a77811bb
Duration: 5391 ms
HTTP/1.1 200 OK

{"id":"chatcmpl-221","object":"chat.completion","created":1757494339,"model":"gpt-oss:20b","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"Rayleigh scattering of sunlight by air molecules makes short blue wavelengths scatter most, so the sky appears blue.","reasoning":"We need to answer: \"Why is the sky blue?\" and limit to 200 characters. We need to provide a concise explanation. 200 characters maximum. Provide short but accurate. Likely: \"Rayleigh scattering of sunlight by air molecules causes shorter blue wavelengths to scatter more, making the sky appear blue.\" Count characters. Let's count: \"Rayleigh scattering of sunlight by air molecules causes shorter blue wavelengths to scatter more, making the sky appear blue.\" Let's count: \nR(1)a2 y3 l4 e5 i6 e7 g8 h9 (space10) s11 c12 a13 t14 t15 i16 n17 g18 (space19) o20 f21 (space22) s23 u24 n25 l26 i27 g28 h29 t30 (space31) b32 y33 (space34) a35 i36 r37 (space38) m39 o40 l41 y42 c43 l44 u45 e46 s47 (space48) c49 a50 u51 s52 e53 s54 (space55) s56 h57 o58 r59 t60 e61 r62 (space63) b64 l65 u66 e67 (space68) w69 a70 l71 l72 p73 h74 a75 n76 e77 s78 (space79) t80 o81 (space82) s83 c84 a85 t86 t87 e88 r89 (space90) m91 o92 r93 e94 ,95 (space96) m97 a98 k99 i100 n101 g102 (space103) t104 h105 e106 (space107) s108 k109 y110 (space111) a112 p113 p114 e115 a116 r117 (space118) b119 l120 u121 e122 .123\n\nSo 123 characters. That is under 200. Good. But user requested 200 chars as maximum? They said \"Answer in 200 chars.\" Usually means up to 200 characters. So 123 is fine. Just output the sentence."},"finish_reason":"stop"}],"usage":{"prompt_tokens":79,"completion_tokens":461,"total_tokens":540}}
2025-09-10T17:52:19.760+09:00 TRACE 75722 --- [nio-8080-exec-2] org.zalando.logbook.Logbook              : Outgoing Response: dbd65b7dec575edd
Duration: 5395 ms
HTTP/1.1 200 OK

Rayleigh scattering of sunlight by air molecules makes short blue wavelengths scatter most, so the sky appears blue.

Using Structured Output

So far we've been receiving Chat API responses as text, but Spring AI can also directly receive structured data like JSON.
You can specify the expected JSON structure with Java classes.

Let's modify HelloController as follows to receive responses as JSON of type Response.

cat <<'EOF' > src/main/java/com/example/HelloController.java
package com.example;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class HelloController {

    private final ChatClient chatClient;

    public HelloController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    @PostMapping(path = "/")
    public Response hello(@RequestBody Request request) {
        return this.chatClient.prompt().messages().user(request.prompt()).call().entity(Response.class);
    }

    public record Request(String prompt) {

    }

    public record Response(String answer) {
    }

}
EOF

Restart the application and query with curl as before. Now the response will be returned in JSON format.

$ curl http://localhost:8080 --json '{"prompt":"Why is the sky blue? Answer in 200 chars."}'
{"answer":"Rayleigh scattering of sunlight by air molecules causes blue light to scatter in all directions, making the sky appear blue."}

The request sent to Ollama's API and the response returned are as follows. You can see that instructions for receiving responses in JSON format have been added to the prompt passed by curl.
The content in the response is also actually in JSON format.

2025-09-10T17:58:37.220+09:00 TRACE 76550 --- [nio-8080-exec-1] org.zalando.logbook.Logbook              : Outgoing Request: c793cbe0e0ea1d70
Remote: localhost
POST http://localhost:11434/v1/chat/completions HTTP/1.1

{"messages":[{"content":"Why is the sky blue? Answer in 200 chars.\nYour response should be in JSON format.\nDo not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.\nDo not include markdown code blocks in your response.\nRemove the ```json markdown from the output.\nHere is the JSON Schema instance your output must adhere to:\n```{\n  \"$schema\" : \"https://json-schema.org/draft/2020-12/schema\",\n  \"type\" : \"object\",\n  \"properties\" : {\n    \"answer\" : {\n      \"type\" : \"string\"\n    }\n  },\n  \"additionalProperties\" : false\n}```\n","role":"user"}],"model":"gpt-oss:20b","stream":false,"temperature":0.7}
2025-09-10T17:58:46.770+09:00 TRACE 76550 --- [nio-8080-exec-1] org.zalando.logbook.Logbook              : Incoming Response: c793cbe0e0ea1d70
Duration: 9549 ms
HTTP/1.1 200 OK

{"id":"chatcmpl-732","object":"chat.completion","created":1757494726,"model":"gpt-oss:20b","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"{\"answer\":\"Rayleigh scattering of sunlight by air molecules causes blue light to scatter in all directions, making the sky appear blue.\"}","reasoning":"We need to output JSON with a single property \"answer\" containing the answer. Must be 200 characters. Provide exactly 200 characters? \"Answer in 200 chars\" likely means the answer string length <= 200? Could be exactly 200? Usually means limit. We'll aim <=200.\n\nWe need to produce JSON object: {\"answer\":\"...\"} with no markdown.\n\nCount characters. Let's craft answer: \"Rayleigh scattering of sunlight by air molecules causes blue light to scatter in all directions, making the sky appear blue.\" Count characters.\n\nLet's count: \n\"Rayleigh scattering of sunlight by air molecules causes blue light to scatter in all directions, making the sky appear blue.\"\n\nCount: I'll count manually.\n\nR(1)a2 y3 l4 e5 i6 g7 h8 h9 (space10) s11 c12 a13 t14 t15 i16 n17 g18 (space19) o20 f21 (space22) s23 u24 n25 l26 i27 g28 h29 t30 (space31) b32 y33 (space34) a35 i36 r37 (space38) m39 o40 l41 y42 c43 l44 u45 e46 s47 (space48) c49 a50 u51 s52 e53 s54 (space55) b56 l57 u58 e59 (space60) l61 i62 g63 h64 t65 (space66) t67 o68 (space69) s70 c71 a72 t73 t74 e75 r76 (space77) i78 n79 (space80) a81 l82 l83 (space84) d85 i86 r87 e88 c89 t90 i91 o92 n93 s94 ,95 (space96) m97 a98 k99 i100 n101 g102 (space103) t104 h105 e106 (space107) s108 k109 y110 (space111) a112 p113 p114 e115 a116 r117 (space118) b119 l120 u121 e122 .123\n\n123 characters. Under 200.\n\nSo JSON: {\"answer\":\"Rayleigh scattering of sunlight by air molecules causes blue light to scatter in all directions, making the sky appear blue.\"}\n\nCheck characters inside string: 123. That's fine.\n\nReturn JSON without markdown."},"finish_reason":"stop"}],"usage":{"prompt_tokens":204,"completion_tokens":520,"total_tokens":724}}

Using Chat Memory

OpenAI API's Chat API is stateless, so LLM responses are determined only from the content sent in the request. If you want to use conversation history, you need to manage the conversation on the client side and include all (or summarized) conversation content in the request.
Spring AI can maintain chat history as memory and preserve conversation context.

There's an abstraction interface ChatMemoryRepository for storing Chat Memory. This time we'll use a JDBC-based ChatMemoryRepository implementation.

The process of saving chat message exchanges to ChatMemoryRepository is provided as an Advisor implementation like an interceptor for ChatClient.
The implementation uses MessageChatMemoryAdvisor which includes conversation-related messages as a list in the Chat API. There's also PromptChatMemoryAdvisor which includes past conversations in the prompt itself.

By registering MessageChatMemoryAdvisor as a default Advisor in the ChatClient builder, you can use Chat Memory in all chats.

When using Advisor that handles ChatMemory, you need to specify a conversation ID for each chat. Usually a logged-in user ID is used. Since this tutorial doesn't perform authentication, we'll use the HTTP session ID as the conversation ID instead.

Modify HelloController as follows to use MessageChatMemoryAdvisor.

cat <<'EOF' > src/main/java/com/example/HelloController.java
package com.example;

import jakarta.servlet.http.HttpSession;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.memory.ChatMemory;
import org.springframework.ai.chat.memory.ChatMemoryRepository;
import org.springframework.ai.chat.memory.MessageWindowChatMemory;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class HelloController {

    private final ChatClient chatClient;

    public HelloController(ChatClient.Builder chatClientBuilder, ChatMemoryRepository chatMemoryRepository) {
        ChatMemory chatMemory = MessageWindowChatMemory.builder().chatMemoryRepository(chatMemoryRepository).build();
        this.chatClient = chatClientBuilder.defaultAdvisors(MessageChatMemoryAdvisor.builder(chatMemory).build())
            .build();
    }

    @PostMapping(path = "/")
    public Response hello(@RequestBody Request request, HttpSession session) {
        String conversationId = session.getId();
        return this.chatClient.prompt()
            .messages()
            .user(request.prompt())
            .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, conversationId))
            .call()
            .entity(Response.class);
    }

    public record Request(String prompt) {

    }

    public record Response(String answer) {
    }

}
EOF

You need to create a table for storing Chat Memory with JDBC. By setting the following configuration, the table will be automatically created when the application starts.

cat <<'EOF' >> src/main/resources/application.properties
spring.ai.chat.memory.repository.jdbc.initialize-schema=always
EOF

Restart the app and query with curl again.

./mvnw spring-boot:test-run

Let's tell it your name.

$ curl http://localhost:8080 --json '{"prompt":"My name is Taro Yamada."}' -sv
> POST / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Content-Type: application/json
> Accept: application/json
> Content-Length: 36
> 
< HTTP/1.1 200 
< Set-Cookie: JSESSIONID=D973A719F2BE5643522C5D42EDCA1857; Path=/; HttpOnly
< Content-Type: application/json
< Transfer-Encoding: chunked
< Date: Wed, 10 Sep 2025 12:59:46 GMT
< 
{"answer":"Hello, Taro Yamada!"}

Since chat is normally stateless, it doesn't remember the name you told it earlier.

$ curl http://localhost:8080 --json '{"prompt":"Do you remember my name?"}'
{"answer":"I don't recall your name."}

However, if you query again while maintaining the HTTP session, you can see that it remembers the name.
Let's query by specifying the JSESSIONID from when you told it your name in the Cookie header.

$ curl http://localhost:8080 --json '{"prompt":"Do you remember my name?"}' -H "Cookie: JSESSIONID=D973A719F2BE5643522C5D42EDCA1857"
{"answer":"Yes, your name is Taro Yamada."}

When using Chat Memory, the request sent to Ollama's API and the response returned are as follows. You can see that the conversation history is included in messages.

2025-09-10T22:00:36.359+09:00 TRACE 94394 --- [nio-8080-exec-2] org.zalando.logbook.Logbook              : Outgoing Request: f80290833e783595
Remote: localhost
POST http://localhost:11434/v1/chat/completions HTTP/1.1

{"messages":[{"content":"My name is Taro Yamada.","role":"user"},{"content":"{\"answer\":\"Hello, Taro Yamada!\"}","role":"assistant"},{"content":"Do you remember my name?\nYour response should be in JSON format.\nDo not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.\nDo not include markdown code blocks in your response.\nRemove the ```json markdown from the output.\nHere is the JSON Schema instance your output must adhere to:\n```{\n  \"$schema\" : \"https://json-schema.org/draft/2020-12/schema\",\n  \"type\" : \"object\",\n  \"properties\" : {\n    \"answer\" : {\n      \"type\" : \"string\"\n    }\n  },\n  \"additionalProperties\" : false\n}```\n","role":"user"}],"model":"gpt-oss:20b","stream":false,"temperature":0.7}
2025-09-10T22:00:37.601+09:00 TRACE 94394 --- [nio-8080-exec-2] org.zalando.logbook.Logbook              : Incoming Response: f80290833e783595
Duration: 1241 ms
HTTP/1.1 200 OK

{"id":"chatcmpl-878","object":"chat.completion","created":1757509237,"model":"gpt-oss:20b","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"{\"answer\":\"Yes, your name is Taro Yamada.\"}","reasoning":"We need to respond in JSON object with property \"answer\" string. No explanations. No markdown, no code fences. Just JSON. Provide answer: \"Yes, your name is Taro Yamada.\" Ensure RFC8259 compliance: string uses double quotes, no escape needed. Provide exactly that."},"finish_reason":"stop"}],"usage":{"prompt_tokens":227,"completion_tokens":84,"total_tokens":311}}

Note

Consider implementing functionality to delete corresponding conversations from ChatMemoryRepository when HTTP sessions are destroyed.

Using VectorStore

Next, let's use the Embedding API to store data in a vector database and perform document similarity search.

Spring AI provides VectorStore as an abstraction interface for storing data in vector databases and performing similarity searches. This time we'll use a VectorStore implementation using pgvector.

First, let's do a simple functionality test. Create a DocumentLoader class that uses VectorStore to store some documents and perform similarity search.

cat <<'EOF' > src/main/java/com/example/DocumentLoader.java
package com.example;

import java.util.List;
import java.util.stream.Stream;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;

@Component
public class DocumentLoader implements CommandLineRunner {

    private final VectorStore vectorStore;

    public DocumentLoader(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    @Override
    public void run(String... args) {
        List<Document> documents = Stream
            .of("Red apples are sweet and crunchy, perfect for snacking.",
                    "Apple Inc. is a technology company that makes iPhones and MacBooks.",
                    "Bananas are yellow tropical fruits rich in potassium.",
                    "Green apples have a tart flavor and are great for baking pies.",
                    "The iPhone is Apple's flagship smartphone with advanced features.",
                    "Oranges are citrus fruits packed with vitamin C and fiber.",
                    "Fresh strawberries are small red berries with tiny seeds on the surface.",
                    "Apple's CEO Tim Cook leads the company's innovation in consumer electronics.",
                    "Juicy peaches have soft fuzzy skin and sweet orange flesh.",
                    "MacBook Pro is Apple's professional laptop computer for creative work.")
            .map(Document::new)
            .toList();
        this.vectorStore.add(documents);

        Stream.of("red fruit for eating", "apple technology").forEach(query -> {
            System.out.println("-----");
            System.out.println("Query: " + query);
            this.vectorStore.similaritySearch(SearchRequest.builder().query(query).topK(3).build())
                .forEach(System.out::println);
        });
    }

}
EOF

This DocumentLoader class runs when the application starts, stores 10 documents in the vector database, and performs similarity searches with 2 queries.
The documents contain mixed content about fruit apples and Apple Inc.
We've prepared two queries to check if it can distinguish between fruit and company.

Add Embedding API and pgvector settings to application.properties. The embedding model nomic-embed-text:v1.5 we're using returns 768-dimensional vectors, so set spring.ai.openai.embedding.options.dimensions and spring.ai.vectorstore.pgvector.dimensions to 768.
You can check the dimensions here. Besides 768, 512, 256, 128, and 64 are also available.

cat <<'EOF' >> src/main/resources/application.properties
spring.ai.openai.embedding.options.dimensions=768
spring.ai.openai.embedding.options.model=nomic-embed-text:v1.5
spring.ai.vectorstore.pgvector.dimensions=768
spring.ai.vectorstore.pgvector.initialize-schema=true
EOF

Start the app with the following command and check the logs. Here we're changing Logbook's log level to info since HTTP logs are noisy.

./mvnw spring-boot:test-run -Dspring-boot.run.arguments="--logging.level.org.zalando.logbook.Logbook=info"

The following logs are output, showing that similar documents are appropriately searched for the two queries.

-----
Query: red fruit for eating
Document{id='024d6058-df40-471d-aa31-6d51bfd7b282', text='Red apples are sweet and crunchy, perfect for snacking.', media='null', metadata={distance=0.24459387}, score=0.7554061263799667}
Document{id='7e188604-697f-475c-ab0a-d3b1d84642fc', text='Fresh strawberries are small red berries with tiny seeds on the surface.', media='null', metadata={distance=0.33758575}, score=0.6624142527580261}
Document{id='044afc94-e939-44ab-b66a-6bc02d65fb9f', text='Oranges are citrus fruits packed with vitamin C and fiber.', media='null', metadata={distance=0.34883383}, score=0.651166170835495}
-----
Query: apple technology
Document{id='30ca5436-323b-47d9-bcc4-730e7162c6aa', text='Apple Inc. is a technology company that makes iPhones and MacBooks.', media='null', metadata={distance=0.165798}, score=0.8342020064592361}
Document{id='d1a7a5c0-c11d-4190-98b0-96e490253aea', text='MacBook Pro is Apple's professional laptop computer for creative work.', media='null', metadata={distance=0.26944816}, score=0.7305518388748169}
Document{id='b6938088-4785-4d30-8504-9837204a940b', text='The iPhone is Apple's flagship smartphone with advanced features.', media='null', metadata={distance=0.30699915}, score=0.693000853061676}

Here, distance represents the cosine distance between the query and document vectors, and score is 1 minus the distance. The closer distance is to 0, the higher the similarity, and the larger the score, the higher the similarity.

Loading Documents from Files

Next, let's store actual documents in the vector database and perform similarity search.

This time we'll use documents about the fictional country "Chronosia".

Download documents about "Chronosia" with the following command and save them in the src/main/resources/docs directory.

curl -sL https://github.com/making/chronosia/archive/refs/heads/main.tar.gz  | tar -xzvf -
mkdir -p src/main/resources/docs
cp -r chronosia-main/{ja,en} src/main/resources/docs/
rm -fr chronosia-main

Modify the DocumentLoader class as follows to load documents, store them in the vector database, and perform similarity search. If data already exists in the vector_store table, document loading is skipped.
We've been using Testcontainers to start PostgreSQL so far, and the database is initialized every time the application starts, so documents are loaded every time.

cat <<'EOF' > src/main/java/com/example/DocumentLoader.java
package com.example;

import java.util.List;
import java.util.Map;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.document.Document;
import org.springframework.ai.reader.TextReader;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.boot.CommandLineRunner;
import org.springframework.core.io.Resource;
import org.springframework.core.io.support.ResourcePatternResolver;
import org.springframework.jdbc.core.simple.JdbcClient;
import org.springframework.stereotype.Component;

@Component
public class DocumentLoader implements CommandLineRunner {

    private final VectorStore vectorStore;

    private final ResourcePatternResolver resourcePatternResolver;

    private final JdbcClient jdbcClient;

    private final Logger logger = LoggerFactory.getLogger(DocumentLoader.class);

    public DocumentLoader(VectorStore vectorStore, ResourcePatternResolver resourcePatternResolver,
            JdbcClient jdbcClient) {
        this.vectorStore = vectorStore;
        this.resourcePatternResolver = resourcePatternResolver;
        this.jdbcClient = jdbcClient;
    }

    @Override
    public void run(String... args) throws Exception {
        Integer count = this.jdbcClient.sql("SELECT COUNT(*) FROM vector_store").query(Integer.class).single();
        if (count > 0) {
            logger.info("Found {} documents. Skip loading.", count);
        }
        else {
            logger.info("Loading documents...");
            for (Resource resource : this.resourcePatternResolver.getResources("classpath:docs/**/*.md")) {
                TextReader documentReader = new TextReader(resource);
                Map<String, Object> metadata = documentReader.getCustomMetadata();
                metadata.put("path", resource.getURI());
                this.vectorStore.add(documentReader.read());
            }
        }
        List<Document> documents = vectorStore
            .similaritySearch(SearchRequest.builder().query("Where is the capital of Chronosia?").topK(3).build());
        for (Document doc : documents) {
            System.out.println("score=" + doc.getScore() + "\tmetadata=" + doc.getMetadata());
        }
    }

}
EOF

There's an abstraction interface DocumentReader for loading Document objects from files. This time we'll use TextReader and load and register Document directly to VectorStore.
TextReader reads the content of text files and converts them directly to Document objects.
When loading Markdown files, you can also use MarkdownDocumentReader, but this DocumentReader splits Markdown paragraphs and returns them as multiple Document objects, which would be too fragmented for our documents.
All our documents are in Markdown format, but since the file sizes are small, having 1 Document per file is not particularly problematic. If file sizes are large and don't fit within LLM context length limits, you can also use TokenTextSplitter to split them.

Start the app with the following command and check the logs.

./mvnw spring-boot:test-run -Dspring-boot.run.arguments="--logging.level.org.zalando.logbook.Logbook=info"

For the query "Where is the capital of Chronosia?", you can see that the following similar documents are searched. The document explaining Chronosia's overview has the highest similarity, and if you actually read the document, you'll find descriptions about the capital.

score=0.7566197067499161	metadata={path=file:/Users/toshiaki/git/tut-spring-ai/target/classes/docs/en/overview.md, charset=UTF-8, source=overview.md, distance=0.2433803}
score=0.4937962293624878	metadata={path=file:/Users/toshiaki/git/tut-spring-ai/target/classes/docs/en/people.md, charset=UTF-8, source=people.md, distance=0.5062038}
score=0.493003785610199	metadata={path=file:/Users/toshiaki/git/tut-spring-ai/target/classes/docs/en/geography.md, charset=UTF-8, source=geography.md, distance=0.5069962}

PostgreSQL started with Testcontainers is initialized every time the application starts, so now let's use PostgreSQL where data persists. If you use ./mvnw spring-boot:run instead of ./mvnw spring-boot:test-run, Testcontainers won't start.

Execute the following command.

./mvnw spring-boot:run -Dspring-boot.run.arguments="--logging.level.org.zalando.logbook.Logbook=info"

With ./mvnw spring-boot:test-run, settings for connecting to PostgreSQL started with Testcontainers were automatically configured, but this doesn't happen with ./mvnw spring-boot:run.
Currently, database connection settings like spring.datasource.url are not specified, so startup fails with the following error.

***************************
APPLICATION FAILED TO START
***************************

Description:

Failed to configure a DataSource: 'url' attribute is not specified and no embedded datasource could be configured.

Reason: Failed to determine a suitable driver class


Action:

Consider the following:
    If you want an embedded database (H2, HSQL or Derby), please put it on the classpath.
    If you have database settings to be loaded from a particular profile you may need to activate it (no profiles are currently active).

Instead of setting properties like spring.datasource.url, let's now use Docker Compose to start PostgreSQL. Start the application with the --spring.docker.compose.enabled=true option.
The Docker Compose integration feature will automatically configure settings for connecting to PostgreSQL started with Docker Compose.

./mvnw spring-boot:run -Dspring-boot.run.arguments="--logging.level.org.zalando.logbook.Logbook=info --spring.docker.compose.enabled=true"

Once the application starts, confirm that Loading documents... is displayed in the logs and documents are being loaded.

2025-09-11T00:32:57.689+09:00  INFO 6385 --- [           main] com.example.DocumentLoader               : Loading documents...

Stop the application and start it again. This time, since documents are already loaded, confirm that Found 12 documents. Skip loading. is displayed and document loading is skipped.

2025-09-11T00:43:06.190+09:00  INFO 13207 --- [           main] com.example.DocumentLoader               : Found 12 documents. Skip loading.

Next, let's combine ChatClient and VectorStore so that chat can answer questions about "Chronosia" that the LLM shouldn't know. This technique is called Retrieval Augmented Generation (RAG).

Modify HelloController as follows to use QuestionAnswerAdvisor. QuestionAnswerAdvisor searches for documents related to chat questions from VectorStore and adds them to the Chat API prompt.
By default, it retrieves 4 documents in order of highest similarity, combines them, and adds them to the prompt. The number of documents retrieved and similarity threshold can be changed. Since it's implemented as an Advisor, document search and addition to prompts are performed transparently without changing the ChatClient call.

cat <<'EOF' > src/main/java/com/example/HelloController.java
package com.example;

import jakarta.servlet.http.HttpSession;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;
import org.springframework.ai.chat.memory.ChatMemory;
import org.springframework.ai.chat.memory.ChatMemoryRepository;
import org.springframework.ai.chat.memory.MessageWindowChatMemory;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class HelloController {

    private final ChatClient chatClient;

    public HelloController(ChatClient.Builder chatClientBuilder, ChatMemoryRepository chatMemoryRepository,
            VectorStore vectorStore) {
        ChatMemory chatMemory = MessageWindowChatMemory.builder().chatMemoryRepository(chatMemoryRepository).build();
        this.chatClient = chatClientBuilder
            .defaultAdvisors(MessageChatMemoryAdvisor.builder(chatMemory).build(),
                    QuestionAnswerAdvisor.builder(vectorStore).build())
            .build();
    }

    @PostMapping(path = "/")
    public Response hello(@RequestBody Request request, HttpSession session) {
        String conversationId = session.getId();
        return this.chatClient.prompt()
            .messages()
            .user(request.prompt())
            .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, conversationId))
            .call()
            .entity(Response.class);
    }

    public record Request(String prompt) {
    }

    public record Response(String answer) {
    }

}
EOF

Restart the app with the following command.

./mvnw spring-boot:run -Dspring-boot.run.arguments="--spring.docker.compose.enabled=true"

Let's ask a question about "Chronosia".

$ curl http://localhost:8080 --json '{"prompt":"What is the capital city of Chronosia?"}'
{"answer":"Temporal City"}

For a question about "Chronosia" that it shouldn't know originally, we got the correct answer by searching for related documents from the vector database and adding them to the prompt.

The request sent by QuestionAnswerAdvisor to Ollama's Chat API and the response returned are as follows. You can see that related documents have been added to the prompt.

2025-09-11T01:13:41.731+09:00 TRACE 14411 --- [nio-8080-exec-2] org.zalando.logbook.Logbook              : Outgoing Request: b600df1d7cf2d1f8
Remote: localhost
POST http://localhost:11434/v1/chat/completions HTTP/1.1

{"messages":[{"content":"What is the capital city of Chronosia?\n\nContext information is below, surrounded by ---------------------\n\n---------------------\n# Chronosia - Overview\n\n## Country Name\n**Chronosia**\n\n## Founded\nJanuary 1st, 2025 (precisely at the stroke of midnight when the seafloor rose)\n\n## Location\nCenter of the Pacific Ocean, straddling the International Date Line (180° longitude)\n\n## Capital\nTemporal City\n\n## Area\nApproximately 42,000 km² (about the same size as Switzerland) ... (omitted) ... - Consider death as departure to \"eternal time\"\n---------------------\n\nGiven the context and provided history information and not prior knowledge,\nreply to the user comment. If the answer is not in the context, inform\nthe user that you can't answer the question.\n\nYour response should be in JSON format.\nDo not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.\nDo not include markdown code blocks in your response.\nRemove the ```json markdown from the output.\nHere is the JSON Schema instance your output must adhere to:\n```{\n  \"$schema\" : \"https://json-schema.org/draft/2020-12/schema\",\n  \"type\" : \"object\",\n  \"properties\" : {\n    \"answer\" : {\n      \"type\" : \"string\"\n    }\n  },\n  \"additionalProperties\" : false\n}```\n","role":"user"}],"model":"gpt-oss:20b","stream":false,"temperature":0.7}
2025-09-11T01:13:50.133+09:00 TRACE 14411 --- [nio-8080-exec-2] org.zalando.logbook.Logbook              : Incoming Response: b600df1d7cf2d1f8
Duration: 8402 ms
HTTP/1.1 200 OK

{"id":"chatcmpl-218","object":"chat.completion","created":1757520830,"model":"gpt-oss:20b","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"{\"answer\":\"Temporal City\"}","reasoning":"Question: \"What is the capital city of Chronosia?\" Context says capital: Temporal City. So answer: \"Temporal City\". Provide JSON.\n\n"},"finish_reason":"stop"}],"usage":{"prompt_tokens":4349,"completion_tokens":45,"total_tokens":4394}}

QuestionAnswerAdvisor adds the following prompt:

{query}

Context information is below, surrounded by ---------------------

---------------------
{question_answer_context}
---------------------

Given the context and provided history information and not prior knowledge,
reply to the user comment. If the answer is not in the context, inform
the user that you can't answer the question.

Let's try a few more questions.

$ curl http://localhost:8080 --json '{"prompt":"When was Chronosia founded?"}'
{"answer":"January 1st, 2025"}

$ curl http://localhost:8080 --json '{"prompt":"What is the population of Chronosia?"}'
{"answer":"Approximately 1.5 million (as of December 2025)."}

$ curl http://localhost:8080 --json '{"prompt":"What are the main cities in eastern and western Chronosia respectively?"}'
{"answer":"Eastern Chronosia – Easterday (largest city, city of tomorrow); Western Chronosia – Westerday (largest city, city of yesterday)."}

$ curl http://localhost:8080 --json '{"prompt":"What are the official languages of Chronosia?"}'
{"answer":"Chronosian, English, Japanese"}

$ curl http://localhost:8080 --json '{"prompt":"Explain how the International Date Line affects Chronosia society."}'
{"answer":"The International Date Line bisects Chronosia, so the eastern side (Tomorrow District) is always one day ahead of the western side (Yesterday District). This means that citizens experience two distinct dates at any given moment: a person in the east celebrates a holiday one day before a person in the west, and a child's birthday is officially recorded twice—once in each zone. The line is the official border of the capital, Temporal City, and crossing it requires a formal oath of time responsibility. It also fuels unique cultural practices: double New‑Year celebrations, the Time Festival where fireworks run 24 hours, and national sports such as Time Soccer that change rules when a ball crosses the line. The date difference creates administrative challenges—government services must coordinate two calendars, and the "time‑zone divorce" industry has emerged as couples split over the one‑day gap. National holidays are celebrated on both sides on different days (e.g., Dec 31st is New Year's Eve in the east and New Year's Day in the west), and tourism capitalizes on the experience of "jumping" between yesterday and tomorrow. In short, the Date Line is both a geographic divider and a cultural engine that shapes Chronosia's identity, economy, and daily life."}

We can see that correct answers are obtained for unknown questions by utilizing "Chronosia" documents.

Spring AI also provides RetrievalAugmentationAdvisor for more advanced RAG in addition to QuestionAnswerAdvisor. Those interested can refer to the documentation.

Application Example: Creating a Chronosia Immigration Advisor Based on Personality Assessment

Finally, as an example of a slightly more advanced AI application, let's create a Chronosia immigration advisor based on personality assessment using Chronosia documents.

While we'd like to create a UI that repeats several questions for personality assessment, for simplicity, we'll create an API that takes personality input as free text and suggests which Chronosia city to immigrate to based on that personality.
However, instead of passing the free text input directly to the immigration advice prompt, we'll include a step to create a persona from the input personality first. This persona will be passed to the immigration advice prompt. In other words, we'll make two ChatClient calls.

Create the following ChronosiaController class. Like the previous HelloController, it uses ChatClient and QuestionAnswerAdvisor. Previously, we specified QuestionAnswerAdvisor in defaultAdvisors when creating ChatClient,
but this time we specify it with the advisors(...) method when calling ChatClient. This is because document search is not needed for the initial persona creation call.

cat <<'EOF' > src/main/java/com/example/ChronosiaController.java
package com.example;

import java.util.List;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class ChronosiaController {

    private final ChatClient chatClient;

    private final QuestionAnswerAdvisor questionAnswerAdvisor;

    public ChronosiaController(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) {
        this.chatClient = chatClientBuilder.build();
        this.questionAnswerAdvisor = QuestionAnswerAdvisor.builder(vectorStore).build();
    }

    @PostMapping(path = "/chronosia")
    public ChronosiaResponse chronosia(@RequestBody ChronosiaRequest request) {
        String persona = this.chatClient.prompt().user("""
                Please create a persona from the following perspectives based on the personality described later. However, for any unclear points, please indicate "unknown" rather than making assumptions.
                - Lifestyle rhythm (morning person, methodical, etc.)
                - Career (innovative, technology-oriented, etc.)
                - Time values (efficiency-focused, etc.)
                - Social nature (balanced type, etc.)
                - Family situation (prioritizes children's education, etc.)
                
                The personality is as follows.
                ----
                {personality}
                """).user(u -> u.param("personality", request.personality())).call().content();
        return this.chatClient.prompt()
            .system("You are an immigration advisor for Chronosia.")
            .user("""
                    There is a person with the following persona:
                    {persona}
                    
                    Please suggest appropriate Chronosian cities where this person should relocate and the reasons why. Provide up to 2 candidates, listed in order of priority.
                    """)
            .user(u -> u.param("persona", persona))
            .advisors(this.questionAnswerAdvisor)
            .call()
            .entity(ChronosiaResponse.class);
    }

    public record ChronosiaRequest(String personality) {

    }

    public record ChronosiaResponse(List<Candidate> candidates) {

        public record Candidate(String city, String reason, int priority) {
        }
    }

}
EOF

Restart the app with the following command.

./mvnw spring-boot:run -Dspring-boot.run.arguments="--spring.docker.compose.enabled=true"

Let's try some examples of immigration advice.

$ curl http://localhost:8080/chronosia --json '{"personality":"I am an impatient person. I prioritize efficiency in everything. Despite this, I tend to oversleep. I constantly aim for the cutting edge. I am single."}' -s | jq .
{
  "candidates": [
    {
      "city": "Easterday",
      "reason": "Easterday, the \"City of Tomorrow\", is a hub for cutting‑edge technology and rapid iteration, aligning with Alex’s innovation‑driven, efficiency‑centric career. The day‑ahead timezone and constant push for the latest tools allow for flexible, high‑productivity work. The city’s culture encourages focused, goal‑oriented work and offers many solo‑worker spaces, fitting Alex’s minimal social commitments.",
      "priority": 1
    },
    {
      "city": "Temporal City",
      "reason": "Temporal City, the capital, hosts the International Standards Organization and a vibrant time‑consulting sector. Its central position on the International Date Line provides unique networking and cross‑date opportunities. While slightly more bureaucratic, it offers a broad range of tech‑related jobs and 24/7 services, suiting Alex’s single lifestyle and need for continuous productivity.",
      "priority": 2
    }
  ]
}

$ curl http://localhost:8080/chronosia --json '{"personality":"I am easygoing, dislike conflict, and want to live at a relaxed pace. I am not well-informed about the latest developments. I have two elementary school children."}' -s | jq .
{
  "candidates": [
    {
      "city": "Westerday",
      "reason": "Retro atmosphere and slower pace create a relaxed environment; less emphasis on strict punctuality suits an easygoing lifestyle and offers family-friendly amenities for elementary children.",
      "priority": 1
    },
    {
      "city": "New Greenwich",
      "reason": "Academic city with quieter surroundings and well-regarded schools; lower time pressure and a calm setting support a laid‑back family life.",
      "priority": 2
    }
  ]
}

If you are using gpt-oss and feel that the response quality is low, try specifying the --spring.ai.openai.chat.options.reasoning-effort=high option at startup, or add spring.ai.openai.chat.options.reasoning-effort=high to application.properties and restart the application.
While it will take more time, the LLM will perform more careful reasoning and you may get better responses.

We can see that AI applications like this can be easily created using Spring AI.

Through this tutorial, we've introduced the basic usage of Spring AI. Spring AI has many other features. Those interested can refer to the following resources:

IK.AM