Spring AIとgpt-ossでプライベートAIアプリを作るチュートリアル

gpt-ossの登場により、プライベートAIアプリでもそこそこの品質が期待できるようになりました。
改めて、Spring AIで簡単なAIアプリを作ってみましょう。利用するバージョンはSpring Boot 3.5、Spring AI 1.0です。

gpt-ossをOpenAI API互換で利用できる基盤として今回はOllamaを使います。

なお、本記事の内容はプライベートAIでなくてももちろん利用可能です。

チュートリアルのゴール
Ollamaのインストール
モデルのロード
OllamaのOpenAI APIの動作確認
Spring AIアプリの雛形作成
ChatClientの利用
Chat APIのHTTPログを確認
Structured Outputの利用
Chat Memoryの利用
VectorStoreの利用
ファイルからドキュメントをロード
関連するドキュメントをChat APIのプロンプトに追加 (RAG)
応用例: 性格診断に基づくクロノシア移住アドバイザーの作成

チュートリアルのゴール

今回はSpring AIの基本的な要素をステップバイステップで学びつつ、最終的には"Chronosia(クロノシア)"という架空の国への移住アドバイザーアプリを作成します。

クロノシアは、2025年1月1日に太平洋の海底が突如5,000メートル隆起したことで出現した架空の島国です。国際日付変更線上に位置するため、東西の地域によって暦上の日付が異なります。

クロノシアの国旗は左半分が夜（黒）、右半分が昼（白）を表し、中央に時計の針が「12時」を指すデザインです。

Ollamaのインストール

brewでインストールします。

brew install ollama --force

次のバージョンを使用しました。

$ ollama --version       
Warning: could not connect to a running Ollama instance
Warning: client version is 0.11.10

次のコマンドでOllamaサーバーを起動します。

$ ollama serve
Couldn't find '/Users/toshiaki/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIH0VuekYP+7rvKr/Ss4jZmJYNrlWhRo2qR7lkBE5BkdX

time=2025-09-10T14:12:02.852+09:00 level=INFO source=routes.go:1331 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/toshiaki/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-09-10T14:12:02.852+09:00 level=INFO source=images.go:477 msg="total blobs: 0"
time=2025-09-10T14:12:02.852+09:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2025-09-10T14:12:02.852+09:00 level=INFO source=routes.go:1384 msg="Listening on 127.0.0.1:11434 (version 0.11.10)"
time=2025-09-10T14:12:02.875+09:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="96.0 GiB" available="96.0 GiB"

Ctrl+Cで停止できます。

Ollamaのコンテキスト長のデフォルトは4096であり、チュートリアルの後半ではこのコンテキスト長では期待通りの回答が得られない可能性があるので、少なくとも2倍の8192に設定することをお勧めします。
環境変数OLLAMA_CONTEXT_LENGTHで変更可能です。
今回使うモデルは131072までサポートしています。この値を大きくするとGPUメモリの消費が増えるので、必要に応じて調整してください。

export OLLAMA_CONTEXT_LENGTH=8192
ollama serve

モデルのロード

今回はchat用にgpt-oss:20bを、embedding用にnomic-embed-text:v1.5を使います。

次のコマンドでモデルをダウンロードします。

ollama pull gpt-oss:20b
ollama pull nomic-embed-text:v1.5

利用可能なモデルを次のコマンドで確認できます。

$ ollama ls
NAME                     ID              SIZE      MODIFIED       
nomic-embed-text:v1.5    0a109f422b47    274 MB    6 seconds ago     
gpt-oss:20b              aa4295ac10c3    13 GB     34 minutes ago

モデルの詳細は次のコマンドで確認できます。

$ ollama show gpt-oss:20b
  Model
    architecture        gptoss    
    parameters          20.9B     
    context length      131072    
    embedding length    2880      
    quantization        MXFP4     

  Capabilities
    completion    
    tools         
    thinking      

  Parameters
    temperature    1    

  License
    Apache License               
    Version 2.0, January 2004    
    ...     

$ ollama show nomic-embed-text:v1.5
  Model
    architecture        nomic-bert    
    parameters          136.73M       
    context length      2048          
    embedding length    768           
    quantization        F16           

  Capabilities
    embedding    

  Parameters
    num_ctx    8192    

  License
    Apache License               
    Version 2.0, January 2004    
    ...

Tip

gpt-oss:20bを使って実用的な性能で回答を得るには、少なくとも32GBのVRAMが必要と言われています。本チュートリルの内容は96GBのVRAMで動作確認しています。上記のollama serveのログにVRAMの情報が表示されています。

実行環境のVRAMが少ない場合は、gpt-oss:20bの代わりにgemma3:4bを使うなど、より小さなモデルを使ってみてください。
その場合は、以降でgpt-oss:20bと指定している箇所をすべてgemma3:4bに置き換えてください。

OllamaのOpenAI APIの動作確認

モデルがロードできたら、OpenAI API互換で動作するか確認します。
curlで次のように問い合わせます。

curl -s http://localhost:11434/v1/chat/completions \
  --json '{
   "model": "gpt-oss:20b",
   "messages": [
      {"role": "user", "content": "Give me a joke."}
   ]
 }' | jq .

次のような応答が返ってくれば成功です。

{
  "id": "chatcmpl-658",
  "object": "chat.completion",
  "created": 1757482176,
  "model": "gpt-oss:20b",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Why don’t scientists trust atoms? Because they make up everything!",
        "reasoning": "User: \"Give me a joke.\" Should respond with a joke. Probably something safe. Provide one joke. Let's pick a short and classic. Possibly one-liner: \"Why don't scientists trust atoms? Because they make up everything.\" Should do.\n\nAdd friendly."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 72,
    "completion_tokens": 79,
    "total_tokens": 151
  }
}

embeddingも試してみましょう。

curl -s http://localhost:11434/v1/embeddings \
  --json '{
   "model": "nomic-embed-text:v1.5",
   "input": "Spring AI is a framework for building AI-powered applications."
 }' | jq .

次のような応答が返ってくれば成功です。

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        0.042710546,
        0.034793288,
        -0.17683603,
        ...
      ],
      "index": 0
    }
  ],
  "model": "nomic-embed-text:v1.5",
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 12
  }
}

Spring AIアプリの雛形作成

Spring InitializrでSpring AIアプリの雛形を作成します。

curl -s https://start.spring.io/starter.tgz \
       -d artifactId=tut-spring-ai \
       -d name=tut-spring-ai \
       -d baseDir=tut-spring-ai  \
       -d packageName=com.example \
       -d dependencies=spring-ai-openai,web,postgresql,jdbc,spring-ai-vectordb-pgvector,spring-ai-chat-memory-repository-jdbc,actuator,configuration-processor,prometheus,native,testcontainers,docker-compose \
       -d type=maven-project \
       -d applicationName=TutSpringAiApplication | tar -xzvf -
cd tut-spring-ai

Chat APIおよびEmbedding APIのクライアントにOpenAI用のモジュールを選択しました。
ベクトルデータベースとしてpgvectorを選択しました。
チャットメモリのリポジトリとしてJDBCを選択しました。

ChatClientの利用

Spring AIではLLMのChat APIへのアクセスはChatClientで抽象化されています。今回はSpring InitializrでOpenAI用のモジュールを選択したので、ChatClientはOpenAI API互換のエンドポイントにアクセスします。

ChatClientを使った簡単なコントローラーを作成してみましょう。

cat <<'EOF' > src/main/java/com/example/HelloController.java
package com.example;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class HelloController {

    private final ChatClient chatClient;

    public HelloController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    @PostMapping(path = "/")
    public String hello(@RequestBody Request request) {
        return this.chatClient.prompt().messages().user(request.prompt()).call().content();
    }

    public record Request(String prompt) {

    }

}
EOF

OpenAI APIのエンドポイントはデフォルトで https://api.openai.com に設定されていますが、Ollamaのエンドポイントに変更する必要があります。application.propertiesを次のように修正します。

cat <<'EOF' > src/main/resources/application.properties
spring.ai.openai.api-key=dummy
spring.ai.openai.base-url=http://localhost:11434
spring.ai.openai.chat.options.model=gpt-oss:20b
spring.docker.compose.enabled=false
EOF

Docker Composeは今は使用しないので無効にしています。

次のコマンドでアプリを起動してください。今は使用しませんが、ベクトルデータベース用のPostgreSQLがDocker(Testcontainers)で起動します。

./mvnw spring-boot:test-run

$ curl http://localhost:8080 --json '{"prompt":"Why is the sky blue? Answer in 200 chars."}'
The sky looks blue because sunlight scatters off air molecules. Short blue wavelengths scatter most (Rayleigh scattering), making the sky appear blue during the day.

Chat APIのHTTPログを確認

ChatClientがOpenAI APIに対してどのようなリクエストを送り、どのようなレスポンスを受け取っているか確認してみましょう。
HTTPクライアントのログを確認するためにLogbookを利用します。
pom.xmlに次の依存関係を追加して、HTTPログを確認できるようにします。

        <dependency>
            <groupId>org.zalando</groupId>
            <artifactId>logbook-spring-boot-autoconfigure</artifactId>
            <version>3.12.3</version>
        </dependency>

デフォルトのログフォーマッターは今回の用途には冗長で、ログが見づらくなるので、次のシンプルなフォーマッターを作成します。

cat <<EOF > src/main/java/com/example/SimpleHttpLogFormatter.java
package com.example;

import java.io.IOException;
import org.zalando.logbook.Correlation;
import org.zalando.logbook.HttpLogFormatter;
import org.zalando.logbook.HttpMessage;
import org.zalando.logbook.HttpRequest;
import org.zalando.logbook.HttpResponse;
import org.zalando.logbook.Origin;
import org.zalando.logbook.Precorrelation;
import org.zalando.logbook.RequestURI;
import org.zalando.logbook.StructuredHttpLogFormatter;

public class SimpleHttpLogFormatter implements HttpLogFormatter {

    /**
     * Produces an HTTP-like request in individual lines.
     * @param precorrelation the request correlation
     * @param request the HTTP request
     * @return a line-separated HTTP request
     * @throws IOException if reading body fails
     */
    @Override
    public String format(Precorrelation precorrelation, HttpRequest request) throws IOException {
        String correlationId = precorrelation.getId();
        String body = request.getBodyAsString();

        StringBuilder result = new StringBuilder(body.length() + 2048);

        result.append(direction(request));
        result.append(" Request: ");
        result.append(correlationId);
        result.append('\n');

        result.append("Remote: ");
        result.append(request.getRemote());
        result.append('\n');

        result.append(request.getMethod());
        result.append(' ');
        RequestURI.reconstruct(request, result);
        result.append(' ');
        result.append(request.getProtocolVersion());
        result.append('\n');

        writeBody(body, result);

        return result.toString();
    }

    /**
     * Produces an HTTP-like request in individual lines.
     * @param correlation the request correlation
     * @return a line-separated HTTP request
     * @throws IOException if reading body fails
     * @see StructuredHttpLogFormatter#prepare(Precorrelation, HttpRequest)
     */
    @Override
    public String format(Correlation correlation, HttpResponse response) throws IOException {
        String correlationId = correlation.getId();
        String body = response.getBodyAsString();

        StringBuilder result = new StringBuilder(body.length() + 2048);

        result.append(direction(response));
        result.append(" Response: ");
        result.append(correlationId);
        result.append("\nDuration: ");
        result.append(correlation.getDuration().toMillis());
        result.append(" ms\n");

        result.append(response.getProtocolVersion());
        result.append(' ');
        result.append(response.getStatus());
        String reasonPhrase = response.getReasonPhrase();
        if (reasonPhrase != null) {
            result.append(' ');
            result.append(reasonPhrase);
        }

        result.append('\n');

        writeBody(body, result);

        return result.toString();
    }

    private String direction(HttpMessage request) {
        return request.getOrigin() == Origin.REMOTE ? "Incoming" : "Outgoing";
    }

    private void writeBody(String body, StringBuilder output) {
        if (!body.isEmpty()) {
            output.append('\n');
            output.append(body);
        }
        else {
            output.setLength(output.length() - 1); // discard last newline
        }
    }

}
EOF

AppConfigクラスを作成して、RestClientCustomizerでLogbookClientHttpRequestInterceptorを登録します。
RestClientはOpenAI用のChatClientで利用されています。

cat <<EOF > src/main/java/com/example/AppConfig.java
package com.example;

import org.springframework.boot.web.client.RestClientCustomizer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.zalando.logbook.spring.LogbookClientHttpRequestInterceptor;

@Configuration(proxyBeanMethods = false)
class AppConfig {

    @Bean
    RestClientCustomizer restClientCustomizer(LogbookClientHttpRequestInterceptor logbookClientHttpRequestInterceptor) {
        return restClientBuilder -> restClientBuilder.requestInterceptor(logbookClientHttpRequestInterceptor);
    }

    @Bean
    SimpleHttpLogFormatter simpleHttpLogFormatter() {
        return new SimpleHttpLogFormatter();
    }

}
EOF

Logbookのログレベルをtraceに設定します。

cat <<'EOF' >> src/main/resources/application.properties
logging.level.org.zalando.logbook.Logbook=trace
EOF

アプリを再起動して、再度curlで問い合わせてみましょう。

./mvnw spring-boot:test-run

次のようなログが出力されます。OllamaのAPIに対してどのようなリクエストが送られ、どのようなレスポンスが返ってきているか確認できます。より高度な使い方をしたい場合のデバッグに役立つでしょう。

2025-09-10T17:52:14.364+09:00 TRACE 75722 --- [nio-8080-exec-2] org.zalando.logbook.Logbook              : Incoming Request: dbd65b7dec575edd
Remote: 0:0:0:0:0:0:0:1
POST http://localhost:8080/ HTTP/1.1

{"prompt":"Why is the sky blue? Answer in 200 chars."}
2025-09-10T17:52:14.367+09:00 TRACE 75722 --- [nio-8080-exec-2] org.zalando.logbook.Logbook              : Outgoing Request: 8b7d2fb4a77811bb
Remote: localhost
POST http://localhost:11434/v1/chat/completions HTTP/1.1

{"messages":[{"content":"Why is the sky blue? Answer in 200 chars.","role":"user"}],"model":"gpt-oss:20b","stream":false,"temperature":0.7}
2025-09-10T17:52:19.759+09:00 TRACE 75722 --- [nio-8080-exec-2] org.zalando.logbook.Logbook              : Incoming Response: 8b7d2fb4a77811bb
Duration: 5391 ms
HTTP/1.1 200 OK

{"id":"chatcmpl-221","object":"chat.completion","created":1757494339,"model":"gpt-oss:20b","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"Rayleigh scattering of sunlight by air molecules makes short blue wavelengths scatter most, so the sky appears blue.","reasoning":"We need to answer: \"Why is the sky blue?\" and limit to 200 characters. We need to provide a concise explanation. 200 characters maximum. Provide short but accurate. Likely: \"Rayleigh scattering of sunlight by air molecules causes shorter blue wavelengths to scatter more, making the sky appear blue.\" Count characters. Let's count: \"Rayleigh scattering of sunlight by air molecules causes shorter blue wavelengths to scatter more, making the sky appear blue.\" Let's count: \nR(1)a2 y3 l4 e5 i6 e7 g8 h9 (space10) s11 c12 a13 t14 t15 i16 n17 g18 (space19) o20 f21 (space22) s23 u24 n25 l26 i27 g28 h29 t30 (space31) b32 y33 (space34) a35 i36 r37 (space38) m39 o40 l41 y42 c43 l44 u45 e46 s47 (space48) c49 a50 u51 s52 e53 s54 (space55) s56 h57 o58 r59 t60 e61 r62 (space63) b64 l65 u66 e67 (space68) w69 a70 l71 l72 p73 h74 a75 n76 e77 s78 (space79) t80 o81 (space82) s83 c84 a85 t86 t87 e88 r89 (space90) m91 o92 r93 e94 ,95 (space96) m97 a98 k99 i100 n101 g102 (space103) t104 h105 e106 (space107) s108 k109 y110 (space111) a112 p113 p114 e115 a116 r117 (space118) b119 l120 u121 e122 .123\n\nSo 123 characters. That is under 200. Good. But user requested 200 chars as maximum? They said \"Answer in 200 chars.\" Usually means up to 200 characters. So 123 is fine. Just output the sentence."},"finish_reason":"stop"}],"usage":{"prompt_tokens":79,"completion_tokens":461,"total_tokens":540}}
2025-09-10T17:52:19.760+09:00 TRACE 75722 --- [nio-8080-exec-2] org.zalando.logbook.Logbook              : Outgoing Response: dbd65b7dec575edd
Duration: 5395 ms
HTTP/1.1 200 OK

Rayleigh scattering of sunlight by air molecules makes short blue wavelengths scatter most, so the sky appears blue.

Structured Outputの利用

ここまではChat APIのレスポンスはテキストとして受け取っていましたが、Spring AIではJSONなどの構造化されたデータを直接受け取ることもできます。
期待するJSONの構造をJavaのクラスで指定することができます。

HelloControllerを次のように修正して、Response型のJSONで応答を受け取るようにしてみましょう。

cat <<'EOF' > src/main/java/com/example/HelloController.java
package com.example;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class HelloController {

    private final ChatClient chatClient;

    public HelloController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    @PostMapping(path = "/")
    public Response hello(@RequestBody Request request) {
        return this.chatClient.prompt().messages().user(request.prompt()).call().entity(Response.class);
    }

    public record Request(String prompt) {

    }

    public record Response(String answer) {
    }

}
EOF

アプリケーションを再起動して、先ほどと同じようにcurlで問い合わせてみましょう。今後はJSON形式で応答が返ってきます。

$ curl http://localhost:8080 --json '{"prompt":"Why is the sky blue? Answer in 200 chars."}'
{"answer":"Rayleigh scattering of sunlight by air molecules causes blue light to scatter in all directions, making the sky appear blue."}

OllamaのAPIに送られたリクエストと返ってきたレスポンスは次のようになります。curlで渡したpromptに加えて、JSON形式でレスポンスを受け取るための指示が追加されていることがわかります。
レスポンスのcontentも実施にJSON型式になっています。

2025-09-10T17:58:37.220+09:00 TRACE 76550 --- [nio-8080-exec-1] org.zalando.logbook.Logbook              : Outgoing Request: c793cbe0e0ea1d70
Remote: localhost
POST http://localhost:11434/v1/chat/completions HTTP/1.1

{"messages":[{"content":"Why is the sky blue? Answer in 200 chars.\nYour response should be in JSON format.\nDo not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.\nDo not include markdown code blocks in your response.\nRemove the ```json markdown from the output.\nHere is the JSON Schema instance your output must adhere to:\n```{\n  \"$schema\" : \"https://json-schema.org/draft/2020-12/schema\",\n  \"type\" : \"object\",\n  \"properties\" : {\n    \"answer\" : {\n      \"type\" : \"string\"\n    }\n  },\n  \"additionalProperties\" : false\n}```\n","role":"user"}],"model":"gpt-oss:20b","stream":false,"temperature":0.7}
2025-09-10T17:58:46.770+09:00 TRACE 76550 --- [nio-8080-exec-1] org.zalando.logbook.Logbook              : Incoming Response: c793cbe0e0ea1d70
Duration: 9549 ms
HTTP/1.1 200 OK

{"id":"chatcmpl-732","object":"chat.completion","created":1757494726,"model":"gpt-oss:20b","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"{\"answer\":\"Rayleigh scattering of sunlight by air molecules causes blue light to scatter in all directions, making the sky appear blue.\"}","reasoning":"We need to output JSON with a single property \"answer\" containing the answer. Must be 200 characters. Provide exactly 200 characters? \"Answer in 200 chars\" likely means the answer string length <= 200? Could be exactly 200? Usually means limit. We'll aim <=200.\n\nWe need to produce JSON object: {\"answer\":\"...\"} with no markdown.\n\nCount characters. Let's craft answer: \"Rayleigh scattering of sunlight by air molecules causes blue light to scatter in all directions, making the sky appear blue.\" Count characters.\n\nLet's count: \n\"Rayleigh scattering of sunlight by air molecules causes blue light to scatter in all directions, making the sky appear blue.\"\n\nCount: I'll count manually.\n\nR(1)a2 y3 l4 e5 i6 g7 h8 h9 (space10) s11 c12 a13 t14 t15 i16 n17 g18 (space19) o20 f21 (space22) s23 u24 n25 l26 i27 g28 h29 t30 (space31) b32 y33 (space34) a35 i36 r37 (space38) m39 o40 l41 y42 c43 l44 u45 e46 s47 (space48) c49 a50 u51 s52 e53 s54 (space55) b56 l57 u58 e59 (space60) l61 i62 g63 h64 t65 (space66) t67 o68 (space69) s70 c71 a72 t73 t74 e75 r76 (space77) i78 n79 (space80) a81 l82 l83 (space84) d85 i86 r87 e88 c89 t90 i91 o92 n93 s94 ,95 (space96) m97 a98 k99 i100 n101 g102 (space103) t104 h105 e106 (space107) s108 k109 y110 (space111) a112 p113 p114 e115 a116 r117 (space118) b119 l120 u121 e122 .123\n\n123 characters. Under 200.\n\nSo JSON: {\"answer\":\"Rayleigh scattering of sunlight by air molecules causes blue light to scatter in all directions, making the sky appear blue.\"}\n\nCheck characters inside string: 123. That's fine.\n\nReturn JSON without markdown."},"finish_reason":"stop"}],"usage":{"prompt_tokens":204,"completion_tokens":520,"total_tokens":724}}

Chat Memoryの利用

OpenAI APIのChat APIはステートレスなので、LLMからの回答はリクエストで送った内容からのみ判断されます。会話の履歴を利用したい場合は、クライアント側で会話を管理し、会話の内容を全て(あるいは要約して)リクエストに含める必要があります。
Spring AIではチャットの履歴をメモリとして保持し、会話のコンテキストを維持することができます。

Chat Memoryを保存する抽象化インタフェースとしてChatMemoryRepositoryがあります。今回はJDBCを使ったChatMemoryRepositoryの実装を利用します。

チャットのメッセージのやりとりをChatMemoryRepositoryに保存する処理はChatClientのインターセプタのようなAdvisorの実装として提供されています。
実装としてはChat APIに、会話に関するメッセージをリストとして含めるMessageChatMemoryAdvisorを利用します。このほかに、過去の会話をプロンプト自体に含めるPromptChatMemoryAdvisorもあります。

ChatClientのビルダーでデフォルトのAdvisorとしてMessageChatMemoryAdvisorを登録することで、すべてのチャットでChat Memoryを利用できます。

ChatMemoryを扱うAdvisorを使う場合は、各チャットで会話のIDを指定する必要があります。通常はログインユーザーIDなどが利用されます。今回のチュートリアルでは認証処理を行わないので、代わりにHTTPセッションIDを会話IDとして利用します。

HelloControllerを次のように修正して、MessageChatMemoryAdvisorを利用するようにします。

cat <<'EOF' > src/main/java/com/example/HelloController.java
package com.example;

import jakarta.servlet.http.HttpSession;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.memory.ChatMemory;
import org.springframework.ai.chat.memory.ChatMemoryRepository;
import org.springframework.ai.chat.memory.MessageWindowChatMemory;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class HelloController {

    private final ChatClient chatClient;

    public HelloController(ChatClient.Builder chatClientBuilder, ChatMemoryRepository chatMemoryRepository) {
        ChatMemory chatMemory = MessageWindowChatMemory.builder().chatMemoryRepository(chatMemoryRepository).build();
        this.chatClient = chatClientBuilder.defaultAdvisors(MessageChatMemoryAdvisor.builder(chatMemory).build())
            .build();
    }

    @PostMapping(path = "/")
    public Response hello(@RequestBody Request request, HttpSession session) {
        String conversationId = session.getId();
        return this.chatClient.prompt()
            .messages()
            .user(request.prompt())
            .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, conversationId))
            .call()
            .entity(Response.class);
    }

    public record Request(String prompt) {

    }

    public record Response(String answer) {
    }

}
EOF

Chat MemoryをJDBCで保存するためのテーブルを作成する必要があります。次の設定を行うことで、アプリケーション起動時にテーブルが自動的に作成されます。

cat <<'EOF' >> src/main/resources/application.properties
spring.ai.chat.memory.repository.jdbc.initialize-schema=always
EOF

アプリを再起動して、再度curlで問い合わせてみましょう。

./mvnw spring-boot:test-run

自分の名前を伝えてみます。

$ curl http://localhost:8080 --json '{"prompt":"My name is Taro Yamada."}' -sv
> POST / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Content-Type: application/json
> Accept: application/json
> Content-Length: 36
> 
< HTTP/1.1 200 
< Set-Cookie: JSESSIONID=D973A719F2BE5643522C5D42EDCA1857; Path=/; HttpOnly
< Content-Type: application/json
< Transfer-Encoding: chunked
< Date: Wed, 10 Sep 2025 12:59:46 GMT
< 
{"answer":"Hello, Taro Yamada!"}

チャットは通常はストートレスなので、先ほど伝えた名前を覚えていません。

$ curl http://localhost:8080 --json '{"prompt":"Do you remember my name?"}'
{"answer":"I don't recall your name."}

しかし、HTTPセッションを維持した状態で再度問い合わせると、名前を覚えていることがわかります。
自分の名前を伝えた際のJSESSIONIDをCookieヘッダーで指定して問い合わせてみましょう。

$ curl http://localhost:8080 --json '{"prompt":"Do you remember my name?"}' -H "Cookie: JSESSIONID=D973A719F2BE5643522C5D42EDCA1857"
{"answer":"Yes, your name is Taro Yamada."}

Chat Memoryを利用した場合のOllamaのAPIに送られたリクエストと返ってきたレスポンスは次のようになります。会話の履歴がmessagesに含まれていることがわかります。

2025-09-10T22:00:36.359+09:00 TRACE 94394 --- [nio-8080-exec-2] org.zalando.logbook.Logbook              : Outgoing Request: f80290833e783595
Remote: localhost
POST http://localhost:11434/v1/chat/completions HTTP/1.1

{"messages":[{"content":"My name is Taro Yamada.","role":"user"},{"content":"{\"answer\":\"Hello, Taro Yamada!\"}","role":"assistant"},{"content":"Do you remember my name?\nYour response should be in JSON format.\nDo not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.\nDo not include markdown code blocks in your response.\nRemove the ```json markdown from the output.\nHere is the JSON Schema instance your output must adhere to:\n```{\n  \"$schema\" : \"https://json-schema.org/draft/2020-12/schema\",\n  \"type\" : \"object\",\n  \"properties\" : {\n    \"answer\" : {\n      \"type\" : \"string\"\n    }\n  },\n  \"additionalProperties\" : false\n}```\n","role":"user"}],"model":"gpt-oss:20b","stream":false,"temperature":0.7}
2025-09-10T22:00:37.601+09:00 TRACE 94394 --- [nio-8080-exec-2] org.zalando.logbook.Logbook              : Incoming Response: f80290833e783595
Duration: 1241 ms
HTTP/1.1 200 OK

{"id":"chatcmpl-878","object":"chat.completion","created":1757509237,"model":"gpt-oss:20b","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"{\"answer\":\"Yes, your name is Taro Yamada.\"}","reasoning":"We need to respond in JSON object with property \"answer\" string. No explanations. No markdown, no code fences. Just JSON. Provide answer: \"Yes, your name is Taro Yamada.\" Ensure RFC8259 compliance: string uses double quotes, no escape needed. Provide exactly that."},"finish_reason":"stop"}],"usage":{"prompt_tokens":227,"completion_tokens":84,"total_tokens":311}}

Note

HTTPセッションが破棄されるタイミングでChatMemoryRepositoryから対応する会話を削除するような実装を考えてみてください。

VectorStoreの利用

次にEmbedding APIを利用して、ベクトルデータベースにデータを保存し、ドキュメントの類似検索を行ってみましょう。

Spring AIではベクトルデータベースへのデータ保存、類似検索のための抽象化インタフェースとして、VectorStoreがあります。今回はpgvectorを使ったVectorStoreの実装を利用します。

まずは簡単な動作確認を行います。VectorStoreを利用して、いくつかのドキュメントを保存し、類似検索を行うDocumentLoaderクラスを作成します。

cat <<'EOF' > src/main/java/com/example/DocumentLoader.java
package com.example;

import java.util.List;
import java.util.stream.Stream;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;

@Component
public class DocumentLoader implements CommandLineRunner {

    private final VectorStore vectorStore;

    public DocumentLoader(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    @Override
    public void run(String... args) {
        List<Document> documents = Stream
            .of("Red apples are sweet and crunchy, perfect for snacking.",
                    "Apple Inc. is a technology company that makes iPhones and MacBooks.",
                    "Bananas are yellow tropical fruits rich in potassium.",
                    "Green apples have a tart flavor and are great for baking pies.",
                    "The iPhone is Apple's flagship smartphone with advanced features.",
                    "Oranges are citrus fruits packed with vitamin C and fiber.",
                    "Fresh strawberries are small red berries with tiny seeds on the surface.",
                    "Apple's CEO Tim Cook leads the company's innovation in consumer electronics.",
                    "Juicy peaches have soft fuzzy skin and sweet orange flesh.",
                    "MacBook Pro is Apple's professional laptop computer for creative work.")
            .map(Document::new)
            .toList();
        this.vectorStore.add(documents);

        Stream.of("red fruit for eating", "apple technology").forEach(query -> {
            System.out.println("-----");
            System.out.println("Query: " + query);
            this.vectorStore.similaritySearch(SearchRequest.builder().query(query).topK(3).build())
                .forEach(System.out::println);
        });
    }

}
EOF

このDocumentLoaderクラスはアプリケーション起動時に実行され、10件のドキュメントをベクトルデータベースに保存し、2つのクエリで類似検索を行います。
ドキュメントは果物のAppleと企業のAppleに関する内容が混在しています。
果物のことか会社のことかを区別できるか確認するためのクエリを2つ用意しています。

Embedding APIとpgvectorの設定をapplication.propertiesに追加します。今回使用するEmbeddingモデルnomic-embed-text:v1.5は768次元のベクトルを返すので、spring.ai.openai.embedding.options.dimensionsとspring.ai.vectorstore.pgvector.dimensionsを768に設定します。
次元数はこちらで確認できます。768以外に、512、256、128、64も利用可能です。

cat <<'EOF' >> src/main/resources/application.properties
spring.ai.openai.embedding.options.dimensions=768
spring.ai.openai.embedding.options.model=nomic-embed-text:v1.5
spring.ai.vectorstore.pgvector.dimensions=768
spring.ai.vectorstore.pgvector.initialize-schema=true
EOF

次のコマンドでアプリを起動して、ログを確認してみましょう。ここではHTTPログはノイジーなので、Logbookのログレベルをinfoに変更しています。

./mvnw spring-boot:test-run -Dspring-boot.run.arguments="--logging.level.org.zalando.logbook.Logbook=info"

次のようなログが出力され、2つのクエリに対して適切に類似ドキュメントが検索されていることがわかります。

-----
Query: red fruit for eating
Document{id='024d6058-df40-471d-aa31-6d51bfd7b282', text='Red apples are sweet and crunchy, perfect for snacking.', media='null', metadata={distance=0.24459387}, score=0.7554061263799667}
Document{id='7e188604-697f-475c-ab0a-d3b1d84642fc', text='Fresh strawberries are small red berries with tiny seeds on the surface.', media='null', metadata={distance=0.33758575}, score=0.6624142527580261}
Document{id='044afc94-e939-44ab-b66a-6bc02d65fb9f', text='Oranges are citrus fruits packed with vitamin C and fiber.', media='null', metadata={distance=0.34883383}, score=0.651166170835495}
-----
Query: apple technology
Document{id='30ca5436-323b-47d9-bcc4-730e7162c6aa', text='Apple Inc. is a technology company that makes iPhones and MacBooks.', media='null', metadata={distance=0.165798}, score=0.8342020064592361}
Document{id='d1a7a5c0-c11d-4190-98b0-96e490253aea', text='MacBook Pro is Apple's professional laptop computer for creative work.', media='null', metadata={distance=0.26944816}, score=0.7305518388748169}
Document{id='b6938088-4785-4d30-8504-9837204a940b', text='The iPhone is Apple's flagship smartphone with advanced features.', media='null', metadata={distance=0.30699915}, score=0.693000853061676}

ここでdistanceはクエリとドキュメントのベクトルのコサイン距離を表し、scoreは1からdistanceを引いた値です。distanceは0に近づくほど類似度が高く、scoreが大きいほど類似度が高くなります。

ファイルからドキュメントをロード

次は、実際のドキュメントをベクトルデータベースに保存し、類似検索を行ってみましょう。

今回は"Chronosia(クロノシア)"という架空の国に関するドキュメントを利用します。

次のコマンドで"クロノシア"に関するドキュメントをダウンロードして、src/main/resources/docsディレクトリに保存します。

curl -sL https://github.com/making/chronosia/archive/refs/heads/main.tar.gz  | tar -xzvf -
mkdir -p src/main/resources/docs
cp -r chronosia-main/{ja,en} src/main/resources/docs/
rm -fr chronosia-main

ドキュメントを読み込んでベクトルデータベースに保存し、類似検索を行うDocumentLoaderクラスを次のように修正します。vector_storeテーブルにデータがすでに存在する場合はドキュメントのロードをスキップするようにしています。
ここまでTestcontainersを使ってPostgreSQLを起動しており、データベースはアプリケーション起動のたびに初期化されるので、毎回ドキュメントがロードされます。

cat <<'EOF' > src/main/java/com/example/DocumentLoader.java
package com.example;

import java.util.List;
import java.util.Map;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.document.Document;
import org.springframework.ai.reader.TextReader;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.boot.CommandLineRunner;
import org.springframework.core.io.Resource;
import org.springframework.core.io.support.ResourcePatternResolver;
import org.springframework.jdbc.core.simple.JdbcClient;
import org.springframework.stereotype.Component;

@Component
public class DocumentLoader implements CommandLineRunner {

    private final VectorStore vectorStore;

    private final ResourcePatternResolver resourcePatternResolver;

    private final JdbcClient jdbcClient;

    private final Logger logger = LoggerFactory.getLogger(DocumentLoader.class);

    public DocumentLoader(VectorStore vectorStore, ResourcePatternResolver resourcePatternResolver,
            JdbcClient jdbcClient) {
        this.vectorStore = vectorStore;
        this.resourcePatternResolver = resourcePatternResolver;
        this.jdbcClient = jdbcClient;
    }

    @Override
    public void run(String... args) throws Exception {
        Integer count = this.jdbcClient.sql("SELECT COUNT(*) FROM vector_store").query(Integer.class).single();
        if (count > 0) {
            logger.info("Found {} documents. Skip loading.", count);
        }
        else {
            logger.info("Loading documents...");
            for (Resource resource : this.resourcePatternResolver.getResources("classpath:docs/**/*.md")) {
                TextReader documentReader = new TextReader(resource);
                Map<String, Object> metadata = documentReader.getCustomMetadata();
                metadata.put("path", resource.getURI());
                this.vectorStore.add(documentReader.read());
            }
        }
        List<Document> documents = vectorStore
            .similaritySearch(SearchRequest.builder().query("Where is the capital of Chronosia?").topK(3).build());
        for (Document doc : documents) {
            System.out.println("score=" + doc.getScore() + "\tmetadata=" + doc.getMetadata());
        }
    }

}
EOF

ファイルからDocumentオブジェクトを読み込むために抽象化インタフェースとしてDocumentReaderが用意されています。今回はTextReaderを利用し、そのままDocumentをVectorStoreにロード登録します。
TextReaderはテキストファイルの内容を読み込み、そのままDocumentオブジェクトに変換します。
Markdownファイルを読み込む場合はMarkdownDocumentReaderを利用することもできますが、このDocumentReaderはMarkdownのパラグラフを分割して複数のDocumentオブジェクトとして返し、今回のドキュメントでは細分化されすぎるため使用しません。
今回のドキュメントはすべてMarkdown形式ですが、ファイルサイズが小さいため、1ファイルにつき1Documentでも特に問題ありません。ファイルサイズが大きく、LLMのコンテキスト長制限に収まらない場合は、TokenTextSplitterを利用して分割することもできます。

次のコマンドでアプリを起動して、ログを確認してみましょう。

./mvnw spring-boot:test-run -Dspring-boot.run.arguments="--logging.level.org.zalando.logbook.Logbook=info"

"Where is the capital of Chronosia?"というクエリに対して、次のような類似ドキュメントが検索されていることがわかります。クロノシアの概要を説明したドキュメントが最も類似度が高く、実際にドキュメントを読むと首都に関する記述があることがわかるでしょう。

score=0.7566197067499161	metadata={path=file:/Users/toshiaki/git/tut-spring-ai/target/classes/docs/en/overview.md, charset=UTF-8, source=overview.md, distance=0.2433803}
score=0.4937962293624878	metadata={path=file:/Users/toshiaki/git/tut-spring-ai/target/classes/docs/en/people.md, charset=UTF-8, source=people.md, distance=0.5062038}
score=0.493003785610199	metadata={path=file:/Users/toshiaki/git/tut-spring-ai/target/classes/docs/en/geography.md, charset=UTF-8, source=geography.md, distance=0.5069962}

Testcontainersで起動したPostgreSQLはアプリケーション起動のたびに初期化されるため、今度はデータが永続されるPostgreSQLを使います。./mvnw spring-boot:test-runではなく、./mvnw spring-boot:runコマンドであればTestcontainersは起動しません。

次のコマンドを実行してください。

./mvnw spring-boot:run -Dspring-boot.run.arguments="--logging.level.org.zalando.logbook.Logbook=info"

./mvnw spring-boot:test-runの場合は、Testcontainersで起動したPostgreSQLに接続するための設定が自動的に行われていましたが、./mvnw spring-boot:runの場合はそうなりません。
現在のはspring.datasource.urlなどのデータベース接続設定が指定されていないため、次のようなエラーで起動に失敗します。

***************************
APPLICATION FAILED TO START
***************************

Description:

Failed to configure a DataSource: 'url' attribute is not specified and no embedded datasource could be configured.

Reason: Failed to determine a suitable driver class


Action:

Consider the following:
    If you want an embedded database (H2, HSQL or Derby), please put it on the classpath.
    If you have database settings to be loaded from a particular profile you may need to activate it (no profiles are currently active).

spring.datasource.urlなどのプロパティを設定する代わりに、今度はDocker Composeを使ってPostgreSQLを起動するようにします。--spring.docker.compose.enabled=trueオプションを指定してアプリケーションを起動します。
Docker Compose連携機能により、Docker Composeで起動したPostgreSQLに接続するための設定が自動的に行われます。

./mvnw spring-boot:run -Dspring-boot.run.arguments="--logging.level.org.zalando.logbook.Logbook=info --spring.docker.compose.enabled=true"

アプリケーションが起動したら、ログにLoading documents...と表示され、ドキュメントがロードされていることを確認してください。

2025-09-11T00:32:57.689+09:00  INFO 6385 --- [           main] com.example.DocumentLoader               : Loading documents...

アプリケーションを終了し、再度起動してみましょう。今度はドキュメントがすでにロードされているため、Found 12 documents. Skip loading.と表示され、ドキュメントのロードがスキップされることを確認してください。

2025-09-11T00:43:06.190+09:00  INFO 13207 --- [           main] com.example.DocumentLoader               : Found 12 documents. Skip loading.

応用例: 性格診断に基づくクロノシア移住アドバイザーの作成

最後に、少しだけ応用したAIアプリケーションの例として、クロノシアのドキュメントを活用して、性格診断に基づくクロノシア移住アドバイザーを作成してみましょう。

性格診断のためのいくつか質問を繰り返すようなUIを作成したいところですが、今回は簡単のため、自身の性格をフリーテキストをトで入力し、その性格に基づいてクロノシアのどの都市に移住すべきかを提案するAPIを作成します。
ただし、フリーテキストの入力をそのまま移住アドバイスのプロンプトに渡すのではなく、前段に入力された性格からペルソナを作成するステップを挟みます。移住アドバイスのプロンプトにはこのペルソナを渡します。すなわち、ChatClientの呼び出しを2回行います。

次のChronosiaControllerクラスを作成します。先ほどのHelloControllerと同様にChatClientとQuestionAnswerAdvisorを利用します。先ほどはChatClientを作成する際のdefaultAdvisorsにQuestionAnswerAdvisorを指定しましたが、
今回はChatClientの呼び出し時にadvisors(...)メソッドで指定しています。これは初回のペルソナ作成の呼び出しではドキュメント検索は不要なためです。

cat <<'EOF' > src/main/java/com/example/ChronosiaController.java
package com.example;

import java.util.List;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class ChronosiaController {

    private final ChatClient chatClient;

    private final QuestionAnswerAdvisor questionAnswerAdvisor;

    public ChronosiaController(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) {
        this.chatClient = chatClientBuilder.build();
        this.questionAnswerAdvisor = QuestionAnswerAdvisor.builder(vectorStore).build();
    }

    @PostMapping(path = "/chronosia")
    public ChronosiaResponse chronosia(@RequestBody ChronosiaRequest request) {
        String persona = this.chatClient.prompt().user("""
                後述のパーソナリティから次の観点でペルソナを作成してください。ただし、不明な点は推測せずに"不明"としてください。
                - 生活リズム （朝型、計画的など）
                - キャリア（革新的、技術志向など）
                - 時間価値観（効率重視など）
                - 社会性（バランス型など）
                - 家族状況（子供の教育重視など）

                パーソナリティは次のとおりです。
                ----
                {personality}
                """).user(u -> u.param("personality", request.personality())).call().content();
        return this.chatClient.prompt()
            .system("あなたはクロノシアの移住アドバイザーです。")
            .user("""
                    次のペルソナの人がいます。
                    {persona}

                    この人に適切な移住すべきクロノシアの都市名とその理由をに提案してください。候補は2つまで、優先度の高いものから順に列挙してください。
                    """)
            .user(u -> u.param("persona", persona))
            .advisors(this.questionAnswerAdvisor)
            .call()
            .entity(ChronosiaResponse.class);
    }

    public record ChronosiaRequest(String personality) {

    }

    public record ChronosiaResponse(List<Candidate> candidates) {

        public record Candidate(String city, String reason, int priority) {
        }
    }

}
EOF

次のコマンドでアプリを再起動してください。

./mvnw spring-boot:run -Dspring-boot.run.arguments="--spring.docker.compose.enabled=true"

いくつかの移住アドバイスの例を試してみましょう。

$ curl http://localhost:8080/chronosia --json '{"personality":"私はせっかちな人間です。何事も効率を重視します。それなのに寝坊しがちです。常に最先端を目指しています。独身です。"}' -s | jq .
{
  "candidates": [
    {
      "city": "テンポラル・シティ",
      "reason": "クロノシアの首都であり、ISO本部や先端時計製造、量子時間プログラミングのハブが集中。効率と革新を重視するキャリア志向の人に最適。",
      "priority": 1
    },
    {
      "city": "明日区",
      "reason": "東側の明日人が集まる地域で、最新技術の採用率が高く、朝型人が多い。時間管理と効率重視の文化が根付いているため、イノベーション志向の人に合う。",
      "priority": 2
    }
  ]
}

$ curl http://localhost:8080/chronosia --json '{"personality":"私はおっとりとしており、争いを好まず、のんびりと生きていたいです。最新事情には疎いです。小学生の子供が2人います。"}' -s | jq .
{
  "candidates": [
    {
      "city": "ウェスタデイ",
      "reason": "西部最大都市であるウェスタデイは、レトロで落ち着いた雰囲気とゆっくりした生活リズムが特徴。時間に対する厳格さが比較的緩いので、子どもたちと共にリラックスした暮らしを楽しめます。",
      "priority": 1
    },
    {
      "city": "ポート・パラドックス",
      "reason": "南部に位置する港町で、ビーチや海辺のレイアウトが広がり、家族連れに適した環境。忙しいビジネスよりも、のんびりとした海辺の生活を重視する人に合うでしょう。",
      "priority": 2
    }
  ]
}

gpt-ossを使っている場合は、回答の結果が低品質だと感じたら--spring.ai.openai.chat.options.reasoning-effort=highオプションを起動時に指定するか、spring.ai.openai.chat.options.reasoning-effort=highをapplication.propertiesに追加して、アプリケーションを再起動してみてください。
時間がかかるようになりますが、LLMがより丁寧に推論を行うようになり、より良い回答が得られるかもしれません。

Spring AIを使うと、ここまでのようなAIアプリケーションも簡単に作成できることがわかりました。

チュートリアルを通して、Spring AIの基本的な使い方を紹介しました。Spring AIには他にも多くの機能があります。興味のある方は以下のリソースを参照してください。

IK.AM