Plamo2をollamaで動かす

#ai #llm #ollama

Plamo2には翻訳用にファインチューンされたPlamo2-translateが存在し、PLaMo翻訳というデモもある

ollamaはGGUF形式のモデルをローカルで動作させられる

Plamo2-translateにはGGUF版が(非公式に変換されたものが)ある

これで完全ローカルな翻訳が可能なのでは〜？

やってみる

ollamaを入れておく

でかい

ollamaは直接HF上のGGUFを動作させられるようになっているので実行する

mmnga/plamo-2-translate-gguf · Hugging Face

ollama run hf.co/mmnga/plamo-2-translate-gguf

量子化タイプはデフォルトだとQ4_K_M

Ollama で Hugging Face Hub の GGUF をそのまま使ってみる｜ぬこぬこによる

ただし、Q4_K_M のモデルが存在しない場合は、リポジトリ内に存在する適切な量子化タイプが選択されます

今回はplamo-2-translate-Q3_K_L.ggufが選択された

でかい

code:error

λ ollama run hf.co/mmnga/plamo-2-translate-gguf

Error: 500 Internal Server Error: llama runner process has terminated: GGML_ASSERT(inp != nullptr && "missing result_norm/result_embd tensor") failed

GPUがへぼそう

qwen3を実行したところうまく動いたのでVRAMなどが不足してそう。

適当なテキストを用意する

https://medium.com/@isaikat213/building-a-simple-ai-chat-server-with-scala-zio-and-ollama-85cde9745e5e

この最初のパラグラフを利用する

code:quote

I recently discovered ZIO while working on a large-scale Scala application, and I was so impressed by its power and elegance that I wanted to explore it further. This project is the result of that exploration, combining my interest in functional programming with the exciting world of local Large Language Models (LLMs).

In the rapidly evolving landscape of AI, LLMs are becoming more capable every day. One of the most exciting advancements is the ability for models to use external tools, allowing them to interact with the real world, access up-to-date information, and perform complex tasks.

This article provides a complete walkthrough of how to leverage the power of Scala ZIO and Ollama to build a high-performance, concurrent chat server that supports tool calling. We’ll explore how to run local LLMs that can search the web, execute code, and more, all orchestrated by the robust and type-safe environment that ZIO provides.