Ponkotsu LLM

● Live

A lightweight language model running on a Raspberry Pi. It's no good at hard problems, but for small talk and a bit of writing help, it does its honest best. Responses stream back token by token.

Endpoint: POST /api/v1/chat
I/O: text->text · streaming
Auth: None (open to all)

Demo

chat.demo

⌘/Ctrl + Enter

How to use

Send text, get text back. Responses stream one token at a time over SSE (Server-Sent Events).

Request

curl -N https://ponkotsu-lab.net/api/v1/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello"}'

Field	Type	Required	Description
`message`	string	✔	Input text for the model
`max_tokens`	number		Max tokens to generate (default: 256)

Response (streaming)

data: {"delta": "Hel"}
data: {"delta": "lo"}
data: {"done": true}

Limitations (the ponkotsu bits)

Being underpowered, long text and complex reasoning are not its strength.
Under load you may be rate-limited and put in a queue.
No auth required, but there is a per-IP usage cap.
Runs on a lightweight model (powered by Ollama / gemma).