Präsentationsvideo / Chatklavier

Chatklavier

„Das Klavier als interaktiver Gesprächspartner mit KI-Synthese”
In Memoriam Peter Ablinger (1946–2025)

KI-gesteuerte Klanginstallation von Markus Sepperer

Die Installation transformiert Sprache in Klavierklang durch eine mehrstufige technische und konzeptuelle Kette.

Ausgangspunkt ist die Audioaufnahme der menschlichen Stimme des Users. Diese wird mit OpenAI Whisper transkribiert und als Text an ein lokal laufendes KI-Sprachmodell (Ollama) übergeben, das mittels Retrieval-Augmented Generation (RAG) antwortet.
RAG ist ein Verfahren, das die Antwort eines Sprachmodells durch gezieltes Nachschlagen in einer großen Textsammlung verbessert. Dabei sucht das System zuerst relevante Textstellen aus einem zuvor erstellten Korpus heraus und nutzt diese als Grundlage, um präzisere und kontextbezogene Antworten zu erzeugen.

Der Textkorpus für das Chatklavier besteht aus 1700 Textdateien, die automatisiert mit einem Python-Skript aus sämtlichen Texten der Website von Peter Ablinger (ablinger.mur.at) extrahiert wurden.Eine semantische Ähnlichkeitssuche mit FAISS, einer Softwarebibliothek für schnelle Suche in großen Vektor-Datenbanken, wählt relevante Textfragmente aus. Diese dienen dem Modell als inhaltlicher Kontext, was die Qualität der KI-Antworten steigert.Ein präziser Prompt, der auf den Input des Users abgestimmt ist, steuert das Modell so, dass es nur Begriffe und Satzfragmente aus dem Ablinger-Korpus nutzt. Die Antworten variieren im Stil (nüchtern, poetisch, dadaistisch, sachlich) und in der Länge. So entsteht eine sprachlich freie, formal eng gebundene Reaktion, die den User in Ablingers Gedankenwelt eintauchen lässt. Die KI-Antwort wird mit einer Text-to-Speech-Engine (pyttsx3) gesprochen. Gleichzeitig wird der Text Wort für Wort auf einem Bildschirm angezeigt und an die Software übertragen.

Parallel wird die Audioausgabe der KI verarbeitet. Die Software extrahiert 16 dominante Sinuskomponenten pro Frame – also kurzen Momentaufnahmen von wenigen Millisekunden. Die daraus gewonnenen Frequenzen und Amplituden werden in MIDI-Pitch und Velocity übersetzt und auf 16 Kanäle verteilt, was eine dynamische MIDI-Struktur erzeugt, die die Sprachformung approximiert. Diese MIDI-Daten werden an den von Winfried Ritsch und Peter Ablinger entwickelten und hier verbauten Klaviervorsetzer übergeben, der es ermöglicht, digitale Daten in mechanische Impulse umzuwandeln. Der Klaviervorsetzer war in zahlreichen Klangkunstprojekten Ablingers zentral und ermöglicht das präzise Spiel rhythmisch komplexer und maximal polyphoner Strukturen. Das Ergebnis ist kompexes Klangbild das die Sprachklänge hörbar macht. Ablinger nennt dies „Verständlichkeit durch Reduktion“: Das Klavier „spricht“ durch Tonhöhe, Akzente und Rhythmus – nicht semantisch, sondern klanglich-phonetisch.

Erstmals erhält dieses Instrument nun eine KI-gestützte, interaktive Erweiterung:
Die Maschine spricht nicht mehr nur – man kann nun mit ihr sprechen.
Das Klavier wird zur dialogfähigen Instanz: Die eigene Stimme formt eine Antwort aus Ablingers Sprachmaterial, wird zu digitalem Klangtext, der in mechanische Klavierimpulse umgesetzt wird – ein Kreislauf aus Hören, Denken, Sprechen und Spielen.

Inhaltliche und technische Betreuung : Dipl.Ing Patrik Lechner, Dr. Thomas Grill, Dipl.Ing Peter Plessas

AI-driven sound installation by Markus Sepperer

The Piano as an Interactive Conversational Partner with AI Synthesis”

In Memoriam Peter Ablinger (1959–2025)

The installation transforms speech into piano sound through a multi-stage technical and conceptual chain.

It starts with an audio recording of the user’s voice. This recording is transcribed using OpenAI Whisper and passed as text to a locally running AI language model (Ollama), which responds using Retrieval-Augmented Generation (RAG). RAG is a method that improves a language model’s responses by consulting a large text collection. The system first identifies relevant passages from a precompiled corpus and uses them as a basis to generate more precise and context-aware answers.

The text corpus for the Chat Piano consists of 1,700 text files, automatically extracted with a Python script from all texts on Peter Ablinger’s website (ablinger.mur.at). Semantic similarity search using FAISS, a software library for fast searching in large vector databases, selects relevant text fragments. These fragments provide the model with content context, enhancing the quality of the AI-generated responses. A precise prompt, tailored to the user’s input, ensures that the model only uses terms and sentence fragments from the Ablinger corpus. Responses vary in style (sober, poetic, Dadaist, factual) and length, creating a linguistically free yet formally constrained reaction that immerses the user in Ablinger’s world of thought.

The AI’s response is spoken using a text-to-speech engine (pyttsx3). Simultaneously, the text is displayed word by word on a screen and transmitted to the software.

In parallel, the AI’s audio output is processed. The software extracts 16 dominant sinusoidal components per frame – brief snapshots of a few milliseconds. The resulting frequencies and amplitudes are converted into MIDI pitch and velocity and distributed across 16 channels, producing a dynamic MIDI structure that approximates the shaping of speech. These MIDI data are sent to the piano actuator developed by Winfried Ritsch and Peter Ablinger, which converts digital data into mechanical impulses. The piano actuator was central to many of Ablinger’s sound art projects and allows precise playing of rhythmically complex and maximally polyphonic structures. The result is a complex soundscape that makes speech sounds audible. Ablinger called this “understandability through reduction”: the piano “speaks” through pitch, accents, and rhythm – not semantically, but phonetically and sonically.

For the first time, this instrument now receives an AI-supported, interactive extension: the machine no longer only speaks – one can now speak with it. The piano becomes a dialogical entity: the user’s own voice shapes a response from Ablinger’s textual material, transforming into digital sound text, which is converted into mechanical piano impulses – a cycle of listening, thinking, speaking, and playing.

Content and technical supervision: Dipl.Ing. Patrik Lechner, Dr. Thomas Grill, Dipl.Ing. Peter Plessas