Architecture

Technical architecture of RubyOnVibes apps — Rails 8, Falcon, async Ruby, RubyLLM, and production deployment.

Your App's Stack

Every app created on RubyOnVibes is a AI enabled Rails 8 application. There is no proprietary runtime or custom framework — it's Rails + Vite JS all the way down.

Layer What you get
Framework Rails 8 with (Turbo + React)
Web Server Falcon — fiber-based, async-native
Authentication Devise with email/password, session management
Authorization Pundit role-based policies
Payments Stripe integration via the Pay gem
Frontend ERB + Turbo, React via Inertia (with optional SSR), Islands Architecture, Alpine.js, TailwindCSS, Vite
Background Jobs SolidQueue (durable) + async-job (fiber-based, perfect for LLM streaming)
AI RubyLLM — supports 1,100+ models across 12+ providers with little to no code changes
Database SQLite (default), PostgreSQL upgrade available
Deployment Three options: Fly.io (live coding), Render / any Docker host (production), local
Testing RSpec with Capybara browser tests

Falcon and Async Ruby

RubyOnVibes apps run on Falcon, a fiber-based web server, instead of the traditional Puma thread pool. This is a deliberate architectural choice driven by the nature of LLM workloads.

Why Fibers Matter for AI Apps

LLM API calls spend 99% of their time waiting on network I/O. In a thread-based server, each streaming conversation occupies an entire thread (and its database connection) while doing almost nothing. Fibers solve this:

  • A fiber yields control during I/O and resumes when data arrives — no OS thread consumed while waiting
  • Thousands of concurrent LLM conversations can share a single thread

How It Works in Practice

The app runs two job backends, each suited to its workload:

Backend Runs Best for
async-job (inline) In the Falcon web process via fibers LLM streaming, API calls, I/O-bound work
SolidQueue Separate worker process Emails, exports, maintenance, CPU-bound work

ChatStreamJob is an async-job — it runs inside the Falcon process as a fiber, not as a separate worker thread. This means streaming an LLM response and broadcasting it over WebSockets happens in the same process with near-zero overhead. When the LLM is thinking, the fiber yields and Falcon serves other requests.

User message → ChatStreamJob (async-job fiber)
                 → RubyLLM.chat.complete { |chunk| broadcast(chunk) }
                 → Fiber yields during network wait
                 → Falcon serves other requests meanwhile

SolidQueue handles everything else — emails, PDF generation, scheduled tasks — in a separate worker process with its own database connections and full process isolation.

AI and LLM Integration

RubyLLM is the LLM communication layer for every RubyOnVibes app. It provides a unified Ruby interface across providers and models.

Provider-Agnostic by Default

A single API works across all supported providers. Switching from one model to another is a string change:

chat = RubyLLM.chat(model: "claude-sonnet-4-5")
chat = RubyLLM.chat(model: "gpt-4o")
chat = RubyLLM.chat(model: "gemini-2.0-flash")

No major code changes, no adapter swaps. Token tracking, streaming, tool use, and structured output work consistently across providers. You may use environment variables to configure your selected model(s) in order to seamlessly run your application in different environments with different LLMs with zero code changes.

Capabilities

RubyLLM covers chat, vision, audio transcription, embeddings, image generation, structured output, content moderation, extended thinking, agentic workflow composition, and tool/function calling — all from one gem. The acts_as_chat and acts_as_message mixins integrate directly with ActiveRecord for conversation persistence.

Tool Calling and Agents

Tools are Ruby classes that the LLM can invoke mid-conversation. RubyOnVibes apps use this for code editing, file operations, and deployment actions. RubyLLM's Agent class enables composable, reusable AI workflows — sequential pipelines, routing, parallel fan-out, and evaluation loops — built from plain Ruby.

Local and Private Models

RubyLLM supports Ollama and other local inference backends. Run open-weight models locally with no API key, no network calls, and complete data privacy. DeepSeek, Mistral, and any OpenAI-compatible endpoint work as well. This means a RubyOnVibes app can run entirely on a local machine with a local model — no cloud dependency required.

Async-Native

RubyLLM automatically becomes non-blocking inside an Async context. Under Falcon, every chat.complete call runs as a fiber — no configuration needed. The same code that works synchronously in a script works concurrently at scale in production.

Database: Start Simple, Scale Up

Apps start with SQLite for instant boot times and zero configuration. When you're ready for production scale, upgrade to PostgreSQL via Supabase (free tier available) or any PostgreSQL provider.

The upgrade minutes (assuming no data): create a Supabase database, load your schemas via the backend console, and update your DATABASE_URL in Settings. Full guide included in every app at docs/supabase.md.

SQLite is fine for development and small-scale apps. Switch to PostgreSQL when you need advanced database features.

File Storage: Flexible Adapters

Files are stored on a persistent volume by default via Active Storage. When you need more capacity or CDN delivery, swap to S3 or any cloud storage provider by updating your storage configuration — no code changes needed.

Frontend: ERB First, React When You Need It

Most pages are server-rendered ERB with Turbo — fast to build, fast to load, no JavaScript compilation step. When a page needs rich client-side interactivity, Inertia Rails renders full React pages with server-side data passing and no API layer necessarily required to maintain. Inertia pages can optionally be server-side rendered (SSR) for faster initial loads and SEO — the SSR build runs via Vite alongside the main app.

For cases where you need interactive components on an otherwise server-rendered page, the Islands Architecture (via islandjs-rails) mounts standalone React components into DOM containers. Alpine.js is also available for lightweight inline reactivity — dropdowns, toggles, and other small interactions that don't warrant a full React component. Both Inertia pages and Islands are built by Vite. Import Maps is available for lightweight global scripts (Turbo, analytics) but is not the primary JS pipeline — Stimulus is not included in the default template, though it can be added if desired. ERB for productivity, React where it matters.

Real-Time Features

Chat and editor updates use Turbo Streams over WebSockets by default (via async-cable):

  1. User sends a message via the chat panel
  2. ChatStreamJob runs as a fiber in the Falcon process
  3. RubyLLM streams tokens back from the provider
  4. Each chunk broadcasts to the client in real-time — same process, no serialization overhead

Deployment

Live Coding on Fly.io

When you create or fork a project on RubyOnVibes, it deploys to Fly.io immediately. The app runs Falcon + SolidQueue + async-job in a single container. Code changes deploy automatically — no CI/CD pipeline to configure.

Production Deployment

Your app ships with a Dockerfile. Deploy to any host that runs Docker or Rails:

  • Render — connect your synced GitHub repo for automatic deploys
  • Railway, Heroku, AWS, GCP — standard Docker or Rails deployment
  • Bare metal / VPSdocker build and docker run

With PostgreSQL, SolidQueue runs as a separate worker service for horizontal scaling.

Local and Desktop

Clone the repo, run bundle install, npm install, and bin/dev. The full stack — Falcon, SolidQueue, AI features — runs on your machine. Future releases will support wrapping apps in Tauri for native desktop distribution.

Platform Infrastructure

The RubyOnVibes platform itself runs on Rails 8 with multi-tenant account management, WebSocket servers for real-time streaming, and background workers for AI processing and repository operations. User apps run independently on their own infrastructure.