Architecture
Technical architecture of RubyOnVibes apps — Rails 8, Falcon, async Ruby, RubyLLM, and production deployment.
Your App's Stack
Every app created on RubyOnVibes is a AI enabled Rails 8 application. There is no proprietary runtime or custom framework — it's Rails + Vite JS all the way down.
| Layer | What you get |
|---|---|
| Framework | Rails 8 with (Turbo + React) |
| Web Server | Falcon — fiber-based, async-native |
| Authentication | Devise with email/password, session management |
| Authorization | Pundit role-based policies |
| Payments | Stripe integration via the Pay gem |
| Frontend | ERB + Turbo, React via Inertia (with optional SSR), Islands Architecture, Alpine.js, TailwindCSS, Vite |
| Background Jobs | SolidQueue (durable) + async-job (fiber-based, perfect for LLM streaming) |
| AI | RubyLLM — supports 1,100+ models across 12+ providers with little to no code changes |
| Database | SQLite (default), PostgreSQL upgrade available |
| Deployment | Three options: Fly.io (live coding), Render / any Docker host (production), local |
| Testing | RSpec with Capybara browser tests |
Falcon and Async Ruby
RubyOnVibes apps run on Falcon, a fiber-based web server, instead of the traditional Puma thread pool. This is a deliberate architectural choice driven by the nature of LLM workloads.
Why Fibers Matter for AI Apps
LLM API calls spend 99% of their time waiting on network I/O. In a thread-based server, each streaming conversation occupies an entire thread (and its database connection) while doing almost nothing. Fibers solve this:
- A fiber yields control during I/O and resumes when data arrives — no OS thread consumed while waiting
- Thousands of concurrent LLM conversations can share a single thread
How It Works in Practice
The app runs two job backends, each suited to its workload:
| Backend | Runs | Best for |
|---|---|---|
| async-job (inline) | In the Falcon web process via fibers | LLM streaming, API calls, I/O-bound work |
| SolidQueue | Separate worker process | Emails, exports, maintenance, CPU-bound work |
ChatStreamJob is an async-job — it runs inside the Falcon process as a fiber, not as a separate worker thread. This means streaming an LLM response and broadcasting it over WebSockets happens in the same process with near-zero overhead. When the LLM is thinking, the fiber yields and Falcon serves other requests.
User message → ChatStreamJob (async-job fiber)
→ RubyLLM.chat.complete { |chunk| broadcast(chunk) }
→ Fiber yields during network wait
→ Falcon serves other requests meanwhile
SolidQueue handles everything else — emails, PDF generation, scheduled tasks — in a separate worker process with its own database connections and full process isolation.
AI and LLM Integration
RubyLLM is the LLM communication layer for every RubyOnVibes app. It provides a unified Ruby interface across providers and models.
Provider-Agnostic by Default
A single API works across all supported providers. Switching from one model to another is a string change:
chat = RubyLLM.chat(model: "claude-sonnet-4-5")
chat = RubyLLM.chat(model: "gpt-4o")
chat = RubyLLM.chat(model: "gemini-2.0-flash")
No major code changes, no adapter swaps. Token tracking, streaming, tool use, and structured output work consistently across providers. You may use environment variables to configure your selected model(s) in order to seamlessly run your application in different environments with different LLMs with zero code changes.
Capabilities
RubyLLM covers chat, vision, audio transcription, embeddings, image generation, structured output, content moderation, extended thinking, agentic workflow composition, and tool/function calling — all from one gem. The acts_as_chat and acts_as_message mixins integrate directly with ActiveRecord for conversation persistence.
Tool Calling and Agents
Tools are Ruby classes that the LLM can invoke mid-conversation. RubyOnVibes apps use this for code editing, file operations, and deployment actions. RubyLLM's Agent class enables composable, reusable AI workflows — sequential pipelines, routing, parallel fan-out, and evaluation loops — built from plain Ruby.
Local and Private Models
RubyLLM supports Ollama and other local inference backends. Run open-weight models locally with no API key, no network calls, and complete data privacy. DeepSeek, Mistral, and any OpenAI-compatible endpoint work as well. This means a RubyOnVibes app can run entirely on a local machine with a local model — no cloud dependency required.
Async-Native
RubyLLM automatically becomes non-blocking inside an Async context. Under Falcon, every chat.complete call runs as a fiber — no configuration needed. The same code that works synchronously in a script works concurrently at scale in production.
Database: Start Simple, Scale Up
Apps start with SQLite for instant boot times and zero configuration. When you're ready for production scale, upgrade to PostgreSQL via Supabase (free tier available) or any PostgreSQL provider.
The upgrade minutes (assuming no data): create a Supabase database, load your schemas via the backend console, and update your DATABASE_URL in Settings. Full guide included in every app at docs/supabase.md.
SQLite is fine for development and small-scale apps. Switch to PostgreSQL when you need advanced database features.
File Storage: Flexible Adapters
Files are stored on a persistent volume by default via Active Storage. When you need more capacity or CDN delivery, swap to S3 or any cloud storage provider by updating your storage configuration — no code changes needed.
Frontend: ERB First, React When You Need It
Most pages are server-rendered ERB with Turbo — fast to build, fast to load, no JavaScript compilation step. When a page needs rich client-side interactivity, Inertia Rails renders full React pages with server-side data passing and no API layer necessarily required to maintain. Inertia pages can optionally be server-side rendered (SSR) for faster initial loads and SEO — the SSR build runs via Vite alongside the main app.
For cases where you need interactive components on an otherwise server-rendered page, the Islands Architecture (via islandjs-rails) mounts standalone React components into DOM containers. Alpine.js is also available for lightweight inline reactivity — dropdowns, toggles, and other small interactions that don't warrant a full React component. Both Inertia pages and Islands are built by Vite. Import Maps is available for lightweight global scripts (Turbo, analytics) but is not the primary JS pipeline — Stimulus is not included in the default template, though it can be added if desired. ERB for productivity, React where it matters.
Real-Time Features
Chat and editor updates use Turbo Streams over WebSockets by default (via async-cable):
- User sends a message via the chat panel
ChatStreamJobruns as a fiber in the Falcon process- RubyLLM streams tokens back from the provider
- Each chunk broadcasts to the client in real-time — same process, no serialization overhead
Deployment
Live Coding on Fly.io
When you create or fork a project on RubyOnVibes, it deploys to Fly.io immediately. The app runs Falcon + SolidQueue + async-job in a single container. Code changes deploy automatically — no CI/CD pipeline to configure.
Production Deployment
Your app ships with a Dockerfile. Deploy to any host that runs Docker or Rails:
- Render — connect your synced GitHub repo for automatic deploys
- Railway, Heroku, AWS, GCP — standard Docker or Rails deployment
- Bare metal / VPS —
docker buildanddocker run
With PostgreSQL, SolidQueue runs as a separate worker service for horizontal scaling.
Local and Desktop
Clone the repo, run bundle install, npm install, and bin/dev. The full stack — Falcon, SolidQueue, AI features — runs on your machine. Future releases will support wrapping apps in Tauri for native desktop distribution.
Platform Infrastructure
The RubyOnVibes platform itself runs on Rails 8 with multi-tenant account management, WebSocket servers for real-time streaming, and background workers for AI processing and repository operations. User apps run independently on their own infrastructure.