ACE-Step 1.5

View Poster Visit Website Search Youtube Videos

ACE-Step 1.5 Overview

ACE-Step 1.5 is an open-source music foundation model co-led by ACE Studio and StepFun ??described as the most powerful local music generation model that outperforms almost all commercial alternatives, running on Mac, AMD, Intel, and CUDA devices with under 4GB VRAM.

Ultra-Fast Generation: Under 2 seconds per full song on A100 and under 10 seconds on RTX 3090; supports 10 seconds to 10 minutes of audio.
Commercial-Grade Quality: Output quality beyond most commercial music models, positioned between Suno v4.5 and Suno v5 with 1000+ instrument and style support.
Hybrid LM + DiT Architecture: Language model plans song blueprints via Chain-of-Thought while a Diffusion Transformer generates audio with intrinsic reinforcement learning alignment.
Versatile Editing: Cover generation, repaint, vocal-to-BGM conversion, track separation, multi-track layering, and reference audio conditioning.
Multi-Language Lyrics: Supports 50+ languages with metadata control for duration, BPM, key/scale, and time signature.
LoRA Personalization: Train a custom LoRA from just 8 songs in about 1 hour on a 12GB RTX 3090 via one-click Gradio training.
Cross-Platform Local Run: Gradio UI and REST API on CUDA, Apple Silicon (MLX), AMD ROCm, Intel XPU, and CPU with tier-aware auto GPU configuration.
XL 4B Model Series: New acestep-v15-xl-base/sft/turbo DiT models deliver higher audio quality with compatible 0.6B??B LM planners on Hugging Face and ModelScope.