AIToolScan

ACE-Step 1.5

ACE-Step 1.5 Overview

ACE-Step 1.5 is an open-source music foundation model co-led by ACE Studio and StepFun — described as the most powerful local music generation model that outperforms almost all commercial alternatives, running on Mac, AMD, Intel, and CUDA devices with under 4GB VRAM.

  • Ultra-Fast Generation: Under 2 seconds per full song on A100 and under 10 seconds on RTX 3090; supports 10 seconds to 10 minutes of audio.
  • Commercial-Grade Quality: Output quality beyond most commercial music models, positioned between Suno v4.5 and Suno v5 with 1000+ instrument and style support.
  • Hybrid LM + DiT Architecture: Language model plans song blueprints via Chain-of-Thought while a Diffusion Transformer generates audio with intrinsic reinforcement learning alignment.
  • Versatile Editing: Cover generation, repaint, vocal-to-BGM conversion, track separation, multi-track layering, and reference audio conditioning.
  • Multi-Language Lyrics: Supports 50+ languages with metadata control for duration, BPM, key/scale, and time signature.
  • LoRA Personalization: Train a custom LoRA from just 8 songs in about 1 hour on a 12GB RTX 3090 via one-click Gradio training.
  • Cross-Platform Local Run: Gradio UI and REST API on CUDA, Apple Silicon (MLX), AMD ROCm, Intel XPU, and CPU with tier-aware auto GPU configuration.
  • XL 4B Model Series: New acestep-v15-xl-base/sft/turbo DiT models deliver higher audio quality with compatible 0.6B–4B LM planners on Hugging Face and ModelScope.