GPU memory swap, also referred to as model hot-swapping, allows multiple models to share the same GPUs, even if their combined memory requirements exceed the available GPU capacity. This approach ...