Social Wire
Tweeted
Oct 15, 2025
EXO Labs
@exolabs Full blog post and more details about EXO 1.0: https://t.co/2ZdPUJe4iR Thanks @NVIDIA for early access to two DGX Sparks. #SparkSomethingBig@NVIDIA for early accesblog.exolabs.net/nvidia-dgx-spa… Thanks @NVIDIA for early access to two DGX Sparks. #SparkSomethingBig
EXO Labs
@exolabs Full blog post and more details about EXO 1.0: https://t.co/2ZdPUJe4iR Thanks @NVIDIA for early access to two DGX Sparks. #SparkSomethingBig@NVIDIA for early accesblog.exolabs.net/nvidia-dgx-spa… Thanks @NVIDIA for early access to two DGX Sparks. #SparkSomethingBig
Tweeted
Oct 15, 2025
EXO Labs
@exolabs But the KV cache is created for each transformer layer. By sending each layer’s KV cache after it’s computed, we overlap communication with computation. We stream the KV cache and hide the network delay. We achieve a 4x speedup in prefill & 3x in decode, with 0 network delVbvk
EXO Labs
@exolabs But the KV cache is created for each transformer layer. By sending each layer’s KV cache after it’s computed, we overlap communication with computation. We stream the KV cache and hide the network delay. We achieve a 4x speedup in prefill & 3x in decode, with 0 network delVbvk
Tweeted
Oct 15, 2025
EXO Labs
@exolabs We can run these two stages on different devices: Prefill: DGX Spark (high compute device, 4x compute) Decode: M3 Ultra (high memory-bandwidth device, 3x memory-bandwidth) However, now we need to transfer the KV cache over the network (10GbE). This introduces a delay.
EXO Labs
@exolabs We can run these two stages on different devices: Prefill: DGX Spark (high compute device, 4x compute) Decode: M3 Ultra (high memory-bandwidth device, 3x memory-bandwidth) However, now we need to transfer the KV cache over the network (10GbE). This introduces a delay.
Tweeted
Oct 15, 2025
EXO Labs
@exolabs LLM inference consists of a prefill and decode stage. Prefill processes the prompt, building a KV cache. It’s compute-bound so gets faster with more FLOPS. Decode reads the KV cache and generates tokens one by one. It’s memory-bound so gets faster with more memory bandwidGX1V
EXO Labs
@exolabs LLM inference consists of a prefill and decode stage. Prefill processes the prompt, building a KV cache. It’s compute-bound so gets faster with more FLOPS. Decode reads the KV cache and generates tokens one by one. It’s memory-bound so gets faster with more memory bandwidGX1V
Tweeted
Oct 15, 2025
EXO Labs
@exolabs Clustering NVIDIA DGX Spark + M3 Ultra Mac Studio for 4x faster LLM inference. DGX Spark: 128GB @ 273GB/s, 100 TFLOPS (fp16), $3,999 M3 Ultra: 256GB @ 819GB/s, 26 TFLOPS (fp16), $5,599 The DGX Spark has 3x less memory bandwidth than the M3 Ultra but 4x more FLOPS. By running
EXO Labs
@exolabs Clustering NVIDIA DGX Spark + M3 Ultra Mac Studio for 4x faster LLM inference. DGX Spark: 128GB @ 273GB/s, 100 TFLOPS (fp16), $3,999 M3 Ultra: 256GB @ 819GB/s, 26 TFLOPS (fp16), $5,599 The DGX Spark has 3x less memory bandwidth than the M3 Ultra but 4x more FLOPS. By running
Tweeted
Sep 20, 2025
EXO Labs
@exolabs EXO covers all UK visa costs and relocation costs. We have 100% success rate, and it’s usually a fast process (~1 month)x.com/alexocheema/st…B2
EXO Labs
@exolabs EXO covers all UK visa costs and relocation costs. We have 100% success rate, and it’s usually a fast process (~1 month)x.com/alexocheema/st…B2
Tweeted
Sep 2, 2025
EXO Labs
@exolabs A deep dive on KPOP at @Cohere_Labs ML efficiency group. KPOP is an optimizer designed specifically for the hardware constraints of Apple Silicon. We're doubling the number of Apple Silicon macs that can train together coherently every 2 months. In 12 months we'll have rebuilt x.com/alexocheema/st…
EXO Labs
@exolabs A deep dive on KPOP at @Cohere_Labs ML efficiency group. KPOP is an optimizer designed specifically for the hardware constraints of Apple Silicon. We're doubling the number of Apple Silicon macs that can train together coherently every 2 months. In 12 months we'll have rebuilt x.com/alexocheema/st…
Tweeted
Aug 30, 2025
EXO Labs
@exolabs EXO Gym: simulate large-scale distributed training experiments on a single MacBook x.com/MattBeton/stat…
EXO Labs
@exolabs EXO Gym: simulate large-scale distributed training experiments on a single MacBook x.com/MattBeton/stat…
Tweeted
Aug 22, 2025
EXO Labs
@exolabs run massive models, add macs incrementally for linear scaling (no limit) x.com/MattBeton/stat…
EXO Labs
@exolabs run massive models, add macs incrementally for linear scaling (no limit) x.com/MattBeton/stat…