Apple’s move validates what’s been clear to many in this field: spatial content is rapidly moving into mainstream expectations. The conversion of 2D content into immersive spatial scenes is an inevitable evolution in how users interact with digital media. Traditional flat images are becoming gateways into vivid, immersive worlds, significantly enhancing user engagement. Advances in AI and computer vision are powering this shift, enabling the reconstruction of 3D geometry from a single 2D photo. These systems estimate depth (or more accurately, disparity), layer the scene, and generate new viewpoints — all with increasing realism. Importantly, these AI-driven conversions no longer require heavy server-side compute alone. Modern mobile SoCs — equipped with neural engines and dedicated DSPs — can now perform depth estimation and rendering in real-time directly on-device.