Striking a balance between minimizing bandwidth consumption and maintaining high visual quality stands as the paramount objective in volumetric content delivery. However, achieving this ambitious target is a substantial challenge, especially for mobile devices with constrained computational resources, given the voluminous amount of 3D data to be streamed, strict latency requirements, and high computational load. Inspired by the advantages offered by neural radiance fields (NeRF), we propose, for the first time, to deliver volumetric videos by utilizing neural-based content representations. We delve deep into potential challenges and explore viable solutions for both video-on-demand (VOD) and live video streaming services, in terms of the end-to-end pipeline, real-time and high-quality streaming, rate adaptation, and viewport adaptation. Our preliminary results lend credence to the feasibility of our research proposition, offering a promising starting point for further investigation.