Visual impairment has caused serious influence on the human being and society. Due to more sensitive hearing and touch, for visually impaired people, it is an available solution to improve their quality of live, work and study by Sensory Substitution Devices (SSDs) which transfer visual information to audio or touch. In this paper, we proposed a wearable, vision-to-audio sensory substitution system with scene-perception-based deep learning to help the visually impaired users recognize and locate normal objects in the environment. The system consists of a wireless camera module, a Bluetooth speech feedback module with a microphone, and an Android mobile phone with a customized application. The camera module captures images from the scene and sends them to the application of Android mobile phone. The Bluetooth speech feedback module sends speech commands to application and broadcasts speech guidance to visually impaired users. The application based on Android platform loads speech recognition and object detection models. The system has been proved to provide an effective way to help the visually impaired people recognize and locate objects.