Many AI algorithms have been deployed on edge devices as edge computing has the advantages of reducing latency, saving network bandwidth, and protecting data privacy. Whether edge devices can run AI algorithms is an important challenge due to the low-power and low-cost characteristics of edge devices. Therefore, this paper analyzed the performance of optimization techniques by running YOLOv3 on a typical GPU-based low-cost edge device, NVIDIA Jetson Nano. YOLOv3 is a representative object detection algorithm, which is widely used as the benchmark in AI scenarios. We compared latency, memory, and power consumption of three deep learning frameworks, TensorFlow, PyTorch, and TensorRT. Then we squeezed the extreme performance using multiple optimization techniques, including model quantization, model parallelization, and image scaling on TensorRT. The running speed of YOLOv3 increases from 3.9FPS to 13.1FPS on NVIDIA Jetson Nano. It proves that the resource-limited edge device can run AI applications with high computing power requirements in a real-time manner. Moreover, we summarized nine observations and five insights to guide the selection and design of optimization techniques and verified the generalization of these rules on NVIDIA Jetson Xavier NX. We also provided a series of suggestions to help developers choose the appropriate method to deploy AI algorithms on edge devices.