Navigation from one place to another involves estimating location and orientation. As such, navigation and location-based services are essential to our daily lives. Prevalent navigation techniques rely on integrating global navigation satellite systems (GNSS) and inertial navigation systems (INS) which does not provide reliable and accurate solutions when GNSS signals are degraded and in the indoor environments. To remedy this deficiency, simultaneous localization and mapping (SLAM) approaches have been developed for navigation purposes. However, SLAM uses data from one device or one session, and is limited in the amount of available information and thus is prone to accumulating errors. Using data from multiple sessions or devices, a collaborative SLAM technique holds promise for reusing mapping information and enhancing localization accuracy. Such techniques gain further significance in view of the widespread cloud computing resources and handheld mobile devices. With crowdsourced data from these consumer devices, collaborative SLAM is key to many location-based services, eg, navigating a building for a group of people or robots. However, available solutions and scope of research investigations are somewhat limited in this field. Therefore, this work focuses on solving the collaborative SLAM problem with crowdsourced data. This problem is approached at three aspects in this work: collaborative SLAM with crowdsourced visual data, calibrating a camera-IMU (inertial measurement unit) sensor system found on mobile devices, and collaborative SLAM with crowdsourced visual and inertial data. For the first aspect, a collaborative SLAM framework is proposed based on the client-server architecture. It has independent clients estimating the device motion using the visual data from mobile devices. The output from these clients is processed by the server to collaboratively map the environment and to refine estimates of device motion which are transmitted to clients to update their estimates. The proposed framework achieves reuse of the existing maps, robust loop closure, and real-time collaborative and accurate mapping. These properties of the framework were validated with experiments on a benchmark dataset and visual data crowdsourced by smartphones. For the second aspect, a calibration approach based on the extended Kalman filter (EKF) is developed for the camera-IMU system of a mobile device. Aimed at target-free on-the-fly calibration, this approach estimates the intrinsic parameters of both the camera and IMU sensor, and the temporal and spatial relations between the two sensors. Simulation and experiments with a benchmark dataset and smartphone data validated the calibration performance. The calibration approach provides essential parameters for integrating visual and inertial data which is the highlight of the third aspect. There the collaborative SLAM framework is extended to work with visual and inertial data. While keeping such features as reusing the existing maps and real-time collaborative mapping, the new framework has the metric scale largely observable thanks to the inertial data. These features of the framework were validated with experiments on visual and inertial data collected by smartphones. In summary, these research efforts have proved that accurate and effective collaborative SLAM is achievable with crowdsourced data at the level that has not been demonstrated before. It represents a step towards location-based services which harness the power of crowdsourcing, such as crowd-based gaming and emergency response.