Identification of user transport mode using mobile phone-based sensors is a key component of Intelligent Transportation System. However, collecting labels/annotations while switching multiple transport modes into different journeys is tedious. Also, transport type identification working across cities and countries is a prime need. This paper proposes a method for generalizable journey mode detection without using any annotations during training exploiting unsupervised representation learning. Our method uses commonalities and diversities across various user’s different journeys, to identify user-specific journey segments from either the same or different city/country. This method is also sensitive to preserve privacy, as it does not use GPS information. We propose a multistage unsupervised learning mechanism to form clusters on the learned latent representation using a choice of best distance measure. We also propose an Invariant Auto-Encoded Compact Sequence, which is a learned compact representation encompassing the common encoded latent feature representation across diverse users and cities. We prove with an exhaustive experimental analysis, that our method, is generalizable across varying users and cities using IMU-Accelerometer sensors. We use real-life publicly available transportation datasets captured from two different cities of different countries -Sussex (United Kingdom) and Bologna (Italy), and also in-house data collected from three Indian cities.