Knowledge of protein-protein interactions (PPI) is essential for studying protein functions and understanding the biological processes. Previously, most of the works on PPI in the BioNLP domain rely solely on textual data. With the availability of different information (structure, sequence, gene ontology) about proteins, researchers have started to use other details with textual data to predict PPI more accurately. This paper reports the first attempt in integrating gene ontology(GO)-based information with the features extracted from other two modalities of proteins namely 3D structure and existing textual information. Existing two popular text-based benchmark PPI corpora, i.e., BioInfer and HRPD50 are first extended to integrate with the structure and GO-based information. Finally, some deep learning-based techniques are employed to extract features from three modalities and those are concatenated for final prediction of protein interaction. The experimentation on generated multi-modal datasets illustrates that the proposed deep multi-modal framework outperforms the baselines (uni-modal, bi-modal and multi-modal) and state-of-the-art methods.