There is a trend to build wind turbines in large wind farms and in the near future to operate such a farm as an integrated power production plant. Predictability of individual turbine behaviour is key in such a strategy. In order to minimize the influence on the balance of the electricity grid it is necessary to have stable electricity production by each of the turbines. Hence all turbines should be available for operation when needed, which puts significant constraints on owner operators. Failure of turbine subcomponents should be avoided. This requires planning in advance of all necessary maintenance actions such that they can be performed during low wind and low electricity demand periods. Therefore, it is necessary to anticipate upcoming component failures, such that spare parts can be ordered and are available for maintenance personnel during the optimal weather and market windows. In order to obtain the insights to predict component failure, it is necessary to have an integrated clean dataset spanning all turbines of the wind farm for a sufficiently long period of time. This paper describes the requirements and challenges related to such a dataset based on experience acquired during years of monitoring offshore wind farms. Furthermore, it suggests a big data based approach for an integrated no-sql data-storage and data-analytics platform to tackle these challenges. In addition a failure prognosis approach using the integrated dataset is proposed to detect failure initiation in the bearings of gearboxes and generators, which are vital parts of wind turbines.