The advance of remote sensing technology has been capable of providing abundant spatial and contextual information for object or area detection in satellite or aerial imagery, which facilitates subsequent automatic analysis and interpretation of the optical remotes sensing images (RSIs). Most existing object or area detection approaches suffer from two limitations. First, the feature representation used is insufficiently powerful to capture the spatial and structural patterns of objects and the background regions. Second, a large number of training data with manual annotation of object bounding boxes is required in the supervised learning techniques adopted while the annotation process is generally too expensive and sometimes even unreliable. We propose an end-to-end framework for the dense, pixel-wise classification of satellite imagery with convolution neural networks (CNNs). In our framework, CNNs are directly trained to produce classification maps out of the input mages. We first devise a fully convolution architecture and demonstrate its relevance to the dense classification problem. We then address the issue of imperfect training data through a two-step training approach: CNNs are first initialized by using a large amount of possibly inaccurate reference data, and then refined on a small amount of accurately labeled data. To complete our framework, we design this module with the two different stages that alleviates the common trade-off between recognition and precise localization. A series of experiments in MATLAB show that our works consider a large amount of Meta context to provide fine-grained classification maps.