The development of fast image processing architectures in smart camera systems is a very important task. Nowadays, many applications, such as robot control or advanced driver assistance systems, require fast image acquisition and processing utilizing embedded devices. These requirements have lead to specialized hardware architectures for image processing. However, an overall approach to designing an optimal image processing architecture, remains unsolved. In most cases the architectures are application-specific, which results in high throughput, but they are also mostly limited to one single task. Other solutions use high performance DSPs which are very flexible but lack in performance and/or power consumption compared to the application-specific implementation. Hence, we designed a framework, called FAUPU1 (FAU Processing Unit), which can generate an architecture for a specific range of image processing applications and furthermore is able to yield real-time and power constraints. As a result, a field of programmable processing elements (PE) is instantiated. The fields structure and size is highly generic in terms of the given requirements (e.g. available hardware resource, timing constraints). The architecture of the PE is generic as well and strongly depends on the requisite types and amount of image processing operations. Due to optimized data access and a strong usage of parallel processing, real-time execution of complex image processing operations is possible. In addition to an executable simulation in SystemC our framework is able to generate the resulting architecture in synthesizable VHDL code, which can be implemented on arbitrary FPGA platforms as well as integrated circuit (IC) designs.