Cache coherency and memory consistency are of the most decisive and challenging issues in the design of shared-memory multi-core systems that influence both the correctness and performance of parallel programs. In this book, we identify and analyze the problem of designing a coherent/consistent memory subsystem in general and then focus on FPGA-based multi-core embedded systems containing general purpose CPU's and dedicated hardware accelerators. We narrow down the range of the problem by targeting only the stream-based applications and developing dedicated application-specific solutions. A flexible Windowed-FIFO communication pattern is proposed to be used by the parallel programs being run on the multi-core system. The software APIs for the FPGA platform are implemented and tested, a customized streaming cache memory is designed, implemented and tested based on the proposed communication pattern and in the end, example embedded systems are developed and tested on the FPGA platform to prove the correct functionality of the APIs, the cache memory and the coherent data communication between the cores.