Hello everyone,
I am Kadir, a physics master’s student. In my master’s thesis, I am working on a heterogeneous computing system CPU + FPGAs(Connected to PCIe 4.0). I am using AMD Xilinx Alveo u280. I need to find the most efficient way to work with FPGAs to exploit the full potential of the device using High-Level Synthesis(HLS) with C++.
I am trying to accelerate a track reconstruction prototype algorithm. So far I familiarised myself with the tools and environment(Such as Vitis_Analyzer, Vivado etc.) a bit and mostly I am using terminal.
I am following the steps SW-EMU (Software Emulation) → HW-EMU(Hardware Emulation) → Running on Hardware. So far using the hardware emulation I managed to emulate when I send data and the kernel from HOST(CPU) to DEVICE(FPGAs) using and make calculations on the device then collect back on the host(CPU). To get a better performance I used the AXI interface and streaming data between functions(READ, COMPUTE, WRITE) inside the kernel and executing them in parallel.
What is the fastest and most efficient way to transfer the data for continuous dataflow between CPU and FPGA? Using Structs with multiple data points (such as struct position{float x, y, z;} makes performance worse or to get the best performance should we use only 1-dimensional arrays? What are the other things that need to be considered(Memory, Port-Bit Width, etc.)?