MicroZed Chronicles: ALVEO

One of the great things about programmable logic is it’s ability to free us from the sequential world which limits software performance.

Adam Taylor
5 years ago

One of the great things about programmable logic is its ability to free us from the sequential world that limits software performance.

Recent years have seen the introduction and adoption of tools such as High-Level Synthesis (HLS), SDAccel and SDSoC. These tools enable us to use higher level languages such as C, C++ and OpenCL to develop programmable logic-based solutions. Not only freeing us from the sequential SW world, but also opening up programmable logic to new users outside those of the traditional logic designer.

The Xilinx ALVEO announced at XDF last year card is designed to allow data center applications to benefit from programmable logic acceleration by using the SDAccel toolchain.

From the start, ALVEO has been developed for the acceleration of data center applications either in the cloud (e.g. Nimbix Cloud) or deployed on premises locally. ALVEO and its supporting frameworks have been designed to enable acceleration of applications such as machine learning, video processing and data analytics.

There are a range of ALVEO cards that are available from the U50 to the U200 and U280. Each one offers different logic resources, memory capacity and memory types, e.g. DDR and HBM.

Internally, each ALVEO card contains an FPGA, DDR / HBM memory and connects to the host using a PCIe Gen3 x16.

It is through this PCIe interface that we will deploy our SDAccel application, but how do we do that?

The FPGA on the ALVEO card contains a dynamically reconfigurable region, which is configured using partial reconfiguration to implement the kernel. This dynamic region then connects to the PCIe end point and other interface, such as the DDR / HBM and QSFP interfaces using AXI interconnects that are contained within the non-dynamic region of the FPGA.

Developing applications for ALVEO and its dynamic region uses the OpenCL framework.

OpenCL is meant to support heterogeneous platforms that consist of a CPU, (the host which is typically x86 based) and several acceleration kernels (sometimes called compute devices), which can be either GPU, DSP, FPGA or specialist hardware. OpenCl allows the development of portable applications that can be deployed across a range of different kernels. In the OpenCL flow, the ALVEO card is an OpenCL Kernel.

The host program is often developed in C or C++ with relevant support from OpenCL APIs to manage the kernels.

While the kernel application is developed using the OpenCL C language, which is based on C99 and C++11; however, there are some limitations to support portability across different kernel types. This includes the removal of support for stdio.h and stdlib.h, while scalar types e.g. char, float, etc .are defined at fixed sizes again to increase portability.

As such, OpenCL introduces both platform and execution models:

Platform Model

  • Ability to define the representation of any platform.
  • Contains a single host and several OpenCL kernel (compute devices).

Execution Model

  • Host program — manages the application using OpenCL APIs.
  • Kernels — run on OpenCL compute devices and accelerate functions.

When it comes to compiling the application, the host application will use a compiler such as G++ or GCC. Whereas the compiler for the OpenCL kernel is the vendor specific.

In the Xilinx development flow using SDAccel, the host application is compiled by XCPP and the Kernel by XOCC.

Transferring information between the host and kernels uses a memory with five different regions:

  • Host memory — accessible only to the host.
  • Global memory — accessible to both the host and the kernel, this is the main medium of transferring data between host and kernel.
  • Constant global memory — accessible to the host and kernel. However only the host as read write access, for kernels this region is read only.
  • Local memory — used by the kernel for computation and storage, not accessible to the host directly.
  • Private memory — used by tasks within a kernel, other tasks cannot access the memory area. Again there is no direct host access.

The OpenCL framework therefore lets us create the applications and accelerate them into our ALVEO card from a host.

I have just received a ALVEO U200 card, so over the next few weeks I will be building up a rack for it and setting it to work. I will, of course, share my journey!

See My FPGA / SoC Projects: Adam Taylor on Hackster.io

Get the Code: ATaylorCEngFIET (Adam Taylor)

Access the MicroZed Chronicles Archives with over 300 articles on the FPGA / Zynq / Zynq MpSoC updated weekly at MicroZed Chronicles.

Adam Taylor
Adam Taylor is an expert in design and development of embedded systems and FPGA’s for several end applications (Space, Defense, Automotive)
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles