Recent years have seen a proliferation of specialized ML accelerators—proposed in both academia (e.g., Gemmini, FEATHER) and industry (e.g., Google TPU, Intel AMX)—that depart significantly from ...
The goal of this tutorial is to show a simple example on how to generate ptx from the llvm ir and how to write the IR itself to access cuda features. For the sake of demonstration a language frontend ...
Ring buffers are incredibly useful data structures that allow for data to be written and read continuously without having to worry about where the data is being written to or read from. Although they ...