Home

Awesome

<!-- TOC --> <!-- /TOC -->

Intro

I am demoing the UPMEM Python + C backend for Spiral here. I want Spiral to be a language suitable for future generations of computing devices, and since UPMEM commercialized the first PIM chip, it is the ideal target for a demo. These backends are easy to make. They can be done in 0.5-1 week's time, after which one can go from the level of programming in C to a level of programming in a highly expressive functional language. One of the goals of Spiral's design is to be efficient enough for writing GPU kernels in it, and it has been met. Spiral has the most efficient possible design for a functional language without sacrificing expressivity.

Spiral's intended niche is not any device or platform in particular, but the intersection of them. Spiral is peerlessly well-suited for heterogeneous computing architectures of the future, and trivializes the data transfer between different pieces of hardware in the system.

I hope to get support for this kind of work, so that I can demonstrate my claims on different classes of hardware. Also, I myself am interested in work in the embedded and AI accelerator space. I am not interested in working with low level languages like C, but it would be fun to program novel hardware in a high level functional language that despite the abstraction capabilities it has is as efficient as C. If you do significant work on hardware that has only low level languages for it, and would like to hugely improve your productivity by going to a higher level language, please get in touch with me.

Main Points Of The Example

Example

You can find the full code here.

open inv
open upmem
open upmem_loop
open utils_real
open utils

// Maps the input array inplace.
inl map_inp f = run fun input output =>
    global "#include <mram.h>"
    global "__mram_noinit uint8_t buffer[1024*1024*64];"
    inl block_size = 8
    // Creates the WRAM buffers as inverse arrays.
    inl buf_in = create block_size
    inl buf_out = create block_size
    inl len = length input
    forBy {from=0; nearTo=len; by=block_size} fun from =>
        inl nearTo = min len (from + block_size)
        // Reads the MRAM into the WRAM buffer.
        mram_read input buf_in {from nearTo}
        for {from=0; nearTo=nearTo - from} fun i => 
            set buf_out i (f (index buf_in i))
        mram_write buf_out output {from nearTo}
    0

// Maps the input array.
inl map f input = inl output = create (length input) in map_inp f input output . output

inl main () =
    global "import os"
    global "from io import StringIO"
    global "from sys import stdout"
    global "import struct"

    inl test_size = 16
    inl input = 
        zip (arange test_size)
        <| zip (arange test_size)
        <| arange test_size
    $"print(!input)"
    inl output = map (fun x, y, z => x+y+z,x*y*z) input
    $"print(!output)"
    ()

Output

Compiling the above with Ctrl+F1 using Spiral generates the main.py file that has a bunch Python as well a C kernels in strings. Running it prints the inputs as well as the outputs.

mrakgr@Lain:/mnt/e/PIM-Programming-In-Spiral-UPMEM-Demo$ . ~/upmem-sdk/upmem_env.sh 
Setting UPMEM_HOME to /home/mrakgr/upmem-sdk and updating PATH/LD_LIBRARY_PATH/PYTHONPATH
mrakgr@Lain:/mnt/e/PIM-Programming-In-Spiral-UPMEM-Demo$ cd /test8
mrakgr@Lain:/mnt/e/PIM-Programming-In-Spiral-UPMEM-Demo/test8$ python3 main.py
(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
      dtype=int32), array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
      dtype=int32), array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
      dtype=int32), 16)
(array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45],
      dtype=int32), array([   0,    1,    8,   27,   64,  125,  216,  343,  512,  729, 1000,
       1331, 1728, 2197, 2744, 3375], dtype=int32), 16)