Lab 7 — Coding with SIMD

Lab goals:


Pre-lab


Before lab, skim the following so you’re ready to code with intrinsics:

GitHub Repository for This Lab

To obtain your private repo for this lab, please point your browser to the starter code for the lab at:

https://classroom.github.com/a/07hNXtVv
Follow the same steps as for previous labs and assignments to to create your repository on GitHub and to then clone it onto CLEAR. The directory for your repository for this lab will be
coding-with-simd-name
where name is your GitHub userid.

Files provided: add.c, multiply.c, divide.c (each has FILL THIS), plot_figure.py, Makefile, three CSVs: add_results.csv, multiply_results.csv, divide_results.csv, and a reflection file: reflection.txt.


In-lab


Important: The Makefile uses -Wall -Wextra -Werror and will fail to build until you replace all FILL THIS lines in the C files. Fix the code first, then build.

What the provided programs do & what they output

Each program (add, multiply, divide):

CSV header (keep exactly once at the top of each CSV file):

Power,Scalar Time,SSE Time,AVX Time

CSV output line format (copy only the numeric line):

12,0.0000504990,0.0000317730,0.0000257870

Why you’re doing this: you will run across sizes (powers 0–25) to identify performance trends and understand when/why/how SSE/AVX are useful.

Run plan (do these in order)

  1. ADD
    Go through add.c and understand the code. Look at the SSE code lines:
    int i = 0;
    for (; i + 4 <= n; i += 4) {
        __m128 va = _mm_loadu_ps(a + i); // load 4 floats from a[i..i+3]
        __m128 vb = _mm_loadu_ps(b + i); // load 4 floats from b[i..i+3]
        __m128 vs = _mm_add_ps(va, vb);  // add them element-wise
        _mm_storeu_ps(r + i, vs);        // store the 4 results into r[i..i+3]
    }
    Update all the FILL THIS lines. Then,
    make add
    Then run for every power p = 0…25:
    ./add 0
    ./add 1
    ...
    ./add 25
    Each run should print the timing results and checksum outputs for all three implementations. An example for ./add 12 is provided below:
    Vector add with SIZE = 2^12 = 4096 elements
    Scalar add: 0.0000504990 s
    SSE    add: 0.0000317730 s
    AVX    add: 0.0000257870 s
    
    CSV Line: 12,0.0000504990,0.0000317730,0.0000257870
    
    Checksums (sum over results):
      scalar: 6289920.0000000000
      sse   : 6289920.0000000000
      avx   : 6289920.0000000000

    The checksum output is to help verify that all three ways of calculating the operations result in the same output.

    Performance variability note: shared servers can have noise (other users’ jobs, frequency scaling, cache effects). If feasible, run each power multiple times and record the minimum. If you paste multiple entries for the same power, ensure the last one in the file is the minimum (the plotting script uses the latest entry per power).

    Generate the figure:
    make add_plot   # produces add_figure.svg
  2. MULTIPLY
    Go through multiply.c, understand the code, and update all the FILL THIS lines Then,
    make multiply
    Run for every power p = 0…25 and append the numeric CSV line to multiply_results.csv. Use the same “minimum of multiple runs” guideline as above, ensuring the last line per power is your minimum.
    make multiply_plot   # produces multiply_figure.svg
  3. DIVIDE
    Go through divide.c, understand the code, and update all the FILL THIS lines. Then,
    make divide
    Run for every power p = 0…25 and append the numeric CSV line to divide_results.csv. Same variability/minimum guidance applies.
    make divide_plot     # produces divide_figure.svg

Reflection

Open reflection.txt (already provided) and write your responses directly next to each prompt (copied below) after Ans:.

  1. How does the performance of SIMD instructions compare for add, multiply, and divide?
  2. What sizes of arrays give the largest performance boost with SIMD instructions?
  3. Why do you think you see the trends that you see in the figures?
  4. When is it useful to use SSE and AVX instructions?
  5. What is your main takeaway?

Post-lab


What to submit

Due: Push code, CSVs, figures, and your reflection by 11:55 PM on Sunday (10/12).