 |
COMP 422/534 |
Parallel Computing |
Spring 2020 |
Assignment 1 |
Now due: 5:00pm
Monday 17 Feburary (postponed from Friday February 14 due
to spring recess) |
Please review the
policies on misconduct and late work in the course syllabus
before you begin the
assignment. |
|
Parallel Exploratory Search using Cilk Plus
Othello (also known as Reversi) is a strategy game played on an 8 x 8 gameboard.
The rules to the game and a brief explanation of strategy issues
are available on the Wikipedia page for
Reversi.
There is an
online version of the game that you can play
to familiarize yourself with the game.
You will use NOTS (nots.rice.edu) for your assignment. Additional
instructions for using NOTS and Cilk Plus on NOTS are included later
on this page.
Distribution and collection of assignments will be managed using GitHub
Classroom. You can accept the assignment at the following URL https://classroom.github.com/a/LthkIvXs. When
you accept the assignment, it will provide you with a clone of a
GitHub repository that contains a copy of a sequential program
that enables two human players to play Othello. A copy of that program
can also be found here. You may use this program as you
see fit to get a jump start on your assignment.
You may include any or all of the provided code in your submitted
program.
Feel free to use
the code directly as the basis for your parallel solution.
Some of you may be uncomfortable with the board representation
that used in the sample code which represents one color's disks in
each of the 64 positions of the board using bits of a 64-bit integer.
If so, you may choose to
rewrite the code using a 64-element array of characters as the board
representation. While this is less compact, it is equally acceptable.
If you would find it more intuitive to develop your own solution from scratch
rather than building upon the code provided, that is fine as
well, but not necessary.
Write a shared-memory parallel program in Cilk Plus that enables
the computer to play the
game. Implement the computer player as a parallel function that plays
Othello/Reversi by searching n moves ahead to select the "best
board position" for its move. For example, searching 1 move ahead for
Player 1 means selecting best legal move for Player 1 based only on
comparing the board states that would result from any of the possible
legal moves for Player 1. Searching 2
moves ahead for Player 1 means selecting the move that would result in the best
board position after Player 1's move followed by Player 2's best
move. This process of considering alternating
moves generalizes naturally to consider lookaheads of n moves.
Note that if one player cannot move, his opponent can move again if
any legal moves remain; your search should account for this accordingly.
You should use either the
minimax or
negamax algorithm to
choose the best move.
Note: Constructing a sophisticated board evaluator to compute
the best strategic move is beyond the scope of the assignment. A simple
board evaluator that computes the best move by maximizing the difference between the number of your disks and the number of the opponents disks on the board
will suffice.
Unless you specifically design your program to be deterministic,
the sequence of moves the computer player performs will be
unpredictable. You are required to use a Cilk Plus reducer to enforce determinism in
your program by using it to deterministically select a move
from the set of one or more moves with the best score.
The Intel
Cilk++ SDK Programmer's Guide describes reducers and how to work with
them.
Examples in directory /projects/comp422/cilkplus-features-tutorial/reducers
and the comments in the reducer include files in
/opt/apps/software/Core/icc/2018.2.199-GCC-6.4.0/compilers_and_libraries_2018.2.199/linux/compiler/include/cilk
on NOTS should be helpful in understanding how to use reducers
or write a custom reducer. You can use a provided reducer or write
a custom reducer.
If you want to write a custom reducer, you can
parameterize a cilk::reducer_max template.
Input specification
- The program should prompt for the user to enter an 'h' or 'c' to
specify if player 1 is a human or computer player.
- If player 1 is a computer player, it should prompt for an
integer between 1 and 60 that specifies its search depth.
- The program should prompt for the user to enter an 'h' or 'c' to
specify if player 2 is a human or computer player.
- If player 2 is a computer player, it should prompt for an
integer between 1 and 60 that specifies the search depth (the number
of moves ahead the computer should explore when chosing its move).
The sample input file given here
specifies that the computer will play both sides of the game and
that each player will search seven moves ahead to select where to place
its disk.
Output specification
Given an input file as specified above for two computer players,
the program should run to completion. When the program begins, it
should
print the initial game board configuration. After each move, the
program should print the 'row,column' position of the move,
specify which disks were flipped,
indicate how many disks were flipped, and reprint the game board. At the
end of the game, the program should print out the final game board, the total
number of disks for each player, and announce the winner.
Experiments
-
Your submitted
program should be free of data races.
Cilk Plus's cilkscreen tool uses binary rewriting to instrument
your executable to check itself for data races as it runs.
Running your program with cilkscreen at the front of the command
line will check an execution for data races.
If cilkscreen reports races, make sure that you
compile your program with the -g
flag and run it again. (Executables compiled with -g have
more detailed race reports, which will help you identify the
references involved in the data races.)
- Compile your program using -O2 optimization. All of the execution
measurements described below should be performed with this version of
your executable.
-
Cilk Plus's cilkview tool uses binary rewriting to instrument
your program to profile its parallelism.
cilkview will report the total amount of work in your program, the critical path length, the average parallelism, as well other measures such as the total number of stack frames, spawns, and syncs.
The input file provided above will
direct your program to have the computer play for each player with a
lookahead depth of seven.
Construct variants of the provided input to have each computer
player use matching lookahead depths ranging from 1-6. For each lookahead
depth 1-7, use cilkview to profile your program and record
your measurements. Include the measurements results
from cilkview in your report in a table.
-
COMP 534 Only: Use the
HPCToolkit performance tools to determine
where your program is spending the majority of its time. As described
at the bottom of the assignment, grading of programs
submitted by COMP 534 students will place more weight on scalability
and performance than programs submitted by COMP 422 students.
HPCToolkit can help you identify problematic code in your program that
consumes unreasonable time or scales poorly. Directions for using
HPCToolkit can be found on the bottom of this page.
-
Run the program with the input file provided
above (using a lookahead depth of seven) on each of 1-16 cores.
To control the number of cores used by your program use
time CILK_NWORKERS=n othello < input, where n is the number of
cores you want. Record the real, user, and system time for the
run on each number of cores.
-
Write a report that describes how your program works. This report
should not include your program, though it may include one or
more figures
containing pseudo-code that sketches key elements of the
parallelization strategy. Explain how your program exploits
parallelism. Explain why you chose to exploit parallelism in this
way. Were there any opportunities for parallelism that you chose not
to exploit? If so, why? Explain how the parallel work is synchronized.
Include a table containing information from the output of
cilkview for lookahead depths 1-7 in
your report.
Discuss and explain the differences you observe in the profiles with
different lookahead depths.
Prepare a graph of the efficiency of your parallelization comparing
the real times measured on 1-16 cores. Plot a point for each of the
executions. The x axis should show the number
of cores. The Y axis should show your measured parallel
efficiency for the execution.
Parallel efficiency is computed as
S/(p * T(p)), where S represents the
real time of a sequential execution of your program,
p is the number of processors and T(p) is the real
time of the execution on p processors.
Discuss your efficiency findings and the quality of the parallelization.
Note:
Don't make the
mistake of using the execution time of a one thread version of your
parallel implementation as the time of a sequential execution.
The
execution time of the one-thread parallel implementation is typically
larger than that of the original sequential implementation. When you
compute parallel efficiency, always use the performance of a
sequential code as a baseline; otherwise, you will
overestimate the value of your parallelization.
The sequential execution time of your Cilk Plus program is not
the same as the execution time of a Cilk Plus program with one
worker. To provide a baseline for your parallel efficiency
measurements, you can
create a serialization of your program that will execute
sequentially
without the overhead of the Cilk Plus runtime by compiling it by
adding the -cilk-serialize option to your command line for
icpc. When you compile your serialized code, make sure to use
-O2 optimization.
Measure the wall clock time for an execution of your serialization
on the input with
lookahead depth seven using the Unix "time" command.
Submitting your Assignment
Your assignment should be submitted in two parts.
-
One or more source files containing your Cilk Plus program.
- A report about your program in either Word or PDF format (PDF
preferred). Guidelines for your report: 3 pages won't have enough
detail; more than 10 pages will say too much. Plan for a length in
between. Make sure that you address all of the requirements specified
in the "Experiments" section.
You will submit your assignment by commiting your source files,
Makefile, and report to your GitHub Classroom repository.
If you have trouble using GitHub and come up against the deadline for
submitting your assignment,
email the instructor a tar file containing your source code and your
report.
COMP 422 Grading Criteria
-
10% Program correctness. Program must correctly implement Othello/Reversi rules. Pay careful
attention to situations in which one player has no legal move.
-
10% Reducer usage. Use a Cilk Plus reducer to ensure deterministic move
selection.
-
10% No data races. Use cilkscreen to prove the absence of races.
-
30% Program clarity, elegance and parallelization. The program should be
well-documented internally with comments.
Your program should have no data races.
-
10% Program scalability and performance.
-
30% Report. Your grade on the report will consider the quality of the
writing (grammar, sentence structure), whether all of
the required elements are included, and the clarity of your explanations.
Make sure that you have performed all of the experiments
requested above and provided the information requested about them in
your report. If you haven't, you will lose points!
COMP 534 Grading Criteria
-
10% Program correctness. Program must correctly implement Othello/Reversi rules. Pay careful
attention to situations in which one player has no legal move.
-
10% Reducer usage. Use a Cilk Plus reducer to ensure deterministic move
selection.
-
10% No data races. Use cilkscreen to prove the absence of races.
-
30% Program clarity, elegance and parallelization. The program should be
well-documented internally with comments.
Your program should have no data races.
-
20% Program scalability and performance.
-
20% Report. Your grade on the report includes the quality of the
writing (grammar, sentence structure), whether all of
the required elements are included, and the clarity of your explanations.
Make sure that you have performed all of the experiments
requested above and provided the information requested about them in
your report. If you haven't, you will lose points!
Using NOTS
You can log into nots.rice.edu on campus using
ssh yournetid@nots.rice.edu
Log into NOTS using your Rice University netid password.
NOTS is not directly accessible from off-campus locations. You will
need to attch to the Rice University network using VPN and then
you will be able to ssh to NOTS.
Setting up Cilk Plus on NOTS
To load the suite of Cilk Plus tools, first set your module path
Next, load the module that contains paths to the Cilk Plus compiler and tools.
The Cilk Plus compiler is icpc.
Data Race Detection Tools
To check your program for data races, you will need to use cilkscreen.
To check your program for data races with cilkscreen, simply
launch your program with cilkscreen:
cilkscreen ./othello < input
Using SLURM on NOTS
While you can develop your program on one of the login nodes on
NOTS, you should run your experiments on one of the compute nodes
using the SLURM job manager. You can obtain a private node to run your
job on by requesting a node for an interactive session using
srun --pty --export=ALL --nodes=1 --ntasks-per-node=1
--cpus-per-task=16 --time=00:30:00 bash
A copy of this command is included in the file interactive
among the provided files in your GitHub Classroom repository.
I strongly recommend developing and
testing your code (with small lookahead depth) on the login node to
avoid the surprise of having your editor be killed as your interactive
session on a compute node expires. See the documentation on
Using NOTS under the tab "Submitting Jobs" for more information
about using SLURM.
Your sample repository includes a SLURM batch script
submit.sbatch that you can launch with
SLURM's sbatch utility.
The sample batch script runs the othello program with each of 1..16
cores. You can run edit the final few lines to run othello just once
or run it multiple times in a sequence of experiments.
Using HPCToolkit
To use HPCToolkit with Cilk Plus, load the HPCToolkit module.
Information about how to use HPCToolkit can be found in its
User's
Manual. There is also a tutorial that you can watch on Youtube.
31 January 2020 - Initial revision.