COMP 540 Project
Overview
The pedagogical goal of the project is to be a miniature version of
a ``real'' research project. You will do the project in a group of size two.
- You will pick an open competition on
kaggle.com and explain in your research proposal why you chose that
specific competition.
- After your project proposal is approved, you will design and implement a solution to the
problem you have chosen. You will measure the performance of your solution
on the kaggle.com leaderboard.
- You will update me on the status of
your project on a project blog on blogs.rice.edu on a bi-weekly basis.
- You will
write up the results of your project in a draft final report, which,
after review, you will turn into a final report.
- Finally, you will
present a poster on your project for the last class session on April 24.
I emphasize that this is a research project, and not a
programming project. Although the implementation of your solution
is an essential component, it is only one aspect of the project, next
to other equally important components, such as technical descriptions of the
algorithms created/used, performance evaluation of these algorithms on data, and the
final report and presentation. The learning curve on some of these projects
could be steep, and there is significant danger of following
what turn out to be a dead-end alleys. To alleviate
these problems, I will assist you throughout the term to make sure
that you continue to make progress. You should use this resource (i.e.,
me) wisely, remembering however that you are to do the
work. In particular, keeping your project blog regularly updated will be very important.
Requirements and Timetable
- [January 21] Project
groups formed (groups of at most two).
- [January 24] Project
proposal on your blogs.rice.edu page (send me the URL!).
- [January 28] Project approval/revision notices sent.
- [February 7] Project progress
report 1 on your project web page.
- [February 21] Project progress
report 2 on your project web page.
- [March 14] Project progress
report 3 on your project web page.
- [March 28] Project progress
report 4 on your project web page.
- [April 11] Project progress
report 5 on your project web page.
- [April 18] Draft of final
project report on your project web page.
- [April 24]
Final project poster presentations.
- [April 25] Final project report on your project web page.
Note that the draft final report is due a week before the
end of the semester! The format of the project proposal and final project report are described below.
Reports
Project Proposal
The project proposal should be no longer
than two typeset pages. It should explain why you chose the specific Kaggle competition, provide a broad overview of how you intend to approach the problem, and a list of resources you intend to use/need for solving the problem. The feedback you will receive from me includes formal approval to proceed with the problem and suggestions for methods/resources, or a request to submit a revised proposal on a different Kaggle problem.
Intermediate Progress Reports
The main reason for the intermediate progress report is to ensure
that you work steadily on the project. This report can be informal,
and must be made available on blogs.rice.edu site. It should
concisely state what you have already achieved, what you plan to
accomplish, what problems you have encountered, etc. I will use these
reports to help guide your work on the problem.
Final Poster
We will
provide easels and black foam boards on which to pin your poster. The
boards measure 48"x48". We suggest that you make your poster approx
36" x 46", with a 1" white border (leaving 34" x 44" for
content). Start on poster making early, since last-minute adjustments
to shape, size, and colors are not fun in Powerpoint. Check with the
operational staff at Mudd to determine the mechanics of printing your
36" wide poster. For design and printing tips, refer to the Resources
section of the Cain Project web page, which also contains Tracy Volz's
poster preparation guide, and poster checklist.
Final Report
Your final report should be no longer than 15 typeset pages. I
suggest the following format, from which, of course, you may deviate
to suit your particular needs.
- [Abstract] A brief summary of the problem, key idea behind solution, and final results.
- [Introduction] A clear
statement of the problem to be solved. Background and
related work on the problem. Your approach placed in the context
of related work.
- [Methods] Technical description of algorithms used and the methodology used for training and validating models. Be specific: state
what your performance measures are, and how you selected models for submission to kaggle.com
- [Results] Concise descriptions and visualizations of models tested and evaluated.
Highlight interesting aspects of your results. Explain
discrepancies between your expectations of a method's performance and its actual performance.
- [Discussion and conclusions] What are the lessons learned from your project? What are avenues for future work?
Writing Proposals and Reports
I expect professionally done documents, without spelling errors,
with appropriate references, etc. Please use LaTeX to typeset your
papers and submit a pdf version of the final report on Owlspace.
If you have never written a technical
document before, you should seriously consider reading one of the
standard references on the subject.
I also recommend, if only for its conciseness, ``The
Elements of Style'', by Strunk and White, Macmillan Publishing Co. The
textbook and recommended reading assigned for the class might also serve as examples of fine technical writing.
In Hindsight
Here is a summary of the most common weaknesses encountered in Comp 540 projects and project reports from prior years.
- Major problems often result from starting too
late. It is important you start early and keep working on the project consistently, updating your
blog site as you go along.
- Not making/updating a timeline for the project.
Start your preliminary experiments with the data early on and
devise a timeline for your project, say by January 31.
The preliminary experiments will help
you identify major hurdles on the way to
completion of the project, and give you a good assessment
of their difficulty and the time it will take you to
conquer them. Past experience shows that the degree to
which a group has planned out their work for the semester
is good predictor of their overall performance on the
project.
- People always underestimate the amount of time it takes
to write up the results. You should realize that
writing up includes not only the physical writing,
but also and, more importantly, digesting the results.
Often, in writing, you will realize that you should have done the experiments a different way.
The act of writing clarifies your thinking.
Therefore, start writing early.
Some groups find it useful to make an early draft of their project report,
and update it as they go along, leaving them with a good first
cut at the final report when they are done with the bulk
of the analysis and experimentation. This strategy will give you time to
look at such things before it is too late.
- The report is at much too low a level. You should
abstract from your implementation those things that you
think the reader will find interesting. Again, this is a
research project, and not a programming project. You
should think of your report as an article you send in to
be published, not as a description of a programming project.
For instance, names of functions and
data structures in your code are of no use. The underlying algorithm is,
as are its results.
- The results are unintelligible. Before you make performance
measurements, explain what your performance criteria are and how you
measure them (on training data, on a set-aside test set, etc.).
. Results need to be presented in a scientific
way. You need to describe exactly how you
performed the experiments, such that a knowledgeable person
could reproduce them.
Moreover, you would like to give the reader some
idea of why the results are the way they
are.
- Vacuous prose (especially in the abstract). Examples are:
Our performance study shows that reasonable accuracy
can be achieved, Also discussed are insights into
which classes of problems would benefit from our algorithm,
Several follow-up ideas are
presented, We also include descriptions of
optimizations made and their effects. Rather than this
kind of vacuous prose, say what accuracies were achieved,
what insights were gathered, what the follow-up
ideas are, etc. If you cannot present those in a concise
abstract, it is probably the case that you have not been
able to digest the problem to the point of discerning the
important results of your work.
- Statements of the form We believe that .., followed
by your favorite preconception, are inappropriate
in a paper. They are appropriate for a proposal, but in a
paper they should either be substantiated by evidence or
discarded.
- Avoid cuteness in writing.
- The abstract should stand by itself. It should not rely
on or refer to any other document, including the rest of
the paper.
The related work subsection is meant to relate your work to
that of others, not to give a mere summary of other work.
Acknowledgement:
This document is adapted from a similar document written by Willy Zwanepoel for Comp
520.