1️⃣ Quarter 1 Project
The specifications of the Quarter 1 Project have been finalized.
- The checkpoint is due on Sunday, October 30th at 11:59PM (at the end of Week 5). Checkpoint deliverables are marked with the Checkpoint label.
- The whole project is due on Monday, December 5th at 11:59PM (at the end of Week 10).
You will submit both the code and reports you generate for both the checkpoint and final submission to Gradescope. You should complete these tasks individually, unless directed otherwise by your mentor. If your mentor assigned you to groups:
- Each group member should submit a copy of the work to Gradescope individually.
- As part of their report, each group member should include a paragraph stating what their role has been in the group project, both in terms of writing code and writing the report.
Table of contents
Introduction
This quarter, under the guidance of your mentor, you’re working through a central problem in your domain that introduces you to the area. Depending on your domain, your Quarter 1 Project will consist of either:
- A replication of a known result in a published paper.
- An investigative paper, with a hands-on, data-driven inquiry into a relevant question in the area.
By the end of the quarter, you will summarize the results of your work in a well-organized report that follows the norms of research papers in your domain.
Your work will go beyond mere paraprhasing of the existing work that you study:
- Your domain may be taking a broader or narrower view than the work of any one paper. Your report will reflect the work you are doing, not the broader research that may be found in the work being studied.
- You will likely be using data that differs from that used in the work you are studying.
- Your conclusions may be different than the work you are studying. You should present the conclusions drawn from your data/methods and address any discrepancies with the work being studied.
- The code supporting your work will conform to best practices in data science software development.
Deliverables
The Quarter 1 Project is due at the end of Week 10. There are two deliverables:
- A PDF submission of the report. This is graded by your domain mentor.
- The code that generates the contents of your report. This is graded by methodology course staff.
Checkpoint At the end of Week 5 (on October 30th), you’ll have to submit a checkpoint. Your checkpoint submission will include:
- The title, abstract, and introduction sections of your report. See Methodology Lesson 3 for details on all three of these components. Note that your introduction itself will have three sections – an introductory paragraph, a literature review and discussion of prior work, and a description of relevant data.
- The code you’ve produced so far.
Intuitively, your checkpoint should contain a summary of the problem you are working on this quarter, with some context on why it’s interesting, along with anything your mentor wants you to include.
Report
The objectives for your report are described below.
- By carefully explaining work in your area, you’ll understand the area you’ll be working in very well, down to the details. In particular, this process will allow you to understand how to draw a rigorous conclusion in your domain.
- By emulating finished work in your domain, you’ll learn how to effectively communicate technical ideas in your domain.
- By using and critiquing existing methodologies and conclusions, you’ll devise possible improvements to these methodologies that you can use in your Quarter 2 Project.
- By working on the literature review, contained in the introduction section of your report, you’ll create a foundation for your Quarter 2 Project.
- By working on the report, you’ll gain practice with using tools (such as LaTeX, Markdown, and Jupyter Notebooks) for producing such reports.
- By writing code alongside your report, you’ll develop a library of useful code that can be used in your Quarter 2 Project.
Code
The code in your Quarter 1 Project should conform to the best practices covered in methodology lectures. It should be:
- Runnable through a build script and targets, in any feasible environment, e.g. on DSMLP using a Docker image.
- If your project is dependent on a specific platform, please inquire on EdStem, but project dependencies should be handled using best practices for that environment.
- Thoughtfully organized, using reusable library code.
- Tested software, implementing a
test
target. This either leverages test data you’ve created (for data-driven projects) or unit and integration tests (for projects that create usable software).
The Q1 code will be run similarly to Methodology Assignment 5.
Code Submission Specifics
Note: The content below will likely not make sense until we cover Methodology Lectures 4 through 8. You can safely ignore this until then; it is not needed for your checkpoint submission.
Turn your project repository into Gradescope. The root directory of your repository should include a submission.json
file with two keys:
"dockerhub-id"
, which maps to your Docker Hub repository. (The format should be<user>/<image>:<tag>
. The default<tag>
islatest
if you don’t supply it)."build-script"
, which maps to the command that needs to be run to build and run your project.
{
"dockerhub-id": "<user>/<image>:<tag>",
"build-script": "<command>"
}
Your submission will be run on DSMLP.
- A container will be started using the command
launch.sh -i <user>/<image>:<tag>
using the Docker image information insubmission.json
. - Once in the container, the
"build-script"
command will be run.
For example, a submission.json
file might look like:
{
"dockerhub-id": "amfraenkel/fair-policing-project",
"build-script": "python run.py"
}
And the following commands would be run on DSMLP:
launch.sh -i amfraenkel/fair-policing-project
cp <project> .
cd <project>
python run.py test
where <project>
is the code that’s been turned into Gradescope.
Note: This assignment should readily apply to all domains.
- Data-centered domains start with collected data, that you will model your test data after.
- Methods domains are building algorithms that take data as input. You should choose a data type that your algorithm may input and use that to model your test data.