Note: These specifications have not been updated for Fall 2023. They will change somewhat but will remain the same in spirit. This warning will be removed once they have been updated.
You will have to produce four main deliverables for your Quarter 2 Project. Most deliverables also have an accompanying checkpoint, which serve to ensure that you’re on track to finish all components in time and to give you an opportunity to receive feedback from your mentor and methodology staff. Here, we outline what it is that you’ll need to produce, and in the coming weeks, we’ll provide additional guidance through the methodology lessons.
For inspiration, you may want to look at the deliverables produced by past capstone students – you can view all past capstone projects under the Archive tab of the course website. At the bottom of this page, we provide a few particular Examples of non-capstone data science projects you may also want to derive inspiration from.
Some general notes:
- Refer to the course homepage for each deliverable’s exact deadline. Roughly speaking, you’ll spend the first half of the quarter working on your project, and the second half preparing your deliverables and rehearsing your presentation. The two most pertinent dates are:
- Wednesday, March 15th, from 11AM-2:30PM, the in-person capstone showcase that all of your group members will have to present at. This will be attended by UCSD faculty and students, and HDSI industry partners. (Your project group will either be assigned to present in the 11AM-12:30PM block or the 1-2:30PM block; we won’t split groups into blocks until early February, so for now, make sure you’re free for the entire interval.)
- Thursday, March 9th at 11:59PM, the deadline by which you must submit to us a PDF of the poster you will present at the capstone showcase. We are coordinating a bulk order with campus printing services and they may not be able to print your poster in time if we receive it any later than that.
- All deliverables and checkpoints must be submitted as a group on Gradescope. As of now, the only component you will submit individually is a “contribution statement” towards the end of the quarter.
- Each deliverable will be graded using the same A/B/C/F rubric from Quarter 1. Different group members can end up with different grades for the same work, depending on the answers to the weekly participation questions and the contribution statement.
DSC 180B is a 4-unit course, which per campus rules means you should expect to spend 12 hours per week on it total. Remember that your goal in DSC 180B is to produce a capstone project you’ll be proud to show others and look back at in the future, not just to receive an A.
In addition to the four Checkpoints mentioned below, you’ll also need to complete a mandatory Week 3 TA Check-In.
Your report should introduce and contextualize the problem you worked on, describe the methods used, state the results, and draw conclusions about the results. It should follow best practices in scientific writing, as laid out in Lesson 3 of Quarter 1. Note that if you’re building some sort of product, your report may look less like a traditional scientific paper and more like a ROI analysis of your product, and that’s fine – just make sure that early on, you’re clear on what your mentor’s expectations of your report are.
A worthwhile Twitter thread to read.
First piece:— Karl Rohe (@karlrohe) November 16, 2022
Don't merely fit some fancy model.
Instead, make an argument about something real that is measured by data. Summarize your claim in a thesis statement, then support that statement with data analysis.
We the reader want to care about what you are doing. help us!
Checkpoint At the end of Week 5, you will submit a checkpoint of your report. This is an opportunity to get feedback from your domain mentor. For your checkpoint, submit a PDF containing a draft of your report, with:
- Headings for every section you plan to write (e.g. “Introduction”, “Methods”, “Results”, “Conclusion”, “Appendix”, “Contributions”). This will make it easy to fill in the remaining sections later.
- A complete title, abstract, and introduction. You can reuse most of this from earlier work – say, your proposal or Quarter 1 Project report.
- A mostly complete methods section – you should know what your methods are at this point!
- An appendix section at the end that includes a copy of your latest project proposal for reference.
- A contributions section at the end that details the tasks that each group member was responsible for so far.
Note that the figures in your report – both now and in your final submission – should not be created from screenshots:
- Tables should be imported from CSVs.
- Plots should be saved programmatically (e.g. using
plt.savefig) and imported (or converted from a notebook).
When submitting your report, also edit this sheet to include your project’s title so that we can use it for marketing purposes if needed.
Checkpoint: Week 5, graded by mentor
Final: Week 10, graded by mentor
Your code artifact is a public GitHub repository that contains all code for building/reproducing your project. It should follow best practices for data science software development, as discussed in Lessons 2, 5, and 6 from Quarter 1, to the extent possible. That is, it should be well-documented, buildable, and containerized – don’t let this be you!
Your public GitHub repo should be updated incrementally and regularly to reflect your most recent work. Both at the checkpoint and at the end of the quarter, the commit history for your repo should show that each group member pushed code to the repo in regular intervals, not that all code was added right before the deadline – that’ll raise red flags 🚩. In Lesson 2 of Quarter 2, we’ll review the basics of collaboration with Git.
Checkpoint At the end of Week 5, you will submit a checkpoint of your code artifact. This checkpoint will verify that your code:
- Is well documented and build instructions are found in
- Is structured with best practices in your domain (and in Quarter 1 methodology, where applicable).
- Contains a test target that tests the pipeline on test data, where applicable.
- Runs in a Docker image on DSMLP.
You will not submit your repository itself. Rather, you will submit an “artifacts repo” in which there is a file that contains links to your project code, report, etc. See this post on Ed for more details.
Then, on DSMLP, methodology staff will be able to run your project on test data by:
- Cloning your GitHub repository.
- Launching a container with your Docker image.
python run.py test(or other build instructions specified in
Checkpoint: Week 5, graded by methodology staff
Final: Week 10, graded by methodology staff
Refer to Quarter 2 Lesson 5 for tips and guidelines for websites.
You will create a static website that serves as a public-facing version of your report. When creating your website, you may certainly borrow material from your report, but you should reduce the amount of material and make stylistic changes to make it appropriate for a general audience. After all, once you graduate, this website will be how most people learn about your project.
Unless you ask the methodology instructor for permission otherwise, your website should be hosted on GitHub Pages. (To get a sense of the types of websites students created in the past, you may want to refer to the Archive.)
In some special cases, your group may be working on developing a “product”, like a dashboard, an application, or some sort of service. If that’s the case, you will create and submit both:
- Your actual product.
- A separate, static website that explains your project, according to the guidelines above. This is the way most people will acquaint themselves with your project. You should include a prominent link to your interactive webpage from your static page.
Primary vs. Secondary Deliverables
If your project is a traditional methods or analysis project, your primary deliverable is your report and your secondary deliverable is your website.
If your project is building a product (e.g. an application or dashboard), your primary deliverable is your product itself and your secondary deliverable is your report, though as mentioned above, you must still produce a static webpage, and it is that webpage that you should submit a checkpoint for.
Checkpoint At the end of Week 7, you will submit a checkpoint of your static website. All that’s required for the checkpoint is that you set up the skeleton/outline of your project webpage. It should contain titles and headers, but doesn’t need to contain content – all we need to see is that you put thought into how your webpage is organized.
To submit the checkpoint, you’ll submit a link to the repository containing the code to your website (not the link to your website itself). The
README.md in that code repository should contain a link to the website itself, which we can use to visit your page.
Checkpoint: Week 7, graded by methodology staff
Final: Week 10, graded by methodology staff
Refer to Quarter 2 Lesson 4 for tips and guidelines for websites.
Finally, you will present your project at the in-person capstone showcase. To do so, you will prepare a physical poster that contains a graphical summary of your work, that you can use as a springboard for conversations about your work with attendees. These attendees will come from a variety of backgrounds, from data science first-years to research faculty and industry mentors.
From the UC Davis Undergraduate Research Center:
At a poster session, your ultimate goal is to share the story of your work with as many people as possible. This will give you the opportunity to network with people that may be future advisers, employers, or collaborators and you can receive important feedback on your work. To bring people in, your poster should communicate the topic quickly and include visual elements such as pictures, graphs, maps and diagrams as well as text. At its core, an effective poster is centered on a concise and powerful story. With the help of visuals, the presenter can share the story of the work in just five minutes.
In Lesson 4 of Quarter 2, we will cover a few more details about how to create academic posters. However, for posters in particular, you should rely on your mentor for guidance, as they’ve almost certainly created posters before.
Note that we will print your poster for you (and pay for the costs), as long as you submit your poster PDF by the deadline (March 9th).
From Quarter 2 Lesson 4:
On March 9th, you will submit your poster to us as a PDF of size 48” x 36”. Make sure that all text and figures on your poster are large and legible – to do this, ensure that your figures are exported at a resolution of 300 DPI (dots per inch). See here for some guidelines.
Checkpoint There are two separate checkpoints for the poster presentation:
- At the end of Week 6, you will submit a draft of your poster as a PDF. Your mentor will provide feedback on this draft. More details to come.
- In Week 8, you will make an appointment with your TA to practice giving presentations about your project using your poster. Signups will be released in Week 7.
At the capstone showcase itself, your TA will come to your table and grade your final presentation, while your mentor will grade your final poster. See this post on Ed for the rubric TAs will use to grade your oral presentation at the showcase.
Poster Checkpoint: Week 6, graded by mentor
Oral Presentation Checkpoint: Week 8, graded by methodology staff
Final: Week 10, graded by mentor (poster) and methodology staff (oral presentation)
Below, we provide examples of non-capstone data science projects. Pay close attention to how they’ve chosen to separate their website, report, and code.
|FACT Diagnostic: How to Better Understand Trade-offs Involving Group Fairness||Website explains the paper’s main points|
Paper contains the main results and proofs of correctness
Code reproduces the project and provides a usable tool
|A New Model To Scale Up Forest Restoration From Sites To Landscapes||Website intended for general audience (with an interactive visualization)|
Paper contains scientific investigation
Code reproduces project
|recount2: A multi-experiment resource of analysis-ready RNA-seq gene and exon count datasets||Website makes datasets (output of project) available|
Paper justifies and demonstrates the project
Code contains source code for the tool
|kallisto||Website introduces and contextualizes the software|
Paper describes usefulness and correctness of method
Code contains source code for the tool
|VoytekLab, a lab with multiple projects||Website contains introduction to the problems approached by the lab|
Papers contain the details of the results
Code lists source for both tools/libraries and for reproducing papers
|LightGBM||Website briefly introduces project and contributors|
Paper describes correctness of algorithm
Code contains library for algorithm usage