Draft vs. Production Output
This post is describes a practical example of applying the separation principle for reproducible research. Specifically, it describes why and how I separate draft versus production versions of my output. The post assumes you have already installed some specific software for reproducible research.
I develop a research compendium for each of my statistical projects. My usual approach is to create a Git repository for a custom R package, part of which is a Quarto project. There are often multiple Quarto scripts that use R code chunks to do various tasks such as importing, cleaning, managing, and analyzing data, then reporting the results.
Why Separate Draft vs. Production Output?
Developing my scripts is an iterative process. I render drafts many times per day as I add or edit code, but I only need to do occasional production runs. The draft outputs are disposable intermediate artifacts. If needed, I can recover interim draft output by returning to a specific Git commit and re-running the code as it existed at that point in time.
Production output files are another matter entirely. I find it useful to retain a version history of all rendered production output files distributed to clients, research partners, or other stakeholders. That way when someone wants to discuss an output with me, I can quickly find, open, and review it without having to re-run code.
My Requirements for Output Files
For most projects and scripts, I find it useful to impose a small set of requirements on output file names. They must:
- Show which script generated the output.
- Clarify draft versus production status.
- Distinguish between different production runs of the same script.
- Show what output format was used.
My personal convention is to assign filenames by combining three pieces: a stem, a suffix, and a file extension.
- The stem is a text string that matches the name of the script that produces the output (addressing requirement #1). For example, a script called
Import_Data.qmd
leads to a stem ofImport_Data
. - The suffix consists of either the string
_Draft
for draft outputs, or a date in_YYYY-MM-DD
format for production outputs. That addresses both requirements #2 and #3. - The file extension then shows what output format was used (e.g., HTML or PDF). That addresses requirement #4.
So, Import_Data_Draft.html
is a draft HTML output, while Import_Data_2025-01-16.html
and Import_Data_2025-02-16.html
are production HTML outputs run a month apart.
Date-stamping the production output file names makes sorting files to find the most current version of the results simple. We can also easily compare those to prior versions because we retain all production outputs as separate files.
Defaulting to Draft Output
Saving every draft output file I generate would clutter my folders with tons of interim files that would differ in only small ways. Therefore, I just overwrite my draft output file each time I render a script on its own. That reduces storage space consumed. Storage space may be very cheap, but still ought to be used sensibly to facilitate usability and finding things.
To demonstrate my approach, I created a file called Example1.qmd
that contains the following text.
The format:
key section of the YAML header controls what types of output will be produced. This one specifies both html:
and pdf:
output files will be written and uses a separate output-file:
key for each one to specify the default output file names. Rendering Example1.qmd
on its own from within the RStudio interface (clicking the Render button) will create both files using the default names. Figure 1 shows the process at a conceptual level.
Naturally, you can remove one of the output format keys from the YAML header to get just one output file in whichever format you prefer.
Here are links to the draft output files:
Obtaining Production Output
What remains then is to demonstrate how I get those date-stamped production outputs. For that, I create a production script (e.g., Production_Run.qmd
) that can render other scripts to automatically date-stamped, customized output file names as shown in Figure 2.
Below is the basic content of the Production_Run.qmd
script. You would need to replace the double curly braces {r}
with single curly braces {r}
to get this to run (the double braces disable execution of those chunks).
Here is a link to a production output file, namely Example1_2025-02-16.html