CI/CD
Continuous Integration / continous delivery (CI/CD) is a basic tool in todays Data Science toolstack to automate processes and execute repetite tasks on schedule.
Running tasks in CI/CD should be fast and reliable, no matter the language, architecture or computing environment at hand. As CI/CD requires installing many packages over and over again, having binaries and optimized distributions helps to reduce runtime.
This page presents snippets for well and lesser known CI/CD engines to help getting started with the R package binaries of this project.
The presented snippets rely on external projects and libraries which get updated regurarly. There is no guarantee to work out-of-the box at all times. If you found an error, please open a pull request in the linked repo.
GitHub Actions
VM
GitHub Actions can be run directly on a VM or in a containered context. The available VM images can be found here. The only Linux distribution available is Ubuntu.
For VM workflows, r-lib/actions provides many examples for different use cases.
To use package binaries from this project, use the following config for r-lib/actions/setup-r@v2
:
- uses: r-lib/actions/setup-r@v2
with:
r-version: ${{ matrix.config.r }}
http-user-agent: ${{ matrix.config.http-user-agent }}
Ncpus: 2
cran: 'https://cran.devxy.io/amd64/noble/latest'
(Ncpus: 2
has been set to allow for parallel installations.)
Container
An alternative is to run Actions in a containerized context. This spins up a container of the selected image in the VM and allows running on other distributions than Ubuntu, e.g. on Alpine. To avoid having to install R first and configure a custom repository every time, specialized images are provided which already have everything in place:
jobs:
container:
runs-on: ubuntu-latest
container: devxygmbh:r-alpine
steps:
- run: |
R -q -e 'install.packages("pak", repos = sprintf("https://r-lib.github.io/p/pak/devel/%s/%s/%s", .Platform$pkgType, R.Version()$os, R.Version()$arch))'
R -q -e 'getOption("repos")' R -q -e 'options(Ncpus = 2); pak::local_install_dev_deps()'
When running in a containerized context, the predefined actions from r-lib/actions
cannot be used. Shell/R commands must be used directly.
Running containerized in Alpine should be substantially faster than running in a VM context.
(91 packages)
- Containerized: 1m 21s
- VM: 2m 11s
Both runs are based on the assumption that caching is not used. With caching enabled, the performance of both approaches should be roughly comparable.
Source: pat-s/workflow-compare
GitLab Runner
WIP