CI/CD

Continuous Integration / continous delivery (CI/CD) is a basic tool in todays Data Science toolstack to automate processes and execute repetite tasks on schedule.

Running tasks in CI/CD should be fast and reliable, no matter the language, architecture or computing environment at hand. As CI/CD requires installing many packages over and over again, having binaries and optimized distributions helps to reduce runtime.

This page presents snippets for well and lesser known CI/CD engines to help getting started with the R package binaries of this project.

Note

The presented snippets rely on external projects and libraries which get updated regurarly. There is no guarantee to work out-of-the box at all times. If you found an error, please open a pull request in the linked repo.

GitHub Actions

VM

GitHub Actions can be run directly on a VM or in a containered context. The available VM images can be found here. The only Linux distribution available is Ubuntu.

For VM workflows, r-lib/actions provides many examples for different use cases.

To use package binaries from this project, use the following config for r-lib/actions/setup-r@v2:

      - uses: r-lib/actions/setup-r@v2
        with:
          r-version: ${{ matrix.config.r }}
          http-user-agent: ${{ matrix.config.http-user-agent }}
          Ncpus: 2
          cran: 'https://cran.devxy.io/amd64/noble/latest'

(Ncpus: 2 has been set to allow for parallel installations.)

Container

An alternative is to run Actions in a containerized context. This spins up a container of the selected image in the VM and allows running on other distributions than Ubuntu, e.g. on Alpine. To avoid having to install R first and configure a custom repository every time, specialized images are provided which already have everything in place:

jobs:
  container:
    runs-on: ubuntu-latest
    container: devxygmbh:r-alpine
    steps:
      - run: |
          R -q -e 'install.packages("pak", repos = sprintf("https://r-lib.github.io/p/pak/devel/%s/%s/%s", .Platform$pkgType, R.Version()$os, R.Version()$arch))'
          R -q -e 'getOption("repos")'
          R -q -e 'options(Ncpus = 2); pak::local_install_dev_deps()'
Note

When running in a containerized context, the predefined actions from r-lib/actions cannot be used. Shell/R commands must be used directly.

Note

Running containerized in Alpine should be substantially faster than running in a VM context.

Tip

(91 packages)

  • Containerized: 1m 21s
  • VM: 2m 11s

Both runs are based on the assumption that caching is not used. With caching enabled, the performance of both approaches should be roughly comparable.

Source: pat-s/workflow-compare

GitLab Runner

WIP

Back to top