Introducing go-apt-cacher and go-apt-mirror

go-apt-cacher is a caching reverse-proxy designed specially for Debian/Ubuntu repositories. As it is written in Go, go-apt-cacher tolerates thousands of concurrent client connections and is very fast.

go-apt-mirror is a mirroring tool for Debian/Ubuntu repositories similar to apt-mirror. The biggest advantage of using go-apt-mirror is that it will never create incomplete/inconsistent mirrors.

This article describes our backgrounds and motivation as well as the design of these tools. They are available at https://github.com/cybozu-go/aptutil.

Backgrounds

We are running thousands of Ubuntu servers in our data centers. In order to deploy programs into these servers, we often create deb packages and deploy them to a central in-house repository.

To distribute deb packages among our geographically distributed data centers, direct access to the central repository should be avoided for latency and band-width reasons. Instead, we relied on apt-cacher-ng that proxy the access to the central repository server and cache deb files.

For security reasons, we need to apply patches regularly to these servers. Although patches come from the official Ubuntu repository, we take a snapshot of it to test patches before application. We used apt-mirror for this purpose.

Problems

apt-cacher-ng is not very stable when used as a reverse proxy for HTTPS servers, especially when there are a lot of clients. It randomly crashes. We tried to debug it but found that its internals are quite complicated that would take days to catch the bug.

apt-mirror sometimes ends with a broken mirror. Broken here means some files do not match the checksums provided by APT indices. We considered this is a design flaw of apt-mirror.

Our solution

Instead of fixing these tools, we implement our own tools by using Go. This results in two new tools, namely, go-apt-cacher and go-apt-mirror. Thanks to Go, our tools are portable (run even on non-Linux machines), fast, terse, and can tolerate with tons of clients.

Both tools are designed and implemented to test downloaded files strictly with checksums listed in indices such as Release, Packages, or Sources.

Features of go-apt-cacher include:

  • Checksum awareness
    go-apt-cacher recognizes APT indices and extracts checksum information.
    Cached files are invalidated and dropped when checksums are updated.
  • Non-HTTP cache semantics
    go-apt-cacher ignores cache-related HTTP headers.
    This is because updates of APT repository can and should be checked through checksums.
  • LRU eviction
    Cached files are evicted in least-recently-used (LRU) fashion.
  • Massive clients
    go-apt-cacher can accept thousands of concurrent connections from clients.
    The number of connections to the upstream servers can be limited.

Features of go-apt-mirror include:

  • Checksum awareness
    go-apt-mirror tests all downloaded files with checksums.
    It rollbacks changes when a file does not match the checksum.
  • Atomic update
    Mirrors are updated atomically by using rename(2) with symbolic links.
    Unchanged files are reused as hard links for space- and time-efficiency.
  • Ultra-fast update
    go-apt-mirror checks updates of files by checksums saved with mirrors.
    This is quite faster than rsync in general.
  • Parallel download
    go-apt-mirror downloads files in parallel.
  • Partial mirror
    go-apt-mirror can mirror repositories partially for specific distributions and/or architectures.

Conclusion

go-apt-cacher and go-apt-mirror are not just re-implementations of apt-cacher-ng and apt-mirror in Go. They are designed for robustness and reliability.

They have already replaced apt-cacher-ng and apt-mirror in our data centers successfully. We have open-sourced them on GitHub hoping someone find them useful.