Package Managers à la Carte

NJPLS @ Jane Street, 2025 May

Shiwei Weng

Johns Hopkins University

Logan Kostick

Johns Hopkins University

Michael Rushanan

Johns Hopkins University

Scott Smith

Johns Hopkins University

Package Managers are Ubiquitous but …

  • Real-world package managers are not easy to understand and reason about
    • whether they make mistakes, or users are just confused
  • Some popular languages may have too many package managers
    • e.g. python has pip, conda, poetry, uv, but why?
  • Some languages may lack one
    • e.g. shell, cmake, menhir, dune, etc
    • e.g. LLM Prompts, λ-calculus or DSLs you have invented

Current State of Package Managers

Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

— Greenspun’s Tenth Rule


It also applies to package managers.

Goals

  • Principled and modular (vs ad hoc)
  • Formally specified (vs informally-specified)
  • Sound and verified (vs bug-ridden)
  • Efficient and scalable (vs slow implementation)

Approach and Aim

  • Our approach
    • Distill a precise vocabulary for package managers (PKGM)
    • Understand and reason about their structures and operations
    • Build a proof-of-concept framework tola (Thousands of Languages)
  • Our aims
    • Hit the goals of the previous slide (principled/sound/efficient/scalable)
    • Allow users to derive PKGM for their languages
    • Help understand and address real-world PKGMs

Outline of the Talk

  • Vocabulary
    • Clarify terms like package name and version
  • Package Stores
    • We treat PKGMs as distributed “versioned-key”-value maps with explicit local and remote stores
  • Version Logic
    • A formal approach to define dependency and their operations that every PKGM (mostly) implements

What is Package Name and Version, really?

  • foo
    • Package name foo
  • foo.1.0
    • Version 1.0
  • foo.1.1.bugfix
    • Version 1.1.bugfix
  • foo_1.1-2_amd64
    • Version 1.1-2
    • Platform amd64
      • less interesting when two platforms are compared

The Vocabulary Is Messy

  • Package managers don’t agree on common terms
    • e.g. package, project, library, application, module, workspace
  • Package managers use metaphors for terms
    • e.g. wheel, gem, switch, rack, keg, bottle
  • Other systems (build systems, compilers, etc) also use these terms but may have subtle different meanings

Where are Packages Kept in the Real-World?

  • Stored local to the user
    • system-wise: in some (global) system directory
    • project-wise: near the project directory
  • Stored remotely, on some servers
  • Each of these locations — a local directory, or a remote source
    • acts as an isolated place for packages.
  • We name each such package place a Package Store

Local Package Stores in the Real-World

  • system-wise
    • opam switch create – creates a local isolated store
    • python -m venv – installs into the virtualenv
    • gem install --install-dir – local install path for RubyGems
  • project-wise
    • npm install – installs into node_modules/ in the project
    • bundle config set --local cache_path vendor/cache – caches resolved packages locally
    • brew --prefix <path> – local installs under custom prefix path

Remote Package Stores in the Real-World

  • Names for remote stores also vary widely
    • repository, index, registry, (gem) server, archive mirror, Personal Package Archive(PPA), tap
  • Official public stores
    • opam: opam repository (Git-based, versioned)
    • pip: Python Package Index (PyPI)
    • npm: npm registry
    • RubyGems: rubygems.org
    • dpkg+apt: Debian archive mirrors

Store Structures in Our Perspective

  • PKGMs are Distributed “Versioned-key”-value maps
    • Each distributed store, locally or remotely, is heterogeneous
  • Multiplicity of choice for key:
    • Unique: only one version per package in a store
    • vs Multi: multiple versions coexist
  • Structure choice for value:
    • Flat: packages are only put in a single directory without nesting
    • vs Nested: packages are organized in a hierarchical structure (nested directories)

Stores of Package Managers

PKGM Target Local Store Local Structure Remote Store Remote Structure
opam OCaml switch unique flat opam repository multi flat
pip Python virtual environment unique flat Python Package Index (PyPI) multi flat
npm Node.js node_modules unique nested npm registry multi flat
RubyGem Ruby System-wide directory multi flat RubyGems.org multi flat
bundler Ruby vendor/bundle unique flat RubyGems.org multi flat
dpkg+apt Debian packages System-wide directory multi flat Debian package repository multi flat
  • Most remote stores use a multi flat structure
  • npm uses a nested local store — used to cause issues
  • RubyGem uses a multi local store - causes many troubles
  • dpkg’s multi local store requires switching symbolic links via update-alternatives

tola Code for Stores

(* In Spec.config *)
type store_kind = 
  Directory | Git_repo
type store_position = 
  Local | Remote

(* In Store *)
module Pkg_store = 
  Store.File_store_make (Package)

type t = {
  config : Spec.config;
  local_store : Pkg_store.t;
  remote_stores : Pkg_store.t list;
  ...
}

Versions in the Real-World

  • A version is a string — but often has internal structure
    • e.g. SemVer 1.3.2 <==> {major=1; minor=3; patch=2}
    • Each segment can be compared; precedence differs by position
  • Different ecosystems define different version schemes:
    • SemVer: MAJOR.MINOR.PATCH[-PRERELEASE][+BUILD]
      • build metadata does not affect precedence
    • Debian: [epoch:]upstream_version[-debian_revision]
      • Fields like Architecture: amd64 are defined elsewhere

Advanced Comparators in Package Managers

PKGM Version Scheme Advanced Comparators
opam Debian versioning conflicts, conflict-class, virtual package
pip version specifiers ~= (compatible), === (arbitrary equality)
npm semantic versioning ~ (tilde), ^ (caret)
RubyGem semantic versioning ~> (pessimistic)
dpkg+apt Debian versioning Breaks, Conflicts


  • Package managers define advanced comparators
    • some are based on version schemes (e.g., ~, ~>, ^),
    • but others are based on dependency relations (e.g., conflicts, breaks)

Examples for Advanced Comparators


Name Cmp PKGM Meaning Example Input Example Match Versions
Compatible release ~= pip Allows patch updates within same minor ~=1.2.3 1.2.3, 1.2.9 , not 1.3.0
Allows minor updates within same major ~=1.2 1.2.0, 1.9.9💥 , not 2.0.0
~=1 (invalid) 💥 -
Exact match === pip Exact string match ===1.2.3 Only 1.2.3
Tilde ~ npm Allows patch updates within same minor ~1.2.3 1.2.3, 1.2.9 , not 1.3.0
~1.2 1.2.0, 1.2.9💥 , not 1.3.0
~1 1.0.0, 1.9.9, not 2.0.0
Caret ^ npm Allows minor updates within same major ^1.2.3 1.2.3, 1.9.9, not 2.0.0
Pessimistic Operator ~> RubyGem Allows patch updates within same minor ~> 1.2.3 1.2.3, 1.2.9, not 1.3.0
~> 1.2 1.2.0, 1.2.9💥 , not 1.3.0
may surprise some people - gem guide ~> 1 1.0.0, 1.9.9, not 2.0.0


  • pip’s compatible ~=, npm’s tilde ~, and gem’s pessimistic ~> look similar, but
    • informally-specified from document and implementation
    • They differ on ~= 1.2, ~ 1.2, ~> 1.2 💥 and differ on ~= 1, ~ 1, ~> 1 💥

Version Logic: A Formal Foundation

  • Captures dependency constraints using logical formulas
  • Clarifies the behavior of version comparators and resolution rules
  • Supports deterministic and composable resolution

Version Logic is Based on Boolean Logic

Concept Boolean Logic Version Logic
Literals true, false extend with version values (e.g., 1.2.3)
Variables propositional variables (P, Q) package names (foo, bar)
Formulas P ∧ Q, ¬P ∨ Q foo >= 1.0, foo.major <= bar.major (version ops)
Connectives , , , (advanced comparators)
Satisfiability whether formula is true under some assignment whether a version solution exists
Assignment [P = true, Q = false] [foo == 1.1, bar = 1.2]
Solve solve(exp);   MaxSAT.solve(exp) solve_max(exp, f)
Determinism arbitrary assignment resolves priorities

Resolving constraints with solve_max

solve_max(𝜑,f) finds a satisfiable assignment 𝜑= for constraints 𝜑 that maximize f.

  • Returns SAT(𝜑=) or UNSAT.
  • f takes an assignment 𝜑= and returns an (integer) score.

It’s not MaxSAT (maximizing # satisfied clauses) - but more similar to an optimizing SAT solver (which finds an assignment that maximizes a scoring function).

Dependency Resolution

  • Given a list of dependencies, a.k.a constraints, version logic formula, ask for a preferred satisfiable assignment
  • Common Resolving Preferences
    • Upgrade most packages locally installed
    • Make the least change for locally installed packages
    • Allow downgrade for locally installed packages
    • Relaxed, make a best effort to fulfill user commands
  • A Resolving Policy is a list of ordered Resolving Preferences
    • PKGM can invoke solve_max(𝜑,f) with the combination of 𝜑 and f
    • 𝜑 can be different

Scoring Function f Candidates

  • f = count_max(𝜑=, S):
    • S is a package store
    • # of packages at the max version (typically in remote store \(S^R\))
  • f = count_in_store(𝜑=, S):
    • # of assigned versions that exist in local store \(S^l_0\)
  • f = count_fulfilled(𝜑=, N^I, N^D):
    • # of install(\(N^I\))/uninstall (\(N^D\)) commands fulfilled

Encoding Package Stores

%%{ init: {
  'themeVariables': {
    'edgeLabelBackground': 'transparent',
    "fontSize": "20px",
    "fontFamily": "monospace",
    'textAlign': 'center',
    'wrap': true
    }
} }%%
flowchart LR
  subgraph FOO [foo]
    direction TB
    foo20[ ]:::invisible
    foo11["foo 1.1"]
    foo10["foo 1.0"]
  end

  subgraph STD [std]
    direction TB
    class STD subgraphStyle;
    std20["std 2.0"]
    std11["std 1.1"]
    std10["std 1.0"]
  end

  subgraph BAR [bar]
    direction TB
    bar20["bar 2.0"]
    bar11[ ]:::invisible
    bar10["bar 1.0"]
  end
  
  FOO ~~~ STD
  STD ~~~ BAR
  foo10 ~~~ std10 ~~~ bar10
  %% foo11 ~~~ std11 ~~~ bar11
  %% foo20 ~~~ std20 ~~~ bar20
  foo10 -->|dep| std10
  foo11 -.->|either| std10
  foo11 -.->|either| std11
  bar10 -->|dep| std10
  bar20 -->|dep| std20
  linkStyle default stroke:#222, color:#000, background:#fff
  classDef subgraphStyle fill:#fff, stroke:#0af, stroke-width:2px, color:#000
  class FOO,STD,BAR subgraphStyle
  classDef invisible fill:#fff, stroke:#fff, stroke-width:0px
  classDef foo fill:#e0f8e0, stroke:#0af, stroke-width:2px,color:#000
  classDef std fill:#e0eaff, stroke:#0af, stroke-width:2px,color:#000
  classDef bar fill:#ffe0e0, stroke:#0af, stroke-width:2px,color:#000
  class foo10,foo11 foo
  class std10,std11,std20 std
  class bar10,bar20 bar

\[ \begin{aligned} \varphi_s = \text{store_to_constraint}(S^R) =\quad &\bigwedge \left\{ \begin{aligned} &(\text{foo} = 1.0) \Rightarrow (\text{std} = 1.0), \\ &(\text{foo} = 1.1) \Rightarrow ((\text{std} = 1.0) \lor (\text{std} = 1.1)), \\ &(\text{bar} = 1.0) \Rightarrow (\text{std} = 1.0), \\ &(\text{bar} = 2.0) \Rightarrow (\text{std} = 2.0) \end{aligned} \right\} \end{aligned} \]

Encoding Example for preference update-most

Pre

%% Mermaid diagram generator for resolution steps
%% This version represents the diagram plus highlight states for resolution logic

%%{ init: {
  'themeVariables': {
    'edgeLabelBackground': 'transparent',
    "fontSize": "20px",
    "fontFamily": "monospace",
    'textAlign': 'center',
    'wrap': true
    }
} }%%
flowchart LR
  subgraph FOO [foo]
    direction TB
    foo20[ ]:::invisible
    foo11["foo 1.1"]
    foo10["foo 1.0"]
  end

  subgraph STD [std]
    direction TB
    std20["std 2.0"]
    std11["std 1.1"]
    std10["std 1.0"]
  end

  subgraph BAR [bar]
    direction TB
    bar20["bar 2.0"]
    bar11[ ]:::invisible
    bar10["bar 1.0"]
  end

  FOO ~~~ STD
  STD ~~~ BAR
  foo10 ~~~ std10 ~~~ bar10
  %% foo11 ~~~ std11 ~~~ bar11
  %% foo20 ~~~ std20 ~~~ bar20
  foo10 -->|dep| std10
  foo11 -.->|either| std10
  foo11 -.->|either| std11
  bar10 -->|dep| std10
  bar20 -->|dep| std20

  linkStyle default stroke:#222, color:#000, background:#fff

  %% Base category styles
  classDef foo fill:#e0f8e0, stroke:#ccc, stroke-width:2px, color:#000;
  classDef std fill:#e0eaff, stroke:#ccc, stroke-width:2px, color:#000;
  classDef bar fill:#ffe0e0, stroke:#ccc, stroke-width:2px, color:#000;

  %% Role-based overlays
  classDef selectedLocal stroke:#0077cc, stroke-width:4px, stroke-width:4px;
  classDef selectedInstall stroke:#f90, stroke-width:4px;

  %% Inactive nodes
  classDef faded fill:#f5f5f5, stroke:#ccc, color:#aaa;

  %% Subgraph hint for to-install selection
  classDef subgraphStyle fill:#fff, stroke:#0af, stroke-width:2px, color:#000
  classDef invisible fill:transparent, stroke:none, color:transparent
  classDef selectedGroup fill:#fff2cc, stroke:#f90, stroke-width:4px, stroke-dasharray: 6 3;

  %% Assign base categories (default for all nodes)
  class foo10,foo11 foo
  class std10,std11,std20 std
  class bar10,bar20 bar


  %% State: Initial: std1.0 installed, selecting foo

  class foo10 faded

  class foo11 faded

  class std10 selectedLocal

  class std11 faded

  class std20 faded

  class bar10 faded

  class bar20 faded


  class FOO selectedGroup

  class STD subgraphStyle

  class BAR subgraphStyle

Post

%% Mermaid diagram generator for resolution steps
%% This version represents the diagram plus highlight states for resolution logic

%%{ init: {
  'themeVariables': {
    'edgeLabelBackground': 'transparent',
    "fontSize": "20px",
    "fontFamily": "monospace",
    'textAlign': 'center',
    'wrap': true
    }
} }%%
flowchart LR
  subgraph FOO [foo]
    direction TB
    foo20[ ]:::invisible
    foo11["foo 1.1"]
    foo10["foo 1.0"]
  end

  subgraph STD [std]
    direction TB
    std20["std 2.0"]
    std11["std 1.1"]
    std10["std 1.0"]
  end

  subgraph BAR [bar]
    direction TB
    bar20["bar 2.0"]
    bar11[ ]:::invisible
    bar10["bar 1.0"]
  end

  FOO ~~~ STD
  STD ~~~ BAR
  foo10 ~~~ std10 ~~~ bar10
  %% foo11 ~~~ std11 ~~~ bar11
  %% foo20 ~~~ std20 ~~~ bar20
  foo10 -->|dep| std10
  foo11 -.->|either| std10
  foo11 -.->|either| std11
  bar10 -->|dep| std10
  bar20 -->|dep| std20

  linkStyle default stroke:#222, color:#000, background:#fff

  %% Base category styles
  classDef foo fill:#e0f8e0, stroke:#ccc, stroke-width:2px, color:#000;
  classDef std fill:#e0eaff, stroke:#ccc, stroke-width:2px, color:#000;
  classDef bar fill:#ffe0e0, stroke:#ccc, stroke-width:2px, color:#000;

  %% Role-based overlays
  classDef selectedLocal stroke:#0077cc, stroke-width:4px, stroke-width:4px;
  classDef selectedInstall stroke:#f90, stroke-width:4px;

  %% Inactive nodes
  classDef faded fill:#f5f5f5, stroke:#ccc, color:#aaa;

  %% Subgraph hint for to-install selection
  classDef subgraphStyle fill:#fff, stroke:#0af, stroke-width:2px, color:#000
  classDef invisible fill:transparent, stroke:none, color:transparent
  classDef selectedGroup fill:#fff2cc, stroke:#f90, stroke-width:4px, stroke-dasharray: 6 3;

  %% Assign base categories (default for all nodes)
  class foo10,foo11 foo
  class std10,std11,std20 std
  class bar10,bar20 bar


  %% State: Final: std1.1, foo1.1 installed

  class foo10 faded

  class foo11 selectedInstall

  class std10 selectedLocal

  class std11 selectedInstall

  class std20 faded

  class bar10 faded

  class bar20 faded


  class FOO subgraphStyle

  class STD subgraphStyle

  class BAR subgraphStyle

Initial state: std 1.0.

Action: install foo.

Policy: upgrade_most

Solution:

\[ \text{solve_max}(\varphi_{o} \land \varphi_{i} \land \varphi_s,\ \texttt{count_max}(\cdot,\ S):) \]

Where:

  • \(\varphi_s\) is the store-to-constraint for all store packages
  • \(\texttt{count_max}\) rewards assignments using latest versions from remote stores
  • \(S\) is the remote store
  • \(\varphi_{o} = \text{no_downgrade}(S^l_0) =\ (\texttt{std} \geq 1.0)\)
  • \(\varphi_{i} = \text{ensure_install}(N^I_0) =\ (\texttt{foo} > 0)\)

Dependency Resolution within Version Logic

  • Can be defined in solve_max(deps, scoring_function)
    • Resolving priorities should be defined in this approach
  • Different from existing PKGMs that allow using external solvers or solving library openSUSE/libsolv
    • Version Logic starts from precise definitions from principles

Recap

  • Vocabulary
  • Package managers are Distributed “Versioned-key”-value Map
    • Local and remote stores with designated key-value structure
  • Version Logic provides a formal logic of version constraints
  • tola code has batteries-included module and module functors for the above
    • Derive your choice of package manager components à la carte

Thanks

Q & A

 

Ongoing Work

  • Package Soundness
    • Package Soundness: either version compability or dependencies are syntactical guarantee, how can we reach semantic safety?
    • Robin Milner [Milner 1978] says for Type Soundness: > Well-typed programs cannot ”go wrong”.
      • It connects two systems: one type system and one run-time system
      • spectrum: gradual typing, semantic typing > Well-packaged programs go?
      • three systems: one pkgm, compiler/interpreter, one run-time system