Writing Reproducible Python

Last updated on 2026-02-10 | Edit this page

Overview

Questions

What is the difference between the standard library and third-party packages?
How do I share a script so that it runs on someone else’s computer?

Objectives

Import and use the pathlib standard library.
Identify when a script requires external dependencies (like numpy).
Write a self-contained script that declares its own dependencies using inline metadata.
Share a script which reproducibly handles conda dependencies alongside Python.

The Humble Script

Most research software starts as a single file. You have some data, you need to analyze it, and you write a sequence of commands to get the job done.

Let’s start by creating a script that generates some data and saves it. We will use the standard library module pathlib to handle file paths safely across operating systems (Windows/macOS/Linux).

PYTHON

import random
from pathlib import Path

# Define output directory
DATA_DIR = Path("data")
DATA_DIR.mkdir(exist_ok=True)

def generate_trajectory(n_steps=100):
    print(f"Generating trajectory with {n_steps} steps...")
    path = [0.0]
    for _ in range(n_steps):
        # Random walk step
        step = random.uniform(-0.5, 0.5)
        path.append(path[-1] + step)
    return path

if __name__ == "__main__":
    traj = generate_trajectory()
    output_file = DATA_DIR / "trajectory.txt"

    with open(output_file, "w") as f:
        for point in traj:
            f.write(f"{point}\n")

    print(f"Saved to {output_file}")

PYTHON

Generating trajectory with 100 steps...
Saved to data/trajectory.txt

SH

head -n 3 data/trajectory.txt

This script uses only Built-in modules (random, pathlib). You can send this file to anyone with Python installed, and it will run.

The Need for External Libraries

Standard Python is powerful, but for scientific work, we almost always need the “Scientific Stack”: numpy, pandas/polars, or matplotlib.

Let’s modify our script to calculate statistics using numpy.

PYTHON

import random
from pathlib import Path
import numpy as np # new dependency!!

DATA_DIR = Path("data")
DATA_DIR.mkdir(exist_ok=True)

def generate_trajectory(n_steps=100):
    # Use numpy for efficient array generation
    steps = np.random.uniform(-0.5, 0.5, n_steps)
    trajectory = np.cumsum(steps)
    return trajectory

if __name__ == "__main__":
    traj = generate_trajectory()
    print(f"Mean position: {np.mean(traj):.4f}")
    print(f"Std Dev: {np.std(traj):.4f}")

Challenge

The Dependency Problem

If you send this updated file to a colleague who just installed Python, what happens when they run it?

Show me the solution

It crashes.

PYTHON

ModuleNotFoundError: No module named 'numpy'

Your colleague now has to figure out how to install numpy. Do they use pip? conda? What version? This is the start of “Dependency Hell.”

The Modern Solution: PEP 723 Metadata

Traditionally, you would send a requirements.txt file alongside your script, or leave comments in the script, or try to add documentation in an email.

But files get separated, and versions get desynchronized.

PEP 723 is a Python standard that allows you to embed dependency information directly into the script file. Tools like uv (a fast Python package manager) can read this header and automatically set up the environment for you.

Flowchart showing uv taking a script with metadata, creating a temporary environment, installing dependencies, and executing the code

We can add a special comment block at the top of our script:

PYTHON

# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "numpy",
# ]
# ///

import numpy as np

print("Hello I don't crash anymore..")
# ... rest of script ...

Now, instead of manually installing numpy, you run the script using uv:

BASH

uv run data/generate_data_np_uv.py

When you run this command:

uv reads the metadata block.
It creates a temporary, isolated virtual environment.
It installs the specified version of numpy.
It executes the script.

This guarantees that anyone with uv installed can run your script immediately, without messing up their own python environments.

Beyond Python: The `pixibang`

PEP 723 is fantastic for installable Python packages [fn:: most often this means things you can find on PyPI).

However, for scientific software, we often rely on compiled binaries and libraries that are not Python packages—things like LAMMPS, GROMACS, or eOn a server-client tool for exploring the potential energy surfaces of atomistic systems.

If your script needs to run a C++ binary, pip and uv cannot help you easily. This is where pixi comes in.

pixi is a package manager built on the conda ecosystem. It can install Python packages and compiled binaries. We can use a “pixibang” script to effectively replicate the PEP 723 experience, but for the entire system stack.

Example: Running minimizations with eOn and PET-MAD

Let’s write a script that drives a geometry minimization ¹. This requires:

Metatrain/Torch: For the machine learning potential.
rgpycrumbs: For helper utilities.
eOn Client: The compiled C++ binary that actually performs the minimization.

First, we need to create the input geometry file pos.con in our directory:

BASH

cat << 'EOF' > pos.con
Generated by ASE
preBox_header_2
25.00   25.00   25.00
90.00   90.00   90.00
postBox_header_1
postBox_header_2
4
2 1 2 4
12.01 16.00 14.01 1.01
C
Coordinates of Component 1
  11.04   11.77   12.50 0    0
  12.03   10.88   12.50 0    1
O
Coordinates of Component 2
  14.41   13.15   12.44 0    2
N
Coordinates of Component 3
  13.44   13.86   12.46 0    3
  12.50   14.51   12.49 0    4
H
Coordinates of Component 4
  10.64   12.19   13.43 0    5
  10.59   12.14   11.58 0    6
  12.49   10.52   13.42 0    7
  12.45   10.49   11.57 0    8
EOF

Now, create the script eon_min.py. Note the shebang line!

PYTHON

#!/usr/bin/env -S pixi exec --spec eon --spec uv -- uv run
# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "ase",
#     "metatrain",
#     "rgpycrumbs",
# ]
# ///

from pathlib import Path
import subprocess

from rgpycrumbs.eon.helpers import write_eon_config
from rgpycrumbs.run.jupyter import run_command_or_exit

repo_id = "lab-cosmo/upet"
tag = "v1.1.0"
url_path = f"models/pet-mad-s-{tag}.ckpt"
fname = Path(url_path.replace(".ckpt", ".pt"))
url = f"https://huggingface.co/{repo_id}/resolve/main/{url_path}"
fname.parent.mkdir(parents=True, exist_ok=True)
subprocess.run(
    [
        "mtt",
        "export",
        url,
        "-o",
        fname,
    ],
    check=True,
)
print(f"Successfully exported {fname}.")

min_settings = {
    "Main": {"job": "minimization", "random_seed": 706253457},
    "Potential": {"potential": "Metatomic"},
    "Metatomic": {"model_path": fname.absolute()},
    "Optimizer": {
        "max_iterations": 2000,
        "opt_method": "lbfgs",
        "max_move": 0.5,
        "converged_force": 0.01,
    },
}

write_eon_config(".", min_settings)
run_command_or_exit(["eonclient"], capture=True, timeout=300)

Make it executable and run it:

BASH

chmod +x eon_min.py
./eon_min.py

Unpacking the Shebang

The magic happens in this line: #!/usr/bin/env -S pixi exec --spec eon --spec uv -- uv run

This is a chain of tools:

pixi exec: Create an environment with pixi.
–spec eon: Explicitly request the eon package (which contains the binary eonclient).
–spec uv: Explicitly request uv.
– uv run: Once the outer environment exists with eOn and uv, it hands control over to uv run.
PEP 723: uv run reads the script comments and installs the Python libraries (ase, rgpycrumbs).

This gives us the best of both worlds: pixi provides the compiled binaries, and uv handles the fast Python resolution.

Diagram showing the nested layers: Pixi providing system binaries like eonclient, wrapping UV which provides Python libraries like numpy, both supporting the script

The Result

When executed, the script downloads the model, exports it using metatrain, configures eOn, and runs the binary.

[INFO] - Using best model from epoch None
[INFO] - Model exported to '.../models/pet-mad-s-v1.1.0.pt'
Successfully exported models/pet-mad-s-v1.1.0.pt.
Wrote eOn config to 'config.ini'
EON Client
VERSION: 01e09a5
...
[Matter]          0     0.00000e+00         1.30863e+00      -53.90300
[Matter]          1     1.46767e-02         6.40732e-01      -53.91548
...
[Matter]         51     1.56025e-03         9.85039e-03      -54.04262
Minimization converged within tolerence
Saving result to min.con
Final Energy: -54.04261779785156

Challenge

Challenge: The Pure Python Minimization

Create a script named ase_min.py that performs the exact same minimization on pos.con, but uses the atomic simulation environment (ASE) built-in LBFGS optimizer instead of eOn.

Requirements:

Do we need pixi? Try using the uv shebang only (no pixi).
Reuse the model file we exported earlier (models/pet-mad-s-v1.1.0.pt).
Compare the “User Time” of this script vs the EON script.

Hint: You will need the metatomic package to load the potential in ASE.

Show me the solution

PYTHON

# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "ase",
#     "metatomic",
#     "numpy",
# ]
# ///

from ase.io import read
from ase.optimize import LBFGS
from metatomic.torch.ase_calculator import MetatomicCalculator

def run_ase_min():
    atoms = read("pos.con")

    # Reuse the .pt file exported by the previous script
    atoms.calc = MetatomicCalculator(
        "models/pet-mad-s-v1.1.0.pt", 
        device="cpu"
    )

    # Setup Optimizer
    print(f"Initial Energy: {atoms.get_potential_energy():.5f} eV")

    opt = LBFGS(atoms, logfile="-") # Log to stdout
    opt.run(fmax=0.01)

    print(f"Final Energy:   {atoms.get_potential_energy():.5f} eV")

if __name__ == "__main__":
    run_ase_min()

Initial Energy: -53.90300 eV
       Step     Time          Energy          fmax
....
LBFGS:   64 20:42:09      -54.042595        0.017080
LBFGS:   65 20:42:09      -54.042610        0.009133
Final Energy:   -54.04261 eV

So we get the same result, but with more steps…

Feature	EON Script (Pixi)	ASE Script (UV)
Shebang	`pixi exec ... -- uv run`	`uv run`
Engine	C++ Binary (`eonclient`)	Python Loop (`LBFGS`)
Dependencies	System + Python	Pure Python
Use Case	HPC / Heavy Simulations	Analysis / Prototyping

While the Python version seems easier to setup, the eOn C++ client is often more performant, and equally trivial with the c.

Key Points

PEP 723 allows inline metadata for Python dependencies.
Use uv to run single-file scripts with pure Python requirements (numpy, pandas).
Use Pixi when your script depends on system libraries or compiled binaries (eonclient, ffmpeg).
Combine them with a Pixibang (pixi exec ... -- uv run) for fully reproducible, complex scientific workflows.

A subset of the Cookbook recipe for saddle point optimization↩︎

Writing Reproducible Python

Overview

Questions

Objectives

The Humble Script

PYTHON

PYTHON

SH

The Need for External Libraries

PYTHON

The Dependency Problem

Show me the solution

PYTHON

The Modern Solution: PEP 723 Metadata

PYTHON

BASH

Beyond Python: The pixibang

Example: Running minimizations with eOn and PET-MAD

BASH

PYTHON

BASH

Unpacking the Shebang

The Result

Challenge: The Pure Python Minimization

Show me the solution

PYTHON

Beyond Python: The `pixibang`