Welcome to the engine room of xmris! If you are wondering why we rely so heavily on xarray, why we don’t just pass sequence parameters as function arguments, or what the deal is with our decorators, you are in the right place.
This guide reads a bit like a story. We will walk through the exact problems we faced when designing this package, and the architectural decisions we made to solve them.
Let’s dive in.
1. The Parameter Soup Problem¶
Imagine you are writing Python functions to process an MRI Free Induction Decay (FID) signal. You need the raw data, but to do anything meaningful — converting frequencies to ppm, removing a digital filter, auto-phasing — you also need the scanner metadata: the spectrometer frequency, the B0 field, the dwell time, and so on.
If we built xmris like a traditional library, a simple processing pipeline would look like this:
❌ The Anti-Pattern: Parameter Soup
def apodize(data, dwell_time, lb): ...
def fft_to_spectrum(data, axis): ...
def to_ppm(data, mhz): ...
def autophase(data, mhz, dwell_time): ...
# User code — threading the same metadata through every step:
data = apodize(data, dwell_time=0.0005, lb=5.0)
data = fft_to_spectrum(data, axis=1) # is time axis 0 or 1?
data = to_ppm(data, mhz=300.15)
data = autophase(data, mhz=300.15, dwell_time=0.0005)
The xarray Solution¶
To avoid parameter soup, xmris is built natively on top of xarray. An xarray.DataArray bundles together the raw data, named dimensions (“numpy axes”), coordinates (axis labels), and arbitrary metadata (.attrs) into a single, self-describing object.
Here is what an xmris DataArray looks like in practice — a 2D MRSI dataset with 16 spatial voxels, each containing a 2048-point FID:
import numpy as np
import xarray as xr
import xmris # activates the .xmr accessor
n_points = 2048
dwell_time = 0.0005 # seconds
mrsi_fid = xr.DataArray(
data=np.random.randn(16, n_points) + 1j * np.random.randn(16, n_points),
dims=["voxel", "time"],
coords={
"voxel": np.arange(16),
"time": np.arange(n_points) * dwell_time,
},
attrs={
"b0_field": 7.0, # Tesla
"reference_frequency": 300.15, # MHz
"carrier_ppm": 4.7, # ppm
},
)
mrsi_fidThe data now carries its own context — metadata, axis names, and coordinates all in one object. The entire pipeline collapses to this:
# ✅ The xmris Way: Encapsulated, Chainable Processing
spectrum = (
mrsi_fid
.xmr.apodize_exp(lb=5.0)
.xmr.to_spectrum()
.xmr.to_ppm()
.xmr.autophase()
)
Two things happened here:
Metadata travels with the data.
to_ppm()andautophase()take zero metadata arguments. They find the spectrometer frequency inside.attrsautomatically — and becausexarraypreserves attributes through operations, that metadata is still there at step four without any effort from you.Operations act on named dimensions, not integer positions.
to_spectrum()defaults todim="time", so it transforms the right axis regardless of whether the array is 1D, 2D, or 5D — and regardless of axis order. If your data uses a different convention, just say so:
# Default — transforms along "time":
mrsi_spectrum = mrsi_fid.xmr.to_spectrum()
# Your data calls it something else? Just pass the name:
mrsi_spectrum = mrsi_fid.xmr.to_spectrum(dim="time")Compare this to the numpy equivalent, where you’d have to track that time is
axis=1 (and hope nobody transposes the array upstream):
# 🤞 numpy — is time axis 0 or 1? Better check every time.
result = np.fft.fftshift(np.fft.fft(data, axis=1), axes=1)
The user still passes arguments that represent choices (lb=5.0),
but never has to re-supply physical constants of the experiment or remember
which integer axis is which. The metadata and the axis semantics travel
with the data. You never carry them yourself.
2. The Danger of “Hidden State”¶
Encapsulation is beautiful, but it introduces a dangerous new problem: magic strings and hidden state.
If to_ppm() implicitly reads the frequency from data.attrs["reference_frequency"], three things can go wrong:
The user’s data doesn’t have that attribute.
The user spelled it
"ref_freq"or"MHz".The user has no way of knowing
"reference_frequency"was required in the first place.
A naive implementation would look like this:
# 💥 Naive approach — no safeguards:
def to_ppm(self, dim="frequency"):
mhz = self._obj.attrs["reference_frequency"] # ← what if it doesn't exist?
ppm_coords = self._obj.coords[dim].values / mhz
return self._obj.assign_coords({"chemical_shift": (dim, ppm_coords)})
And there is a subtler problem: how does the user even know that to_ppm() requires "reference_frequency"?
If we document it by hand in a docstring, those docs will inevitably drift out of sync with the actual code.
We needed a system that:
Prevents the crash before it happens.
Tells the user exactly what is wrong and how to fix it.
Documents itself automatically so documentation can never go stale.
The solution has two parts: a Data Dictionary (section 3) and a Decorator Engine (section 4.
What’s a decorator?
A decorator is a Python function that wraps another function to add
behavior before or after it runs — without modifying the function’s own code.
You apply one with the @ syntax:
@requires_attrs(ATTRS.reference_frequency, ATTRS.carrier_ppm)
def to_ppm(self, dim="frequency"):
...
This is equivalent to writing:
def to_ppm(self, dim="frequency"):
...
to_ppm = requires_attrs(ATTRS.reference_frequency, ATTRS.carrier_ppm)(to_ppm)
The decorator returns a new function that first checks whether
reference_frequency and carrier_ppm exist in .attrs, and only then calls the
original to_ppm. The original function never contains any validation
code — the decorator handles it from the outside.
3. Building the Data Dictionary¶
To eliminate magic strings, we built a single source of truth for the entire vocabulary of xmris — the Data Dictionary in xmris.core.config.
Instead of scattering raw strings like "time", "reference_frequency", or "chemical_shift" throughout the codebase, every internal access goes through standard Python classes containing custom XmrisTerm string objects:
What is a singleton?
A singleton is a design pattern where only one instance of a class ever
exists in the entire program. In xmris, the config objects are created once
at the bottom of config.py:
ATTRS = XmrisAttributes()
DIMS = XmrisDimensions()
COORDS = XmrisCoordinates()
Every module that does from xmris.core import ATTRS gets a reference to
the same object. There is no way to accidentally create a second,
conflicting vocabulary. This guarantees that the vocabulary is global — a single source of truth that cannot drift.
from xmris.core import ATTRS, COORDS, DIMS
# These are typed Python objects, not bare strings.
# Your IDE will autocomplete them — typos become impossible.
print(f"{ATTRS.reference_frequency=}")
print(f"{ATTRS.b0_field=}")
print(f"{DIMS.time=}")
print(f"{DIMS.frequency=}")
print(f"{COORDS.chemical_shift=}")ATTRS.reference_frequency='reference_frequency'
ATTRS.b0_field='b0_field'
DIMS.time='time'
DIMS.frequency='frequency'
COORDS.chemical_shift='chemical_shift'
Because they use our custom XmrisTerm class under the hood, each entry natively carries rich metadata — a human-readable description and physical units. In Jupyter, simply type the name to render a formatted reference table:
print("This code cell ran and produced this ⬇️ overview.")
ATTRSThis code cell ran and produced this ⬇️ overview.
DIMSCOORDSThe Lowercase Convention¶
All xmris dimension names, coordinate names, and attribute keys are **lowercase snake_case**.
This is a deliberate decision that aligns with the broader xarray ecosystem:
| Standard / Package | Convention |
|---|---|
| CF Conventions | time, latitude, longitude |
| cf-xarray | time, latitude, vertical |
| xarray docs & tutorials | time, x, y, space |
| xmris | time, frequency, chemical_shift |
This also avoids ambiguity with multi-word names: "chemical_shift" is unambiguous
snake_case, whereas "Chemical_Shift" is a hybrid that no Python convention endorses.
As a user, you are free to name your own dimensions however you like — xmris functions
accept a dim argument for exactly this reason (see [section 5]dimensions-vs-attributes-the-great-divide)).
But whenever xmris creates a name internally (e.g., the "chemical_shift" coordinate
added by to_ppm()), it will always be lowercase.
How the Dictionary Is Used Internally¶
Throughout the xmris codebase, no function uses a bare string to access metadata. Every
attribute access, dimension reference, and coordinate name goes through the config:
# ❌ Never this:
mhz = self._obj.attrs["reference_frequency"]
ppm_coords = hz_coords / mhz
self._obj.assign_coords({"chemical_shift": (dim, ppm_coords)})
# ✅ Always this:
mhz = self._obj.attrs[ATTRS.reference_frequency]
ppm_coords = hz_coords / mhz
self._obj.assign_coords({COORDS.chemical_shift: (dim, ppm_coords)})
This means if the underlying key ever changes, we update it in one place — the class attribute — and the entire package updates automatically.
4. The “Bouncer” Pattern (Decorators)¶
With our vocabulary locked in, we needed a way to enforce it at runtime. We created a
decorator engine, @requires_attrs, that acts as a bouncer at the door of every function
that depends on hidden state.
Here is the actual source code for to_ppm, straight from the xmris codebase:
# From xmris/core/accessor.py:
@requires_attrs(ATTRS.reference_frequency, ATTRS.carrier_ppm)
def to_ppm(self, dim: str = DIMS.frequency) -> xr.DataArray:
"""Convert the frequency axis coordinates from Hz to ppm."""
# Safe! The decorator already verified these exist before we got here.
mhz = self._obj.attrs[ATTRS.reference_frequency]
carrier_ppm = self._obj.attrs[ATTRS.carrier_ppm]
hz_coords = self._obj.coords[dim].values
ppm_coords = carrier_ppm + (hz_coords / mhz)
# Pack data and metadata into a Variable and assign it
shift_var = as_variable(COORDS.chemical_shift, dim, ppm_coords)
return self._obj.assign_coords({COORDS.chemical_shift: shift_var})
The decorator does two things:
1. Fail-Fast with Helpful Errors¶
If a required attribute is missing, the bouncer intercepts the call before any math runs
and tells the user exactly what is wrong and how to fix it using standard xarray methods:
💡 Click to view the actual xmris error message
spectrum.xmr.to_ppm()
ValueError: Method 'to_ppm' requires the following missing attributes
in `obj.attrs`: ['reference_frequency', 'carrier_ppm'].
To fix this, assign them using standard xarray methods:
>>> obj = obj.assign_attrs({'reference_frequency': value})No KeyError. No stack trace through numpy internals. Just a clear message with
copy-pasteable fix code.
2. Self-Documenting Functions¶
At import time, the decorator dynamically injects a “Required Attributes” section into each function’s docstring by pulling descriptions and units directly from the Data Dictionary:
📖 Click to view the auto-generated docstring section
help(spectrum.xmr.to_ppm)Convert the frequency axis coordinates from Hz to ppm.
...
Required Attributes
--------------------
* ``reference_frequency``: Spectrometer working/reference frequency. [MHz]
* ``carrier_ppm``: The absolute chemical shift at the center of the RF excitation bandwidth. [ppm]Because the docstring is generated from the same config that powers the runtime validation, it is physically impossible for the documentation to drift out of sync with the code.
5. Dimensions vs. Attributes: The Great Divide¶
You might be wondering: “If decorators are so great for attributes, why don’t you use them for dimensions to enforce consistent use of e.g. time or frequency?”
This was the single most important architectural decision we made. We treat Dimensions and Attributes with different strategies, because they play fundamentally different roles.
Attributes Are “Hidden State”¶
A B0 field strength is a physical constant of the experiment. You don’t apply an operation
to the B0 field; the math just requires it to exist in the background. Because it is
invisible, it needs strict guarding by our @requires_attrs decorator.
Dimensions Are an “Action Space”¶
When you apply an FFT or an apodization, you are actively choosing which axis to act upon.
We want you to have the freedom to say, “apply this to the t axis” — even if your data
doesn’t follow the xmris lowercase convention.
If we strictly forced you to rename your axes to "time" and "frequency" before doing any
processing, the package would feel rigid and hostile toward quick-and-dirty datasets.
Therefore, dimensions are passed as explicit arguments with smart defaults:
from xmris.core import DIMS
# Your data uses the xmris standard "time" dimension? Just use the defaults:
result = fid.xmr.apodize_exp(lb=5.0)
# Your data has a custom axis name? No problem — just pass it:
result = fid.xmr.apodize_exp(dim="t", lb=5.0)
# You can even pass xmris constants explicitly for maximum clarity:
result = fid.xmr.apodize_exp(dim=DIMS.time, lb=5.0)
And if you pass a dimension that doesn’t exist at all, xmris gives you a clear,
actionable error — just like the attribute bouncer:
💡 Click to view the dimension error message
fid.xmr.apodize_exp(dim="randomname")ValueError: Method 'apodize_exp' attempted to operate on missing
dimension(s): ['randomname'].
Available dimensions are: ['time'].
To fix this, either pass the correct `dim` string argument to the function,
or rename your data's axes using xarray:
>>> obj = obj.rename({'randomname': DIMS.time})The Design Rule¶
Here is the rule we follow throughout the entire codebase:
| Attributes (Hidden State) | Dimensions (Action Space) | |
|---|---|---|
| Nature | Physical constants of the experiment | Axes the user chooses to act upon |
| Guarded by | @requires_attrs decorator | _check_dims helper |
| User interface | Implicit (read from .attrs) | Explicit argument with smart default |
| Example | ATTRS.reference_frequency | dim=DIMS.time → dim="time" |
Putting It All Together¶
Let’s trace through a single function call — spectrum.xmr.to_ppm() — to see every
architectural layer working in concert:
Every layer serves a distinct purpose:
Config constants (
ATTRS.reference_frequency,DIMS.frequency,COORDS.chemical_shift) eliminate magic strings everywhere.@requires_attrscatches missing metadata before the math runs and auto-generates the docstring._check_dimsvalidates the dimension argument at call time, listing what’s available.The function body is pure science — no validation code, no defensive
try/exceptblocks.
Summary¶
By combining xarray encapsulation, a strongly-typed Data Dictionary, fail-fast decorators
for hidden state, and explicit arguments for action spaces, xmris strives for three goals:
Rigorously safe — no silent math failures from swapped parameters or missing metadata.
Highly transparent — docstrings generate themselves from the config; documentation can never drift from code.
Easy to use — clean, chainable APIs with zero parameter soup.
For a quick-start example, head back to the Welcome page.
Happy processing!