Fundamentals

Introduction

Here we strive to provide a practical summary. We also recommend reading a theoretical overview of the RascalC methodology in Sections 2.1 and 3 of Rashkovetskyi et al 2025, or Section 2.2 of Rashkovetskyi 2025 dissertation. Many of the more technical (algorithm implementation) details omitted there are described in Sections 3 and 4 of Philcox et al 2020.

RascalC constructs a one-parametric covariance matrix model (in the single-tracer case) from an arbitrary (e.g., empirical or theoretical) 2-point correlation function (2PCF):

(1)\[C(\alpha_{\rm SN}) = C_4 + C_3 \alpha_{\rm SN} + C_2 \alpha_{\rm SN}^2\]

where \(C_4, C_3\) and \(C_2\) are 4-, 3- and 2-point terms respectively. The model parameter \(\alpha_{\rm SN}\) scales density, or equivalently the amount of shot noise, and is accordingly called shot-noise rescaling. We recommend calibrating this shot-noise rescaling parameter using a reference covariance from jackknife resampling of the data (Jackknife pipeline), or variations in a sample of mocks (Mock pipeline).

The model is similar to the Gauss-Poisson approximation in Fourier space (see e.g. Grieb et al 2016) allowing the shot noise to be a free parameter. Working in configuration space allows to naturally account for survey geometry and selection, including the variation of the expected density \(\bar n\) across the survey volume, by sampling points from the random catalog.

This Python library currently only implements the covariance estimators for 2PCF. Covariances for 3PCF (from later parts of Philcox & Eisenstein 2019) and configuration-space power spectrum estimators (Philcox & Eisenstein 2020) are not supported (yet). To use them, one would probably need to dig in the historic C++ code.

Basic/minimal pipeline

Use with caution! This pipeline does not include a reference to determine the shot-noise rescaling parameter. The default value for the shot-noise rescaling is 1, but we can not recommend using it in real applications. Refer to Jackknife pipeline or Mock pipeline after looking at the Basic/minimal pipeline flowchart and reading the General guidance after it. However, we can suggest at least two situations in which this variant is advisable:

  • Two-tracer covariance when the shot-noise rescaling values for each tracer are known (or will be known soon) from single-tracer jackknife computations. Currently, building a two-tracer jackknife model seems excessive.

  • You can also use the basic pipeline to re-run the post-processing with a mock 2PCF sample later. However, substituting jackknife can not be deferred like this because the jackknife model crucially depends on the jackknife RR counts.

digraph pipeline_basic {
"Random catalog", "Data catalog" [style=filled, fillcolor=red];
"RR counts", "Full 2PCF", "Shot-noise rescaling" [shape=egg, style=filled, fillcolor=orange];
"RascalC", "Full covariance model" [shape=box, style=filled, fillcolor=yellow];
"RascalC" [fontname="Courier New"];
"Random catalog" -> {"RascalC" "RR counts" "Full 2PCF"};
"Data catalog" -> "Full 2PCF";
{"RR counts" "Full 2PCF"} -> "RascalC" -> "Full covariance model";
{"Full covariance model" "Shot-noise rescaling"} -> "Full (final) covariance";
"Full (final) covariance" [style=filled, fillcolor=green];
}

Basic/minimal pipeline flowchart

General guidance

General practical usage remarks for the Python wrapper function, RascalC.run_cov():

  • RR counts and 2PCF can and should all be estimated at the same time using the pycorr library for 2-point correlation function estimation; this is an external step to RascalC.run_cov().

    • Use s_mu mode in pycorr, other counting modes are not supported by RascalC.

    • Use the Landy-Szalay 2PCF estimator, or the natural 2PCF estimator with analytical RR counts in periodic boxes. Alternative estimators have a different (typically higher) variance and are not supported by RascalC.

    • The necessary pair counts can be computed on GPU, whereas RascalC can only use CPU (currently).

    • Even if you need to use CPU, you should run the counts in a separate, independent process from the one calling RascalC.run_cov(), because both should be parallelized and they are known to interfere with each other’s efficiency.

  • Choose a binning mode for the covariance:

    • s_mu mode for angular bins (uniform in \(0 \le \left| \mu \right| \le 1\), where \(\mu \equiv \cos \theta\); the mode was implemented in Philcox et al. 2020 and used in Rashkovetskyi et al 2023);

    • legendre_projected mode for Legendre multipole moments in separation (radial) bins, corresponding to the pycorr multipole estimation via projection from angular bins (introduced and validated in Rashkovetskyi et al 2025).

    • legendre_accumulated mode for Legendre multipole moments accumulated at pair-counting, using a survey correction function with realistic survey geometry (introduced in Philcox & Eisenstein 2019). It is simpler with periodic cubic boxes, but this mode is not compatible with jackknives.

  • Load the pycorr 2PCF estimator computed beforehand, cut and/or rebin it to radial (separation) bins desired for the covariance (e.g., 4 Mpc/h wide from 20 to 200 Mpc/h) and pass it through the pycorr_allcounts_11 argument.

    • In s_mu mode, you should also rebin angularly to the desired number of angular bins, barring wrapping of \(-1 \le \mu < 0\), as explained next:

      • It is recommended that you leave the counts in \(-1 \le \mu \le 1\) bins (for potential error correction), the code will wrap them to \(0 \le \left| \mu \right| \le 1\) automatically, halving the number of angular bins.

    • In Legendre modes, you can leave the angular (\(\mu\)) bins as they are.

  • You also need to provide input clustering in a form of 2PCF table via the xi_table_11 argument. You can use the pre-computed and loaded pycorr 2PCF estimator again, but you might want to rebin it differently from the previous case. It is usually advisable to have xi_table_11 in finer radial bins than pycorr_allcounts_11, but the angular (\(\mu\)) should not be too fine to avoid noisiness.

  • Random positions are another necessary input as the randoms_positions1 argument.

    • The number of randoms for RascalC does not have to be the same as for pair counting and 2PCF estimation (except when you disable normalize_wcounts). It should be high enough to provide a good representation of survey geometry, but not too high to keep the run time reasonable.

    • For 2PCF covariance after standard BAO reconstruction, provide the shifted randoms (Rashkovetskyi et al 2023, Rashkovetskyi et al 2025; the input 2PCF conversion in that case will be applied automatically to a pycorr estimator).

    • For a periodic cubic box (without reconstruction), you will need to generate uniform random positions yourself.

  • You must also pass the weights for the randoms through the randoms_weights1 argument. These must match what you used for pair counting and 2PCF estimation with pycorr.

    • If you did not use weights in pycorr, you should pass an array containing 1 for each random point as randoms_weights1.

  • Set the output directory, out_dir. We highly recommended a different output directory for each run. This directory will contain all information necessary for post-processing, and a complimentary log file (log.txt).

  • Set the temporary directory, tmp_dir. Bear in mind that it will need to temporarily contain the random catalog(s) (positions, weights, and jackknife regions if applicable) in text format. This directory can be deleted after the run, and the code normally strives to leave it in its original state. We highly recommended a different temporary directory for each run.

  • Set the number of threads via nthread.

    • We recommend trying different options before massive computations.

    • At NERSC Perlmutter, the best option seems to be nthread=64 on half the CPU node (shared queue, requesting 1 node and 128 SLURM cores, which are hyperthreads, and correspond to 64 physical cores).

  • Set N2, N3 and N4. These are importance sampling settings: RascalC will try to take N2 secondary points per each random point provided, N3 tertiary points per each secondary point and N4 quaternary points per each tertiary point. Common values have been N2=5, N3=10, N4=20, but you may need to adjust them (see Addressing convergence issues).

  • Set the number of integration loops n_loops for covariance matrix terms evaluation. It should be divisible by nthread. 1024 might be a nice starting value, but you may need to adjust it (see Addressing convergence issues).

    • The runtime roughly (not exactly) scales as the number of quads per the number of threads, N_randoms * N2 * N3 * N4 * n_loops / nthread in single-tracer mode. For reference, on NERSC Perlmutter CPU the code processed about 27 millions (2.7e7) quads per second per thread (in December 2024).

  • Set the number of loops per sample, loops_per_sample. This sets the amount of auxiliary output used almost exclusively for General quality control. loops_per_sample needs to be a divider of n_loops, and we recommend keeping n_loops / loops_per_sample (the number of output subsamples) roughly between 10 and 30. Smaller values may require you to wait too long before there is any usable output, or give insufficient information for General quality control. Larger values can lead to too much output.

  • To compute a full two-tracer covariance, you need to also provide all of the following:

    • cross-counts as pycorr_allcounts_12 and second tracer auto-counts as pycorr_allcounts_22 (rebinned in the same way as pycorr_allcounts_11);

    • cross-correlation function as xi_table_12 and the second tracer auto-correlation function as xi_table_22 (in the same format as xi_table_11);

    • second tracer random points positions (randoms_positions2) and weights (randoms_weights2).

  • RascalC in the flowcharts refers to the most computationally intensive steps (implemented in C++), at which the coefficients for the covariance matrix models are evaluated. These coefficients are saved in a Raw_Covariance_Matrices*.npz file in the chosen output directory.

  • Basic/minimal post-processing involves substituting a fixed shot-noise rescaling value (or two values in case of two tracers) into the full covariance model to obtain the final covariance. These operations normally are invoked at the end of RascalC.run_cov(), but they can also be performed separately using RascalC.post_process_auto(). The results are saved in a Rescaled_Covariance_Matrices*.npz file in the chosen output directory.

    • External shot-noise rescaling value(s) (e.g. from jackknife or mock calibration) can be supplied via the shot_noise_rescaling1 parameter (for the first tracer; and shot_noise_rescaling2 for the second tracer if applicable).

      • Usage of covariance matrices obtained with the default shot-noise rescaling value(s) of 1 is generally not recommended.

      • These shot_noise_rescaling parameters set the fixed shot-noise rescaling values only for the Basic/minimal pipeline. They are ignored in Jackknife pipeline and Mock pipeline, where the shot-noise rescaling value is instead determined based on a reference covariance.

After the run (execution of RascalC.run_cov()):

  • If it finished normally with no errors, congratulations!

  • If it terminated early and/or the job timed out, it is often worth invoking the post-processing with RascalC.post_process_auto() to see whether sufficiently good results were saved.

  • If there was a convergence-related error or warning, please refer to Addressing convergence issues.

  • If anything is unclear, please contact the developer.

In any case, take a look at General quality control after the run. To work with the final results more conveniently, we recommend seeing Loading and exporting the final covariance matrices.

After reading to this point, please refer to Jackknife pipeline or Mock pipeline to see which fits your needs better.

Jackknife pipeline

The jackknife pipeline allows to obtain the covariance matrix using only the observational data, without simulations (or using only a single simulated realization). It has been tested most thoroughly (see e.g. Rashkovetskyi et al 2025), and is showcased in most Practical examples, especially tutorial.ipynb.

digraph pipeline_jack {
"Random catalog", "Data catalog" [style=filled, fillcolor=red];
"RR counts (full and jackknife)", "Full 2PCF", "Jackknife 2PCF" [shape=egg, style=filled, fillcolor=orange];
"RascalC", "Full covariance model", "Jackknife covariance model", "Jackknife covariance", "Best-fit shot-noise rescaling" [shape=box, style=filled, fillcolor=yellow];
"RascalC" [fontname="Courier New"];
"Random catalog" -> {"RascalC" "RR counts (full and jackknife)" "Full 2PCF" "Jackknife 2PCF"};
"Data catalog" -> {"Full 2PCF" "Jackknife 2PCF"};
{"RR counts (full and jackknife)" "Full 2PCF"} -> "RascalC" -> {"Full covariance model" "Jackknife covariance model"};
"Jackknife 2PCF" -> "Jackknife covariance";
{"Jackknife covariance model" "Jackknife covariance"} -> "Best-fit shot-noise rescaling";
{"Full covariance model" "Best-fit shot-noise rescaling"} -> "Full (final) covariance";
"Full (final) covariance" [style=filled, fillcolor=green];
}

Jackknife pipeline flowchart as in Figure 1 from Rashkovetskyi et al 2025

Practical remarks particular to the jackknife pipeline with RascalC.run_cov() in addition to the General guidance:

  • Jackknife and full RR counts and 2PCF can and should all be estimated at the same time using the pycorr library for 2-point correlation function estimation.

    • Remember that you should run the counts in a separate, independent process from the one calling RascalC.run_cov(), because both should be parallelized and they are known to interfere with each other’s efficiency.

  • The jackknife 2PCF will be loaded from the pycorr_allcounts_11 argument (rebinned as explained above).

  • Assign the jackknife regions to the random points (randoms_positions1) in the same way as for 2PCF and pair counts, and pass the assignment results (jackknife region number for each random point) through the randoms_samples1 argument.

    • Technically, passing the non-None randoms_samples1 argument switches on the jackknife functionality in RascalC.run_cov().

  • The code produces a separate model for the jackknife covariance, which takes into account correlations between the jackknife regions.

  • Jackknife post-processing involves fitting the jackknife covariance model to the data jackknife covariance to find the optimal shot-noise rescaling and substituting that value into the full covariance model to obtain the final covariance. These operations normally are invoked at the end of RascalC.run_cov(), but they can also be performed separately using RascalC.post_process_auto(). The results are saved in a Rescaled_Covariance_Matrices*Jackknife*.npz file in the chosen output directory.

Take a look at General quality control after the run. To work with the final results more conveniently, we recommend seeing Loading and exporting the final covariance matrices.

Mock pipeline

Tuning the shot-noise rescaling on mocks was the original method in O’Connell et al. 2016; it does not require such a large number of realizations as the direct sample covariance estimation from mocks. This can be seen as a theory-based template smoothing of the mock sample covariance (where the templates are the full RascalC covariance model terms), reducing the noise. However, since O’Connell & Eisenstein 2018 it has been largely superseded by the idea of using jackknives, which eliminated the need for mocks except for an occasional validation. Accordingly, there are few good usage examples, but we are working on this.

digraph pipeline_mock {
"Random catalog", "Data catalog", "Mock catalogs" [style=filled, fillcolor=red];
"RR counts", "Full 2PCF", "Mock 2PCFs" [shape=egg, style=filled, fillcolor=orange];
"RascalC", "Full covariance model", "Sample covariance", "Best-fit shot-noise rescaling" [shape=box, style=filled, fillcolor=yellow];
"RascalC" [fontname="Courier New"];
"Random catalog" -> {"RascalC" "RR counts" "Full 2PCF"};
"Data catalog" -> "Full 2PCF";
{"RR counts" "Full 2PCF"} -> "RascalC" -> "Full covariance model";
"Mock catalogs" -> "Mock 2PCFs" -> "Sample covariance";
{"Full covariance model" "Sample covariance"} -> "Best-fit shot-noise rescaling";
{"Full covariance model" "Best-fit shot-noise rescaling"} -> "Full (final) covariance";
"Full (final) covariance" [style=filled, fillcolor=green];
}

Mock pipeline flowchart as in Figure 2 from Rashkovetskyi et al 2025

Currently, the mock pipeline can only be used by

  1. running the Basic/minimal pipeline;

  2. computing and writing the mock 2PCF sample covariance with e.g.

    • RascalC.pycorr_utils.sample_cov.sample_cov_from_pycorr_to_file() in s_mu mode;

    • RascalC.pycorr_utils.sample_cov_multipoles.sample_cov_multipoles_from_pycorr_to_file() in Legendre multipole modes;

  3. invoking the manual and rather tedious post-processing by choosing an appropriate function:

    • RascalC.post_process.post_process_default_mocks() in s_mu mode (for a single tracer);

      • RascalC.post_process.post_process_default_mocks_multi() in s_mu mode for two tracers;

    • RascalC.post_process.post_process_legendre_mocks() in Legendre modes (for a single tracer).

The post-processing results will be saved in a Rescaled_Covariance_Matrices*Mocks*.npz file in the chosen output directory. Take a look at General quality control after the run. To work with the final results more conveniently, we recommend seeing Loading and exporting the final covariance matrices.

We are working on allowing to pass the mock correlation functions and/or the mock sample covariance to RascalC.run_cov() and RascalC.post_process_auto() to make this pipeline easier to use.

General quality control

The convergence checks mostly follow Section 6.1 of Rashkovetskyi et al 2025.

  1. The strictest criterion is that the final covariance matrix should be positive definite. If this condition is violated, the Python code raises an exception, which should be easy to notice.

  2. Next, there is the eigenvalue test, which produces warnings if failed:

    • In the original (stronger) version (Equation (4.5) in Philcox et al 2020), the minimal eigenvalue of the 4-point covariance term \(C_4\) should be larger than minus the minimal eigenvalue of the 2-point term \(C_2\), both in (1).

      • Shot-noise rescaling values smaller than 1 (which are quite common, e.g. in Rashkovetskyi et al 2025) make this criterion stricter, because they scale the 2-point term down by \(\alpha_{\rm SN}^2\). So the code now repeats the test is the optimal shot-noise rescaling value becomes less than 1.

    • On the other hand, the compared eigenvalues of \(C_4\) and \(C_2\) can correspond to quite different separation scales, making the original criterion unnecessarily strict in some cases. This consideration led us to introduce the weaker version, where we compare the eigenvalues of \(C_2^{-1/2} C_4 C_2^{-1/2}\) with \(-1\) or \(-\alpha_{\rm SN}^2\). Here \(C_2^{-1/2}\) is the inverse of the matrix square root (scipy.linalg.sqrtm) of the 2-point term, which scales the different parts of the 4-point term matrix more appropriately. (The 2-point term is either diagonal or block-diagonal with small blocks, so taking its matrix square root should be numerically stable.)

  3. Finally, there is the extra convergence check (RascalC.convergence_check_extra) performed at the end of RascalC.run_cov() or RascalC.post_process_auto() by default.

    • After Section 3.2 of Rashkovetskyi et al 2023, we recommend focusing on R_inv (\(R_{\rm inv}\)) values. There is no universal threshold, but some decent reference values are:

      • <0.6% (6e-3) for Early DESI data BGS/LRG with 45 bins (45 radial times 1 angular);

      • <5% (5e-2) for DESI DR1/DR2 LRG/ELG, <12% (1.2e-1) for BGS_BRIGHT-21.5 (\(0.1<z<0.4\)) and <20% (2e-1) for BGS_BRIGHT-21.35 with 135 bins (45 radial times 3 multipoles). QSO have almost always converged much better, like 0.1-0.2% (2e-3) due to higher shot-noise, making the easy 2-point term the dominant one.

      • Values exceeding 1 are high.

    • These figures of intrinsic scatter in covariance sums/integrals estimated with importance sampling tend to increase

      • as the number of bins increases (the trend is the same for mocks — see e.g. Equation (3.12) in Rashkovetskyi et al 2023)

      • as the sample density increases and shot-noise decreases (parallel with mocks is less clear, but dense samples are also harder to simulate).

Addressing convergence issues

Use these instructions when

  • you get the error message The full covariance is not positive definite - insufficient convergence;

  • the R_inv values from the extra convergence check are worryingly high (see above for reference, or reach out if in doubt);

  • you get the 4-point covariance matrix has not converged properly via the weaker eigenvalue test warnings, although they seldom appear without one of the previous two issues;

    • you can probably ignore the warning(s) about stronger eigenvalue test if they appear alone.

First, you can re-run post-processing with alternative options using RascalC.post_process_auto():

  • skip a few bins with smallest separations by passing a single positive integer (their number) via skip_s_bins;

    • also skip a few bins with highest separations by passing a tuple of two positive numbers via skip_s_bins; among them, the first number sets how many bins to skip at the low end, the second — at the high end;

  • in Legendre mode, you can also try skipping highest multipoles by passing a single positive integer (their number) via skip_l.

  • you can also try to discard “unlucky” samples using the n_samples argument, but this seldom helps and can become confusing.

The above are the fastest options because they only require re-running the post-processing script while re-using the products of the main computation.

Then, consider running the main computation again with RascalC.run_cov() to generate a smaller covariance matrix by

  • using coarser radial (separation) bins;

  • in s_mu mode, using coarser angular (\(\mu\)) bins;

  • in Legendre mode, using fewer multipoles (this alone can be done faster and easier in post-processing, as was just mentioned above, but not in combination with different radial/separation bins).

If changing the covariance binning is undesirable or does not help, you should run longer. Use a different output directory to prevent confusion and overrides; renaming the old output directory also works. This can be done in different ways:

  • Increase the number of loops (n_loops).

    • Occasionally bad convergence is just bad luck, so running again with the same settings, including n_loops might not be needed. In that case, just do not use the same fixed seed, as that should reproduce the results exactly.

    • If you keep other settings fixed (except n_loops, nthread and naturally seed — you can change those more freely), you can also concatenate (combine) samples from different runs into a new, different directory using RascalC.cat_raw_covariance_matrices() to reach even better convegence (check it by post-processing the new directory with RascalC:post_process_auto()). However, combining samples does not always improve convergence, and keeping track of different sample combination can be hard.

  • Increase N2 — probably not recommended, because the effect is similar to increasing n_loops, but sample combination is no longer an optionl.

  • Increase N4 and/or N3. It is probably more sensible than the previous options because we expect the higher-point terms to converge slower. N4 will only affect the 4-point term \(C_4\); N3 also affects the 3-point term \(C_3\).

The convergence issues can be persistent. Please do not hesitate to reach out.

The main computation wrapper

This Python wrapper of RascalC is heavily interfaced with pycorr library for 2-point correlation function estimation. Many of the arguments are intentionally similar to pycorr.TwoPointCorrelationFunction high-level interface.

Please bear with the long description; you can pay less attention to settings labeled optional in the beginning.

RascalC.run_cov(mode: Literal['s_mu', 'legendre_projected', 'legendre_accumulated'], nthread: int, N2: int, N3: int, N4: int, n_loops: int, loops_per_sample: int, out_dir: str, tmp_dir: str, randoms_positions1: ndarray[float], randoms_weights1: ndarray[float], pycorr_allcounts_11: BaseTwoPointEstimator, xi_table_11: BaseTwoPointEstimator | tuple[ndarray[float], ndarray[float], ndarray[float]] | tuple[ndarray[float], ndarray[float], ndarray[float], ndarray[float], ndarray[float]] | list[ndarray[float]], position_type: Literal['rdd', 'xyz', 'pos'] = 'pos', xi_table_12: None | BaseTwoPointEstimator | tuple[ndarray[float], ndarray[float], ndarray[float]] | tuple[ndarray[float], ndarray[float], ndarray[float], ndarray[float], ndarray[float]] | list[ndarray[float]] = None, xi_table_22: None | BaseTwoPointEstimator | tuple[ndarray[float], ndarray[float], ndarray[float]] | tuple[ndarray[float], ndarray[float], ndarray[float], ndarray[float], ndarray[float]] | list[ndarray[float]] = None, xi_cut_s: float = 250, xi_refinement_iterations: int = 10, pycorr_allcounts_12: BaseTwoPointEstimator | None = None, pycorr_allcounts_22: BaseTwoPointEstimator | None = None, normalize_wcounts: bool = True, no_data_galaxies1: float | None = None, no_data_galaxies2: float | None = None, randoms_samples1: ndarray[int] | None = None, randoms_positions2: ndarray[float] | None = None, randoms_weights2: ndarray[float] | None = None, randoms_samples2: ndarray[int] | None = None, max_l: int | None = None, boxsize: float | None = None, skip_s_bins: int | tuple[int, int] = 0, skip_l: int = 0, shot_noise_rescaling1: float = 1, shot_noise_rescaling2: float = 1, sampling_grid_size: int = 301, coordinate_scaling: float = 1, seed: int | None = None, verbose: bool = False) dict[str, ndarray[float]][source]

Run the 2-point correlation function covariance integration.

Parameters

modestring

Choice of binning setup, one of:

  • "s_mu": compute covariance of the correlation function in s, µ bins. Only linear µ binning between 0 and 1 supported.

  • "legendre_projected": compute covariance of the correlation function Legendre multipoles in separation (s) bins projected from µ bins (only linear µ binning supported between 0 and 1). Procedure matches pycorr. Works with jackknives, may be less efficient in periodic geometry.

  • "legendre_accumulated": compute covariance of the correlation function Legendre multipoles in separation (s) bins accumulated directly, without first doing µ-binned counts. Incompatible with jackknives.

max_linteger

Max Legendre multipole index (required in both Legendre modes). Has to be even.

boxsizeNone or float

Periodic box side (one number — so far, only cubic boxes are supported). All the coordinates need to be between 0 and boxsize. If None (default) or 0, assumed aperiodic.

position_typestring, default=”pos”

Type of input positions, one of:

  • “rdd”: RA, Dec (both in degrees), distance; shape (3, N)

  • “xyz”: Cartesian positions, shape (3, N)

  • “pos”: Cartesian positions, shape (N, 3).

randoms_positions1array of floats, shaped according to position_type

Coordinates of random points for the first tracer.

randoms_weights1array of floats of length N_randoms

Weights of random points for the first tracer.

randoms_samples1None or array of integers of length N_randoms

Jackknife region numbers for random points for the first tracer. If given (and not None), enables the jackknife functionality (tuning of shot-noise rescaling on jackknife correlation function estimates). The jackknife assignment must match the jackknife counts in pycorr_allcounts_11 (and pycorr_allcounts_12 with multi-tracer functionality enabled).

randoms_positions2None or array of floats, shaped according to position_type

(Optional) coordinates of random points for the second tracer. If given and not None, enables the multi-tracer functionality (full two-tracer covariance estimation).

randoms_weights2None or array of floats of length N_randoms2

(Optional) weights of random points for the second tracer (required for multi-tracer functionality).

randoms_samples2None or array of integers of length N_randoms2

(Optional) jackknife region numbers for the second tracer (required for multi-tracer + jackknife functionality, although this combination has not been used yet). The jackknife assignment must match the jackknife counts in pycorr_allcounts_12 and pycorr_allcounts_22.

pycorr_allcounts_11pycorr.TwoPointEstimator

pycorr.TwoPointEstimator with auto-counts for the first tracer. Must be rebinned and/or cut to the separation (s) bins desired for the covariance. Note that more bins result in slower convergence. A typical configuration has been 4 Mpc/h wide bins between 20 and 200 Mpc/h. The counts will be wrapped to positive µ, so if the µ range in them is from -1 to 1, the number of µ bins must be even. Providing unwrapped counts (µ from -1 to 1) is preferrable, because some issues can be fixed by assuming symmetry.

  • In “s_mu” mode, the covariance will be done for the given number of µ bins (after wrapping).

  • In “legendre_projected” mode, it will be assumed that Legendre multipoles are projected from the same number of µ bins as present in these counts. One might consider rebinning more coarsely for faster performance but less guaranteed accuracy (neither effect has been tested yet).

  • In “legendre_accumulated” mode, all the present µ bins (after wrapping) will be used to fit the survey correction functions.

For jackknife functionality, pycorr_allcounts_11 must contain jackknife RR counts and correlation function. The jackknife assigment must match randoms_samples1.

pycorr_allcounts_12pycorr.TwoPointEstimator

(Optional) pycorr.TwoPointEstimator with cross-counts between the two tracers (order does not matter, because they will be wrapped). Must have the same bin configuration as pycorr_allcounts_11. For jackknife functionality, must contain jackknife RR counts and correlation function. The jackknife assigment must match randoms_samples1 and randoms_samples2.

pycorr_allcounts_22pycorr.TwoPointEstimator

(Optional) pycorr.TwoPointEstimator with auto-counts for the second tracer. Must have the same bin configuration as pycorr_allcounts_11. For jackknife functionality, must contain jackknife RR counts and correlation function. The jackknife assigment must match randoms_samples2.

normalize_wcountsboolean

(Optional) whether to normalize the weights and weighted counts. If False, the provided RR counts must match what can be obtained from given randoms, otherwise the covariance matrix will be off by a constant factor. Example: if counts were computed with n_randoms roughly similar random chunks and only one is provided to RascalC here, the counts should be divided by n_random where s > split_above and by n_random ** 2 where s < split_above. If True (default), the weights will be normalized so that their sum is 1 and the counts will be normalized by their wnorm, which gives a match with default pycorr normalization settings.

no_data_galaxies1None or float

(Optional) number of first tracer data (not random!) points for the covariance rescaling. If None (default), the code will attempt to obtain it from pycorr_allcounts_11.

no_data_galaxies2None or float

(Optional) number of second tracer data (not random!) points for the covariance rescaling. If None (default), the code will attempt to obtain it from pycorr_allcounts_22.

xi_table_11pycorr.TwoPointEstimator, or sequence (tuple or list) of 3 elements: (s_values, mu_values, xi_values), or sequence (tuple or list) of 4 elements: (s_values, mu_values, xi_values, s_edges)

Table of first tracer auto-correlation function in separation (s) and µ bins. The code will use it for interpolation in the covariance matrix integrals. Important: if the given correlation function is an average in s, µ bins, the separation bin edges need to be provided (and the µ bins are assumed to be linear) for rescaling procedure which ensures that the interpolation results averaged over s, µ bins returns the given correlation function. In case of pycorr.TwoPointEstimator, the edges will be recovered automatically. Unwrapped estimators (µ from -1 to 1) are preferred, because symmetry allows to fix some issues. In the sequence format:

  • s_values must be a 1D array of reference separation (s) values for the table, of length N;

  • mu_values must be a 1D array of reference µ values (covering the range from 0 to 1) for the table, of length M;

  • xi_values must be an array of correlation function values at those s, µ values of shape (N, M);

  • s_edges, if given, must be a 1D array of separation bin edges of length N+1. The bins must come close to zero separation (say start at s <= 0.01).

The sequence containing 3 elements should be used for theoretical models evaluated at a grid of s, mu values. The 4-element format should be used for bin-averaged estimates.

xi_table_12None or the same format as xi_table_11

Table of the two tracer’s cross-correlation function in separation (s) and µ bins. The code will use it for interpolation in the covariance matrix integrals. Required for multi-tracer functionality.

xi_table_22None or the same format as xi_table_11

Table of second tracer auto-correlation function in separation (s) and µ bins. The code will use it for interpolation in the covariance matrix integrals. Required for multi-tracer functionality.

xi_cut_sfloat

(Optional) separation value beyond which the correlation function is assumed to be zero for the covariance matrix integrals. Default: 250. Between the maximum separation from xi_tables and xi_cut_s, the correlation function is extrapolated as \(\propto s^{-4}\).

xi_refinement_iterationsinteger

(Optional) number of iterations in the correlation function refinement procedure for interpolation inside the code, ensuring that the bin-averaged interpolated values match the binned correlation function estimates. Default: 10. Important: the refinement procedure is disabled completely regardless of this setting if the xi_tables are sequences of 3 elements, since they are presumed to be a theoretical model evaluated at a grid of s, mu values and not bin averages.

nthreadinteger

Number of (OpenMP) threads to use. Can not utilize more threads than n_loops.

  • IMPORTANT: AVOID multi-threading in the Python process calling this function (e.g. at NERSC this would mean not setting OMP_* and other *_THREADS environment variables; the code should be able to set them by itself). Otherwise the code may run effectively single-threaded. If you need other multi-threaded calculations, run them separately or spawn sub-processes.

N2integer

Number of secondary points to sample per each primary random point. Setting too low (below 5) is not recommended.

N3integer

Number of tertiary points to sample per each secondary point. Setting too low (below 5) is not recommended.

N4integer

Number of quaternary points to sample per each tertiary point. Setting too low (below 5) is not recommended.

n_loopsinteger

Number of integration loops. For optimal balancing and minimal idle time, should be a few times (at least twice) nthread and exactly divisible by it. The runtime roughly scales as the number of quads per the number of threads, \(\mathcal{O}\)(N_randoms * N2 * N3 * N4 * n_loops / nthread). For reference, on NERSC Perlmutter CPU half-node the code processed about 27 millions (2.7e7) quads per second per thread (using 64 threads on half a node) as of December 2024. (In Legendre projected mode, which is probably the slowest, with N2 = 5, N3 = 10, N4 = 20.) In single-tracer mode, the number of quads is N_randoms * N2 * N3 * N4 * n_loops. In two-tracer mode, the number of quads is (5 * N_randoms1 + 2 * N_randoms2) * N2 * N3 * N4 * n_loops.

loops_per_sampleinteger

Number of loops to merge into one output sample. Must divide max_loops. Recommended to keep the number of samples = n_loops / loops_per_sample roughly between 10 and 30.

out_dirstring

Directory for important outputs. Moderate disk space required (up to a few hundred megabytes), but increases with covariance matrix size and number of samples (see above).

tmp_dirstring

Directory for temporary files. Can be deleted after running the code, and is cleaned up after the normal execution. More disk space required - needs to store all the input arrays in the current implementation.

skip_s_binsinteger or tuple of two integers

(Optional) removal of separations bins at the post-processing stage. First (or the only) number sets the number of radial/separation bins to skip from the beginning (lowest-separation bins tend to converge worse and probably will not be precise due to the limitations of the formalism). Second number (if provided) sets the number of radial/separation bins to skip from the end. By default, no bins are skipped at the post-processing stage.

skip_linteger

(Only for the Legendre modes; optional) number of highest (even) multipoles to skip at the post-processing stage. (Higher multipole moments of the correlation function tend to converge worse.) Default 0 (no skipping).

shot_noise_rescaling1float

(Optional) shot-noise rescaling value for the first tracer if known beforehand. Default 1 (no rescaling). Will be ignored in jackknife mode - then shot-noise rescaling is optimized on the auto-covariance.

shot_noise_rescaling2float

(Optional) shot-noise rescaling value for the second tracer if known beforehand. Default 1 (no rescaling). Will be ignored in jackknife mode - then shot-noise rescaling is optimized on the auto-covariance.

seedinteger or None

(Optional) If given as an integer, allows to reproduce the results with the same settings, except the number of threads. Individual subsamples may differ because they are accumulated/written in order of loop completion which may depend on external factors at runtime, but the final integrals should be the same. If None (default), the initialization will be random.

sampling_grid_sizeinteger

(Optional) first guess for the sampling grid size. The code should be able to find a suitable number automatically.

verbosebool

(Optional) report each 5% of each loop’s progress by printing. Default False (off). This can be a lot of output, only use when the number of loops is small.

coordinate_scalingfloat

(Optional) scaling factor for all the Cartesian coordinates. Default 1 (no rescaling). This option is supported by the C++ code, but its use cases are not very clear. Zero or negative value is reset to boxsize, rescaling an unit cube to full periodicity.

Returns

post_processing_resultsdict[str, np.ndarray[float]]

Post-processing results as a dictionary with string keys and Numpy array values. All this information is also saved in a Rescaled_Covariance_Matrices*.npz file in the output directory. Selected common keys are: "full_theory_covariance" for the final covariance matrix and "shot_noise_rescaling" for the shot-noise rescaling value(s). There will also be a Raw_Covariance_Matices*.npz file in the output directory (as long as the C++ code has run without errors), which can be post-processed separately in a different way.

Post-processing

A suitable post-processing routine is invoked at the end of the main wrapper function (RascalC.run_cov()), so in many circumstances you may not need to run it separately. However, this automated but customizable post-processing routine is useful for timed-out runs, switching the mode, testing different cuts and/or output combinations in cases of insufficient convergence, etc.

RascalC.post_process_auto(file_root: str, out_dir: str | None = None, skip_s_bins: int | tuple[int, int] = 0, skip_l: int = 0, tracer: ~typing.Literal[1, 2] = 1, n_samples: None | int | ~typing.Iterable[int] | ~typing.Iterable[bool] = None, shot_noise_rescaling1: float = 1, shot_noise_rescaling2: float = 1, print_function: ~typing.Callable[[str], None] = <built-in function print>, extra_convergence_check: bool = True, jackknife: bool | None = None, legendre: bool | None = None, two_tracers: bool | None = None, n_r_bins: int | None = None, n_mu_bins: int | None = None, n_jack: int | None = None, max_l: int | None = None) dict[str][source]

Automatic but highly customizable post-processing interface. Designed to work with the RascalC.run_cov() outputs.

  • By default, this function guesses jackknife pipeline and the covariance binning mode by the output directory contents, but you can also specify some or all of these regimes via optional arguments.

  • Note that skip_s_bins and skip_l are not auto-determined, so by default no bins are be skipped even if they were cut in RascalC.run_cov().

Do not run this (or any other post-processing function/script) while the main RascalC computation is running — this may delete the output directory and cause the code to crash.

Parameters

file_rootstring

Path to the RascalC (RascalC.run_cov()) output directory. This is the only necessary argument. If no others are provided, jackknife pipeline and the covariance binning mode will be determined automatically. The result should match the RascalC.run_cov() setup (except non-zero skip_s_bins and skip_l, which are not auto-determined).

out_dirstring | None

(Optional) path to the directory in which the post-processing results should be saved. If None (default), is set to file_root. Empty string means the current working directory. We advise to use different output directories for different post-processing options.

skip_s_binsinteger or tuple of two integers

(Optional) removal of some radial bins. First (or the only) number sets the number of radial/separation bins to skip from the beginning. Second number (if provided) sets the number of radial/separation bins to skip from the end. By default, no bins are skipped.

skip_linteger

(Optional) number of higher multipoles to skip (from the end).

tracer1 or 2

(Optional) if the RascalC output directory contains two-tracer results, tracer = 2 together with two_tracers = False allows to select the second tracer for single-tracer post-processing.

n_samplesNone, integer, array/list/tuple/etc of integers or boolean values

(Optional) selection of RascalC subsamples (independent realizations of Monte-Carlo integrals).

  • If None, use all (default).

  • If an integer, use the given number of samples from the beginning.

  • If an array/list/tuple/etc of integers, it will be used as a NumPy index array.

  • If an array/list/tuple/etc of boolean, it will be used as a NumPy boolean array mask.

shot_noise_rescaling1float

(Optional) shot-noise rescaling value for the first tracer (default 1). In jackknife mode, the shot-noise rescaling value is auto-determined, so this parameter has no effect.

shot_noise_rescaling2float

(Optional) shot-noise rescaling value for the second tracer only in multi-tracer mode (default 1). In jackknife mode, the shot-noise rescaling value is auto-determined, so this parameter has no effect.

print_functionCallable

(Optional) custom function to use for printing. Default is print.

extra_convergence_checkbool

(Optional) whether to perform the extra convergence check. It is done by default.

jackknifeboolean or None

(Optional) boolean value sets jackknife mode manually. If None (default), this mode is determined automatically.

legendreboolean or None

(Optional) boolean value sets Legendre (vs s,mu) mode manually. If None (default), this mode is determined automatically.

two_tracersboolean or None

(Optional) boolean value sets 1- vs 2-tracer mode manually. If None (default), this mode is determined automatically.

n_r_bins, n_mu_bins, n_jack, max_linteger or None

(Optional) integer value is used to set manually the number of radial bins, angular bins (not needed in some Legendre modes), jackknife regions (jackknife mode only) and maximum (even) ell (Legendre mode only) respectively. Each parameter which is None (default) is determined automatically.

Returns

post_processing_resultsdict[str, np.ndarray[float]]

Post-processing results as a dictionary with string keys and Numpy array values. All this information is also saved in a Rescaled_Covariance_Matrices*.npz file in the out_dir (in file_root if the former is not provided). Selected common keys are: "full_theory_covariance" for the final covariance matrix and "shot_noise_rescaling" for the shot-noise rescaling value(s).

Loading and exporting the final covariance matrices

Perform the General quality control first.

This module contains convenience functions that

  • read RascalC results (checking eigenvalues of the inversion bias matrix. If they are much smaller than 1, it is safe to simply invert the covariance matrix. Otherwise, a correction factor is necessary.)

  • convert (reorder) covariance matrices (as is often needed for Legendre moments)

  • and/or export (save) the full covariance matrix to a text file.

RascalC.cov_utils.get_cov_header(rascalc_results_file: str) str[source]

Format the header for the covariance matrix text file in a consistent way to be used in other functions. Currently, just gets the shot-noise rescaling value.

RascalC.cov_utils.convert_cov_legendre(cov: ndarray[float], max_l: int) ndarray[float][source]

Change the bin ordering of the covariance matrix in Legendre mode for a single tracer. The original bin ordering (in RascalC .npy files) is by radial bins (top-level) and then by multipoles. The resulting bin ordering (for text files) is by multipoles (top-level) and then by radial bins.

RascalC.cov_utils.convert_cov_legendre_multi(cov: ndarray[float], max_l: int) ndarray[float][source]

Change the bin ordering of the covariance matrix in Legendre mode for two tracers. The original bin ordering (in RascalC .npy files) is by the correlation function (top-level; first auto-correlation, then cross-correlation and last the second auto-correlation), then by radial bins and then by multipoles (bottom-level). The resulting bin ordering (for text files) is by the correlation function (top-level), then by multipoles and then by radial bins (bottom-level).

RascalC.cov_utils.convert_cov_multi_to_cross(cov: ndarray[float]) ndarray[float][source]

Select only the cross x cross-correlation part (middle block) of the full two-tracer covariance. This function does not change the bin order of the covariance.

RascalC.cov_utils.convert_cov_legendre_multi_to_cross(cov: ndarray[float], max_l: int) ndarray[float][source]

Select only the cross x cross-correlation part (middle block) of the full two-tracer covariance in Legendre mode. This function also changes the bin order of the covariance as in convert_cov_legendre().

RascalC.cov_utils.load_cov(rascalc_results_file: str, print_function: ~typing.Callable[[str], None] = <built-in function print>) ndarray[float][source]

Load the theoretical covariance matrix from RascalC results file as-is, intended for the s_mu mode.

RascalC.cov_utils.load_cov_legendre(rascalc_results_file: str, max_l: int, print_function: ~typing.Callable[[str], None] = <built-in function print>) ndarray[float][source]

Load the theoretical covariance matrix from RascalC results file and change the bin ordering as in convert_cov_legendre(); intended for Legendre single-tracer mode.

RascalC.cov_utils.load_cov_legendre_multi(rascalc_results_file: str, max_l: int, print_function: ~typing.Callable[[str], None] = <built-in function print>) ndarray[float][source]

Load the theoretical covariance matrix from RascalC results file and change the bin ordering as in convert_cov_legendre_multi(); intended for Legendre two-tracer mode.

RascalC.cov_utils.load_cov_cross_from_multi(rascalc_results_file: str, print_function: ~typing.Callable[[str], None] = <built-in function print>) ndarray[float][source]

Load only the cross x cross-correlation part (middle block) of the full two-tracer covariance in a RascalC results file without conversion.

RascalC.cov_utils.load_cov_legendre_cross_from_multi(rascalc_results_file: str, max_l: int, print_function: ~typing.Callable[[str], None] = <built-in function print>) ndarray[float][source]

Load only the cross x cross-correlation part (middle block) of the full two-tracer covariance in a RascalC results file with the Legendre-mode conversion (see convert_cov_legendre()).

RascalC.cov_utils.export_cov(rascalc_results_file: str, output_cov_file: str, print_function: ~typing.Callable[[str], None] = <built-in function print>) None[source]

Export the theoretical covariance matrix from RascalC results file to a text file as-is, intended for the s_mu mode.

RascalC.cov_utils.export_cov_legendre(rascalc_results_file: str, max_l: int, output_cov_file: str, print_function: ~typing.Callable[[str], None] = <built-in function print>) None[source]

Export the theoretical covariance matrix from RascalC results file to a text file with conversion appropriate for single-tracer Legendre modes (see convert_cov_legendre()).

RascalC.cov_utils.export_cov_legendre_multi(rascalc_results_file: str, max_l: int, output_cov_file: str, print_function: ~typing.Callable[[str], None] = <built-in function print>) None[source]

Export the theoretical covariance matrix from RascalC results file to a text file with conversion appropriate for two-tracer Legendre modes (see convert_cov_legendre_multi()).

RascalC.cov_utils.export_cov_cross(rascalc_results_file: str, output_cov_file: str, print_function: ~typing.Callable[[str], None] = <built-in function print>) None[source]

Export the theoretical covariance matrix for cross-correlation only from RascalC results file to a text file without conversion, intended for the s_mu mode.

RascalC.cov_utils.export_cov_legendre_cross(rascalc_results_file: str, max_l: int, output_cov_file: str, print_function: ~typing.Callable[[str], None] = <built-in function print>) None[source]

Export the theoretical covariance matrix for cross-correlation only from RascalC results file to a text file with conversion appropriate for the Legendre modes (see convert_cov_legendre()).

RascalC.cov_utils.convert_txt_cov_multi_to_cross(input_file: str, output_file: str) None[source]

Convert a plain-text covariance file from full two-tracer to cross x cross-correlation only by selecting the middle block of the matrix. Should be appropriate for any mode.