Covariance Matrix Estimation

Overview

This is the main section of RascalC, where 2PCF or 3PCF covariance matrix estimates are computed via Monte Carlo integration from a given set of input particles. For the 2PCF, depending on the number of input fields the code will compute either components for a single covariance matrix or all required components for 6 cross-covariance matrices (i.e. for multi-tracer analysis).

Prerequisites: - In JACKKNIFE mode, the jackknife weights and binned pair counts must be computed via the Computing Jackknife Weights script before the C++ code is run. - In DEFAULT mode, the RR pair counts must be computed via the Computing Random Counts and Survey Correction Functions script before the C++ code is run. - In LEGENDRE and 3PCF modes, the survey correction functions \(\Phi\) must be computed via the Computing Random Counts and Survey Correction Functions script before the C++ code is run.

Particle Grid and Cells

In the code, the particles are divided up into many cells for efficient computation, with each containing \(\sim10\) random particles. A cuboidal grid of cells is constructed aligned with the \((x,y,z)\) axes, which fully encompasses all random particles. The cellsize is set by the the \(N_\mathrm{side}\) parameter, which gives the number of (cubic) cells along the largest dimension along the box. This must be an odd integer, and the code will automatically exit if \(N_\mathrm{side}\) is too small or too large (since this gives inefficient performance).

Covariance Matrix Precision

The precision of covariance matrix estimators can be user-controlled in the RascalC code. To understand these parameters we must briefly outline the selection of random quads of particles (denoted \(\{i,j,k,l\}\)) used in the Monte Carlo integration. The particle selection algorithm has the following form:

  1. Loop over all \(i\)-particles \(N_\mathrm{loops}\) times. Each loop is independent and can be run on separate cores.
  2. For each \(i\) particle, we pick \(N_2\) \(j\)-particles at random, according to some selection rule. Here we compute the 2-point contribution to the covariance matrix, if we are computing a 2PCF covariance.
  3. For each \(j\) particle, we pick \(N_3\) \(k\)-particles at random, according to some selection rule. Here we compute the 3-point contribution to the covariance matrix.
  4. For each \(k\) particle, we pick \(N_4\) \(l\)-particles at random, according to some selection rule. Here we compute the 4-point contribution to the covariance matrix.
  5. (3PCF mode only): For each \(l\) particle, we pick \(N_5\) \(m\)-particles at random, according to some selection rule. Here we compute the 5-point contribution to the 3PCF covariance matrix.
  6. (3PCF mode only): For each \(m\) particle, we pick \(N_6\) \(n\)-particles at random, according to some selection rule. Here we compute the 6-point contribution to the 3PCF covariance matrix.

By setting the parameters \((N_\mathrm{loops},N_2, N_3, N_4,[N_5,N_6])\) we can control the precision of each matrix component. Standard values of \(N_2\sim N_3\sim N_4 [\sim N_5 \sim N_6] \sim 10\) normally work well. Each loop of the code produces an independent estimate of the full covariance matrix, which can be used to create accurate inverse matrices and effective number of mock calculations. The covariance converges relatively fast, so setting \(N_\mathrm{loops}\) to a few times the number of cores should work well. Values of \(N_\mathrm{loops}\gtrsim 100\) should be avoided to stop file sizes and reconstruction times becoming large.

Usage

The code is used as follows, with the command line options specified below:

bash clean
make
./cov [OPTIONS]

The first line removes any pre-existing C++ file before it is recompiled in line 2 to produce the ./cov file. The Makefile may need to be altered depending on the particular computational configuration used. The default Makefile is for a standard Unix installation, with the Makefile_mac file giving a sample Makefile for a Mac installation. This uses the following optional flags in the Makefile;

  • -DOPENMP: (Recommended) Run code in parallel with OpenMP, using the OpenMP installation specfied by the -lgomp and -fopenmp flags.
  • -DPERIODIC: Use periodic boundary conditions (appropriate for a cubic simulation box, but not mock surveys).
  • -DJACKKNIFE: Compute both full-survey and jackknife 2PCF covariance matrix terms in \((r,\mu)\) binning, allowing for shot-noise-rescaling calibration from the survey itself.
  • -DLEGENDRE: Compute the full-survey covariance matrix terms for (even) Legendre multipoles of the 2PCF.
  • -DTHREE_PCF: Compute the full-survey covariance matrix terms for (even and odd) Legendre multipoles of the isotropic 3PCF.

NB: For a summary of input command line parameters, simply run ./cov with no arguments.

Options

Input parameters for the RascalC code may be specified by passing options on the command line or by setting variables in the Parameters class in the modules/parameters.h. When using two sets of random particles, the code will exit automatically if all the required files are not input. We list the major code options below, in the form -option (variable) for command line option option and Parameters class variable variable.

Essential Parameters:

  • -def: Run the code with the default options for all parameters (as specified in the modules/parameters.h file.
  • -in (fname): Input ASCII random particle file for the first set of tracer particles. This must be in {x,y,z,w,j} format, as described in File Inputs.
  • -binfile (radial_bin_file): Radial binning ASCII file (see File Inputs) specifying upper and lower bounds of each radial bin.
  • -cor (corname): Input correlation function estimate for the first set of particles in ASCII format, as specified in File Inputs. This can be user defined or created by Full Survey Correlation Functions.
  • -binfile_cf (radial_bin_file_cf): Radial binning ASCII file for the correlation function (see File Inputs) specifying upper and lower bounds of each radial bin.
  • -norm (nofznorm): Number of galaxies in the first set of tracer particles. This is used to rescale the random particle covariances.
  • -output (out_file): Output directory in which to store covariance matrix estimates. This directory will be created if not already present. Beware: the code can produce a large volume of output (\(\sim 1\) GB for a standard run with one field and \(\sim1000\) bins).
  • -mbin_cf (mbin_cf): Number of \(\mu\) bins used for the correlation function.
  • -nside (nside): Number of cubic cells to use along the longest dimension of the grid encompassing the random particles, i.e. \(N_\mathrm{side}\). See Particle Grid and Cells note for usage.
  • -nthread (nthread): Number of parallel processing threads used if code is compiled with OpenMPI.
  • -perbox (perbox): Whether or not we are using a periodic box.

DEFAULT and JACKKNIFE mode Binning Parameters: - -mbin (mbin): Number of \(\mu\) bins used. This must match that used to create the jackknife weights. - -RRbin (RR_bin_file): Location of the binned_pair_counts_n{N}_m{M}_j{J}_11.dat ASCII file containing the summed pair counts in each bin (\(RR_{aA}^{11}\)), created by the jackknife_weights scripts.

JACKKNIFE mode Parameters:

  • -jackknife (jk_weight_file): Location of the jackknife_weights_n{N}_m{M}_j{J}_11.dat file containing the jackknife weights for each bin (\(w_{aA}^{11}\)), as created by the jackknife_weights scripts.

LEGENDRE and 3PCF mode Parameters:

Precision Parameters

  • -maxloops (max_loops): This is the number of matrix subsamples to compute. See Covariance Matrix Precision note for usage guidelines. (Default: 10)
  • -N2, -N3, -N4 (N2, N3, N4): The parameters controlling how many random particles to select at each stage. See Covariance Matrix Precision note above. (Default: 10)
  • -N5, -N5 (N5, N6): As above, but for the 3PCF mode only. (Default: 10)

General Multi Field Parameters:

  • -in2 (fname2): Input ASCII random particle file for the second set of tracer particles.
  • (nofznorm2): Total number of galaxies in the second set of tracer particles.
  • -cor12 (corname12): Input cross correlation function file between the two sets of random particles, as created by Full Survey Correlation Functions.
  • -cor2 (corname2): Input autocorrelation function for the second set of particles, either user-defined or created by Full Survey Correlation Functions.
  • -norm2 (nofznorm2): Number of galaxies in the second set of tracer particles. This is used to rescale the random particle covariances.

DEFAULT and JACKKNIFE mode Multi Field Parameters:

  • -jackknife12 (jk_weight_file12): Location of the jackknife_weights_n{N}_m{M}_j{J}_12.dat file containing the jackknife weights for each bin for the combination of random particle sets 1 and 2 (\(w_{aA}^{12}\)), as created by the jackknife_weights scripts.
  • -jackknife2 (jk_weight_file2): Location of the jackknife_weights_n{N}_m{M}_j{J}_22.dat file containing the jackknife weights for each bin for the second set of random particles (\(w_{aA}^{22}\)), as created by the jackknife_weights scripts.
  • -RRbin12 (RR_bin_file12): Location of the binned_pair_counts_n{N}_m{M}_j{J}_12.dat ASCII file containing the summed jackknife pair counts in each bin for the combination of random particle sets 1 and 2 (\(RR_{aA}^{12}\)), created by the jackknife_weights scripts.
  • -RRbin2 (RR_bin_file2): Location of the binned_pair_counts_n{N}_m{M}_j{J}_22.dat ASCII file containing the summed jackknife pair counts in each bin for the combination of random particle sets 1 and 2 (\(RR_{aA}^{22}\)), created by the jackknife_weights scripts.

LEGENDRE mode Multi Field Parameters:

Optional Parameters

  • -mumin (mumin): Minimum \(\mu\) binning to use in the analysis. (Default: 0, or -1 in 3PCF mode)
  • -mumax (mumax): Maximum \(\mu\) binning to use in the analysis. (Default: 1)
  • -cf_loops (cf_loops): Number of iterations over which to refine the correlation functions.
  • (perbox): Boolean controlling whether we are using a periodic box. (Default: False)
  • -boxsize (boxsize): If creating particles randomly, this is the periodic size of the computational domain. If particles are read from file, this is set dynamically. (Default: 400)
  • -rescale (rescale): Factor by which to dilate the input positions. Zero or negative values cause this to be set to the boxsize. (Default: 1)
  • -xicut (xicutoff): The radius beyond which the correlation functions \(\xi(r,\mu)\) are set to zero. (Default: 400)
  • -nmax (nmax): The maximum number of particles to read in from the random particle files. (Default: 1e12)
  • -save (savename): If savename is set, the cell selection probability grid is stored as savename. This must end in .bin. (Default: NULL)
  • -load (loadname): If set, load a cell selection probability grid computed in a previous run of RascalC. (Default: NULL)
  • -invert (qinvert): If this flag is passed to RascalC, all input particle weights are multiplied by -1. (Default: 0)
  • -balance (qbalance): If this flag is passed to RascalC, all negative weights are rescaled such that the total particle weight is 0. (Default: 0)
  • -np (np, make_random): If make_random = 1, this overrides any input random particle file and creates np randomly drawn particles in the cubic box. NB: The command line argument automatically sets make_random = 1. Currently creating particles at random is only supported for a single set of tracer particles.
  • -rs (rstart): If inverting particle weights, this sets the index from which to start weight inversion. (Default: 0)

Code Output

In the specified output directory, RascalC creates the directories 3PCFCovMatricesAll/ (3PCF mode), CovMatricesAll/ (DEFAULT, LEGENDRE and JACKKNIFE modes) and CovMatricesJack/ (JACKKNIFE mode) containing the relevant output matrix estimates. These contain multiple estimates of the each part of the total matrix and should be reconstructed using the Post-Processing & Reconstruction scripts.

The full output files take the following form (for N radial bins, M angular bins, maximum Legendre bin L and J non-zero jackknife regions, with FIELDS specifying the utilized tracer fields):

3PCF or LEGENDRE mode:

  • c{X}_n{N}_l{L}_{FIELDS}_{I}.txt: I-th estimate of the X-point covariance matrix estimates, i.e. \({}^X\mathbf{C}\). The summed covariance matrix terms have the suffix ‘full’.
  • binct_c{X}_n{N}_l{L}_{FIELDS}_{I}.txt: Total used counts per bin for the X-point covariance matrix.
  • total_counts_n{N}_l{L}_{FIELDS}_{I}: Total number of sets of particles attempted for the summed integral.

DEFAULT or JACKKNIFE mode:

  • c{X}_n{N}_m{M}_j{J}_{FIELDS}_{I}.txt: I-th estimate of the X-point covariance matrix estimates, i.e. \(C_{X,ab}\) The summed covariance matrix has the suffix ‘full’.
  • RR_n{N}_m{M}_{FIELDS}_{I}.txt: I-th estimate of the (non-jackknife) \(RR_{ab}^{XY}\) pair counts which can be compared with Corrfunc.
  • binct_c{X}_n{N}_m{M}_{FIELDS}.txt: Total used counts per bin for the X-point covariance matrix.
  • total_counts_n{N}_m{M}_{FIELDS}.txt: Total number of pairs, triples and quads attempted for the summed integral.

JACKKNIFE mode only:

  • RR{P}_n{N}_m{M}_{FIELDS}.txt: Estimate of \(RR_{ab}\) pair count for particles in random-subset P (\(P\in[1,2]\)). This is used to compute the disconnected jackknife matrix term.
  • EE{P}_n{N}_m{M}_{FIELDS}.txt: Estimate of \(EE_{ab}\) \(\xi\)-weighted pair count for particles in random-subset P. This is also used for the disconnected jackknife matrix term.

Each file is an ASCII format file containing the relevant matrices with the collapsed bin indices \(\mathrm{bin}_\mathrm{collapsed} = \mathrm{bin}_\mathrm{radial}\times n_\mu + \mathrm{bin}_\mathrm{angular}\) (2PCF) or \(\mathrm{bin}_\mathrm{collapsed} = \left(\mathrm{bin}_\mathrm{radial,1}\times n_r + \mathrm{bin}_\mathrm{radial,2}\right)\times n_\mu + \mathrm{bin}_\mathrm{angular}\) (3PCF) for a total of \(n_\mu\) angular (or Legendre) bins and \(n_r\) radial bins.