checkMyIndex

What checkMyIndex does

Searches for a set of compatible indices for your sequencing experiment according to the number of samples and the desired multiplexing rate (i.e. number of samples per pool/lane). The app returns the best, color balanced solution it can find (assuming one exists) from the number of trials you select (note: better solutions may be found by running for more trials, but this takes longer to run). The app will also allow the selection of required indices (within a larger set of indices) and find best the solution that includes those required indices.

Input index file(s)

The list of available indices are supplied in one, or two, tab delimited text files without headers. This app requires each row in the file(s) to consist of an ID, a single sequence (in the case of a single index file), or two sequences (in the case of a dual index file), a weight, and a flag indicating whether the index is required to be in the final solution (1 = required; 0 = not required). If the weight is not supplied it will default to '1'; if the required indicator is not supplied it will default to '0'. Possible file formats and column contents are given below:

Two column, single index file, where each row has the format: <ID> <Sequence>
Three column, single index file, where each row has the format: <ID> <Sequence> <Weight>
Four column, single index file, where each row has the format: <ID> <Sequence> <Weight> <Required indicator>
Three column, dual index file, where each row has the format: <ID> <Sequence> <Sequence>
Four column, dual index file, where each row has the format: <ID> <Sequence> <Sequence> <Weight>
Five column, dual index file, where each row has the format: <ID> <Sequence> <Sequence> <Weight> <Required indicator>

Any other formats or column orders are not valid and will cause the app to error out or generate incorrect results. Examples of both a four-column and a two-column file are available in the GitHub repository here and here to test the application.

Parameters

Total number of samples in your experiment (can be greater than the number of available indices).

Multiplexing rate i.e. number of samples per pool/lane (only divisors of the total number of samples are proposed).

i7 and i5 pairing (only for dual-indexing) is proposed if there are as many i5 as i7 indices to deal with Illumina Unique Dual-Indices (UDI). Note that the pairing is done using the order of the indices in the input files.

Constraint on the indices (only for single-indexing) to avoid having two samples or two pools/lanes with the same index(es).

Directly look for a solution with the desired multiplexing rate (only for single-indexing) instead of looking for a partial solution with a few samples per pool/lane and then add some of the remaining indices to reach the desired multiplexing rate.

Select compatible indices (only for single-indexing) before looking for a (partial) solution can take some time but then speed up the algorithm.

Maximum number of trials can be increased if a solution is difficult to find with the parameters chosen.

How the algorithm works

There can be many combinations of indices to check according to the number of input indices and the multiplexing rate. Thus, testing for the compatibility of all the combinations may be long or even impossible. The trick is to find a partial solution with the desired number of pools/lanes but with fewer samples than asked and then to complete each pool/lane with some of the remaining indices to reach the desired multiplexing rate. Indeed, adding indices to a combination of compatible indices will give a compatible combination. Briefly, a lower number of samples per pool/lane generates a lower number of combinations to test and thus makes the research of a partial solution very fast. Adding some indices to complete each pool/lane is fast too and gives the final solution.

Unfortunately, the research of a final solution might become impossible as the astuteness reduces the number of combinations of indices. In such a case, one can look for a solution using directly the desired multiplexing rate (see parameters), the only risk is to increase the computational time.

Background on Illumina chemistry and color balancing

Illumina chemistry can be either four-channels (HiSeq & MiSeq), two-channels (original SBS and XLEAP-SBS) or one-channel (iSeq 100). With the four-channel chemistry, a red laser detects A/C bases and a green laser detects G/T bases and the indices are compatible if there is at least one red light and one green light at each position. With the two-channel chemistry (original SBS), G bases have no color, A bases are orange, C bases are red and T bases are green and indices are compatible if there is at least one color at each position. For two-channel XLEAP-SBS chemistry, G bases have no color, A bases are blue, C bases are Blue+Green (Cyan) and T bases are green and indices are compatible if there is at least one color at each position. Note that indices starting with GG are not compatible with the two-channel chemistry. With the one-channel chemistry, compatibility cannot be defined with colors and indices are compatible if there is at least one A or C or T base at each position. Please refer to the Illumina documentation for more detailed information on the different chemistries.

About

The original application was developed at the Biomics pole of the Institut Pasteur by Hugo Varet and an Application Note describing it has been published in 2018 in Bioinformatics. Send an e-mail to hugo.varet@pasteur.fr for any suggestion or bug report.

Source code and instructions to run the original application locally are available at the PF2 - Institut Pasteur GitHub repository.

Modifications to include Illumina XLEAP-SBS chemistry have been made by the Genomics Facility at Cornell. These modifcations are available at the Cornell Genomics Facility GitHub repository.

Please note that checkMyIndex is provided without any guarantees as to its accuracy.

Version

Software based on the Institut Pasteur checkMyIndex version 1.0.2.

Genomics Facility at Cornell checkMyIndex version 1.4.7.

See the Cornell Genomics Facility GitHub repository for a full update history.

Search for a set of compatible indices for your sequencing experiment (version 1.4.7)