Skip to content

Sequence Search

The Bulk Search pipeline lets users submit multiple sequences for searching against one or more repertoire databases.


Use this pipeline when you have a set of query sequences (full amino acid sequences) and want to find the most similar sequences in a database. Typical use cases:

  • Finding public database matches for novel sequences
  • Comparing patient sequences to a reference cohort index
  • Cross-cohort sequence similarity analysis

Navigate to Dashboard → New Job (or via the Experimental → Single Sequence Search for one-off queries).

FieldDescription
Job nameA label for this run
Input CSV filesOne or more CSV/TSV files with a sequence column, uploaded as a zip archive
Sequence Column NameColumn name containing the query sequences
Databasedatabase names to search against
Top-KNumber of nearest neighbors to return per query
FieldDefaultDescription
Score thresholdnoneMinimum similarity score to include in results
Search typefull_sequencefull_sequence or cdr3_only

The input CSV with additional columns per neighbor:

ColumnDescription
match_1_sequenceTop match sequence
match_1_scoreSimilarity score (0–1)
match_1_subjectSubject ID from the database
match_2_sequenceSecond match sequence
… up to top-K

Sequences that had no results or encountered search errors:

ColumnDescription
sequenceOriginal query sequence
errorError description

Job stuck in polling state
FIRE API may be overloaded. The DAG retries indefinitely with backoff. Check FIRE API health at GET /health. If FIRE is down, the job will eventually time out.

High rate of sequences in failed.csv
Sequences that are too short (< 5 aa) or contain non-amino-acid characters are rejected by FIRE. Pre-filter your input sequences.

Score threshold too strict
If results.csv is mostly empty, lower the score_threshold or remove it entirely.