Sequence Search
The Bulk Search pipeline lets users submit multiple sequences for searching against one or more repertoire databases.
When to use
Section titled “When to use”Use this pipeline when you have a set of query sequences (full amino acid sequences) and want to find the most similar sequences in a database. Typical use cases:
- Finding public database matches for novel sequences
- Comparing patient sequences to a reference cohort index
- Cross-cohort sequence similarity analysis
Submitting a job
Section titled “Submitting a job”Navigate to Dashboard → New Job (or via the Experimental → Single Sequence Search for one-off queries).
Required inputs
Section titled “Required inputs”| Field | Description |
|---|---|
| Job name | A label for this run |
| Input CSV files | One or more CSV/TSV files with a sequence column, uploaded as a zip archive |
| Sequence Column Name | Column name containing the query sequences |
| Database | database names to search against |
| Top-K | Number of nearest neighbors to return per query |
Optional inputs
Section titled “Optional inputs”| Field | Default | Description |
|---|---|---|
| Score threshold | none | Minimum similarity score to include in results |
| Search type | full_sequence | full_sequence or cdr3_only |
Output files
Section titled “Output files”results.csv
Section titled “results.csv”The input CSV with additional columns per neighbor:
| Column | Description |
|---|---|
match_1_sequence | Top match sequence |
match_1_score | Similarity score (0–1) |
match_1_subject | Subject ID from the database |
match_2_sequence | Second match sequence |
| … | … up to top-K |
failed.csv
Section titled “failed.csv”Sequences that had no results or encountered search errors:
| Column | Description |
|---|---|
sequence | Original query sequence |
error | Error description |
Troubleshooting
Section titled “Troubleshooting”Job stuck in polling state
FIRE API may be overloaded. The DAG retries indefinitely with backoff. Check FIRE API health at GET /health. If FIRE is down, the job will eventually time out.
High rate of sequences in failed.csv
Sequences that are too short (< 5 aa) or contain non-amino-acid characters are rejected by FIRE. Pre-filter your input sequences.
Score threshold too strict
If results.csv is mostly empty, lower the score_threshold or remove it entirely.