mkref
Table of contents
Introduction
The mkref command in VAMPIRE is used to generate a motif database (in FASTA format) from a motif file produced by the anno command. This enables downstream analyses that require a curated set of motifs.
Key Use Cases
- Population-level analysis: Create consensus motif set from multiple samples
- Batch annotation: Use same motif database across multiple datasets
- Motif refinement: Iteratively improve annotation quality
- Reference building: Create custom motif databases for specific organisms
Input and Output
Input
| Input | Format | Description | Default |
|---|---|---|---|
| Motif | TSV | Motif file generated by anno | None |
Output
| Output | Format | Description | Default |
|---|---|---|---|
| Motif Database | FASTA | Curated motif database in FASTA | None |
Input file format ({prefix}.motif.tsv):
id motif rep_num label
0 GGC 150 alpha
1 GGT 45 alpha
Output file format (FASTA):
>0
GGC
>1
GGT
Usage Example
Generate a motif database from an annotation file:
vampire mkref annotation_prefix motif_database.fa
Complete workflow: Population-level annotation
A common workflow for analyzing multiple samples from the same population:
# Step 1: Annotate each sample de novo
vampire anno sample1.fa sample1_anno
vampire anno sample2.fa sample2_anno
vampire anno sample3.fa sample3_anno
# Step 2: Merge motif files
cat sample1_anno.motif.tsv sample2_anno.motif.tsv sample3_anno.motif.tsv > all_motifs.tsv
# Step 3: Create reference database
vampire mkref all_motifs population_motif_database.fa
# Step 4: Re-annotate all samples with unified motif set
vampire anno --no-denovo -m population_motif_database.fa sample1.fa sample1_curated
vampire anno --no-denovo -m population_motif_database.fa sample2.fa sample2_curated
vampire anno --no-denovo -m population_motif_database.fa sample3.fa sample3_curated
Parameters
Required Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
prefix | string | Input prefix from anno command (reads .motif.tsv) | Required |
output | string | Output FASTA file path for reference motif database | Required |
Parameter Details
The command reads the .motif.tsv file generated by anno and extracts each unique motif sequence. Each motif is written to the output FASTA file with its ID as the header.
Best Practices
Quality Control
Before creating the final reference database:
- Review motif statistics: Check
rep_numin the motif file to ensure sufficient evidence - Filter low-frequency motifs: Remove motifs with very low counts
- Manual curation: Examine motif sequences for artifacts
# Example: Keep only motifs with >= 10 occurrences
awk -F'\t' '$3 >= 10' annotation.motif.tsv > filtered_motifs.tsv
vampire mkref filtered_motifs curated_database.fa
Population Analysis
When working with multiple samples:
- Annotate all samples with de novo mode first
- Merge motif tables from all samples
- Deduplicate motifs across samples
- Create reference database from merged set
- Re-annotate all samples with the unified database
This ensures consistent motif labeling across the population.
Iterative Refinement
For difficult datasets:
# Round 1: De novo annotation
vampire anno data.fa round1
vampire mkref round1 reference1.fa
# Round 2: Annotation with reference
vampire anno --no-denovo -m reference1.fa data.fa round2
# Round 3: Refine and create new reference
vampire refine round2 refine_actions.tsv -o round2_refined
vampire mkref round2_refined reference2.fa
# Round 4: Final annotation
vampire anno --no-denovo -m reference2.fa data.fa final
Common Issues
Empty output file
Cause: No motifs detected in the input annotation file.
Solution: Check the input annotation file:
head annotation.motif.tsv
Ensure it contains valid motif entries with id, motif, and rep_num columns.
Duplicate motifs
Cause: Multiple samples have the same motif with different IDs.
Solution: Deduplicate before creating the database:
# Remove duplicates, keep first occurrence
awk '!seen[$0]++' combined_motifs.tsv > unique_motifs.tsv
vampire mkref unique_motifs final_database.fa
See Also
- Parameters - Full parameter reference
- anno - Generate annotation for mkref
- refine - Manually curate motifs before creating database