WheatCoordDB: Guide

How to use the converter, how to interpret confidence scores and result badges, validation accuracy, and which assemblies are included.

How to use the converter

All conversions run entirely in your browser. No data is sent to any server. The converter loads pre-computed PCHIP conversion tables (sampled at 1 kb resolution) on demand and interpolates target coordinates using piecewise linear lookup between adjacent table entries. Anchor files are additionally loaded for dotplot rendering.

Single position or region

Enter a chromosome, a start position, and an end position (in base pairs). For a single position, enter the same value in both fields. Select one or more target assemblies and click Convert.

Chromosome:  Chr2B
Start (bp):  450000000
End (bp):    520000000
Assembly:    Lancer, Jagger

Batch BED conversion

Upload a BED file containing multiple regions. The file should be tab-delimited with columns: chrom, start, end, and an optional name column. Chromosome names must use the Chr1AChr7D convention.

Chr1A  10000000  15000000  QTL_1
Chr2B  450000000 520000000 resistance_locus
Chr4A  200000000 250000000 yield_QTL

Results are returned as a table and can be downloaded as a BED file for each target assembly.

Coordinate format

All coordinates are in base pairs (bp). The converter accepts plain integers; do not include commas or unit suffixes. Chromosome names follow the convention Chr1A, Chr1B, Chr1D through Chr7D (21 chromosomes total).

Synteny dotplot

For single-position or region queries with a single target assembly selected, a synteny dotplot is shown below the result. The dotplot displays all gene anchors for the queried chromosome as dark blue points, with the query region highlighted as a red box. This lets you visually verify that your region of interest falls within a well-anchored collinear block before relying on the converted coordinates. The dotplot can be downloaded as a PNG using the download button.

The dotplot is not available in batch mode (BED file upload) or when multiple assemblies are selected. For those cases, refer to the per-assembly chromosome dotplots on GitHub.

Confidence scores

Every conversion result includes a confidence score: the anchor recovery fraction. This is the proportion of CS RefSeq v2.1 gene models expected to be present in a ±5 Mb window around the query position that were successfully projected onto the target assembly by Liftoff.

Unlike a raw anchor count, the recovery fraction normalises for gene density variation along the chromosome, so pericentromeric regions have fewer genes per Mb, so they are not penalised simply for being gene-poor. What matters is how many of the available genes were recovered.

Tier Threshold Display Interpretation
High confidence ≥ 80% gene recovery Green tag · full confidence bar Strong collinearity. Median coordinate error ~26 bp (KASP) / ~107 bp (LOO). Suitable for fine-mapping and marker design.
Moderate confidence 50–80% gene recovery Amber tag · partial bar Reduced collinearity: structural variation, introgression, or assembly gap likely in this region. Median error ~311 bp (KASP) / ~1.4 kb (LOO). Use with caution.
Low confidence < 50% gene recovery Red tag · minimal bar Very low collinearity. Median error ~4.6 Mb (KASP) / ~3.9 kb (LOO). The returned coordinate may be unreliable. Treat as approximate only.
Regional variation: confidence scores are specific to the query region and assembly. The same chromosome can have High confidence on chromosome arms and Low confidence in pericentromeric regions or known introgression intervals. Check the per-assembly chromosome profiles on GitHub to understand accuracy in your region of interest before relying on a result.
Result badges and symbols

⚠ inverted orientation

The returned start coordinate is larger than the end coordinate. This means the query region maps to the opposite strand in the target assembly; the region is present but in reverse orientation relative to CS. The coordinates are still valid; the start and end simply need to be swapped if your downstream tool requires start < end.

⚠ whole-chromosome inversion

The entire chromosome appears to be assembled in the opposite orientation in the target relative to CS. All coordinates on this chromosome will be inverted. This is an assembly orientation choice, not a biological inversion.

~ symbol before coordinates

A tilde (~) prefix indicates the coordinate was extrapolated beyond the range of available anchors. The first anchor on a chromosome may not start at position 0, and the last anchor may not reach the chromosome end. Extrapolated coordinates are estimated by projecting the boundary slope beyond the anchor range and may be less reliable than interpolated coordinates.

When extrapolation occurs, a warning message appears below the result:

Extrapolated coordinates — query extends beyond anchor range on Chr1B (anchors cover 1.8–700.4 Mb). Returned coordinates are extrapolated from the nearest anchor boundary and may be less reliable.

The anchor coverage range shown in the message tells you where the first and last gene anchors are on that chromosome for that assembly. Queries that start before or end after these positions use values from the boundary of the pre-computed PCHIP conversion table, where the spline was extrapolated at pipeline construction time using a robust boundary slope estimated from the nearest anchors. In most cases this extrapolation is accurate, but in pericentromeric regions or near structural rearrangements the extrapolated coordinate may be less reliable than positions within the anchor range.

⚠ Query spans a translocation breakpoint

The query region overlaps a known inter-chromosomal translocation boundary. The result is split into two segments, one for each side of the breakpoint, mapping to different target chromosomes. This currently applies to ArinaLrFor and SY_Mattis, which carry a Chr5B/Chr7B translocation: the proximal portion of Chr5B maps to Chr5B in the target, while the distal portion maps to Chr7B. A query spanning the breakpoint will return two results accordingly.

Query falls in translocation breakpoint gap

The query falls within the breakpoint gap itself, a region with no reliable anchor coverage on either side of the translocation. No coordinate can be returned for this region.

Technical validation

WheatCoordDB was validated using two independent approaches: leave-one-out (LOO) cross-validation across all 24 assemblies, and KASP marker validation against 2,547 SNP markers with known positions in 21 assemblies.

Leave-one-out (LOO) cross-validation

For each of the 24 assemblies, one assembly was held out and its anchor positions were predicted from the remaining assemblies' splines. Prediction error was calculated as the distance between the predicted and actual anchor midpoint positions.

LOO: by gene recovery tier
High ≥80%107 bp median · 5.4 kb P90
Moderate 50–80%1.4 kb median · 58 kb P90
Low <50%3.9 kb median · 201 kb P90
KASP SNP markers: by gene recovery tier
High ≥80%26 bp median · 3.0 kb P90
Moderate 50–80%311 bp median · 207 kb P90
Low <50%4.6 Mb median · 41.9 Mb P90
Important caveat: these headline figures are genome-wide averages. Individual chromosomes and specific regions within an assembly can have substantially higher error rates, particularly in pericentromeric regions, known introgression intervals, and regions of low anchor density. We strongly recommend checking the per-assembly chromosome profile plots on GitHub to assess the likely accuracy for your specific region of interest before using converted coordinates for fine-mapping or marker design.

KASP validation notes

KASP marker positions in target assemblies were obtained by BLASTn of flanking sequences (~100 bp). Of 51,331 marker-assembly pairs tested, 93.9% showed correct chromosome assignment. Large errors (>5 Mb) were concentrated in Low-confidence regions (82.6% of large errors had anchor recovery fraction <50%) and on chromosomes with known homeologous sequence similarity (particularly Chr4A and Chr6B), where short BLAST queries can produce ambiguous hits. These BLAST mapping artefacts are distinct from WheatCoordDB interpolation errors.

Per-assembly accuracy profiles

Detailed plots showing anchor recovery fraction, anchor density, mean anchor gap, and coordinate accuracy along all 21 chromosomes for each of the 24 assemblies are available in the supplementary plots folder on GitHub. These plots allow users to identify chromosomal regions where accuracy may be reduced for a specific assembly before relying on converted coordinates.

Included assemblies

All 24 assemblies are chromosome-scale hexaploid wheat (Triticum aestivum) assemblies unless otherwise noted. Gene anchors were projected from IWGSC CS RefSeq v2.1 using Liftoff with alignment thresholds of 90% coverage and 90% identity.

10+ wheat panel
Jaggerready
Lancerready
ArinaLrForready
Stanleyready
Speltready
Maceready
SY_Mattisready
Juliusready
Landmarkready
Norin61ready
CS versions
CS_IAAST2T
CS_CAUT2T
CS_v1RefSeq v1
Additional cultivars
Aikang58ready
Chuanmai104ready
Sumai3ready
JIN50ready
MOVready
 
Fielderready
Kariegaready
Attraktionready
Renan_v2ready
Paragon_v3ready
Cadenza_v2ready

Planned additions include 17 Chinese cultivars from Jiao et al. (2025, Nature) spanning Chinese wheat breeding history, and wild relative assemblies (Ae. tauschii, T. timopheevii, Th. bessarabicum).

Want a different assembly included? Get in touch. We are happy to add any publicly available chromosome-scale wheat assembly. Please provide the accession number or a download link.

Contact

WheatCoordDB is developed and maintained by the Grewal lab at the University of Nottingham.

For questions, bug reports, or assembly addition requests, contact surbhi.grewal@nottingham.ac.uk.

To report issues or contribute, visit the GitHub repository.