Georeactor Blog

RSS Feed

Quinoa news + genome browser

Tags: jbrowsebioseries

When writing the closing for my quinoa presentation at CSV Conf, I was thinking about a productive way to continue or conclude my project. Eventually I took a deeper look at public genome browsers - a way to navigate the genome and visualize the locations of specific annotated genes.

Major crops like rice and maize have websites dedicated to this (above is ), and more obscure crops like quinoa don't. The larger context is unclear - maybe this is an older way of doing things, or they are being run internally, or maybe the community lacks resources to host a public site. What I can do is:

The go-to open source tool for this is jBrowse2, built on Yarn / React / SVG. This is code which I understand and could maybe contribute to. Here's a look:

jBrowse has other views and features which I don't know anything about. Here is a tool which compares the same position across multiple tracks.

OK, so suppose I want a track of named quinoa genes over its genome?
No one appears to have a write-up for going directly UniProt to jBrowse. This got me concerned.
Then I discover Enseml Plants has a browser with quinoa genes marked on it:

So where did these come from..? Do any have more detail?

On UniProt the quinoa proteins have a Genomic Coordinates tab, giving Assembly Name "ASM168347v1" and Genomic location "3,770,659–3,784,205". What's happening:

The jBrowse website invites users to set up office hours, so I booked one.

Meanwhile! I was considering contacting quinoa / kañiwa experts about the mystery of huazontles. Then I find this paper published days ago on May 29th - the genome is read and the researchers are convinced that huazontles shares a common, polyploid ancestor and quinoa split off on a later southern migration. The genome is in many pieces (contigs) and not chromosomes, but that's fine. I can download the genome here (this was uploaded in December but I didn't find it in searches, maybe because nuttaliae is in the subspecies field).
NCBI also has 129 named genes. After review, these were easier to identify because they're from the chloroplasts and match other plants.

The next morning, I download the genome and these genes, to prove to the jBrowse people that I am not totally lost. I wrote a script to generate a GFF file, placing these over the right sequence locations with the right contig ID + numeric offset.

By default I had all genes/annotations appear to go left-to-right. But when we zoom in, we see these green and red colored lines:

These represent start and stop codons for 3 different frames x 2 directions of reading. atpH could be left-to-right based on the green and red marks at the ends.

Handcoding which contigs to search for in which gene sequences, then formatting out a GFF, is not a great strategy TBH. There's got to be tools which do this automatically. Or maybe I run the tool to mark the start and end of all plant protein-coding genes, and then try to match them to known genes/proteins.

Finally I did the jBrowse call today. This was better received than the CSV Conf talk and cleared up some of my confusion. Some of my notes plus research here:

I gave my feedback about lacking a "zoom to" feature for custom tracks, lacking a way to favorite/star/direct link a specific contig (there are 150+ contigs in huazontles and only 2 where I have tracks), and an error reporting issue.