Description
This track shows protein-coding genes based on annotation provided by
RGD.
Method
Annotations on RGD Genes were
downloaded from:
-
ftp://rgd.mcw.edu/pub/data_release/GFF/Corrected_GFF3/
-
ftp://rgd.mcw.edu/pub/data_release/GENES_RAT
The GFF files were combined into a temporary MySQL database table
and those records of "gene", "exon", and
"CDS" were selected and loaded into separate MySQL tables.
Records of inconsistent strand information were cleaned out.
The resulting collection was loaded into a gene prediction format table
using the ldHgGene utility program. The data were further processed by
two programs, getRgdGeneCds and doRgdGene2 to create the genePred format
table, rgdGene2, as the base table for RGD Genes.
A program, doRgdGene2Xref, was used to create the rgdGene2Xref table using
the Dbxref field. rgdGene2ToDescription table was built using the
gene_desc field from GENES_RAT file. rgdGene2ToUniProt and rgdGene2Pep
tables were built using data from GENES_RAT and UniProt database.
A total of 753 genes were found to have inconsistent annotations, which
caused display problems in the Genome Browser. These 753 entries were
removed.
All the programs mentioned above in this build pipeline could be found in
the source code package which may be downloaded
here.
Credits
Thanks to
RGD
for providing the base annotation of RGD Genes.
The RGD Genes track was produced at UCSC by Fan Hsu, Mary Goldman and Hiram
Clawson. It is based on data from RGD, NCBI RefSeq, UniProt, and GenBank. Our
thanks to the people running these databases and to the scientists worldwide
who have made contributions to them.
References
Twigger SN, Shimoyama M, Bromberg S, Kwitek AE, Jacob HJ, RGD Team.
The Rat Genome Database, update 2007--easing the path from disease to data and back again.
Nucleic Acids Res. 2007 Jan;35(Database issue):D658-62.
PMID: 17151068; PMC: PMC1761441
|