1. What does this website do?
This website enables easy and fast genome-wide association
studies (GWAS) in the DGRP. Users upload a phenotype file and
can expect to retrieve full association results, annotations of
top associations, and diagnostic plots such as QQ plots and LD
heatmaps in under 30 minutes.
This website also hosts data files generated in the DGRP2 should you need to perform your own analyses. For a
list of those files, please refer to the data file page (
link).
2. How to prepare input files?
Example (
link)
The GWAS analysis pipeline accepts line means. The input file
should contain either a single phenotype or two phenotypes, one
for each sex. The file should be delimited by comma
and NOT contain a header line. The first
column should be integers that identify the DGRP lines. If there
are phenotypes for both sexes, males are in the second column and
females the third. Lines that
are not present in the current data freeze (see
here for a
complete list of lines in this freeze) should be removed prior
to submission. Non-numerical values are treated as missing
data. An example input file with two phenotypes can be found in
this link
(
link), first three lines shown below). We encourage you to download this example and use it as a
template to preprae your input files.
100,49.28,77.92
101,47.2,57.76
105,51.04,73.12
...
3. What are the output files?
As soon as the analyses are complete, you will receive an email
notification with the link to download the output files.
- gwas.top.annot: This is the main output file that
contains annotated varians with at least one p value smaller
than 1e-5. For each variant, we report two p values for a single-sex
phenotype input file or eight p values for a two-sex phenotype
input file. For a single-sex phenotype, the two p values are
from simple regression (SinglePval) or mixed effects
model (SingleMixedPval). For a two-sex phenotype, there are two
p values (regression or mixed effects model) for each of the
following four possible traits: female, male, ave (average of
the two sexes), and diff (difference of the two sexes, female -
male). The variant effects are estimated as one half of the
difference between means of major and minor alleles.
In the last two columns of this output file, we provide functional
annotation for each varaint. All annotation is based on Flybase
release 5.49.
- GeneAnnotation: This field consists of two types of
information separated by a ",". The first is "SiteClass". The arrangement of information in the brackets following SiteClass is
as [FBgn ID | Gene Symbol | Site class | Distance to Gene]. When
there are multiple FBgn IDs overlapping with variant, they are
separated by ";". The second is a more detailed
"TranscriptAnnot" information. This contains all transcript annotation for the variant as output by SnpEff: Effect ( Effect_Impact | Functional_Class | Codon_Change | Amino_Acid_change| Amino_Acid_length | Gene_Name | Gene_BioType | Coding | Transcript | Exon [ | ERRORS | WARNINGS ] )
- RegulationAnnotation: This field contains regulatory features as annotated by Flybase. They are separated by "," and in the format of (Type | Source | Flybase ID).
- gwas.all.assoc: This file contains p values for all
variants. This file is useful to make Manhattan plot or QQ
plots.
- pheno.adjust.txt: When a phenotype file is submitted,
we adjust the phenotype for the effects of Wolbachia
infection and five major inversions (In(2L)t, In(2R)NS,
In(3R)K, In(3R)P, and In(3R)Mo). This file contains tables for
type III ANOVA F tests of these factors.
- raw.adjusted.pheno.txt: This file contains the raw
and adjusted phenotype.
- LDheat.eps: The pipeline also outputs a LD heatmap of
variants in the "gwas.top.annot" file. This is a graphical
depiction for LD among the top associations.
- snp_calls.csv: This file contains variant genotypes
for those in "gwas.top.annot".
- qqplots.zip: In this zipped archive, QQ plots are
produced.
4. How do I cite this website for publication?
A manuscript describing the data generating process of the DGRP
Freeze 2.0 has been submitted for publication. The manuscript comprehensively
characterizes molecular variation in the DGRP and details the
statistics underlying this GWAS web server. Before the paper is
published, please cite the main DGRP paper (
Mackay et al., Nature, 2012).
5. Contact us
Please send your feedbacks and questions to dgrp2-webadmin@ncsu.edu
6. Acknowledgement
The entirety of the GWAS pipeline uses open source softwares,
including but are not limited to:
R
FaST-LMM
PLINK
SnpEff