FAQ

Sample preparation

Does VIB Nucleomics core offer Sanger sequencing ?

We do not do Sanger sequencing here, only Next-and Third Generation Sequencing. For Sanger sequencing, please feel free to contact our colleagues at NSF Antwerp: https://cmn.sites.vib.be/en/neuromics-support-facility#/

Do you provide DNA or RNA extraction as a service?

Currently we do not provide DNA or RNA extraction as a service to our customers. However, we can be an intermediate for outsourcing your samples for RNA/DNA extraction to a collaborative service facility with which we have good experiences. Please contact us for more information.

Can you recommend a RNA/DNA extraction protocol?

Currently, we do not impose a specific protocol for RNA/DNA extractions. Our advice would be to use a protocol which works best in your hands. Please keep in mind that your RNA/DNA samples must meet specific quality criteria in order for us to be able to guarantee good quality data. Check the Sample Submission Page for more details.

 

What are the requirements for my DNA sample?

You can find more details about the requirements on our Sample Submission Page.

What are the requirements for my RNA sample?

You can find more details about the requirements on our Sample Submission Page.

What if I have only a very low amount of RNA?

The standard protocols for the majority of the assays that we offer start with about 100-200 ng total RNA. If you have less, there are ways of doing more extensive amplifications. For this we use the NuGEN Ovation Pico WTA system. Although the specifications of this kit say that the minimal amount is 500 pg, we have tested the reproducibility and linearity between different amounts of starting material. Our advice is not to go below 5-10 ng. However, if you do not have enough sample material, and still want to continue, you often have no other choice than to accept a bias. We could do a QC for your samples on an Agilent Bioanalyzer Pico chip, which is able to also quantitatively measure your sample with a lower limit of 50 pg/µL. If you would like to discuss this in more detail, including advice on platform type, please contact nucleomics@vib.be to make an appointment for a meeting at our facility.

How should I bring in my DNA or RNA samples?

You can bring your samples in any working day between 9-12h and 13-17h. If you ship your samples by courier, please ship them at the beginning of the week on plenty of dry-ice. Before you bring/ship the samples, please make sure that you send us a filled out DNA or RNA Sample Submission Form. We will check the quality of the samples and contact you again before we start the actual assays.

Check the Sample Submission Page for more details.

Bioinformatics

Why do I have multiple files per sample ? and how to handle it ?

On Illumina instrument, each sample can be distributed across multiple lanes (4 on NextSeq500, 2 on the NextSeq2000, and up to 4 on the NovaSeq6000). The lane can be identified in the filename as L001, L002, L003, L004.

If the lanes could be considered as technical replicates for the sequencing, they should not be handled as biological replicates of your experiment.

We used to map per lane (optimal for parallelization of the pipeline), then we merge the bam files using samtools command as follow:

samtools merge [options] sample_S1_L001_R1.merged.bam sample_S1_L001_R1.bam sample_S1_L002_R1.bam ...

Another approach would be to merge the fastq files from different lanes per sample before starting the pipeline. You can also sum up the counts from different lanes per sample after the counting.

Note that you get a double number of files per sample if the sequencing is paired-end. The fastq files are splitted in Read 1 (forward, "_R1" in the filename) and the second is Read 2 (Reverse, "_R2" in the filename). You only get Read1 files for single-end sequencing.

I don’t find my data using Basespace CLI?

We recently moved our Basespace data from US to EU host server. You will need to specify it in your command.

bs auth –api-server=https://api.euc1.sh.basespace.illumina.com/

Can I attend a data analysis session?

It is not our usual practice that customers follow us when we do the analysis. For most of the analysis, we use freeware tools such as R/Bioconductor. You will also get a full analysis report describing the complete analysis and which tools and packages that we have used. With this, you will have sufficient information on how we do these kinds of analyses. However, since we use R, some programming skills are required. In case you really look for additional bioinformatics training, we can recommend the courses that are organized by VIB Bioinformatics Training.

How can I motivate the use of ‘uncorrected’, but very stringent, p-values?

Once p-values are computed, we define a p-value-based criterion for selecting genes. We need to make a trade-off between precision and recall, or otherwise false discovery rate and statistical power. Simply using a cut-off of 0.05 would result in unacceptably many false positives, given the large number of statistical tests performed. Multiple testing criteria take in account the distribution of p-values and are developed to limit the number of false positives, however at the expense of a higher number of false negatives. We evaluated several multiple testing strategies (Benjamini and Hochberg 1995; Storey 2003; Scheid and Spang 2005), some of them combined with precursory filtering based on overall variance (Bourgon et al. 2010), or starting from p-values computed from other statistics than the (moderated) t-statistic (Hong et al. 2006). For some experiments, all approaches returned at most a few genes, even when mild cut-offs were used. It is a well-recognized issue that adjustment for multiple testing can result in low statistical power for micro-array studies (Bourgon et al. 2010). An evident reason is the low number of samples combined with a high number of tests, but also other aspects such as gene correlation can play an important role (Efron 2007). Because the selected genes are usually independently validated afterwards, we choose for a selection criterion that returns less false positives than the simple 0.05-cut-off and less false negatives than the general multiple testing approaches. Therefore we adopt the criterion that was used during the elaborate MAQC-I study and select genes based on p-value < 0.001, further constrained to genes with an absolute fold-change > 2 (MAQC Consortium 2006).

References

  • Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 57, No. 1, pp. 289-300. (1995)
  • Bourgon R, Gentleman R, Huber W. Independent filtering increases detection power for high-throughput experiments. Proceedings of the National Academy of Sciences of the United States of America, Vol. 107, No. 21. (May), pp. 9546-9551. (2010)
  • Efron B. Size, power and false discovery rates. The Annals of Statistics, Vol. 35, No. 4. (August), pp. 1351-1377. (2007)
  • Hong F, Breitling R, McEntee C.W, Wittner B.S., Nemhauser J.L, Chory J. RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis. Bioinformatics, Vol. 22, No. 22. (November), pp. 2825-2827. (2006)
  • MAQC Consortium. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology, Vol. 24, No. 9. (September), pp. 1151-1161. (2006)
  • Scheid S, Spang R. twilight; a Bioconductor package for estimating the local false discovery rate. Bioinformatics, Vol. 21, No. 12. (June), pp. 2921-2922. (2005)
  • Storey J.D. The positive false discovery rate: a Bayesian interpretation and the q -value. The Annals of Statistics, Vol. 31, No. 6. (December), pp. 2013-2035.(2003)
How can I run a functional analysis myself using DAVID?

You may try out the user-friendly DAVID tool for a quick functional analysis. To start, you assemble a list of official gene symbols, for instance an Excel column with ACTA1, MYOD1, etc.  Then you go to DAVID and proceed as follows.

  • Click on “Shortcut to DAVID Tools > Functional Annotation” in the menu
  • Enter your input:
    • Paste the list of genes in the box at the left entitled “A: Paste a list”
    • Select the identifier to be “OFFICIAL_GENE_SYMBOL”
    • Select the list type to be “Gene List”
    • Click on “Submit List”
  • A warning pops up, just click on OK to continue
  • Configure your search:
    • Select the species by clicking at the left on “Homo sapiens” and then on the button “Select species”
    • Choose the databases and ‘vocabulary’ in which the functional characterization is formulated. Usually we deselect “Check defaults” and select our own choices: Click on the plus-sign next to Gene Ontology and select all levels ending with “3”, “4”, or “5” (these are the most detailed, low level concepts). Click on the plus-sign next to Pathways and select “KEGG_PATHWAY”.
  • Finalize your search by clicking at the bottom on “Functional Annotation Clustering”.

Interpret the output: Several blocks appear that cluster together closely related functional terms/pathways. The block at the top is the most important one, indicated by the enrichment scores. To get more information, you can click on the terms. Clicking on the Kegg pathway terms (if present in the list) shows the pathways and annotates genes in your list by a blinking star. In this sense, you can get an idea about the function of genes in your list.