Databases

NCBI Sequencing Reads Archive - SRA
OrthoMCL DB - identification of orthologous ORFs Ortholog Groups of Protein Sequences
SwissProt database

Genomic data

Trade-off between transcriptome plasticity and genome evolution in cephalopods denovo transcriptome
From the original paper presented for this project. “Trade-off between Transcriptome Plasticity and Genome Evolution in Cephalopods” SRX1396680: Genomic DNA sequencing of Sepia officinalis germline DNA
Sepia officinalis (common cuttlefish - MT only)

Tools

Aligners

bwa dominant aligner for Illumina
bowtie2 original paper and RNA-edit survey paper refer to it.
Spliced Transcripts Alignment to a Reference - STAR RNA-seq aligner
tophat maps RNA-seq reads to the genome.

RNA

Cufflinks Transcriptome assembly and differential expression analysis for RNA-Seq.
RSEM RNA-Seq by Expectation-Maximization. Accurate quantification of gene and isoform expression from RNA-Seq data
HTseq count HTSeq: Analysing high-throughput sequencing data with Python
Kalisto kallisto is a program for quantifying abundances of transcripts from RNA-Seq data

Functional analysis

Blast2GO - (commercial) A bioinformatics platform for high-quality protein function prediction and functional analysis of genomic datasets

Assemblers

DNA

ABySS, github
Discovar
Meraculous
SOAP, github
MaSuRCA assembler used for the recent wheat genome

RNA

Tools:

busco provides quantitative measures for the assessment of genome assembly, gene set, and transcriptome completeness, based on evolutionarily-informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB v9.
QUAST - Quality Assessment Tool for Genome Assemblies
preseqThe preseq package is aimed at predicting and estimating the complexity of a genomic sequencing library, equivalent to predicting and estimating the number of redundant reads from a given sequencing depth and how many will be expected from additional sequencing using an initial sequencing experiment.

Project:

Assemblathon was a project to evaluate genome assemblers.
GAGE - Genome Assembly Gold-Standard Evaluations a similar project to Assemblathon.

NCBI

The National Center for Biotechnology Information(NCBI) contains an amazing plethora of bioinformatics information. Raw Next Generation Sequencing (NGS) data is found on the Sequence Read Archive (SRA) section.

The Sequence Read Archive basically includes intensity, read and alignment data. All this data requires a lot of space. We only want to extract the two fastq files representing the reads. NCBI provides a collection of command line tools called the SRA toolkit to extract fastq files. We will use the fastq-dump command to retrieve the data we require.

As an example, I selected ERX276244: Whole Genome Sequencing of fungal endophyte sp. D3-2B19-1. Since it is a model species, there is a lot of data associated with it and, as genomic data goes, it is small.

The selected experiment is a paired reads run on an Illumina HiSeq 2000. Paired reads signify that the fragment is read in both directions. This implies we will need two files, one in each direction. Consider:

$ fastq-dump --split-files ERR302903

General

Samtools utility for manipulating alignment data.

Used in original paper

REDItools editing detection package script to analyze RNA editing. (lang: python)
GOrilla functional analysis

Visualize/Stats

Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data.(lang: R)

Infrastructure

common workflow language Nice companion to docker.
Platforms: Galaxy, Arvados

Research organization

The Cephalopod Sequencing Consortium

Educational Resources

Sites

RNA-seqlopedia from University of Oregon.

Books

MOOCs

Coursera bioinformatics specialization There is a two volume, companion book called Bioinformatics Algorithms(see book section). Many of the video lessons are available on youtube. In particular, there is a chapter on Assemblers - Chapter 3: How Do We Assemble Genomes? Bioinformatics Algorithms: An Active Learning Approach.
Coursera genomic data specialization
If you’re more statistically minded, consider the courses by Rafael Irizarry on EdX.

Bioinformatics