Transcriptomics 3 is a course dedicated to advanced analysis methods of transcriptomic data that will allow us to find meaningful patterns in big datasets -especially complex patterns that are typical for RNA-seq experiments. The course builds upon what we learned in Transcriptomics 1 and 2. In Transcriptomics 1, we learned how to convert “raw reads” produced by a Next Generation Sequencer into a table of expression – and then visualize it to develop a hypothesis.
In Transcriptomics 2, we looked at statistical methods for determining differentially expressed elements in known groups of samples. We explored Student’s t-test, Bayesian methods such as Deseq and EdgeR – and Factor Regression Analysis to dissect the influence of multiple factors.
In Transcriptomics 3, we will turn to a new problem – the problem of complex patterns in big datasets. The complexity of gene expression patterns across a variety of samples makes it more challenging to apply “straightforward” methods of analysis. There are many unknowns, for example exactly how many groups we expect to find in a given dataset, as well as how to identify a set of genes that will consistently identify a specific class of samples based on a dataset we have analyzed. That is why in this course, we will explore different methods for identifying groups of samples without prior knowledge (clustering) and then examine methods for developing classifiers from known samples to classify unknown samples (classification). Together, these two approaches are referred to as “data mining” and “machine learning”. These are vague terms that will be clarified further in our series on Machine Learning for Biomedical Data.
In the previous dataset we used, we selected samples that were of 2 types of breast cancer. In this course, we will use another dataset, one that contains multiple cell lines of different subtypes of cancer taken from cell lines. The example is taken from a publication by Daemen et. al and a team from Genentech research, Modeling precision treatment of breast cancer. We will not repeat all of the analysis presented in the paper; rather we will re-analyze the dataset the paper presents and later will be able to compare the author’s and our own results. Let’s review this publication to get familiar with the data we will be analyzing and the questions this paper poses.
- Lectures 13
- Quizzes 2
- Duration 6 hours
- Skill level All levels
- Language English
- Students 55
- Certificate Yes
- Assessments Yes
Data Details and Pre-Processing
Clustering: Unsupervised Analysis
Classification: Supervised Analysis