Understanding the dark matter of the human genome

Meet NCMM’s new group leader Biswajyoti Sahu who combines a molecular approach with functional genomics to better understand what happens in disease states such as cancer.

Portrait of Dr. Sahu

Dr. Sahu started as a group leader at NCMM in September. 

Since unraveling of the human genome sequence, we can now study the functions of different parts of the genome. If you look in terms of what defines a person’s phenotype, such as the colour of their eyes, that is defined by specific proteins that need to be expressed in correct cell types and developmental states.

­– We have around 20 000 protein coding genes in our genome, Dr Sahu tells us, but that covers only 2% of the whole human genome sequence. So, what does the remaining 98% of the genome do? After all, as a species we have carried it with us over hundreds and millions of years of evolution.

This 98% of the genome was previously treated as “junk DNA” – non-coding DNA that was considered evolutionary detritus. However, since the past few decades, researchers have come to understand that it serves important regulatory functions – while it does not have a protein coding function itself, it regulates when and where the genes are expressed. Ever since the human DNA was sequenced, a lot of research efforts have been focused on studying these regulatory functions, and we have gained a better understanding how the expression of the 20 000 human genes is controlled.

– If you think of different disease states, there are certain proteins that control the pathogenic processes. However, we need to understand how these proteins themselves are regulated. The key lies in the genome. In order to understand complex diseases such as cancer, we need to study the regulatory mechanisms that lead to its development, Dr Sahu explains.

Dr Sahu’s primary interest is to decode the 98% of the genomic sequence that he calls the dark matter of the genome. This non-coding regulatory genome holds the keys to understanding not only normal human physiology, but also the different malfunctions, such as cancer. Dr Sahu’s research focuses on revealing how the cancer genome is regulated, what are its defining factors, which molecules interact with each other and how all of these factors work at the level of the genome.

Understanding cancer with respect to developmental lineage

Cancer is not simply a result of uncontrolled cell growth. Dr Sahu explains that in cancer state, there are multiple active signaling pathways, and those signals are carried by certain factors. One of them is the deregulation of the gene expression – in other words, deregulation of the proteins that control certain cellular pathways.

– It is important to know what are the factors that control these pathways in an abnormal manner. Specifically, where these factors bind in our genome and is the pattern similar in cancer cells as to what happens in normal cells. If we can find the differences, that can help us develop better biomarkers and more effective ways of screening the disease progression.

Understanding cancer genome regulation is a broad field that many researchers are investigating in different ways. One of the unique aspects of Dr Sahu’s research is to study the regulatory factors in the context of a defined lineage. He explains:

– Certain tissues in the human body share their developmental origin, for example intestine, pancreas and liver all originate from an endodermal lineage, whereas for example brain is of completely different lineage. In my research, we want to elucidate whether cancers of tissues that share the developmental origin, such as colon, pancreatic and liver cancers have common regulatory mechanisms compared to for example brain cancer.

During the development of tissues from a same lineage, fine-tuning between the regulatory factors defines the developmental outcome. For example, if the ratio of particular factors is high, the cell can develop into a pancreas, whereas lower ratio of the same factors can lead to a completely different cell type, say a liver cell. Imbalance of these fine-tuning mechanisms can also play a role in development of different cancer types.

– We have to understand how development of cancer initiates with respect to the lineage. That is my primary interest, Dr Sahu states.

Why is this important?

– Now that vast number of cancer genomes have been sequenced, we know the landscape of all major somatic mutations and oncogenic drivers in each cancer type. However, we still don’t know how particular mutations operate in the context of a certain cell type or tissue. Oncogenic mutations are often cancer-specific, meaning that same mutations don’t cause cancer in all tissues. So how the mutations act in a given cell type and in a given context has to be understood in terms of the right lineage.

Developing defined model systems

How a particular normal cell develops into a cancer cell is still largely unknown. Dr Sahu’s research approach is to study the early stages of tumorigenesis by developing innovative model systems in which all factors can be controlled at a molecular level in human cells, making it possible to identify the exact set of factors that control cell lineage and transformation.

­– Since we know all the major oncogenic drivers in different human cancer types, we can start addressing defined mechanistic questions, such as what are the factors that a certain oncogene interacts with to give a phenotype that makes a cancer cell.

The group’s experimental approach combines cell fate conversion to cell transformation. In the direct cell fate conversion method, one can convert a cell type A to a cell type B by using defined factors that are specific for a given cell type. Importantly, cancer-specific mutations can then be introduced to the cells during the course of cell fate conversion to study their functions and interactions with the regulatory factors. For example, liver cancer-specific mutations can be introduced during generation of a liver cell to investigate whether they are able to make transformed cells with properties similar to liver cancer cells.

­– This is an approach where we combine two fields, a stem cell-based transdifferentiation and state-of-the-art functional genomics, aiming to get a better overview of how these defined factors co-operate with oncogenes, Dr Sahu says.

­– Once we understand how oncogenes interact with a certain set of factors under particular conditions, and how other sets of oncogenes interact under different conditions, we can define a formula of cell transformation, detailing the signaling pathways under which each oncogene needs to operate. This will give important insights into why and how normal cells become cancer cells in different tissues.

Genome-wide perspective

Transcription factors are a class of proteins that control gene expression by binding to regulatory elements in a sequence-specific manner. One of the main goals of Dr Sahu’s research is to study the non-coding regulatory genome and to reveal transcription factor binding patterns in cancer cells. In particular, how is their binding affected in the context of tumorigenic processes and genomic instability in cancer – in other words, is the regulatory code different in a normal cell versus a cancer cell.

­– We employ a plethora of genome-wide techniques, so that we can measure the effects of regulatory events in cancer: what particular proteins are deregulated, what is the structure of the genome in the cancer cells as opposed to normal cells, and also, when you find these sequences, whether all of them have regulatory activity in cancer cells.

Dr Sahu’s interest in the non-coding regulatory genome started during his doctoral training. Back then, he studied one gene at a time, focusing on what happens to a chosen protein in the presence of another protein. During that time, the next-generation sequencing technology emerged, allowing sequencing of the genome in an easily accessible manner.

­– High-throughput sequencing changed the whole landscape of doing genomics research, and I could jump into a new exciting field back then. Rather than studying every single genomic site one by one, we had the opportunity to map for example transcription factor binding sites across the whole genome, and then decode it, Dr Sahu explains.  

The human genome is vast, harboring approximately three billion base pairs. “If you have a good antibody against your protein of interest, you can go on a fishing expedition”, Dr Sahu explains. “You can fish all the sequences in the genome where this protein binds, remove the proteins and sequence the material.” In this way, one can find all binding sites for a protein i.e. transcription factor of interest in a very specific manner. “Next, we can identify what is the gene that this particular sequence bound by this particular factor regulates.” Then by comparing the normal and disease state, it is possible to see if the active sequences and factors are the same or different between, for example, normal and cancer cell.

­– This is one of the many ways functional genomics works - you can try to interpret the meaning of the whole genome using defined assays. That is what got me interested in the field, Dr Sahu tells.

Data-driven approach

Many transcription factors are over-expressed in cancer. One of the most common factors, MYC, is an oncogene that binds to regulatory regions of multiple human genes. In hormonal cancers such as prostate and breast cancers, the master transcription factors are androgen and oestrogen receptors, respectively. These have completely normal functions in human physiology as they regulate the development of secondary sexual features and reproductive functions. However, in the case of these two hormonal cancers, the respective proteins get deregulated.

During his doctoral studies, Dr Sahu studied androgen receptor function in prostate cancer.

­– Androgen receptor over-expression correlates with the worst prognosis of prostate cancer. To prevent androgen receptor function, androgen deprivation therapy is commonly used in prostate cancer patients. However, the relapses still happen because cancer cells can figure out how to overcome this and still maintain active androgen receptor signaling, Dr Sahu explains.

­– This is an example of how a basic assumption on how the transcription factors behave proved unsuccessful, Dr Sahu continues. Thus, we need more complete information on how these factors work.

Transcription factors are so essential to normal physiology and cellular functions that they are also very difficult to target using drugs. They don’t work on their own, but in collaboration with multiple different co-regulators. Once we know the defined co-regulators and if they are going haywire in a particular disease state, then it is possible to develop strategies to therapeutically target them.

­– It gets more interesting when you start to looking at things in a genome-wide manner and in my opinion, it is a better way of doing research - because you don’t make a lot of a priori assumptions, Dr Sahu says.

Genomics approaches are data driven. Researchers generate a lot of genome-wide data sets which allow them to address bigger goals and to ask more defined questions based on the actual measurements something similar to what is done with other quantitative sciences.

­– We don’t start from a preconceived notion of what is going to happen in a particular experiment or disease state. We get to ask the questions from the functional data that we generate in line with our hypothesis and this also gives the opportunity to apply computational models for better prediction and identification in a more integrative manner, Dr Sahu clarifies.

Career highlights so far

Dr Sahu looks back at his PhD years as one of his career highlights. He received a distinction for his PhD from the Faculty of Medicine at the University of Helsinki, and published papers in high impact journals. One of them was a Faculty of 1000 paper on prostate cancer signaling that is still frequently cited today. Importantly, his PhD gave Dr Sahu the first window to enter the field of transcription factor function.

After finishing his PhD that focused only on few factors, Dr Sahu was driven by the idea of extending his research into all transcription factors and more cancer types. He soon found his new scientific home for post-doctoral work at the Professor Taipale’s lab in the Karolinska Institute and University of Helsinki. Taipale group uses a genome-wide approach to understand the function and binding sequences of all ~2000 transcription factors found in the human genome.

Since Dr Sahu’s goal was to understand cancer signaling in a comprehensive manner, he planted his feet on two different fields.

­– On one side I was working towards understanding the sequence specificities and sequence determinants of human gene regulatory elements in a functional manner for all the transcription factors. On the other side, I was developing a molecular approach for studying the role of oncogenic drivers, which are very cancer-specific, together with lineage-defining transcription factors in human cell transformation.

Dr Sahu acknowledges that leading these two ambitious projects and producing intriguing novel findings is certainly among the highlights of his career.

­– For the first time, we were able to generate cancer cells from normal cells using only the tissue-specific oncogenic mutations that are found in the actual human tumors. My results highlight the role of cellular identity and state in the development of human cancer, Dr Sahu tells.

Together with his contributions to several other projects, Dr Sahu’s post-doctoral period was highly successful, culminating in several high-impact publications.

­– Overall, I feel that my rigorous training during both PhD and post-doctoral work has prepared me well for the next steps in my career, he concludes.

Future vision

What keeps Dr Sahu motivated in his everyday work is not only his research in the context of cancer, but also imparting the knowledge and training he has gained in the field of transcription factor function, oncogenic gene regulation and a wide variety of functional genomic methods to the next generation of researchers.

­– Developing new research methods, generating data and following novel scientific leads keeps me intrigued. We can study the gene regulatory programs using more defined approaches and then use more defined ways to tackle these questions – this will help both in preventing and treating cancer in the future and is a big motivation for this work. But the primary goal is unraveling novel molecular mechanisms paving the way towards functional precision medicine.

Dr Sahu started as a group leader at NCMM in September 2022. For the next 10 years, his vision is to understand the early stages of human tumorigenesis by employing a molecular approach in a defined and systematic studies, and to understand how different transcription factors operate in different cancer types. Using this genome-wide perspective, his main goal is to identify, understand, and validate what is the regulatory logic employed by the cancer cells and how different that is from a normal state.

­– Once we have a better understanding of the regulatory logic in different cell and cancer types, we can start asking even more complex questions about the molecular mechanisms that lead to development of cancer, Dr Sahu concludes.

Learn more

Sahu group

Sahu group external website

By Larissa Lily
Published Dec. 19, 2022 11:21 AM - Last modified July 14, 2023 10:46 AM