KANSARL Fusion Gene

Our scientists, in collaboration with colleagues from USA and China, have identified the KANSARL fusion gene as the first familially-inherited cancer susceptible fusion gene specific to the population of European ancestry origin. The KANSARL fusion gene is also the cancer gene being discovered so far affecting the largest numbers of people and families. KANSARL is a fusion gene between KANSL1-ARL17A on the negative strand of 17q21.31 and is likely to encode the truncated KANSL1 protein, a regulatory subunit of MLL1 and NSL1 complexes involved with chromatin histone H4 and p53 acetylation.

We have used the SplicingCodes model to develop the SCIF system (SplicingCodes Indentify Fusion Transcripts) to identify six KANSARL fusion isoforms in cancer cell lines. Then, the scientists systematically analyzed the RNA-seq data of many cancer types including glioblastoma, prostate cancer, lung cancer, breast cancer, and lymphoma from different geological regions of the World. The KANSARL fusion transcripts were rarely detected from the tumor samples of the patients from Asia or Africa, but they were present in 30 - 52% of the tumors from North American cancer patients. Analysis of CEPH/Utah Pedigree 1463 revealed that KANSARL is a familially-inherited fusion gene. Further analysis of RNA-seq datasets of the 1000 Genome Project found out that the KANSARL fusion gene occurred specifically to 28.9% of the population of European ancestry origin.

The presence of KANSARL fusion transcripts in normal and adjacent tissues raise the possibility that KANSARL fusion transcripts are derived from a germline-inherited fusion gene. To investigate this possibility, we performed RNA-seq data analysis for the lymphoblastoid cell lines derived from the families in the CEU population (CEPH/Utah Pedigree 1463, Utah residents with ancestry from northern and western Europe), which includes a 17-individual, three-generation family [47]. Supplementary Table 9 and Supplementary Figure 8 show that KANSARL fusion transcripts are detected in 15 of 17 family members, as indicated by black squares and circles in Supplementary Figure 8, except son (NA12885), which is deviated from the first Mendel law. A reasonable explanation is that the grandfather sample (NA12889) might have been mixed with the son sample (NA12885). To prove this possibility, we performed analyses of the WSG data (PRJEB3381) and RNA-seq from 1000 Genome Project, and shown that both WSG and RNA-seq of grandfather sample (NA12889) are KANSARL-negative while WGS of the son (NA12885) is KANSARL-positive. WGS analysis shows the genomic breakpoint 1 and 2 of the KANSARL fusion gene identified by analysis of WGS data among some members of CEPH/Utah Pedigree 1463. Therefore, Both WGS and RNA-seq data support that the father (NA12877) and the mother (NA12878) have the genotypes of KANSARL-/KANSARL- and KANSARL+/KANSARL+ respectively and all their offsprings are the genotype KANSARL+/KANSARL- (Figure 6a).

As shown above, KANSARL is a familially-inherited fusion gene. A critical question is whether KANSARL fusion transcripts exist in general populations. To answer this question, we analyzed RNA-seq data of the lymphoblastoid cell lines of the 1000 Genome Project [48]. It has been shown that no single copy of KANSARL fusion transcripts has been detected in the Nigeria YRI (Yoruba in Ibadan) population. In contrast, Figure 6b shows that KANSARL fusion transcripts have been found in 33.7% GBR (British from England and Scotland), 26.3% FIN (Finnish in Finland) and 26.9% TSI (Toscani in Italia) populations, respectively. The differences of KANSARL frequencies among the GBR, FIN, and TSI population are not statistically significant (data not shown), suggesting that these differences may be caused by sampling errors. However, their differences with that of Nigeria YRI are statistically significant (Fisher's exact test, p<0.001), confirming our claim that KANSARL fusion transcripts are specific to the population of European ancestry origin.

Kits for KANSARL fusion transcripts and gene are available for research only, please contact: support@splicingcodes.com



Figure1

Figure 1 Identification and characterization of KANSARL (KANSL1 - ARL17A) fusion transcripts
a). A schematic diagram showing steps of genetic rearrangements from normal genomic structures of ARL17A → KANSL1 genes to inverted genomic structures of KANSL1 → ARL17A genes on the chromosomal band 17q21.31. Dashed white horizontal arrow and solid white vertical arrow represent genomic rearrangements and potential fusion gene structures. Solid red and black horizontal arrows indicate ARL17A and KANSL1 genes, respectively. Solid blue arrows represent LRRC37A and MAPT genes, respectively. The dashed horizontal black arrow indicates undetermined genomic regions. Black and black squares represent KANSL1 and ARL17A exons respectively. b). The schematic diagram shows KANSARL fusion transcripts identified so far. Black and red squares represent KANSL1 and ARL17A exons respectively. Dashed lines indicate omitted regions. The numbers above the black and red squares are exon numbers. The numbers within sequences indicate omitted numbers of nucleotides; c). Validation of KANSARL isoform 1 in A549, HeLa, K562, 786-O and OS-RC-2 cell lines; d). Validation of KANSARL isoform 2 in A549, HeLa, K562, 786-O and OS-RC-2 cell lines; e). Detection of KANSL1 gene expression in A549, HeLa, K562, 786-O and OS-RC-2 cell lines; f). Detection of ARL17A gene expression in A549, HeLa, K562, 786-O and OS-RC-2 cell lines; g). Detection of GAPDH gene expression as positive controls in A549, HeLa, K562, 786-O and OS-RC-2; h) Sanger sequencing validation of KANSARL isoform 2. The black and red letters represent KANSL1 exon 3 and ARL17A exon 3 sequences, respectively. And i) Sanger sequencing validation of KANSARL genomic breakpoint in the Hela-3 cell line. The black and red letters indicate KANSL1 and ARL17A intronic sequences, respectively. Vertical arrows indicate the fusion junctions. The black and red lines indicate KANSL1 and ARL17A sequences, respectively. All markers are 100 bp DNA markers.



Figure2

Figure 6. Inheritance and distribution of KANSARL fusion transcripts in the population of European ancestry origin
a). Diagrams of correct KANSARL inheritance in the CEPH/Utah Pedigree 1463, which includes four grandparents, two parents, and eleven children. Black and white squares represent KANSARL-positive and KANSARL-negative males while black and white squares indicate KANSARL-positive and KANSARL-negative females. The black lines represent relationships among the family members. The diagram is drawn based on RNAs-seq and WGS data. b). Analysis of KANSARL fusion transcripts in the RNA-seq data of the lymphoblastoid cell lines of the 1000 Genome Project [48]. The diagram shows frequencies of KANSARL fusion transcripts in some populations of European and African ancestries. GBR is British from England and Scotland; FIN indicates Finnish in Finland; TSI represents Toscani in Italia, and YRI is Yoruba in Ibadan, Nigeria.