Play all audios:
ABSTRACT NF-Y is a pioneer transcription factor—TF—formed by the Histone-like NF-YB/NF-YC subunits and the regulatory NF-YA. It binds to the CCAAT box, an element enriched in promoters of
genes overexpressed in many types of cancer. NF-YA is present in two major isoforms—NF-YAs and NF-YAl—due to alternative splicing, overexpressed in epithelial tumors. Here we analyzed NF-Y
expression in stomach adenocarcinomas (STAD). We completed the partitioning of all TCGA tumor samples (450) according to molecular subtypes proposed by TCGA and ACRG, using the deep learning
tool DeepCC. We analyzed differentially expressed genes—DEG—for enriched pathways and TFs binding sites in promoters. CCAAT is the predominant element only in the core group of genes
upregulated in all subtypes, with cell-cycle gene signatures. NF-Y subunits are overexpressed, particularly NF-YA. NF-YAs is predominant in CIN, MSI and EBV TCGA subtypes, NF-YAl is higher
in GS and in the ACRG EMT subtypes. Moreover, NF-YAlhigh tumors correlate with a discrete Claudinlow cohort. Elevated NF-YB levels are protective in MSS;TP53+ patients, whereas high
NF-YAl/NF-YAs ratios correlate with worse prognosis. We conclude that NF-Y isoforms are associated to clinically relevant features of gastric cancer. SIMILAR CONTENT BEING VIEWED BY OTHERS
NF-YAL DRIVES EMT IN CLAUDINLOW TUMOURS Article Open access 28 January 2023 THE VAX2-LINC01189-HNRNPF SIGNALING AXIS REGULATES CELL INVASION AND MIGRATION IN GASTRIC CANCER Article Open
access 21 October 2023 CLASSIFYING GASTRIC CANCER USING FLORA REVEALS CLINICALLY RELEVANT MOLECULAR SUBTYPES AND HIGHLIGHTS _LINC01614_ AS A BIOMARKER FOR PATIENT PROGNOSIS Article Open
access 19 March 2021 INTRODUCTION Gastroesophageal tumors are among the most widespread cancers worldwide1. Stomach adenocarcinomas—STAD—share a survival outcome of patients that, despite
many efforts, remains poor. The Lauren histological classification divides gastric cancers into intestinal (IT), diffuse (DF) and mixed (MX)2,3. Further microarrays profilings studies have
since classified tumors according to molecular subtypes4,5,6,7. More recently, TCGA has proposed a classification based on genetic mutations, chromosomal alterations, epigenetic features and
RNA-seq expression data that included four subtypes: EBV (EBV-infected), MSI (MicroSatellite Instability), GS (Genomically Stable) and CIN (Chromosomal Instability)8. In parallel, the ACRG
(Asian Cancer Research Group) proposed another classification, originally based on independent microarray profilings, also consisting of four subtypes: EMT (Epithelial to Mesenchymal
Transition), MSS;TP53- (MicroSatellite Stable, inactive tumor protein 53), MSS;TP53+ and MSI9,10. The two classifications are partially overlapping (Reviewed in Refs.11,12,13). In general,
cellular transformation causes—and in some cases is caused by—changes in mRNA production patterns. The first step in this process is the binding of sequence-specific transcription
factors—TFs—to DNA elements in promoters and enhancers, entailing recruitment of chromatin modifying Cofactors14. Changes in the structure or expression of TFs can cause permanent changes
that lead to transformation. The identification of TFBSs—transcription factor binding sites—in promoters of genes overexpressed in cancer led to the identification of the CCAAT box as one of
the most widely enriched15. CCAAT is typically crucial for high-level expression of genes16. This box is recognized by NF-Y, a heterotrimer formed by the histone fold domain—HFD—dimer
NF-YB/NF-YC and the sequence-specific NF-YA17. NF-YA has two alternatively spliced isoforms—NF-YAs and NF-YAl—differing in 28/29 amino acids coded by exon 318. NF-YC is also present in
multiple isoforms, resulting from alternative splicing at the C-terminal of the protein19. In both subunits, this involves the glutamine-rich trans-activation domains (TADs), while the
subunits-interaction and DNA-binding domains are common to all isoforms. NF-Y subunits are rarely mutated in tumors, yet the NF-Y regulome—ChIP-seq and functional analysis—point to
cell-cycle and metabolic pathways being positively affected20: specifically, rate-limiting, cancer-promoting genes of different anabolic routes—amino acids, lipids, nucleotides—are
activated21. Reports on the expression of NF-Y subunits in tumors emerged recently. In ovarian22,23, breast24,25, lung26,27, liver28 and head and neck squamous cell carcinomas (HNSCC)29,
overexpression of NF-YA was reported. As for gastric cancer, two studies provide evidence for a specific function of NF-YA: microarray-based differentially expressed genes (DEG) of gastric
cancer identified NF-YA as a key TF, specifically in the DF subtype, with prognostic significance30; NF-YA inactivation has a more profound growth suppressive effect in a DF than in a IT
cell line. Another study analyzing TCGA data found high expression of NF-YA, including of the protein in STAD specimens31; this correlated with Cyclin E, a gene often amplified and
overexpressed in STAD datasets32,33. These two studies did not report on the relative levels of the two major NF-YA subunits, which are clinically important in breast, lung and HNSCC
cancers25,26,27, nor of the HFD subunits, which might be relevant in light on our recent finding on their overexpression in liver Hepatocarcinomas and HNSCC28,29. We report here on the
analysis of STAD RNA-seq data present in TCGA, as further classified according to TCGA and ACRG. We confirm NF-YA global overexpression, extend this finding to HFD subunits, and investigate
the isoforms of NF-YA. RESULTS NF-Y SUBUNITS ARE OVEREXPRESSED IN STAD Inspection of NF-Y subunits expression of the TCGA datasets (http://firebrowse.org) suggested that expression of NF-YA
is globally increased in epithelial tumors25. We downloaded the available STAD RNA-seq dataset8 and analyzed NF-Y subunits: NF-YA is robustly increased in STAD (p value: 10–14). NF-YB and
NF-YC are also increased (p values: 10–07/08) (Fig. 1a). We then analyzed the levels of NF-YA isoforms: Fig. 1b shows that the levels of the “short” NF-YAs increase in tumors (p value
10–15), unlike NF-YAl. In conclusion, we confirm a generalized overexpression of NF-Y subunits, especially NF-YA, in STAD. The predominance of NF-YAs prompted us to verify the relative
expression in gastric cancer cell lines. For this, we interrogated two repositories: the Broad Institute CCLE—Cancer Cell Lines Encyclopedia (https://portals.broadinstitute.org/ccle/about)
and a recently described set of gastric cancer lines34; overall, we analyzed 50 cell lines, with a partial overlap of lines common to the two datasets. We downloaded RNA-seq data, mapped
reads and analyzed NF-Y subunits levels. The results are shown in Fig. S1: the overall levels of NF-YA mRNA expression are variable with the majority, but not all, cell lines expressing
primarily NF-YAs (Fig. S1a). The levels of the two HFD subunits, particularly NF-YB, are comparably less variable among the cell lines (Fig. S1b,c). We conclude that NF-Y subunits are
overexpressed in STAD, particularly NF-YA, whose predominant isoform is NF-YAs, in gastric tumors and cell lines. EXPRESSION OF NF-Y ISOFORMS IN STAD SUBTYPES According to several genetic,
epigenetic and functional parameters, TCGA classified STAD in four subtypes8. Since overexpression of NF-Y subunits could be limited to one -or more- of the subtypes, we investigated the
levels of the three subunits in the four cohorts. Currently, RNA-seq data on 415 tumors are available, of which 387 were categorized by TCGA. We first classified all tumors for which there
are RNA-seq data, employing the DeepCC machine learning tool35, with a training set represented by those already classified by TCGA: the relative proportions are indeed essentially
maintained (Fig. 2a). Figure 2b (Left Panels) shows that the relative increase of NF-YA is similar in CIN, EBV and MSI (p values of 10–12/15 relative to normal samples), but in GS, the
levels are lower. NF-YB and NF-YC are increased at comparable levels in all subtypes. As for the isoforms, the data are shown in Fig. 2b (Right Panels): NF-YAs is increased in MSI, EBV and
CIN (p values 10–14/16 with respect to normal samples), less in GS. NF-YAl, instead, shows a significant increase in GS. As a consequence of these changes, the NF-YAl/NF-YAs ratio is
substantially increased in GS with respect to the other subtypes. In summary, overexpression of NF-YAs is generally widespread, but there is a distinctly higher NF-YAl/NF-YAs ratio in GS
tumors. STAD DIFFERENTIALLY EXPRESSED GENES—DEG—HAVE CCAAT IN PROMOTERS To gain insight on the gene expression programs altered in STAD, we compared RNA-seq data of STAD tumors to those of
the respective normal samples, using a |log2FC|> 0.5, FDR < 0.01 threshold. The lists of DEG are in Supplementary Table S1. We analyzed the promoters (− 450 to + 50 from the TSS) of
overexpressed genes with the Pscan software, which pinpoints enriched TFs matrices36. The NF-Y matrix is absent, and E2Fs and SP/KLFs are at the top of the list of upregulated genes (Fig.
S2a, Left Panel). As for downregulated genes (Fig. S2a, Right Panel), CCAAT is absent, and Zn Fingers TFs are enriched. Thereafter, we used KOBAS to identify Gene Ontology terms in DEG: in
upregulated genes, nuclear terms—_nucleolus_, _nuclear chromatin_, _cell division_, _DNA replication_—predominate; different terms are also present in downregulated genes (Fig. S2b). With
the same thresholds, we then performed analysis of RNA-seq of the individual TCGA subtypes. Venn diagrams of the overlaps are shown in Fig. 3a and the lists of genes are in Supplementary
Table S2. As for subtype-specific TFBS, distinct matrices are enriched in the four subtypes (Fig. S3a): SP1/2 in CIN, ETS-family in EBV, Zn fingers TFs in GS and MSI (EGR1/2/3, Sp2/4). We
analyzed Gene Ontology terms of DEG: Fig. S3b shows specific gene signatures for individual subtypes: in CIN, _cellular protein metabolism, spermatogenesis_; in EBV, _viral process_, _T cell
signaling_; in GS, _extracellular matrix_, _cell adhesion_; in MSI, _nucleolus_. Analysis of the common set of 898 genes upregulated in all subtypes have NF-Y at the top of the enriched
matrices, and features described in global DEG, such as _extracellular matrix_, _cell division_, _DNA replication_, with the addition of _extracellular matrix_ terms (Fig. 3b). Overall, we
conclude that CCAAT is the primary site only in promoters of commonly upregulated genes, but it is absent in those specific to each TCGA subtype. CLINICAL OUTCOME OF NF-Y OVEREXPRESSION IN
STAD ACCORDING TO THE TCGA SUBTYPES We stratified the progression free interval—PFI—of STAD patients according to High, Intermediate, Low levels of NF-Y subunits expression. In addition, we
considered the ratios of NF-YAl/NF-YAs, because this parameter was more informative than the overall levels of the two isoforms to predict patient outcomes in breast, lung and HNSCC
cancers25,26,27,29. No correlation is scored according to the different levels of NF-YA and of the HFD subunits (Fig. S4), nor to the ones of NF-YAl and NF-YAs isoforms (Fig. 4a, Upper
Panels). As for the NF-YAl/NF-YAs ratios, instead, we did find a robust correlation with worse prognosis (p value 0.0099) (Fig. 4a, Lower Panel). We then focused on PFIs of NF-YA ratios
stratified according to the single subtypes: a correlation with poor prognosis was scored in CIN and EBV (Fig. 4b), but not in GS and MSI (Fig. S5). In summary, a higher NF-YAl/NF-YAs ratio
does have relevant clinical implication in STAD, globally and in specific TCGA subtypes. EXPRESSION OF NF-Y ACCORDING TO THE ACRG CLASSIFICATION A second STAD molecular classification was
proposed by ACRG. This was originally based on profiling analysis, and thereafter applied to the TCGA RNA-seq database on a partial set of 204 samples9. As above, we first used DeepCC and
the training set to classify all TCGA tumors in the four ACRG subclasses: unclassified samples are reduced from 211 to 16 (Fig. S6a). The proportion of the four classes are relatively well
maintained, with EMT being the most abundant (122 samples). A direct comparison between the TCGA and ACRG classifications is shown in Fig. 5a: most GS samples are found in EMT, which also
harbors a sizeable number of CIN; MSI samples are largely shared, while EBV are partitioned among the four subclasses. With the extended ACRG dataset on hand, we evaluated the levels of NF-Y
subunits and isoforms: Fig. 5b (Left Panels) shows similar levels of NF-YA and NF-YC, lower levels of NF-YB in MSS;TP53- and MSS;TP53+. Figure 5b (Right Panels) shows higher levels of
NF-YAl, and lower of NF-YAs, in EMT samples, leading to an increased ratio of these isoforms. The presence of CIN samples in all ACRG subtypes, particularly EMT, led us to analyze NF-Y
expression of CIN within ACRG subclasses: globally, the levels are similar (Fig. 5c, Left Panels), with those within the EMT group having distinctly higher levels of NF-YAl, lower NF-YAs
and, by consequence, higher ratios (Fig. 5c, Right Panels). Note that analysis of STAD cell lines shows that most EMT lines, classified as such by Lee et al. 34, indeed express the lowest
levels of NF-YAs and highest of NF-YAl (Fig. S1a). We conclude that the EMT subclass of ACRG includes GS, as well as a portion of tumors catalogued as CIN, having a high ratio between NF-YAl
and NF-YAs. CLINICAL OUTCOME OF NF-Y EXPRESSION ACCORDING TO THE ACRG SUBTYPES Next, we evaluated the clinical outcome of patients according to the ACRG classification. Stratification
according to NF-YAl/NF-YAs ratios indicate no clinical relevance in MSI, MSS;TP53− and MSS;TP53+, but worst prognosis with high and intermediate levels in EMT (Fig. 6a). This is in agreement
with the CIN data (Fig. 4b) and with the notion of a cluster of CIN tumors with high NF-YAl/NF-YAs ratios being inserted in the EMT subtype of ACRG (Fig. 5c): this could be responsible for
the correlation seen in EMT, but not in GS. To substantiate this point, we calculated the distribution of the NF-YAl/NF-YAs ratios in GS and EMT: Fig. 6b shows that GS has a flatter
distribution, with more samples with very high ratios (35% are ≥ 1), whereas EMT has fewer samples with high ratios (25% are ≥ 1), but a larger population with ratios between 0.2 and 0.5.
Thus, EMT is in part fed by the CIN samples that show high ratios (Fig S6b). Note that EBV and MSI have essentially no samples above a 0.35 ratio. Thereafter, we stratified EMT samples
according to low and intermediate/high ratios: the curve of the latter significantly correlates to a worst outcome (p value 0.012) (Fig. 6c, Left Panel). In addition, we reasoned that the
overall levels of NF-YAs might also be impactful: stratification according to NF-YAs levels indeed indicates a protective effect of this isoform (Fig. 6c, Right Panel). Finally, analysis on
the levels of HFD subunits in ACRG subtypes yielded negative results (Fig. S7), except for NF-YB, whose high levels are protective in MSS;TP53+ (Fig. 6d). Altogether, these data reinforce
the role of the relative levels of the two NF-YA isoforms in the outcome of EMT, as well as pointing at a novel role of NF-YB in the MSS;TP53+ subtype. NF-YAL IS PREDOMINANT IN CLAUDINLOW
STAD TUMORS We previously reported on association of high NF-YAl levels in a subclass of BRCA showing low levels of Claudin 3/4/7 expression25, a cluster associated with EMT features and
poor prognosis. By analyzing TCGA STAD data, Nishijima et al. identified a specific group of tumors—46 samples—based on three features: epithelial to mesenchymal transition (EMT),
tumor-initiating cells (TIC) and a Claudinlow phenotype37; this group was separated from CIN and GS (TCGA classification) and EMT (ACRG classification). Importantly, these Authors derived a
24-strong gene signature predictive of this subclass: we used it to conduct a hierarchical clustering of the entire TCGA dataset; Fig. 7a shows the dendrogram with the identification of 79
samples with these gene expression features; this cohort is clearly separated by the other tumors based on a strong statistical bias (p value: 2.91 × 10–4). We first checked how this
signature features each subtypes: Fig. S8 shows below zero median Z scores of CIN, EBV and MSI (TCGA), MSI, MSS;TP53- and MSS;TP53+ (ACRG); instead, good concordance is scored within the GS
and EMT groups. Because of the presence of low levels of epithelial Claudins, we will refer to this group as Claudinlow. Next, we positioned this group within the other TCGA and ACRG
subtypes (Fig. 7b): most tumors of the Claudinlow cluster are from the GS and CIN (TCGA) and EMT (ACRG) subtypes. In essence, the Claudinlow group could be classified as new within TCGA,
while being essentially a subclass of the EMT ACRG subtype. Overall, these data confirm the existence of the subgroup proposed by Nishijima et al., further expanding it to 79 TCGA samples,
with robust statistical significance. Next, we evaluated the expression of NF-YA isoforms and their relative ratio including the Claudinlow group. Figure 7c,d show the results according to
the TCGA and ACRG subtypes, respectively: NF-YAl is mostly present in the Claudinlow class, with far lower levels in the remaining samples of the ACRG EMT subtype. On the contrary, NF-YAs is
lowest in Claudinlow, and higher in all other ACRG and TCGA subtypes, with the exception of GS. As a consequence, the NF-YAl/NF-YAs ratio is significantly increased (lowest p values: 10–16)
mostly in the Claudinlow group. These data indicate that NF-YAl is mostly associated to a discrete number of STAD samples with EMT and Claudinlow features. To verify the overlap between the
Claudinlow and NF-YAlhigh (and NF-YAslow) subsets, we stratified the clinical outcome of Claudinlow tumors according to NF-YA isoforms expression (High, Intermediate, Low): no further
worsening of prognosis in PFI curves is scored according to the different levels of NF-YA isoforms (Fig. S9, Upper Panels), nor NF-YAl/NF-YAs ratio (Fig. S9, Lower Panel). We conclude that
there is a large overlap between the subset classified as Claudinlow and NF-YAlhigh tumors. CCAAT BOX IS ENRICHED IN UPREGULATED PATHWAYS OF CLAUDINLOW SAMPLES To further investigate the
Claudinlow cluster, we compared pathways in Claudinlow and EMT versus normal samples. The analysis of DEG in EMT shows absence of CCAAT in promoters (Fig. S10a). Across EMT upregulated
pathways, we did find mesenchymal terms such as _extracellular matrix_, _heart development_, _mesenchyme development_ (Fig. S10b). Within the TF motifs enriched in the promoters of genes of
each single category, we observed significant enrichment of the NF-Y motif in _cell-cycle_ terms, as expected, and in _mesenchyme development_ and _pattern specification process_. In
downregulated pathways, we observed different _metabolism_ terms, also expected (Fig. S10c). The same analysis performed on Claudinlow samples did not yield NF-Y motifs as enriched in
deregulated genes, but rather MAZ, E2F6 and KLFs motifs (Fig. 8a); these TFs were confirmed by analyzing ChIP-seq data from the ChIP-Atlas database38 (Supplementary Table S3). Among
upregulated pathways we found _extracellular matrix_ and _mesenchyme development_ terms (_heart development_, _skeletal system_, and _pattern specification process_). As above, the CCAAT box
was enriched in terms related to mesenchyme (Fig. 8b). Various metabolic processes populated the downregulated pathways (fatty acid and lipid metabolic process), expectedly regulated by
NF-Y and with CCAAT motifs (Fig. S11). DISCUSSION Because of its histone-like structure17, positioning within promoters16, synergistic connections with many other TFs and interactions with
coactivators, NF-Y is believed to play a pioneering role in “opening” promoter structures and correct positioning of RNA Pol II39. Specifically, NF-Y is important for genes required for cell
proliferation20. We describe here an investigation on NF-Y subunits levels in gastric cancer. We report the presence of CCAAT in commonly overexpressed genes and overexpression of NF-YA
isoforms, as well as a prognostic value of their relative levels. We also report on overexpression of the HFD subunits, and clinical significance of NF-YB. CCAAT boxes have been routinely
found in promoters of genes overexpressed in cancer, first in large microarrays profiling15 and more recently in RNA-seq datasets. Our analysis of TCGA identified CCAAT in overexpressed
genes, typically with E2Fs sites, in line with the pro-growth role of these TFs. Specifically, two schemes are starting to emerge. In the first, CCAAT is enriched globally, and indeed at the
top of the TFBS list, when all upregulated genes are computed: it is the case of lung tumors26,27; in the second, the enrichment is found either in specific subtypes—iCluster 3 in HCC28—or
only in DEG shared by all subtypes, as in BRCA25 and STAD, as shown here. In global STAD DEG, TFBS in promoters of upregulated genes contain the familiar E2Fs motifs, along with Zn Finger
TFBS (SPs/KLFs), but CCAAT is absent. As in BRCA, however, it comes out first when considering the core group of upregulated genes shared in all STAD subtypes. We also find that CCAAT is
absent in promoters of genes downregulated in STAD, as for all other types of cancer examined so far. This further reinstates that this element is not a “general” signal enriched in
promoters per se, but rather a core logo driving expression of genes associated to growth, not necessarily related to transcriptional features that are cancer- or subtype-specific. The HFD
subunits are overexpressed in STAD, unlike in lung and breast tumors. We recently reported a similar scenario in HCC, in which high levels of these subunits correlate with worst prognosis in
a specific subtype, iCluster1. In STAD, global or subtype-specific PFI curves are globally superimposable based on NF-YB or NF-YC expression, with one notable exception: the MSS;TP53+ ACRG
subtype, in which high NF-YB levels correlate with a better prognosis. As for HCC, the fraction of p53 wt tumors in STAD is much higher—51%—than in other epithelial cancers (lung for
example), in which the vast majority are p53 mutated, rendering comparisons with wt p53 samples essentially impossible. Note that the protective role of NF-YB in STAD is opposite to what we
reported in HCC iCluster1 tumors, generally associated to wt p53 status: although direct NF-Y/p53 interactions have been reported in several studies20, the reasons for association of NF-YB
levels to such genetic background is unclear. Nevertheless, a role of HFD subunits in cancer progression is starting to emerge; in this respect, measurement of protein levels in tumors
deserve a close look in the future: in BRCA cell lines, for example, the NF-YB protein seems to be more variable than one could anticipate from mRNA levels25. Overexpression of NF-YA mRNA is
as obvious in STAD as in the tumors previously analyzed. Note that analysis of 22 cancer specimens confirms that higher expression is also found at the protein level30. In the same study,
high levels of NF-YA and Cyclin E in TCGA STAD samples were associated to worsening of patients’ prognosis: yet, we do not find here a prognostic value of global levels of NF-YA. In another
study, NF-YA high expression correlated with prognosis in a separate set of tumor samples analyzed by microarray profilings31, but only in the Diffuse (DF), not in the Intestinal (IT)
subtype (Lauren classification). We add a novel and relevant twist, in that isoform ratios—rather than global levels—are clinically important within subclasses of STAD. The two major NF-YA
splicing isoforms differ in the Gln-rich trans-activation domain (TAD): NF-YAl has 28/29 extra amino acids coded by exon 3, predicted to impart different activation potential, as reported in
mESCs and myoblasts40,41. In addition, a shorter isoform—NF-YAx—lacking sequences of exon-3 and exon-5 was recently found overexpressed in Neuroblastomas42. As in the other epithelial
cancers, we find that NF-YAs predominates, but higher expression of NF-YAl, alone or coupled to lower levels of NF-YAs, is clinically relevant. The TCGA GS subtype is enriched in DF
samples8, which is indeed in line with the data reported by Cao et al.30. GS tumors are characterized by earlier onset and expression of “cell adhesion” signatures. The NF-YAl/NF-Ys ratio is
shifted in GS and the same pattern is observed stratifying tumors according to the ACRG classification: higher NF-YAl/NF-YAs ratios are found in EMT tumors. The relatedness of these
subtypes in the two classifications was commented before11,12,13: indeed analysis of GO terms and pathways of DEG in these subtypes are in agreement with a mesenchymal phenotype. The ACRG
EMT has 48 samples catalogued as CIN by TCGA: interestingly, the PFI of CIN patients indicates a worst prognosis following the NF-YAl/NF-YAs ratios. Our comparative analysis of the whole set
of TCGA tumors suggest clinical relevance for NF-YB and NF-YA isoforms in subgroups of the ACRG classification. Specifically, NF-YA-wise, the ACRG EMT group is more revealing than the TCGA
GS, most likely because of the inclusion of CIN tumors with EMT-like profilings. While in the EMT group the role of NF-YA ratios is clinically visible, in the TCGA GS it is not. One possible
explanation is the lower dispersion of ratios and lower number of samples in this latter group, making comparison of quartiles difficult. Incidentally, this also allowed to score a
protective role of NF-YAs, completely missed by adhering to the TCGA classification. Another feature emerging in the ACRG classification is the protective role of high NF-YB levels, as
discussed above. These differences might reflect the fact that RNA profilings are the basis of ACRG, while TCGA factored in other genetic and epigenetic features of STAD. The parallel of the
present data with what we found in breast carcinoma is noteworthy. NF-YAs is also predominant in BRCA, except in the Claudinlow subset of Basal-like tumors, that have higher levels of
NF-YAl. This is associated to a shift in DEG in these tumors, from signatures dominated by proliferative terms in NF-YAshigh tumors, toward activation of EMT signatures. In turn, this is
clinically associated to an aggressive, metastatic, drug-resistant behavior. As in BRCA, the NF-YAl/NF-YAs ratio is clinically informative in STAD, but in this case the protective role of
NF-YAshigh in the EMT subtype is novel. Nishijima et al. showed that overall survival curves and Hazard ratios of the 46 Claudinlow patients are indeed worse with respect to other subtypes,
dramatically so within the ACRG-classified patients. This suggests that the Claudinlow partitioning is particularly significant with ACRG. We extended this group to 79 TCGA tumors by using
the signature described: our results confirm and extend the scenario proposed by these Authors, particularly within the ACRG classification, which better partitions the protective role of
NF-YAs from the detrimental role of NF-YAl in the Claudinlow group. Furthermore, it appears manifest the overlap of tumors with Claudinlow and NF-YAlhigh features. In general, these data
invite further analysis in epithelial cancers to identify (i) Claudinlow signatures in other types of epithelial cancers, and (ii) a threshold of NF-YA isoforms ratios, rather than overall
levels, possibly responsible for shifting DEG away from proliferative, cell cycle genes toward mesenchymal ones. MATERIALS AND METHODS RNA-SEQ DATASETS As of December 2020, there were
RNA-seq data on 415 STAD primary tumors in TCGA and 35 non-tumor tissues. We downloaded the corresponding RSEM scaled count data from the http://firebrowse.org/ web page. The last published
classification of STAD samples in the four molecular subtypes made by TCGA referred to 387 of the 415 tumors, and we retrieved it from the https://www.cbioportal.org/ web page43,44; a
different classification was proposed by ACRG on 204 TCGA tumors9. All the experiments involving human data in these public datasets adhered to relevant ethical guidelines. The DeepCC tool35
was used to classify RNA-seq dataset of all tumors in TCGA, according to the TCGA and ACRG classification, using as a training set the tumors already classified by TCGA and ACRG,
respectively. We retrieved the FASTQ files associated to the 37 CCLE stomach cell lines (accession code: PRJNA523380)45, as well as the 29 cell lines collected by Lee et al. (accession code:
PRJNA327709)34, using the SRA Explorer website (https://sra-explorer.info/). From the FASTQ files, we calculated mRNA expression with RSEM-1.3.3. GENE EXPRESSION ANALYSIS Differential gene
expression analysis of RNA-seq data was performed using R package _DESeq2_46. The Tumor versus Normal expression fold change (FC) denotes upregulation or downregulation according to the FC
value. Log2FC, and the corresponding false discovery rate (FDR), were reported by the R package. FDR < 0.01 and |log2FC|> 0.5 were set as inclusion criteria for DEG selection in
tumor/subtype versus normal samples. GENE ONTOLOGY, PATHWAY ENRICHMENT AND TRANSCRIPTION FACTOR BINDING SITE ANALYSIS We used KOBAS 3.0 (http://kobas.cbi.pku.edu.cn/anno_iden.php) for
pathway enrichment analysis using the ENTREZ gene IDs. The TFBS and de novo motif analyses were performed using the Pscan software36, while ChIP-seq experiments enrichment analyses were
conducted with ChIP-Atlas38. To obtain TFBS enrichment heatmaps, input genes collections of the top GO terms from KOBAS analysis, sorted by FDR, were analyzed individually with Pscan. Only
GO terms with less than 500 background genes were included, and TFBS motif enriched (Pscan p value < 0.01) in less than 10 terms were filtered out. ANALYSIS OF CLINICAL DATA We retrieved
clinical data related to the TCGA STAD samples and progression free interval—PFI—time records of patients, respectively, from the https://www.cbioportal.org/ and the http://xena.ucsc.edu/
web pages43,44,47. We stratified all the tumors for which PFI records were available according to NF-Y subunits expression at gene level, NF-YA isoforms expression, and NF-YAl/NF-YAs ratio,
into three groups (Low = first quartile, Intermediate = second and third quartiles, High = fourth quartile). Survival analysis was performed according to the Kaplan–Meier analysis and
log-rank test48. HIERARCHICAL CLUSTERING AND Z SCORES COMPUTATION TCGA samples RSEM scaled count data were converted into TPM, log2-transformed, and median centered; we then performed a
hierarchical clustering of the samples with the R package _SigClust2_ (version 1.2.4) with “average” linkage and “euclidean” metric options, while the alpha parameter was set to 0.05.
Daughter nodes were tested if significance was achieved at the corresponding parent node, according to the built-in FWER controlling procedure. We obtained Z scores from log2-transformed
expression data for each gene of the Claudinlow signature, and a median Z score for each sample was computed across the genes of the signature. STATISTICAL ANALYSIS Analyses were performed
in the R programming environment (version 4.0.3), with the _ggplot2_, _ggpubr_, _survival_, _survminer, tidyverse_ packages. Single comparisons between two groups were performed with the
Wilcoxon rank-sum test. ABBREVIATIONS * TCGA: The Cancer Genome Atlas * ACRG: Asian Cancer Research Group * NF-YAl: Nuclear factor Y subunit A isoform long * NF-YAs: Nuclear factor Y subunit
A isoform short * NF-YB: Nuclear factor Y subunit B * NF-YC: Nuclear factor Y subunit C * E2F: E2 factor * TF: Transcription factor * TFBS: Transcription factors binding sites * FDR: False
discovery rate * HFD: Histone fold domain * STAD: Stomach adenocarcinoma * BRCA: Breast carcinoma * LUSC: Lung squamous cells carcinoma * LUAD: Lung adenocarcinoma * HCC: Hepatocellular
carcinoma * HNSCC: Head and neck squamous cells carcinoma * CIN: Chromosome instability * EBV: Epstein-Barr virus * GS: Genomically stable * MSI: MicroSatellite instability * EMT:
Endothelial to mesenchymal transition * MSS: MicroSatellite stable * TP53: Tumor protein 53 * TIC: Tumor-initiating cells * DEG: Differentially expressed genes * PFI: Progression free
interval REFERENCES * Siegel, R., Naishadham, D. & Jemal, A. Cancer statistics, 2012. _CA Cancer J. Clin._ 62, 10–29 (2012). PubMed Google Scholar * Laurén, P. The two histological
main types of gastric carcinoma: Diffuse and so-called intestinal-type carcinoma. _Acta Pathol. Microbiol. Scand._ 64, 31–49 (1965). PubMed Google Scholar * Hartgrink, H. H., Jansen, E. P.
M., van Grieken, N. C. T. & van de Velde, C. J. H. Gastric cancer. _Lancet_ 374, 477–490 (2009). PubMed PubMed Central Google Scholar * Kim, B. _et al._ Expression profiling and
subtype-specific expression of stomach cancer. _Cancer Res._ 63, 8248–8255 (2003). CAS PubMed Google Scholar * Jinawath, N. _et al._ Comparison of gene-expression profiles between
diffuse- and intestinal-type gastric cancers using a genome-wide cDNA microarray. _Oncogene_ 23, 6830–6844 (2004). CAS PubMed Google Scholar * Lee, Y.-S. _et al._ Genomic profile analysis
of diffuse-type gastric cancers. _Genome Biol._ 15, R55 (2014). PubMed PubMed Central Google Scholar * Tanabe, S., Aoyagi, K., Yokozaki, H. & Sasaki, H. Gene expression signatures
for identifying diffuse-type gastric cancer associated with epithelial-mesenchymal transition. _Int. J. Oncol._ 44, 1955–1970 (2014). CAS PubMed Google Scholar * Bass, A. J. _et al._
Comprehensive molecular characterization of gastric adenocarcinoma. _Nature_ 513, 202–209 (2014). ADS Google Scholar * Cristescu, R. _et al._ Molecular analysis of gastric cancer
identifies subtypes associated with distinct clinical outcomes. _Nat. Med._ 21, 449–456 (2015). CAS PubMed Google Scholar * Yu, Y. A new molecular classification of gastric cancer
proposed by Asian Cancer Research Group (ACRG). _Transl. Gastrointest. Cancer_ 5, 557–557 (2016). Google Scholar * Chia, N.-Y. & Tan, P. Molecular classification of gastric cancer.
_Ann. Oncol._ 27, 763–769 (2016). PubMed Google Scholar * Min, L. _et al._ Integrated analysis identifies molecular signatures and specific prognostic factors for different gastric cancer
subtypes. _Transl. Oncol._ 10, 99–107 (2017). PubMed Google Scholar * Battaglin, F., Naseem, M., Puccini, A. & Lenz, H.-J. Molecular biomarkers in gastro-esophageal cancer: Recent
developments, current trends and future directions. _Cancer Cell Int._ 18, 99–99 (2018). PubMed PubMed Central Google Scholar * Levine, M., Cattoglio, C. & Tjian, R. Looping back to
leap forward: Transcription enters a new era. _Cell_ 157, 13–25 (2014). CAS PubMed PubMed Central Google Scholar * Goodarzi, H., Elemento, O. & Tavazoie, S. Revealing global
regulatory perturbations across human cancers. _Mol. Cell_ 36, 900–911 (2009). CAS PubMed PubMed Central Google Scholar * Dolfini, D., Zambelli, F., Pavesi, G. & Mantovani, R. A
perspective of promoter architecture from the CCAAT box. _Cell Cycle_ 8, 4127–4137 (2009). CAS PubMed Google Scholar * Nardini, M. _et al._ Sequence-specific transcription factor NF-Y
displays histone-like DNA binding and H2B-like ubiquitination. _Cell_ 152, 132–143 (2013). CAS PubMed Google Scholar * Li, X. Y., Hooft van Huijsduijnen, R., Mantovani, R., Benoist, C.
& Mathis, D. Intron-exon organization of the NF-Y genes. Tissue-specific splicing modifies an activation domain. _J. Biol. Chem._ 267, 8984–8990 (1992). CAS PubMed Google Scholar *
Ceribelli, M., Benatti, P., Imbriano, C. & Mantovani, R. NF-YC complexity is generated by dual promoters and alternative splicing. _J. Biol. Chem._ 284, 34189–34200 (2009). CAS PubMed
PubMed Central Google Scholar * Gurtner, A., Manni, I. & Piaggio, G. NF-Y in cancer: Impact on cell transformation of a gene essential for proliferation. _Biochim. Biophys. Acta_ 1860,
604–616 (2017). CAS Google Scholar * Benatti, P. _et al._ NF-Y activates genes of metabolic pathways altered in cancer cells. _Oncotarget_ 7, 1633–1650 (2016). PubMed Google Scholar *
Mamat, S. _et al._ Transcriptional regulation of aldehyde dehydrogenase 1A1 gene by alternative spliced forms of nuclear factor Y in tumorigenic population of endometrial adenocarcinoma.
_Genes Cancer_ 2, 979–984 (2011). CAS PubMed PubMed Central Google Scholar * Cicchillitti, L. _et al._ Prognostic role of NF-YA splicing isoforms and Lamin A status in low grade
endometrial cancer. _Oncotarget_ 8, 7935–7945 (2017). PubMed Google Scholar * Yang, C., Zhao, X., Cui, N. & Liang, Y. Cadherins associate with distinct stem cell-related transcription
factors to coordinate the maintenance of stemness in triple-negative breast cancer. _Stem Cells Int._ 2017, 5091541–5091541 (2017). PubMed PubMed Central Google Scholar * Dolfini, D.,
Andrioletti, V. & Mantovani, R. Overexpression and alternative splicing of NF-YA in breast cancer. _Sci. Rep._ 9, 12955 (2019). ADS PubMed PubMed Central Google Scholar * Bezzecchi,
E. _et al._ NF-YA overexpression in lung cancer: LUAD. _Genes_ 11, 198 (2020). CAS PubMed Central Google Scholar * Bezzecchi, E., Ronzio, M., Dolfini, D. & Mantovani, R. NF-YA
Overexpression in lung cancer: LUSC. _Genes (Basel)_ 10, 937 (2019). CAS Google Scholar * Bezzecchi, E., Ronzio, M., Mantovani, R. & Dolfini, D. NF-Y overexpression in liver
hepatocellular carcinoma (HCC). _Int. J. Mol. Sci._ 21, 9157 (2020). CAS PubMed Central Google Scholar * Bezzecchi, E. _et al._ NF-Y Subunits Overexpression in HNSCC. Cancers (Basel).
13(12), 3019 (2021). Google Scholar * Cao, B. _et al._ Gene regulatory network construction identified NFYA as a diffuse subtype-specific prognostic factor in gastric cancer. _Int. J.
Oncol._ 53, 1857–1868 (2018). CAS PubMed PubMed Central Google Scholar * Bie, L.-Y. _et al._ Analysis of cyclin E co-expression genes reveals nuclear transcription factor Y subunit alpha
is an oncogene in gastric cancer. _Chronic Dis. Transl. Med._ 5, 44–52 (2018). PubMed PubMed Central Google Scholar * Alsina, M. _et al._ Cyclin E amplification/overexpression is
associated with poor prognosis in gastric cancer. _Ann. Oncol._ 26, 438–439 (2015). CAS PubMed Google Scholar * Ooi, A. _et al._ Gene amplification of CCNE1, CCND1, and CDK6 in gastric
cancers detected by multiplex ligation-dependent probe amplification and fluorescence in situ hybridization. _Hum. Pathol._ 61, 58–67 (2017). CAS PubMed Google Scholar * Lee, J. _et al._
Selective cytotoxicity of the NAMPT inhibitor FK866 toward gastric cancer cells with markers of the epithelial-mesenchymal transition, due to loss of NAPRT. _Gastroenterology_ 155,
799-814.e13 (2018). CAS PubMed Google Scholar * Gao, F. _et al._ DeepCC: A novel deep learning-based framework for cancer molecular subtype classification. _Oncogenesis_ 8, 44–44 (2019).
CAS PubMed PubMed Central Google Scholar * Zambelli, F., Pesole, G. & Pavesi, G. Pscan: finding over-represented transcription factor binding site motifs in sequences from
co-regulated or co-expressed genes. _Nucleic Acids Res._ 37, W247–W252 (2009). CAS PubMed PubMed Central Google Scholar * Nishijima, T. F. _et al._ Molecular and clinical
characterization of a claudin-low subtype of gastric cancer. _JCO Precis. Oncol._ https://doi.org/10.1200/PO.17.00047 (2017). Article PubMed Google Scholar * Oki, S. _et al._ ChIP-Atlas:
A data-mining suite powered by full integration of public ChIP-seq data. _EMBO Rep._ 19, e46255 (2018). PubMed PubMed Central Google Scholar * Oldfield, A. J. _et al._ NF-Y controls
fidelity of transcription initiation at gene promoters through maintenance of the nucleosome-depleted region. _Nat. Commun._ 10, 3072–3072 (2019). ADS PubMed PubMed Central Google Scholar
* Dolfini, D., Minuzzo, M., Pavesi, G. & Mantovani, R. The short isoform of NF-YA belongs to the embryonic stem cell transcription factor circuitry. _Stem Cells_ 30, 2450–2459 (2012).
CAS PubMed Google Scholar * Libetti, D. _et al._ The switch from NF-YAl to NF-YAs isoform impairs myotubes formation. _Cells_ 9, 789 (2020). CAS PubMed Central Google Scholar *
Cappabianca, L. _et al._ Discovery, characterization and potential roles of a novel NF-YAx splice variant in human neuroblastoma. _J. Exp. Clin. Cancer Res._
https://doi.org/10.1186/s13046-019-1481-8 (2019). Article PubMed PubMed Central Google Scholar * Cerami, E. _et al._ The cBio cancer genomics portal: An open platform for exploring
multidimensional cancer genomics data. _Cancer Discov._ 2, 401–404 (2012). PubMed Google Scholar * Gao, J. _et al._ Integrative analysis of complex cancer genomics and clinical profiles
using the cBioPortal. _Sci. Signal_ 6, 11 (2013). Google Scholar * Ghandi, M. _et al._ Next-generation characterization of the cancer cell line encyclopedia. _Nature_ 569, 503–508 (2019).
ADS CAS PubMed PubMed Central Google Scholar * Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. _Genome Biol._
15, 550–550 (2014). PubMed PubMed Central Google Scholar * Goldman, M. J. _et al._ Visualizing and interpreting cancer genomics data via the Xena platform. _Nat. Biotechnol._ 38, 675–678
(2020). CAS PubMed PubMed Central Google Scholar * Therneau, T. _A Package for Survival Analysis in R_, 95. Download references ACKNOWLEDGEMENTS We thank P. Gandellini and N. Gnesutta
for comments and critical reading of the manuscript. The authors acknowledge support from the University of Milan through the APC initiative. FUNDING This work was supported by Ministero
della Salute GR-2013-02355625 to DD. AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Dipartimento di Bioscienze, Università degli Studi di Milano, Via Celoria 26, 20133, Milan, Italy Alberto
Gallo, Mirko Ronzio, Eugenia Bezzecchi, Roberto Mantovani & Diletta Dolfini Authors * Alberto Gallo View author publications You can also search for this author inPubMed Google Scholar *
Mirko Ronzio View author publications You can also search for this author inPubMed Google Scholar * Eugenia Bezzecchi View author publications You can also search for this author inPubMed
Google Scholar * Roberto Mantovani View author publications You can also search for this author inPubMed Google Scholar * Diletta Dolfini View author publications You can also search for
this author inPubMed Google Scholar CONTRIBUTIONS D.D. designed the experiments. A.G., E.B. and M.R. performed and analyzed the experiments. R.M. and D.D. wrote the manuscript. CORRESPONDING
AUTHOR Correspondence to Diletta Dolfini. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PUBLISHER'S NOTE Springer Nature
remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. SUPPLEMENTARY INFORMATION SUPPLEMENTARY FIGURE S1. SUPPLEMENTARY FIGURE S2.
SUPPLEMENTARY FIGURE S3. SUPPLEMENTARY FIGURE S4. SUPPLEMENTARY FIGURE S5. SUPPLEMENTARY FIGURE S6. SUPPLEMENTARY FIGURE S7. SUPPLEMENTARY FIGURE S8. SUPPLEMENTARY FIGURE S9. SUPPLEMENTARY
FIGURE S10. SUPPLEMENTARY FIGURE S11. SUPPLEMENTARY TABLE S1. SUPPLEMENTARY TABLE S2. SUPPLEMENTARY TABLE S3. RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative
Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in
the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your
intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence,
visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Gallo, A., Ronzio, M., Bezzecchi, E. _et al._ NF-Y subunits overexpression in
gastric adenocarcinomas (STAD). _Sci Rep_ 11, 23764 (2021). https://doi.org/10.1038/s41598-021-03027-y Download citation * Received: 05 July 2021 * Accepted: 22 November 2021 * Published:
09 December 2021 * DOI: https://doi.org/10.1038/s41598-021-03027-y SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a
shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative