The coffee bean transcriptome explains the accumulation of the major bean components through ripening

The coffee bean transcriptome explains the accumulation of the major bean components through ripening

Play all audios:

Loading...

ABSTRACT The composition of the maturing coffee bean determines the processing performance and ultimate quality of the coffee produced from the bean. Analysis of differences in gene


expression during bean maturation may explain the basis of genetic and environmental variation in coffee quality. The transcriptome of the coffee bean was analyzed at three stages of


development, immature (green), intermediate (yellow) and mature (red). A total of more than 120 million 150 bp paired-end reads were collected by sequencing of transcripts of triplicate


samples at each developmental stage. A greater number of transcripts were expressed at the yellow stage. As the beans matured the types of highly expressed transcripts changed from


transcripts predominantly associated with galactomannan, triacylglycerol (TAG), TAG lipase, 11 S and 7S-like storage protein and Fasciclin-like arabinogalactan protein 17 (FLA17) in green


beans to transcripts related to FLA1 at the yellow stage and TAG storage lipase SDP1, and SDP1-like in red beans. This study provides a genomic resource that can be used to investigate the


impact of environment and genotype on the bean transcriptome and develop coffee varieties and production systems that are better adapted to deliver quality coffee despite climate variations.


SIMILAR CONTENT BEING VIEWED BY OTHERS THE BHLH TRANSCRIPTION FACTOR VFTT8 UNDERLIES _ZT2_, THE LOCUS DETERMINING ZERO TANNIN CONTENT IN FABA BEAN (_VICIA FABA_ L.) Article Open access 31


August 2020 TRANSCRIPTOME ANALYSIS OF GENES INVOLVED IN STARCH BIOSYNTHESIS IN DEVELOPING CHINESE CHESTNUT (_CASTANEA MOLLISSIMA_ BLUME) SEED KERNELS Article Open access 11 February 2021


GENOME-WIDE IDENTIFICATION AND EXPRESSION ANALYSIS OF _ACS_ AND _ACO_ GENE FAMILY IN _ZIZIPHUS JUJUBA_ MILL DURING FRUIT RIPENING Article Open access 24 May 2025 INTRODUCTION Arabica coffee


(around 70% of world coffee consumed), is one of the most important and valuable commodities traded internationally1. The coffee bean is a tropical dicotyledonous albuminous seed, with a


copious endosperm and a tiny embryo. Generally, seeds have evolved intricate strategies to reserve nutrients and take advantage of the pericarp and pulp to encourage dispersal by animals2.


The composition of the coffee bean determines the quality of the coffee. An improved understanding of the molecular basis of determination of bean composition is required to support enhanced


coffee production and genetic improvement especially in response to climate change. Numerous biochemical studies have been conducted on the genetic and environment factors influencing the


accumulation of key components of the bean3,4. Genetic control of these processes can be investigated by the study of changes in gene expression through bean ripening, including transcripts


regulating bean filling as well as in response to stress2,5,6. Early studies applied RT-PCR (coffee beans) or microarrays to mainly coffee leaves or seedlings5,7,8. More recently, different


tissues of Arabica coffee including flowers, leaves and fruit pericarp have been subjected to transcriptome analysis. A genome wide study recently reported identification of key genes


regulating the lipid and diterpene contents of Arabica coffee bean4. However, analysis of the genetics of other essential components of the bean contributing to coffee quality was not


included in these studies9. Importantly, the absence of a reference genome or transcriptome further limited previous studies. Additionally, gaps remain in knowledge of gene interactions and


how expression varies at the global scale at specific ripening stages and finally determines the quality and regeneration capability of the bean. In this study, different development stages,


green, yellow and red coffee beans (other than exocarp or mesocarp) were collected and the RNA was sequenced for transcriptome analysis. A recent long read sequencing full-length (LRS)


coffee bean transcriptome was used as a reference to facilitate transcriptome analysis in the ripening coffee bean10. The aim was to understand the progress of accumulation of the key


component in the coffee bean through ripening and the molecular basis of genetic regulation of coffee quality and deliver a platform for the study of genotypic and environmental influences


on the coffee transcriptome and coffee quality. RESULTS Nine coffee bean transcriptome datasets were generated in this study, including green, yellow and red stages, all in triplicate (Fig. 


1a). As pericarp (including exocarp and mesocarp) was removed before RNA extraction, the coffee bean in this study refers to endocarp, perisperm (seed coat), endosperm and embryo


(encapsulated by endosperm) (Fig. 1b). RAW READ PROCESSING A total of 120,784,786 reads were generated from the nine coffee bean transcriptome datasets (Supporting Information Table S1).


This number was slightly reduced (117,539,360) after trimming. For individual samples, the number of trimmed reads were between 4,602,943 and 23,905,608. A range of 48.37% and 64.05% of


reads were mapped to the long-read sequencing coffee bean transcriptome (coffee LRS transcriptome). The yellow stage has the highest number of genes expressed (43,552 transcripts, TPM > 


1), compared to red and green stages (43,257 and 36,388 transcripts) (Supporting Information Fig. S1). Functional annotation of the expressed transcripts revealed that GO terms associated


with the metabolic process, catalytic activity and cell part, ranked as the most abundant in “biological process”, “molecular function” and “cellular components” respectively (Supporting


Information 1 Fig. S2). The lower number of transcript expression was observed at green stage. The top three pathways with the most transcripts expressed were purine pathway, thiamine


metabolism, etc (Supporting Information 1 Table S2). Starch and sucrose metabolism and phenylpropanoid biosynthesis were the sixth and ninth most enriched pathway. The top 30 pathways with


the highest number of transcripts expressed were from 11 parent-pathway groups. The dominant parent-pathway group was carbohydrate metabolism, enriched with the highest number of pathways


(seven pathways), including starch and sucrose metabolism and galactose metabolism. This is followed by the amino acid and lipid metabolism parent-group, which relating to five and three


metabolism pathways, individually. HIGHLY EXPRESSED TRANSCRIPTS THROUGH COFFEE BEAN RIPENING OVERVIEW Of all the HEGs, more unique transcripts were expressed over TPM500 at the green stage


(46 transcripts) other than yellow and red stages (27 and 8) (Fig. 1c). A total of 68 common transcripts were expressed at all the three development stages. A significantly higher number of


common transcripts (73) were expressed at both the yellow and red stages, while the number of common transcripts (seven) expressed at the green and the red stages were the same as that of


the first two stages. INTENSIVE LIPIDS FORMATION AT THE GREEN STAGE The top ten HEGs in green, yellow and red coffee beans were extracted individually for further analysis (Table 1). Other


than three unnamed protein products, a ncRNA was one the most abundant transcript at all stages, peaking at the green stage and decreasing at the last two stages. Two non-specific lipid


transfer (LTP) A-like transcripts, showed extremely high expression in green coffee beans (33,227 and 18,563) but decreased dramatically in yellow stage (5,488 and 1,446) until maturity


(1,260, 378 and 1,165). This suggested a high level of lipids may accumulate and transported from this stage. A similar drop was also identified in alpha-galactosidase 2, from 10,120 in the


green stage to more than ten times lower in yellow stage (830) and red stage (351). Fewer changes were characterized in transcripts encoding 11S storage globulin (bean storage protein) and


metallothionein type 3 (important for bean development), presenting maximum expression in green coffee beans (9,994 and 7,709 respectively), while gradually dropped until red coffee beans


(4,059 and 6,829 respectively). Transcripts encoding for kirola-like protein and dehydrin DH1a (response to desiccation), peaked at the yellow stage and decreased slightly in red stage. MORE


CHANGES IN THE COMPARISON OF THE RED VS GREEN STAGE Three comparisons between developmental stages were conducted in this study of the ripening Arabica coffee bean transcriptome, yellow vs


green (Y vs G), red vs green (R vs G) and red vs yellow (R vs Y) included. The highest number of differentially expressed transcripts (DEGs) were shown in the R vs G comparison (2,262),


including the most downregulated DEGs (1,257) (Fig. 1d). Only 130 DEGs were seen in R vs Y, including 108 downregulated genes and 22 upregulated genes. A total of 2,058 DEGs were


characterized in Y vs G, with the most up-regulated DEGs (1,070). The majority of DEGs were distributed within the range of two to ten times fold change, while only a few varied less than


two times fold change. The top ten up and down-regulated DEGs from individual comparisons were extracted in Table 2 to understand the most significant changes in coffee bean ripening. These


transcripts included pathogen relevant, cell function and key chemical biosynthesis transcripts. In the comparison of Y vs G, two class III chitinase (chi2) transcripts were highly expressed


in green coffee beans and were absent (TPM < 1) in yellow stage. Similarly, a dramatic decrease was shown in a lipid degradation transcript (controls GDSL esterase lipase APG-like


protein), an O-fucosyltransferase family transcript (relating to mannan biosynthesis and galactomannan accumulation), an amino acid transport transcript (_WAT1-related protein-like_), and a


cellular function transcript (regulating ubiquitin 40S protein S27a). The ubiquitin 40S protein S27a related transcript also demonstrated a significant change in the comparison of R vs G.


Additionally, a significant decrease from green to red coffee beans was characterized in a cell wall vascular inhibitor of fructosidase 1-like (_vacuole invertase inhibitor_). In the


comparison of R vs Y, E3 ubiquitin- ligase SHPRH isoform X1, Luminal-binding 5-like, cinnamoyl- reductase 2-like etc. degraded in red stage. Numerous MYB transcription factors (_MYB90)_ were


identified as upregulated DEGs in top ten DEGs of all comparisons, increasing from green to maturity (maximum expression was TPM: 686). A trans-resveratrol di-O-methyltransferase-like


transcript, catalyzing pterostilbene (antifungal and pharmacological function) biosynthesis and a transposon elements variant gene (_retrotransposon Ty1-copia subclass_) were upregulated in


both comparison of Y vs G and R vs G as they increased sharply from green to red stage. From the green stage, cell wall degradation DEGs, such as probable rhamnogalacturonate lyase B


(probably pectin degradation) and probable xyloglucan endotransglucosylase hydrolase B (_XTHB_, cleaving primary cell wall xyloglucan polymers and contributing to the construction of the


growing tissues) were upregulated at the yellow stage. Meanwhile, transcript expression of the pectinesterase inhibitor 11 transcript, maintaining the integrity of cell walls, was increased


and peaked at the yellow stage. Steady growth was detected in beta-glucosidase 44-like, lipid transfer and pathogenesis-related 1 transcripts. Moreover, the comparison of R vs Y included


cell function transcripts, like _Luminal-binding_ 5_-like_, _endonuclease V_ and three transcripts probably related to phenolic compounds, _cinnamoyl- reductase 2-like_. STORAGE COMPOUNDS


ACCUMULATED THROUGH BEAN MATURITY An association study of the key storage component related DEGs was conducted with lipid, cell wall component, storage protein and phenylpropanoid related


DEGs in ripening coffee bean (Fig. 2). Major changes were observed in the comparison of Y vs G and R vs G. More DEGs were assigned in the comparison of R vs G, while only a few shown in a


comparison of the last two stages (R vs Y). MAJOR LIPIDS ACCUMULATION AT THE GREEN STAGE The main lipid-related DEGs went through a decrease in expression compared to levels in green coffee


beans, especially fatty acid desaturation (omega-6-desaturase, critical for biosynthesis of linoleic acid), TAG synthase (oil body oleosin family proteins) and lipase (for TAG degradation)


(Supporting Dataset). One omega-6-desaturase declined dramatically (log2 ratio: −6.10) at the yellow stage in comparison with the green stage (Supporting Dataset). Similarly, non-specific


LTP protein decreased sharply (maximum log2 ratio: −8.16 in R vs G) compared to the green stage, supporting the idea of lipids predominantly accumulating in green beans. Only a limited


number of DEGs remained in the comparison of the last two stages, and they were all down-regulated. This includes three non-specific lipid transfer proteins (phospholipid transfer protein)


and four GDSL-like Lipases. Therefore, TAG is likely to be synthesized in green beans, which accompanies its degradation by lipase. Linoleic acid was also apparently formed in green bean.


THE FLOW OF CELL WALL COMPONENTS Most down-regulated DEGs in the comparison of Y vs G were found to relate to cell wall precursor synthase, hemicellulose synthase (only downregulation),


cellulose synthase and arabinogalactan protein (AGPs) DEGs (Supporting Dataset). Only one UDP-XYL synthase (_USX_, UDP-D-xylose biosynthesis) transcript 2-like was upregulated in Y vs G and


R vs G in cell wall precursor DEGs. The others were downregulated either in Y vs G or R vs G. Mannose-1-phosphate guanylyltransferase (_CYT_, involved in GDP-mannose biosynthesis) was


downregulated in the last two stages. Downregulations include nucleotide-rhamnose synthase/epimerase-reductase, _USX6_ and _USX6-like_, UDP-glucose 6-dehydrogenase 5 (provides nucleotide


sugars for cell-wall polymers). Decreased cellulose synthase DEGs are transcript 1 and 2 (_CESA1, 2_), COBRA-like protein 1, _COBRA-like_ and CESA-like transcript (mannan synthase 1-_MS1_,


which peaks in the green stage), while upregulation was seen in two CESA-like transcripts with upregulation observed in _MS2_ (peaking at the yellow stage). AGP related DEGs were


downregulated in _FLA17_ and upregulated as was seen in _FLA1_ (top expression in yellow stage). Hemicellulose related DEG, regulating hydroxyproline O-galactosyltransferase 6 (translocates


galactose from UDP-galactose to the residues of AGPs), declined in the yellow and red stages compared to the green stage. However, the only increases in transcript expression were shown in


the comparison of Y vs G and R vs G in _cellulase_ (_CEL3_ and 5) and _beta 1,4-glucanase_ (such as transcript 6), with a peak expression in the yellow stage. Leucine-rich protein (LRR)


associated DEGs were also seen to rise in Y vs G and R vs G. A large number of DEGs were found in pectin degradation (pectin esterases, lyses and pectinases), which were mainly upregulated


in Y vs G and R vs G. A major rise of transcript expression was also observed in other cell wall modification DEGs including numerous expansin (and expansin-like) transcripts (isoform 6, 8,


10, 11 and 15) and XTH transcripts (isoform B, 2, 6, 12, 23, 30). Expression of _BGAL-like_, probable beta-D-xylosidase 7 and _mannan endo-1,4-beta-mannosidase 5-like_ (one transcript


variant was upregulated) DEGs declined at the yellow stage. Similar patterns continued in the second comparison, R vs G. A higher number of DEGs were associated with cell wall precursor


synthases, cellulose synthases, cell wall modifications and mannan degradation. However, in the last comparison, R vs Y, all DEGs were downregulated. Altogether, cellular precursors,


hemicellulose, _FLA17_, _CESA1_ and _CESA2_ and _MS1_ increased relative to the highest expression at the green stage. Peak expression shifted to cellular degradation (_CEL3_ and _5_) and


_MS2_ at the yellow stage and pectin degradation and LRR in the last two stages. MAJOR STORAGE PROTEIN ACCUMULATION AT THE GREEN STAGE In terms of storage protein related DEGs, both up and


down regulations were identified in both Y vs G and R vs G comparisons, while no DEGs were assigned in the comparison of R vs Y (Supporting Dataset). Altogether, significant decreases were


identified in eight storage protein related DEGs in the yellow and red stage compared to green coffee beans. They were four 11 s globulin (_11S_), three 7S-like globulin transcripts


(_7S-like_) and glutelin type A2 (_GLUA2_) DEGs. In addition to these eight DEGs, one more _11S_ and a patatin (_Pat_) 2 also decreased in the red stage in contrast to green beans. In


contrast, upregulation was observed in three _Pat6_, _GLUA3_, TAG lipase _SDP1_, _SDP1-like_ (storage lipids degradation in bean germination) and _SDP1-like_ DEGs in the comparisons of Y vs


G and R vs G (larger fold change at the red stage). Transcripts regulating these proteins are likely to peak at maturity. One more _GLU_, type B5 (_GLUB5_), was also more highly expressed at


the red stage rather than in green coffee beans. Hence, _11S_, _7S-Like_ and _GLUA2_ storage probably occurred since the green stage, while _SDP1_, _SDP1-like_, GLUA3, _Pat6_, accumulated


at the end. PHENYLPROPANOIDS Most CGAs related transcripts peaked at the yellow stage, with a significant increase since the green stage and decreased dramatically in the red stage (FDR


corrected p-value < 0.001). Exceptions were phenylalanine ammonia-lyase (_PAL_) 4 and 4-coumarate CoA ligase (_4CL_) 7, which increased from green coffee beans till maturity. This


suggested that CGAs were likely to be mainly formed from the yellow stage and _4CL7_ probably contributes to further accumulation of CGAs or lignin in later stages. CO-EXPRESSION NETWORK OF


THE BEAN STORAGE GENES Four groups of transcripts were targeted for co-expression network analysis, including lipid, cell wall, storage protein and phenylpropanoids DEGs (Fig. 3). The aim


was to investigate the connections among key bean storage/quality attributed transcripts. A great number of the cell wall and phenylpropanoids DEGs were filtered out in the co-expression


module (weight ≥0.9), followed by lipid, other metabolites, and storage protein related transcripts. Different transcripts of alpha-expansin transcripts (_EXPA_) played a key role in this


network module, especially _EXPA6_3_ (_EXPA6_3_ transcript variant 3). _EXPA6_3_ was co-expressed with 22 transcripts from all five categories, mainly cell wall-related. The top five


connected transcripts to _EXPA6_3_ were _EXPA_ 8_2 (0.9709), probable xyloglucan endotransglucosylase hydrolase 30 (_XTH30_2_, 0.9681), probable pectin lyase 8 (_PL8_1_, 0.9402),


polygalacturonase-like (_PG_2_), _EXPA11-like_ (0.9399), etc. Another core transcript connected with all five transcript groups was _PG_2_. _PG_2_, identified as being connected to


cinnamoyl- reductase 2 (_CCR2_), beta-xylosidase/alpha-L-arabinofuranosidase 2-like (_Xyl2-like_2_) and shikimate O-hydroxycinnamoyltransferase transcript a (_HCTa_). This suggested _PG_2_


is essential in cell wall modification and probably interacts with phenylpropanoid and phenolic biosynthesis. A lipid DEG, _CP5_3_, functions as a membrane related protein and was one of the


pivotal transcripts in the co-expression network. The top five co-expression were from different groups, comprise of _CCR2_, _EXPA8_1_, _PG_2_, _11S_2_, etc. In addition, 3-ketoacyl-CoA


synthase 11-like (_KCS11-like_, biosynthesis of very long chain fatty acids), unanimously expressed with transcripts from all categories, was another central transcript from the module. The


co-expressed transcripts were _CP5_3_, non-specific phospholipase C2, cytochrome P450 71D8, EXPA8_1, _CCR2_, etc. The relationship of these diverse transcripts suggested that _CP5_3_ and


_KCS11-like_ are essential in bean ripening and nutrient reservation. _CCR2_ (biosynthesis of phenolics), was among the most important transcripts from the network. Other than the above


connections, _CCR2_ was also associated with diverse categories with the highest impact on _EXPA6_4_, _Xyl2-like_2_, and _EXPA8_1_. Hence, it is likely _CCR2_ is pivotal to cell wall


expansion and modification. The next important phenylpropanoids related DEGs was _HCTa_, correlating to _EXPA8_1_, _Pat6_3_, _MS2_1_, and _7S-like_1_, indicating a key role in cell wall


expansion and storage protein biosynthesis. Very few storage protein related transcripts were distributed in this co-expression network module and they were co-expressed with a lower number


of transcripts. Among them, _SDP1_ and _11S_ were relatively important. _SDP1_ expression was also found to be concurrent with _SKU5_ similar 5 (_sks5_1_), _XTH30_2_, _CCR2_ and _HCTa_,


while 11S correlated with phosphoinositide phosphatase SAC3 (_SAC3_1_), _CCR2_, _SDP1-like_, and _KCS11-like_. DISCUSSION As a storage tissue, nutrients are formed and stored through


complicated pathways in beans to support the embryo development and reproduction of the plant. Arabica coffee beans are composed of cell wall polysaccharides (CWP, 50% of the dry mass),


lipids (13–17%), proteins (11–15%), sucrose (7–11%), and CGAs (5–8%)3,4,11. The major CWP in the young coffee bean, comprise arabinogalactan (~50%), cellulose, pectic polymers (~20%) and


galactomannans (10%)5,11,12. However, at maturity, this has changed to arabinogalactan-proteins (~30%, AGPs), cellulose (15%), pectic polymers (~5%) and galactomannan (50%)12,13. Lipids were


reported to be mainly composed of triacylglycerol (TAG, 70–80% of lipids), while the major storage protein was identified as 11S globulin (45%)5,11,13. This study reveals the peak


expression of candidate transcripts related to major storage compounds was in green beans at the initiation of the storage phase, for example, galactomanan, FLA17, TAG, linoleic acid, 11S,


7S-like and glutelin A2. The main accumulation of FLA1 started at the yellow stage. Transcripts encoding SDP1, SDP1-like, Glueteline A3, Pat2, storage proteins reached a maximum expression


at the red stage. In addition, pectin was mainly degraded at the yellow and red stages. The lower number but wider range of HEGs from green coffee beans was probably associated with the


initiation of the storage phase, where a large number of key components were formed predominantly at this stage. Galactomannan is a typical storage compound for legumes (>30% of the bean


dry weight), coffee, and palms14. Galactomannan can react with proteins through Millard reactions to produce volatile components, contributing to the coffee flavour15. The high viscosity of


the remaining galactomannan accounts for the texture, namely “body”, of the coffee beverage16. For the plant itself, the low solubility and high viscosity galactomannans are a stable storage


reserve that provides strong cells to prevent osmotic stress, microbial attack or mechanical damage14,17. In seed germination, galactomannans were degraded by α-galactosidase (encoded by


_gal_), endo-β-mannanase and β-mannosidase to provide carbon and energy for embryo development14,18. Golublins such as 11S and 7S are major storage proteins in legumes19,20. Glutelins and


prolamins are the predominant proteins in cereals, such as rice (>60%)21. 11S globulin was found to be the major storage protein in coffee11. In addition, this study suggested the


presence of other potential storage proteins of 7S-like, SDP1, SDP1-like, glutelinA2, A3, and Patatin 6. In contrast, the major storage proteins of the exalbuminous seed of Arabidopsis,


mainly 2S and 12S protein (one-third of the total protein) were mainly formed late in bean development22. Decreased accumulation through ripening was also shown in the TAG and linoleic acid


(fatty acids desaturation) storage in coffee and Arabidopsis seeds (stored at the late stage)23. Different storage pattern may result from a different structure (copious endosperm in the


coffee bean) and function of bean tissues. In Arabidopsis, TAG storage lipase SDP1 and SDP1-like were found to catalyze more than 90% of the TAG in seed germination24. The accumulation of


these two lipase at the end of maturity is likely to avoid unnecessary degradation of storage reserves, TAG. The higher number of transcripts expressed at the yellow stage together with more


DEGs compared to the green stage indicated a great shift at the yellow stage. A slight decrease in the number of transcripts expressed at the red stage as well as the small number of DEGs


(mostly down-regulated) compared to the yellow stage, demonstrates bean maturity and fewer changes in coffee beans when the pericarp changed from yellow to red. This also indicates the end


of bean maturity. Fewer components accumulated in the last two stages suggesting a possible shift to modification, such as cell wall degradation (pectin and cellulose degradation). Some or


all of the coffee bean arabinogalactans were combined with proteins and present as AGPs25. This structure is typical in plant cell walls and has an important role in intercellular signaling


and wound sealing (as a glue)26. Different transcripts of AGPs accumulated (_FLA17_ at the early stage with _FLA1_ in later stages) indicating various function through ripening in various


tissues (endosperm, perisperm and embryo). Pectinesterase, lyase, and polygalacturonase catalyze pectin degradation, modifying cell walls through demethylesterification and depolymerization


of pectin27,28,29,30. The decrease in pectin in the ripening coffee bean reaches a peak after the yellow coffee bean stage. This parallels the transcripts expressed in fleshy fruit, such as


transcripts encoding polygalacturonase in Arabidopsis, which are highly expressed in ripe fruits (pericarp and mesocarp) as they soften31. In addition, dilution by the accumulation of other


key compounds during bean ripening is probably another reason for the decline of pectin content. Importantly, this study suggested expansin transcripts (especially _EXPA6_3_) were essential


in bean ripening and storage. The higher number of the cell wall and phenylpropanoids related transcripts filtered with the storage DEGs module reveals the close relationship of transcripts


from these two groups. Multiple alpha-expansin transcripts from the core co-expression network connected to transcripts from different transcript categories indicating their diverse


functions in bean storage. Expansins loosen cell walls during plant growth and allow responses to the plant growth hormone, auxin32. There were four expansin subfamily members, _EXPA_


(acid-induced), _EXPB_ (beta-expansin), _EXPA-like_ and _EXPB-like_33. It was proposed that wall loosening by expansins may be involved with a breakdown of non-covalent binding of cellulose


microfibrils. This results in turgor driven polymer movement that may be inhibited by some polysaccharide-binding proteins33,34. However, the exact mechanism of expansin action remains


unknown. In this study, expansin related DEGs were _EXPA_ or _EXPA-like_. Consistently, core transcripts from the co-expression network, _EXPA6_3_, were highly stimulated with probable


xyloglucan endotransglucosylase 30 (degradation of xyloglucan polymer). Co-expression was also shown with pectin and xylan degradation transcripts (_PL8_, _PG_2_ and _Xyl2-like_), suggesting


pectin and hemicellulose are potentially targeted by expansins or involved in the cell wall expansion phase. Other than cell wall-related transcripts, _CP5_, _SDP1_ and _HCTa_, were


concurrent with _EXPA6_3_ suggesting a diverse interaction of _EXPA6_3_. When cell walls expand, they became vulnerable. This may result in the co-expression of _HCTa_, SDP1 and _CP5_ (lipid


transfer). Plants cannot move; therefore, they have evolved numerous strategies to survive, mature and regeneration. As the bean grows to become a nutrition factory, other species like


predators (virus or insects) are also interested in consuming the storage compounds generated. However, plants have evolved multiple strategies to protect beans from damage and recover from


damage. For example, galactomannan and the lignified endocarp provide a physical barrier to prevent bean from being digested by predators but dispersed after consumption of the pulp35.


Different HEGs and DEGs characterize the complicated strategy evolved in the dicotyledonous albuminous coffee beans. Stress-related transcripts are more highly expressed in the green coffee


beans, which have the protection of a thick and firm lignified pericarp. However, to protect against pathogens such as fungi, specific transcripts were expressed in green coffee beans. This


was supported by the peak expression of chi2, galactomannan at this stage. The gradually hardening of endosperm then takes over the protection of the embryo and endosperm. Assorted


transcripts were expressed at the yellow stage in coffee beans. As the coffee pericarp became soft and its color changed to be distinguished at the yellow stage. The brighter color (red) of


the coffee pericarp attracts animals to consume the pulp but to disperse intact beans it requires more stress-responsive transcripts, which were evidenced in this research. This coincidence


with digressive metabolite biosynthesis upon maturity and fewer growth transcripts. The beta-glucosidase 44-like transcript is a typical example of a plant stress resistance gene, expressed


at maximum level in the ripe coffee beans. It was highly expressed in the Arabica coffee bean to activate chemical defense compounds once the cell is being attacked36,37. Another


representative case is _actin-7_, highly expressed in red bean, related to growth and required for callus formation and response to wounding38. In conclusion, this study provides insights


into the sequence of accumulation of the key components in the ripening of the bean (Fig. 4). Importantly, the co-expression network illustrated the importance in bean storage and ripening


of expansin A transcripts which have a wide interaction with other key components DEGs. This information will facilitate the genetic control of these components in coffee. This ripening bean


transcriptome will also provide a platform to determine the genotype and environment influences on coffee quality. Targeted analysis is now possible to characterize gene families of


interest (e.g. transcriptional factors). Phylotranscriptomic analysis is another approach now enabled for the study of evolutionary diversity features. METHODS RNA SAMPLE AND CDNA LIBRARY


PREPARATION Coffea arabica cv. K7 cherries of different ripening stages (green, yellow and red) were collected from the upper canopy only as described previously10. Total RNA isolated from


nine samples (three development stages in triplicates) was processed individually according to Furtado _et al_.39. The integrity of total RNA was accessed with an Agilent RNA 6000 nano kit


and chips through a Bioanalyzer 2100 (Agilent Technologies, California, USA). Thereafter, a standard 18 x Truseq total RNA library preparation was conducted with the use of an additional


Ribo-Zero kit. Samples were subsequently sequenced on an Illumina HiSeq4000 platform (2 × 150 bp paired-end reads). READS MINING AND RNA-SEQ ANALYSIS Raw reads were mainly processed with CLC


Genomic Workbench 10.0.1 (CLC Bio, Denmark) as following. (1) Adapters and indexes were trimmed. (2) Reads failed matching the PHRED score (<0.01) and length (≥40 bp) were removed. (3)


RNA-Seq analysis (read similarity 0.9, length similarity 0.8) was conducted with the processed reads. A recent published long-read sequencing coffee bean transcriptome was used as a


reference with Transcripts Per Kilobase Million (TPM) as the expression parameter3. Outlier expression values, classified with coefficient variation and standard deviation, were not


considered in this study. STATISTICAL ANALYSIS Transcripts expressed at each development stage were filtered with TPM (>1). Functional annotations were analyzed with BLAST2GO for GO terms


and KEGG pathway distribution40,41. Venn diagrams were built through an online tool42. Highly expressed transcripts were filtered with TPM (>500). Differential gene expression tool for


RNA-seq (CLC) was used for significant through ripening stages. DEGs were filtered with FDR p-value correction (<0.01) and maximum group means (TPM ≥ 10). Key storage component


association with DEGs was constructed through Mercator and Mapman 3.6.0RC143,44. Co-expression network was constructed with Mapman annotated storage DEGs and candidate genes from the


targeted analysis. Gene expression of these transcripts were log2(x) transformed before analysis through WGCNA build-in Web MEV package (cutoff_0.9) and Cytoscape 3.5.1.45,46 (Note: 0.0001


was assigned to transcripts with an expression value of 0 before log2 transformation). DATA AVAILABILITY The RNA-sequencing trimmed data used in this manuscript has been submitted to EMBL


database under accession number: PRJEB24137. Please note that the two subsets of the paired-end reads were trimmed together in the data mining process. Two sub-files were generated after


trimming, paired reads and orphans. These two sub-files were combined, compressed and submitted to EMBL as “one Fastq file (Single)”. REFERENCES * Fridell, G. Coffee. (John Wiley & Sons,


2014). * Giovannoni, J., Nguyen, C., Ampofo, B., Zhong, S. & Fei, Z. The Epigenome and Transcriptional Dynamics of Fruit Ripening. _Annual Review of Plant Biology_ 68, 61–84 (2017).


Article  PubMed  CAS  Google Scholar  * Cheng, B., Furtado, A., Smyth, H. E. & Henry, R. J. Influence of genotype and environment on coffee quality. _Trends in Food Science &


Technology_ 57, 20–30 (2016). Article  CAS  Google Scholar  * Sant’Ana, G. C. _et al_. Genome-wide association study reveals candidate genes influencing lipids and diterpenes contents in


Coffea arabica L. _Scientific reports_ 8, 465 (2018). Article  PubMed  PubMed Central  ADS  CAS  Google Scholar  * Joët, T. _et al_. Metabolic pathways in tropical dicotyledonous albuminous


seeds: Coffea arabica as a case study. _New Phytologist_ 182, 146–162 (2009). Article  PubMed  CAS  Google Scholar  * Li, L. _et al_. The Association of Hormone Signaling Genes,


Transcription, and Changes in Shoot Anatomy during Moso Bamboo Growth. _Plant Biotechnology Journal_ (2017). * Combes, M. C., Dereeper, A., Severac, D., Bertrand, B. & Lashermes, P.


Contribution of subgenomes to the transcriptome and their intertwined regulation in the allopolyploid Coffea arabica grown at contrasted temperatures. _New phytologist_ 200, 251–260 (2013).


Article  PubMed  CAS  Google Scholar  * Yuyama, P. M. _et al_. Transcriptome analysis in Coffea eugenioides, an Arabica coffee ancestor, reveals differentially expressed genes in leaves and


fruits. _Molecular Genetics and Genomics_ 291, 323–336 (2016). Article  PubMed  CAS  Google Scholar  * Ivamoto, S. T. _et al_. Transcriptome Analysis of Leaves, Flowers and Fruits Perisperm


of Coffea arabica L. Reveals the Differential Expression of Genes Involved in Raffinose Biosynthesis. _PloS one_ 12, e0169595 (2017). Article  PubMed  PubMed Central  Google Scholar  *


Cheng, B., Furtado, A. & Henry, R. J. Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts. _Giga Science_ 6, 1–13 (2017). Article 


PubMed  Google Scholar  * De Castro, R. D. & Marraccini, P. Cytology, biochemistry and molecular changes during coffee fruit development. _Brazilian Journal of Plant Physiology_ 18,


175–199 (2006). Article  Google Scholar  * Redgwell, R. & Fischer, M. Coffee carbohydrates. _Brazilian Journal of Plant Physiology_ 18, 165–174 (2006). Article  CAS  Google Scholar  *


Marraccini, P. _et al_. Biochemical and molecular characterization of α-D-galactosidase from coffee beans. _Plant Physiology and Biochemistry_ 43, 909–920 (2005). Article  PubMed  CAS 


Google Scholar  * Buckeridge, M. S. Seed cell wall storage polysaccharides: models to understand cell wall biosynthesis and degradation. _Plant Physiology_ 154, 1017–1023 (2010). Article 


PubMed  PubMed Central  CAS  Google Scholar  * Nunes, F. M., Reis, A., Domingues, M. R. M. & Coimbra, M. A. Characterization of galactomannan derivatives in roasted coffee beverages.


_Journal of agricultural and food chemistry_ 54, 3428–3439 (2006). Article  PubMed  CAS  Google Scholar  * Viani, R. Espresso coffee: the science of quality. (Elsevier, 2004). * Morant, A.


V. _et al_. β-Glucosidases as detonators of plant chemical defense. _Phytochemistry_ 69, 1795–1813 (2008). Article  PubMed  CAS  Google Scholar  * Buckeridge, M. S. & Dietrich, S. M.


Mobilisation of the raffinose family oligosaccharides and galactomannan in germinating seeds of Sesbania marginata Benth. (Leguminosae-Faboideae). _Plant Science_ 117, 33–43 (1996). Article


  CAS  Google Scholar  * Orruno, E. & Morgan, M. Purification and characterisation of the 7S globulin storage protein from sesame (Sesamum indicum L.). _Food Chemistry_ 100, 926–934


(2007). Article  CAS  Google Scholar  * Rogers, W. J. _et al_. Biochemical and molecular characterization and expression of the 11S-type storage protein from Coffea arabica endosperm. _Plant


Physiology and Biochemistry_ 37, 261–272 (1999). Article  CAS  Google Scholar  * Kusaba, M. _et al_. Low glutelincontent1: a dominant mutation that suppresses the glutelin multigene family


via RNA silencing in rice. _The Plant Cell_ 15, 1455–1467 (2003). Article  PubMed  PubMed Central  CAS  Google Scholar  * Kroj, T., Savino, G., Valon, C., Giraudat, J. & Parcy, F.


Regulation of storage protein gene expression in Arabidopsis. _Development_ 130, 6065–6073 (2003). Article  PubMed  CAS  Google Scholar  * Santos‐Mendoza, M. _et al_. Deciphering gene


regulatory networks that control seed development and maturation in Arabidopsis. _The Plant Journal_ 54, 608–620 (2008). Article  PubMed  CAS  Google Scholar  * Eastmond, P. J.


SUGAR-DEPENDENT1 encodes a patatin domain triacylglycerol lipase that initiates storage oil breakdown in germinating Arabidopsis seeds. _The Plant Cell Online_ 18, 665–675 (2006). Article 


CAS  Google Scholar  * Redgwell, R. J., Curti, D., Fischer, M., Nicolas, P. & Fay, L. B. Coffee bean arabinogalactans: acidic polymers covalently linked to protein. _Carbohydrate


Research_ 337, 239–253 (2002). Article  PubMed  CAS  Google Scholar  * Nothnagel, E. A., Bacic, A. & Clarke, A. E. Cell and developmental biology of arabinogalactan-proteins. (Springer


Science & Business Media, 2012). * Jiang, L. _et al_. VANGUARD1 encodes a pectin methylesterase that enhances pollen tube growth in the Arabidopsis style and transmitting tract. _The


Plant Cell_ 17, 584–596 (2005). Article  PubMed  PubMed Central  CAS  Google Scholar  * Micheli, F. Pectin methylesterases: cell wall enzymes with important roles in plant physiology.


_Trends in plant science_ 6, 414–419 (2001). Article  PubMed  CAS  Google Scholar  * Maríd‐Rodríds, M. C., Orchard, J. & Seymour, G. B. Pectate lyases, cell wall degradation and fruit


softening. _Journal of experimental botany_ 53, 2115–2119 (2002). Article  Google Scholar  * González-Carranza, Z. H., Elliott, K. A. & Roberts, J. A. Expression of polygalacturonases


and evidence to support their role during cell separation processes in Arabidopsis thaliana. _Journal of experimental botany_ 58, 3719–3730 (2007). Article  PubMed  CAS  Google Scholar  *


Rose, J. K., Catalá, C., Gonzalez-Carranza, Z. H. & Roberts, J. A. Cell wall disassembly. _Annual Plant Reviews_ 8, 264–324 (2003). CAS  Google Scholar  * Ding, X. _et al_. Activation of


the indole-3-acetic acid–amido synthetase GH3-8 suppresses expansin expression and promotes salicylate-and jasmonate-independent basal immunity in rice. _The Plant Cell_ 20, 228–240 (2008).


Article  PubMed  PubMed Central  CAS  Google Scholar  * Cosgrove, D. J. Plant expansins: diversity and interactions with plant cell walls. _Current opinion in plant biology_ 25, 162–172


(2015). Article  PubMed  PubMed Central  CAS  Google Scholar  * Cosgrove, D. J. Loosening of plant cell walls by expansins. _Nature_ 407, 321–326 (2000). Article  PubMed  ADS  CAS  Google


Scholar  * Urbaneja, A., Jacas, J., Verdú, M. & Garrido, A. Dinámica e impacto de los parasitoides autóctonos de Phyllocnistis citrella Stainton, en la Comunidad Valenciana. _Invest.


Agric. Prod. Prot. Veg_ 13, 787–796 (1998). Google Scholar  * Nisius, A. The stromacentre inAvena plastids: An aggregation of β-glucosidase responsible for the activation of oat-leaf


saponins. _Planta_ 173, 474–481 (1988). Article  PubMed  CAS  Google Scholar  * Halkier, B. A. & Gershenzon, J. Biology and biochemistry of glucosinolates. _Annu. Rev. Plant Biol._ 57,


303–333 (2006). Article  PubMed  CAS  Google Scholar  * McDowell, J. M., An, Y., Huang, S., McKinney, E. C. & Meagher, R. B. The Arabidopsis ACT7 actin gene is expressed in rapidly


developing tissues and responds to several external stimuli. _Plant physiology_ 111, 699–711 (1996). Article  PubMed  PubMed Central  CAS  Google Scholar  * Furtado, A. In Cereal Genomics


23–28 (Springer, 2014). * Conesa, A. & Götz, S. Blast2GO: A comprehensive suite for functional analysis in plant genomics. _International journal of plant genomics_ 2008 (2008). *


Kanehisa, M. _et al_. From genomics to chemical genomics: new developments in KEGG. _Nucleic acids research_ 34, D354–D357 (2006). Article  PubMed  CAS  Google Scholar  * Bioinformatics


& Evolutionary Genomics, G. U., Belgium. _Calculate and draw custom Venn diagrams_, http://bioinformatics.psb.ugent.be/webtools/Venn/. * Thimm, O. _et al_. mapman: a user‐driven tool to


display genomics data sets onto diagrams of metabolic pathways and other biological processes. _The Plant Journal_ 37, 914–939 (2004). Article  PubMed  CAS  Google Scholar  * Lohse, M. _et


al_. Mercator: a fast and simple web server for genome scale functional annotation of plant sequence data. _Plant, cell & environment_ 37, 1250–1258 (2014). Article  CAS  Google Scholar


  * Franz, M. _et al_. Cytoscape. js: a graph theory library for visualisation and analysis. _Bioinformatics_ 32, 309–311 (2015). PubMed  PubMed Central  ADS  Google Scholar  * _Multiple


Experiment Viewer_, http://mev.tm4.org/#/welcome. Download references ACKNOWLEDGEMENTS The authors thank Green Cauldron Coffee, Australia for providing the coffee bean samples and Poss


Reading, Marta Brozynska, Adam Healey, Tiparat Tikapunya and Hayba Badro for help with sampling. This research is supported by Australian Research Council (PROJECT ID: LP130100376) and


Chinese Scholarship Council (2014–2018). AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD,


4072, Australia Bing Cheng, Agnelo Furtado & Robert J. Henry Authors * Bing Cheng View author publications You can also search for this author inPubMed Google Scholar * Agnelo Furtado


View author publications You can also search for this author inPubMed Google Scholar * Robert J. Henry View author publications You can also search for this author inPubMed Google Scholar


CONTRIBUTIONS B.C., A.F. and R.H. designed this study. B.C. did the analysis and manuscript drafting. A.F. and R.H. edited the manuscript. CORRESPONDING AUTHOR Correspondence to Robert J.


Henry. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PUBLISHER'S NOTE: Springer Nature remains neutral with regard to


jurisdictional claims in published maps and institutional affiliations. ELECTRONIC SUPPLEMENTARY MATERIAL SUPPLEMENTARY INFORMATION SUPPLEMENTARY DATASET RIGHTS AND PERMISSIONS OPEN ACCESS


This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as


long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third


party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the


article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright


holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Cheng, B., Furtado, A. & Henry,


R.J. The coffee bean transcriptome explains the accumulation of the major bean components through ripening. _Sci Rep_ 8, 11414 (2018). https://doi.org/10.1038/s41598-018-29842-4 Download


citation * Received: 26 January 2018 * Accepted: 16 July 2018 * Published: 30 July 2018 * DOI: https://doi.org/10.1038/s41598-018-29842-4 SHARE THIS ARTICLE Anyone you share the following


link with will be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature


SharedIt content-sharing initiative