Hiv- 1 lentivirus tethering to the genome is associated with transcription factor binding sites found in genes that favour virus survival

Hiv- 1 lentivirus tethering to the genome is associated with transcription factor binding sites found in genes that favour virus survival

Play all audios:

Loading...

ABSTRACT Lentiviral vectors (LV) are attractive for permanent and effective gene therapy. However, integration into the host genome can cause insertional mutagenesis highlighting the


importance of understanding of LV integration. Insertion site (IS) tethering is believed to involve cellular proteins such as PSIP1/LEDGF/p75, which binds to the virus pre-integration


complexes (PICs) helping to target the virus genome. Transcription factors (TF) that bind both the vector LTR and host genome are also suspected influential to this. To determine the role of


TF in the tethering process, we mapped predicted transcription factor binding sites (pTFBS) near to IS chosen by HIV-1 LV using a narrow 20 bp window in infected human induced pluripotent


stem cells (iPSCs) and their hepatocyte-like cell (HLC) derivatives. We then aligned the pTFBS with these sequences found in the LTRs of native and self-inactivated LTRs. We found


significant enrichment of these sequences for pTFBS essential to HIV-1 life cycle and virus survival. These same sites also appear in HIV-1 patient IS and in mice infected with HIV-1 based


LV. This in silco data analysis suggests pTFBS present in the virus LTR and IS sites selected by HIV-1 LV are important to virus survival and propagation. SIMILAR CONTENT BEING VIEWED BY


OTHERS A POINT MUTATION IN HIV-1 INTEGRASE REDIRECTS PROVIRAL INTEGRATION INTO CENTROMERIC REPEATS Article Open access 18 March 2022 HIV-1 SEQUENCES IN LENTIVIRAL VECTOR GENOMES CAN BE


SUBSTANTIALLY REDUCED WITHOUT COMPROMISING TRANSDUCTION EFFICIENCY Article Open access 08 June 2021 INTRAGENIC VIRAL SILENCER ELEMENT REGULATES HTLV-1 LATENCY VIA RUNX COMPLEX RECRUITMENT


Article Open access 13 May 2025 INTRODUCTION LV have been engineered extensively for efficient and safe therapeutic gene delivery. VSV-G pseudotyped HIV-1 based vectors are particularly well


suited for this as they have been shown to infect a broad range of cell types effectively and achieve permanent gene transfer. Following infection and entry into the cell, reverse


transcription converts vector RNA genomes into double-stranded cDNA for assembly with cellular proteins [1] into PICs that associate with host chromatin to facilitate integration [2,3,4,5].


Clear differences exist in IS selection by retrovirus vectors (RV), that appear to target promoter regions, in contrast to LV that favour the transcription unit of the gene. Integration is


semi-random, and genes involved in proliferation, development and differentiation are believed to be favoured [4,5,6,7]. Importantly, both RV and LV have been shown to cause insertional


mutagenesis and therefore, understanding IS choice is of utmost importance for safe vector design. Tethering of LV to the host genome has been demonstrated to involve the integration complex


and several cellular proteins are known to interact with the viral integrase [8,9,10]. Importantly, HIV LV tethering is believed to be mediated by PSIP1/LEDGF/p75 [4,5,6,7, 11, 12] and


depletion of PSIP1/LEDGF/p75 significantly reduces HIV integration. However, because HIV-1 IS profile remains semi-random this indicates alternative factors support IS preference [13]. While


other sites may influence IS, the LTR is known to be vital in binding to the viral integrase for integration within the host genome [14]. The potential role of transcription factors (TF) in


tethering of MLV is supported by the finding that interaction between the MLV integrase and the enhancer in the LTR U3 is important for insertion near specific genomic pTFBS [15]. MLV PIC


tethering to pTFBS has been postulated as an important mechanism that promotes viral survival and propagation by enabling TFs to bind gene targets involved in viral transcription [15]. LV,


however, is believed not to integrate near to pTFBS after high resolution mapping of IS in haematopoietic cells [16]. Several known TF bind to the HIV-1 genome, for example, TNF-α activation


of HIV-1 transcription in chronically infected T-cells requires binding of the NFκB TF specifically to the U3 in the LTR [17]. The HIV life cycle also uses LTR binding sites for c-myb [18]


and AP1 [19] TFs to support viral transcription, latency, and infection of non-activated T- cells [20,21,22,23,24]. Several genes important to LV propagation have also been found associated


with cancer [25,26,27,28,29,30]. To understand more clearly the risk for LV to target TFBS that are found in cancer genes, we mapped pTFBS sequences close to IS using a small sequence window


of 20 bp around these sites and matched these with the identical pTFBS sequences located in the LV LTR. Human induced pluripotent stems cells (hiPSC) reprogrammed from somatic cells have


the ability to be differentiated to several derivative cell types including HLC which have been used widely for human disease modelling [31,32,33,34,35,36]. iPSCs have clinical relevance


with these cells used in correction of genetic diseases [37,38,39]. iPSCs and HLCs have been comprehensively characterised as representative of liver-like cells at the genetic and functional


level, with these derivatives more closely aligned to in vivo hepatocytes than liver cell lines [40]. Progress has been made in maturation of HLCs to represent a mature phenotype, including


growth of spheroid cultures to more closely represent the in vivo microenviroment [40,41,42,43]. As RV and LV vectors show preference for integration in highly expressed genes, hiPSC and


their derivatives, that express multiple genes involved in developmental and differentiation stages, would be useful to study LV IS selection in genes important to controlling normal


cellular behaviour. In this report, hiPSC and their HLC derivatives were used for infection by HIV-1 LV. We compared LV with native LTR and self-inactivating (SIN) configurations to identify


pTFBS targeting, especially since the latter is used for gene therapy. In addition, we aligned these sites in IS of infected hiPSC and HLC by both LV LTR configurations to pTFBS identified


in the sequences of the LTRs and found a high degree of similarity. These sites also closely match with pTFBS also present in the IS chosen by HIV-1 in infected patient T-cells and in mice


infected with SIN LV. Furthermore, we confirm enrichment of pTFBS that are important for virus survival that also associate with cellular proliferation. These findings imply that further


modification of the LTR may avoid these IS targets to achieve safer LV site selection. RESULTS IDENTIFICATION OF PTFBS IN LV IS IN INFECTED IPSC AND HLC iPSCs were recovered from frozen


storage and expanded for differentiation assays. These cells were fully characterised as pluripotent and HLC derivatives before gene transfer to ensure cell identity during these


experiments, as described previously [42]. Cells were determined as viable and positive for infection via flow cytometry and PCR analysis and retained their pluripotency and differentiation


characteristics. HIV-1 LV carrying the native LTR (pHV) or SIN’LTR configurations driving GFP were used to infect iPSC and their HLC derivatives (Fig. 1). We observed HLC spheroids are


heterogenous by light microscopy and therefore postulate this may influence LV IS choice. iPSCs were harvested 3 or 30 days after infection and HLC were harvested 3 days post infection. IS


profiling via LAM-PCR [44, 45] identified multiple IS in all infected cells (Table 1). Each IS was identified using the UCSC BLAT genome browser (http://genome.ucsc.edu) alignment to the


human genome (hg38 build). The majority of IS were identified within gene bodies. pTFBS were mapped 20 bp either side of each site using oPOSSUM v3.0 Single Site Analysis


(http://opossum.cisreg.ca/). Matching pTFBS were found across all samples with regards to timepoint of infection, vector configuration and cell type (Fig. 2 and Table 1). Examination of the


top 10 pTFBS (sequence hits) in infected cells by each vector showed pTFBS mostly in common for iPSC at the two time points of infection (both at 82%) and in transduced HLCs (82%) for each


LV. In addition, for each vector, infected iPSC and HLC shared pTFBS at 82% and 100%, respectively. Aligning sites identified between the different cell types at day 3 also showed pTFBS in


common for each LV (82% for the SIN’LTR LV and 100% for pHV LV) (Fig. 2). These data confirmed IS choice alignment to be highly similar regardless of cell type, stage of development and LV


LTR configuration. PTFBS IDENTIFIED IN LV LTRS ALIGN WITH VECTOR IS We next identified the pTFBS in the 5′ LTR sequences of SIN’LTR and pHV via oPOSSUM v3.0 Single Site Analysis


(http://opossum.cisreg.ca/). While pTFBS are also found in the promoter region of these viruses, 69% of these sites were not identified in the in vitro data set. A random association of


pTFBS in the human genome identified in the Opossum V3.0 database of 170 TFBS families [46] indicated that pTFBS, within a 20 bp window close to the IS chosen by pHV and SIN’LTR in iPSC and


HLC, accounted for 25% and 10% of these families, respectively, in the JASPAR database. The frequency of pTFBS common to the SIN’LTR and host IS was 36%. 65% of IS identified were common to


SIN’LTR and pHV infected samples Fig. 3. pTFBS were common between each LV regardless of harvest timepoint, cell type and vector copy number. To determine enrichment of these pTFBS above


random expectation, we compared alignments with a randomly generated data set. Calculation of Z scores (enrichment) derived from this analysis showed nearly all pTFBS were significantly


enriched above the random data set (Fig. 4). Compared to the random data set, enrichment of LTR/IS associated pTFBS was proven with positive _Z_ scores for most of the pTFBS present in the


SIN’LTR and pHV LTRs. PTFBS NEAR IS ASSOCIATE WITH HIV-1 CYCLE AND PROLIFERATION We then examined the TF assigned to each pTFBS and identified HIV-1 based associations. NFATC2, which is


known to interact with U3 and U5 to activate HIV-1 genome transcription [47], PBX1 and ZEB1, that have been identified responsible for regulating viral transcription [24] and AP1, which is


known to contribute to HIV-1 latency [48]. Also, present was pTFBS known to bind NFκβ and SP1, that are involved in viral gene expression [49, 50]. Interestingly, ZEB1, NFATc2, PBX1, AP-1,


NF-kB and SP1 are all associated with cellular proliferation [25,26,27,28,29,30]. _Z_ scores for the binding sites of these TF showed them to be highly enriched above background in iPSC and


HLC datasets, suggesting specific targeting of these pTFBS by each LV (Fig. 4). A summary of the enrichments for each pTFBS is shown (Table 2). CHANGES IN LV ASSOCIATED PTFBS SUGGESTS CLONAL


DRIFT The frequency of pTFBS found in iPSC transduced by SIN’LTR decreased by 49% between day 3 and 30 (from 633,929 to 322,340 sites), in contrast to pTFBS in pHV iPSC, which decreased


only 10 % (from 329,674 to 297,681 sites) over the same period. The frequency of pTFBS for TF associated with HIV lifecycle also decreased, however, this was significantly different for each


LV (Fig. 2). These differences in pTFBS suggests that, for the pHV vector, retention of these IS could be supporting virus survival. It is possible that this IS preference may also


influence outgrowth of cells within pHV bulk populations caused by the native virus LTR promoter and/or enhancer. IDENTIFICATION OF PTFBS IN HIV-1 INFECTED PATIENTS IS To further study the


hypothesis that HIV-1 infected cells may exhibit preferential survival compared to their uninfected counterparts as a result of their chosen IS, we investigated pTFBS prevalence near to the


IS chosen by HIV-1 in infected patients that had undergone anti-retroviral therapy (ART), which supports the survival of infected cells. We used the Retrovirus Integration Database


(https://rid.ncifcrf.gov/intro.php) (RID) to obtain patient IS loci and identified pTFBS (Table 1). These sites were also aligned to those identified in native 5′ LTR. Once again, IS were


inputted into BEDTools (https://bedtools.readthedocs.io/en/latest/) to generate 20 bp sequences 5′ and 3′ from each site and map to pTFBS (Table 1) using oPOSSUM v3.0 Single Site Analysis


(http://opossum.cisreg.ca/) of these sequences (Fig. 2). Alignment of pTFBS found most sites predicted in the native LTR aligned to patient IS data (39/42). Of the pTFBS alignments for all


pTFBS families, most sites (812/1248) were found in common with pTFBS in the HIV LTR and fewer sites (8/42) identified in the LTR were enriched with a positive _Z_ score compared to the


random data set. pTFBS alignments patient data revealed 28% of pTFBS are enriched, with only PBX1 enriched out of HIV-1 specific TF (Fig. 4). ALIGNMENT OF PTFBS IDENTIFIED IN SIN LTR LV AND


IS IN INFECTED MICE We next characterised pTFBS near IS selected by SIN’LTR LV in CD-1 immunocompetent mice (_n_ = 31) after gene transfer via yolk sac vessel injection before birth (G16),


which efficiently reaches the liver [8]. Following injection, none of the treated mice showed observable adverse effects or tumour development and each displayed normal liver morphology


after sacrifice. Four weeks post injection, three mouse livers were harvested for immunohistochemistry of GFP expression and sample DNAs were subjected to LAM PCR [51] followed by BLAST


(http://www.ncbi.nlm.nih.gov/genome/seq/MmBlast.html) and BLAT (http://genome.ucsc.edu) searches of the murine genome to determine LV IS (Table 1). From this analysis, out of the IS loci


retrieved and pTFBS mapped, all pTFBS present in the SIN LTR aligned (Fig. 2). A similar percentage of the LTR pTFBS was identified in the in vitro iPSC and HLC data sets described in this


study. Using _Z_ score significance, 45% of sites were enriched with >50% of of pTFBS found in SIN LTR significantly enriched in the mouse compared to a random control dataset generated


using murine background sequences (Fig. 4). This further supports the hypothesis that the LV LTR influences IS choice. ALIGNMENT OF PTFBS FOUND BY IN VITRO AND IN VIVO ANALYSES By comparison


of pTFBS identified in vitro and in vivo, all pTFBS identified in HIV-1 patient IS aligned with those associated with pHV infected iPSC and HLC. In addition, in SIN’LTR infected mice,


virtually all pTFBS (97–98%) aligned with pTFBS identified from IS in iPSC and HLC infected with this vector at both time points of infection. When compared to the random control datasets,


as indicated by a positive _Z_ score, pTFBS were found to be enriched. In the mouse, these sites included PBX1, NFATC2 and AP1 but not ZEB1, NFκβ and SP1 near IS (Fig. 4). Conversely, in HIV


patient IS only pTFBS for NFκβ was significantly enriched. By comparison of Z scores, pTFBS in LV LTRs and IS were highly enriched for TF known to be associated with HIV lifecycle,.


Although pTFBS in AP1 and NFATC2, also associated with the HIV lifecycle were moderately enriched, half of these sites were enriched across all data sets (Fig. 4). In iPSC and HLC sites and


in mice and patient data sets, NK3-1, PDX1 and PRRX2 were also significantly enriched. AP1, GFI, HOXA5, NFATC2, PBX1, SPIB and NFκβ were also enriched in iPSC and HLC datasets and the murine


in vivo data set. Overall, the majority of pTFBS identified in the LV LTRs or those associated with HIV lifecycle are significantly enriched in vitro and in vivo suggesting that the pTFBS


in the LTR to be highly associated with in vitro and in vivo data sets and implies infection profiling in iPSC/HLC recapitulates our findings in vivo. ALIGNMENT OF PTFBS IN LV IS GENES


ASSOCIATED WITH CLONAL OUTGROWTH Lastly, we determined whether the pTFBS identified in this study are similar to the IS found near to genes previously reported to be involved in clonal


outgrowth following gene therapy [52, 53]. The eight genes we investigated were: _LMO2_, _PRDM16_, _CCND2_, _MECOM_, _HMGA2_, _BMI1_, _BCL2_ and _PRDM1_ were subject to oPOSSUM v3.0 Single


Site Analysis (http://opossum.cisreg.ca/) in which 109 pTFBS were identified. pTFPS present in the native and SIN configuration LTRs were present in all eight genes investigated.


Interestingly, the unique pTFBS found in SIN’LTR (Hand::Tcfe2a) was also present in each gene. Enrichment analysis of these pTFBS in these genes against 24,752 genes stored in the oPOSSUM


database showed the majority in the native LTR (76%) and the SIN LTR (82%) enriched (Fig. 5) confirming each pTFBS resides in genes known to have been involved in genotoxic events and that


the unique pTFBS present in the SIN LTR configuration used in these reported cases, but not the native LTR vector was identified in all eight genes. DISCUSSION HIV IS selection has been


studied in several non-clinical models and from patient data from gene therapy clinical trials. LV are known to preferentially integrate into active transcription units in the host genome


[4, 5, 7, 54] and various studies have shown insertional mutagenesis after HIV-1 mediated gene therapy [52]. 27.7% of the human genome have been associated with TFBS with 40% of the human


genome accessible to TF and, therefore, we sought to investigate the likelihood that TF may influence vector genome tethering using pTFBS common to both the LV IS and the HIV-1 LV native or


SIN LTR [55]. TFBS are functionally active and are driven by sequence homology to bind to TF for gene activation [56]. We next identified the importance of the genes chosen for integration


and with safety in mind asked whether these genes have been found previously associated with genotoxic events [55]. RV and LV integration requires cleavage of host DNA by the virus


integrase, then insertion of the vector and completion of this process by the cellular machinery for successful and permanent residence into the host genome. The tethering model of


integration has previously been reported as a mechanism where cell derived proteins chaperone the virus PICs specifically to their chosen target site [54]. DNA repair proteins such as hRad1,


are also important to this process by repairing the nicks made in host DNA by the virus integrase [57, 58]. A major protein believed to be involved in tethering is PSIP1/LEDGF/p75, through


its interaction with PICs [11, 12]. Interestingly, knockout studies involving PSIP1/LEDGF/p75, show, albeit reduced, preferential integration still into active transcription units,


suggesting alternative factors may be supporting genome target site selection [13]. Indeed, whilst interaction between PICs and nuclear import proteins have been shown to be important for


efficient entry into the nucleus, these proteins are believed to assist tethering of HIV genomes towards actively transcribing chromatin residing near the nuclear periphery [54]. The virus


integrase also shows site specific selection as observed by MLV insertion preference in or close to promoter regions, in contrast to gene transcription units by LV, which is why RV is


believed to have a higher risk than LV in causing unwanted insertional mutagenesis through altering gene expression of important genes that regulate normal cellular behaviour [59].


Confirmation of this is evidenced by changing HIV LV site selection to gene promoters though switching of gag/pol sequences between MLV and LV [5]. Studies involving yeast mediated bait


selection of proteins have shown that the MLV integrase interact with several proteins that include TF [60]. It has been shown that pTFBS identified in the U3 region of MLV LTR are also


present close to IS chosen by these vectors in the host genome, using a search window 1 kb either side of the site of insertion and supports the hypothesis that these sites may also in some


way be involved in vector tethering. Interestingly, in that study, pTFBS found in the native HIV-1 LTR were also present near to the IS of infected CD34 + cells, however, in HeLa cells pTFBS


IS association could only be identified for MLV and but not LV. However, swapping the U3 region of the LV LTR with that of the U3 of MLV returned this association. This suggests that


differences in tethering to pTFBS may be influenced by differential TF gene expression between transformed and untransformed cells [15]. In addition to vector integration site choice,


consideration has been made for safer vector design. Both RV and LV carry virus LTRs that reside at the 5′ and 3′ ends of the vector and have promoter and enhancer functions. To circumvent


gene activation by these activities, most important to safe LV design has been the development of LTR self-inactivation during reverse transcription. SIN configuration abrogates promoter and


enhancer gene activation by U3 deletion. Replication defective vectors with SIN LTR configuration, believed to reduce the potential for insertional mutagenesis [61, 62], are currently


promising LV for permanent gene transfer. However, surprisingly these vectors have still been found associated with high frequency oncogenesis in mice and were also implicated in clonal


outgrowth in a β-thalassaemia gene therapy trial [63]. To study the tethering model and its possible relationship with IS choice further, we took advantage of the differences in LTR design


to determine what effects this had on IS selection by LV in relation to pTFBS. To do this, we used a narrow 20 bp window to investigate for the presence of pTFBS around the IS chosen by


native and SIN LTR configurations and aligned these to pTFBS identical sites in each LV LTR. By determination of pTFBS frequency occurring in each LTR with host IS choice we aimed to provide


further evidence supporting the hypothesis that TF mediate LV tethering. In addition, we investigated these pTFBS specific sites for their functional properties with regards to HIV-1 and


whether these sites are present in genes known to have been targets for insertional mutagenesis. As we suspected modified cell lines could display gene expression profiles different to


untransformed cells and this could influence IS selection, we used human iPSC and their HLC derivatives for our investigation. By using iPSC and their HLC derivatives we expected integration


in highly transcribed genes at the early proliferative and the late terminally differentiation stages as would be expected for in vivo or ex-vivo gene transfer in early progenitor and


mature cells. These cells have also been used widely for human disease modelling [31,32,33] and iPSC have also been shown successful in ex-vivogene therapy mediated disease correction


[34,35,36]. The heterogenous nature of cell populations we observed in HLC spheroids may influence LV IS targeting according to differences in gene expression. Interestingly, gene expression


in iPSc derived HLC has previously been found to more closely representative of the in vivo microenviroment of the human liver than in liver cell lines [40]. Both iPSc and HLC were


characterised as pluripotent stem cells and their differentiated counterparts, respectively, prior to infection and appeared to retain these properties morphologically after infection.


Compared to the randomly generated control datasets, we found pTFBS significantly enriched around LV IS, thereby strengthening the hypothesis that pTFBS present in the LTR either directly or


indirectly are involved in LV genome tethering to specific IS. Indeed, this preference appeared to be independent of each cell type used and differences in LTR configurations. Our analysis


of pTFBS in these cells was also highly comparable to the pTFBS identified around the IS of HIV-1 in patient genomes carrying native LTR and consistent with our data analysis from in mice


injected with the SIN LTR configuration LV. Interestingly, our pTFBS IS analysis in vitro and in vivo found 50% of LV LTR associated pTFBS around IS involved in HIV-1 lifecycle (NKX3-1,


PDX1, PRRX2, AP1, Gfi, HOXA5, NFATC2, SPIB and NFκβ) that appear enriched across most data sets. For example, PBX1 is known to be involved with viral transcription [20,21,22]. ZEB1 and AP1


have been shown to be involved in HIV latency [24, 48] and NFATC2 is essential for productive infection of non-activated T-cells [23]. Also, NFκβ and SP1 sites in the LTR have both been


shown to be involved in transcription of the HIV genome [49, 50]. Interestingly, the majority of these TF have also been previously found associated with cancer [25,26,27,28,29,30]. We


observed that the frequency of pTFBS in these genes decreases over the 3–30 day time period in iPSC infected by SIN’LTR and a significantly smaller decrease of these pTFBS was observed in


pHV infected iPSC over the same period (5% compared to 50%). This finding suggests that LV insertion near to these selected pTFBS may be useful to promote virus survival and highlights the


importance of the SIN configuration to reduce the potential for insertional mutagenesis of these genes. LEDGF knockout has been shown to significantly decrease HIV-1 integration [11, 12]. As


such, mechanistic data using knockout RNAi or TF CHIP-Seq analysis would be interesting to further investigate these findings. Previous studies have shown that a reduction in IS


heterogeneity in infected cells is observed over periods of long-term cell growth. This has been postulated due to reducing polyclonality and clonal outgrowth caused by insertional


mutagenesis by vector influence on specific genes involved in cell proliferation [64]. Several vector and host factors are believed to influence vector-associated side effects and the risk


of insertional mutagenesis leading to oncogenesis and IS selection and vector configuration are believed highly important to this [57, 65, 66]. This has been demonstrated by the difference


shown between RV and LV (tenfold) to cause cellular transformation [67]. With evidence of enrichment of pTFBS that occurred close to IS for each of the LV vectors, in common to these sites


in the vector LTR after infections of iPSC and HLC, we investigated whether these pTFBS also occur in genes already previously reported to be associated with clonal dominance in non-clinical


genotoxicity models and clinical trials. We chose eight genes; _LMO2_, _PRDM16_, _CCND2_, _MECOM_, _HMGA2_, _BMI1_, _BCL2_, _PRDM1_, associated with insertional mutagenesis [52] in which


116 pTFBS were assigned and then aligned these with the pTFBS found in each LV LTR. We identified 67% of the pTFBS occurring in the native or SIN LTR configuration within these genes.


Interestingly, while the U3 deletion removed multiple pTFBS, it introduces a new pTFBS site not identified in the native LTR (Hand::Tcfe2a). This work shows that pTFBS present in the LTR are


also present in the sites selected by LV for integration suggesting tethering of LV to these sites within the genome. Furthermore, we propose the iPSC/HLC model would be useful to study LV


interactions with the host and the outcome of integration into genes important to cell survival and proliferation. METHODS AND MATERIALS VECTOR PRODUCTION AND TITRATION The production of


HR’SIN-cPPT-SEW-eGFP-W (SIN’LTR) and its native LTR equivalent (pHV) LV was carried out as previously described [68]. These viruses have previously been used in both cell and animal assays


[8, 69,70,71]. The plasmid carrying eGFP flanked by SIN LTRs have also been sequenced to ensure sequence integrity. Both vectors express eGFP under the internal promoter of SFFV (Fig. 1).


Infectious LV titre was calculated as previously reported [72]. Briefly, 2 × 105 HEK293T cells were seeded and incubated at 37 °C, 5% CO2 overnight to adhere. Serial dilutions of virus were


prepared and incubated in complete cell culture medium with 5 µg/ml polybrene (Sigma Aldrich, Dorset, England), for 20 min at room temperature before addition to cells. 72 h post


transfection, cells were harvested for GFP expression analysis via flow cytometry using ACEA Novocyte flow cytometer and NovoExpress software V1.2.5 (Agilent Technologies, Didcot, England).


Dilutions expressing 1–30% GFP expression were analysed as accurate representations of viral titre, calculated as shown below: $${{{{{{{\mathrm{Titre}}}}}}}}\left(


{{{{{{{{\mathrm{TU}}}}}}}}/{{{{{{{\mathrm{ml}}}}}}}}} \right) = \left( {\left( {{{{{{{{\mathrm{Cell}}}}}}}}\,{{{{{{{\mathrm{count}}}}}}}} \ast \left(


{{{{{{{{\mathrm{Percentage}}}}}}}}\,{{{{{{{\mathrm{GFP}}}}}}}}\,{{{{{{{\mathrm{expression}}}}}}}}/100} \right)} \right)/{{{{{{{\mathrm{Volume}}}}}}}}} \right) \ast


{{{{{{{\mathrm{DF}}}}}}}}$$ SIN’LTR LV titre was calculated as 1.18 × 109 TU/ml and pHV was titrated as 3.8 × 109 TU/ml. INJECTION OF IMMUNOCOMPETENT MICE AND IMMUNOHISTOCHEMISTRY OF MOUSE


LIVERS TO DETERMINE GENE DELIVERY Neonatal MF-1 mice were injected intravenously via the temporal vein 1 day after birth with 4.8 × 107 vector particles/neonate with the SIN’LTR vector to


reach their circulation, as previously described. After 72 h, mouse liver samples were harvested via liver biopsy, as previously described [68]. Briefly, each liver biopsy was fixed in 25%


formalin overnight, transferred to 70% ethanol, and processed into paraffin. EGFP was detected by incubation in citrate buffer with rabbit anti-eGFP antibody A-6455 (Molecular Probes,


Eugene, Oregon, USA). Standard avidin-biotin peroxidase and diaminobenzidine treatment followed and sections were counterstained with haematoxylin. DNA from non-fixed infected mouse tissues


and uninfected controls was harvested for analysis of vector integration sites. IPSC CULTURE, DIFFERENTIATION, CHARACTERISATION, AND TRANSDUCTION mTeSR™1 medium (Stemcell Technologies,


Cambridge, England) was prepared for iPSC growth according to manufacturer’s instructions and stored at 4 °C for further use. 6 and 12 well tissue culture treated plates were coated in 5 


µg/well laminin-521 (Stemcell Technologies) as a matrix for stem cell attachment, according to manufacturer’s instructions. Laminin- 521 coated plates were sealed and stored at 4 °C for


further use. In preparation for stem cell plating, laminin-521 coated plates were warmed at 37 °C for 20 min. JHU106i (P106) cells are a hiPS cell line derived from the blood of a


28-year-old Caucasian male, reprogrammed using episomal vectors [73]. These cells were purchased from WiCell, DB41285 (Madison, Wisconsin, USA). These iPSC were grown in mTeSR™ 1 -medium and


passaged regularly when 70–80% confluent. Morphologically differentiated cells were manually cleared through aspiration. Cells were passaged for at least 1 month prior to initiating


differentiation to HLCs to ensure pure cultures of pluripotent stem cells. These cells have been fully characterised against pluripotency and differentiation markers by immunocytochemistry


and qPCR analysis [42, 43]. iPSCs were also stained against pluripotent markers and analysed by flow cytometry, namely SSEA4 (96.25 ± 3.75%), TRA-1-60 (95.15 ± 2.35%), TRA-1-81 (88.95 ± 


3.75) and against a differentiation marker, CD15 (13.08 ± 3.23%) to verify pluripotency. Cells were washed in DPBS and incubated in gentle cell dissociation reagent (Stemcell Technologies)


for 6 min at 37 °C before aspiration of medium. Cells were resuspended in mTeSR™ 1 medium and serially diluted between laminin-521 coated wells. iPSCs were differentiated to HLC in three


dimensional spheroid culture as previously reported [42]. The number of cells seeded per microplate were kept consistent at 3.84 × 105 to ensure formation of 256 spheroids of 100–150 µm in


diameter form. Endoderm differentiation medium was initiated when the cells were ~30–40% confluent. The culture media was replaced with endoderm differentiation medium RPMI 1640 containing 1


× B27 (Life Technologies, Hemel Hempstead, England) supplemented with essential growth factors, 10 ng/ml Activin A (PeproTech, Hammersmith, England) and 50 ng/ml Wnt3a (R&D Systems,


Abington, England). The medium was changed every 24 h for 72 h and then continued with 10 ng/ml Activin A without Wnt3a for 2 days. On day 5, endoderm differentiation medium was replaced


with hepatoblast differentiation medium, and this was replaced every 2 days for a further 5 days. The medium consisted of Knockout-DMEM, knockout serum replacement, 0.5% Glutamax, 1%


non-essential amino acids, 0.2% b-mercaptoethanol and 1% DMSO (Life Technologies). Hepatocyte maturation of the iPSCs-derived hepatoblasts was induced at day 10 of differentiation. Cells


were cultured using serum-free HepatoZYME™ medium (Life Technologies) containing 1% Glutamax (Life Technologies), supplemented with 10 ng/ml hepatocyte growth factor (PeproTech) and 20 ng/ml


Oncostatin M (PeproTech), for 13 days. The medium was replaced every 48 h and cells were characterised, as previously described [42, 74]. Cells were analysed at specific time points


throughout differentiation using qRT-PCR detection of Oct4, NANOG, FOXA2, HNF4A, SOX17, AFP and albumin protein quantification [42]. Immunohistochemistry against NANOG, OCT4, SOX17 AND FOXA2


(day 0–10), AFP and HNF4A (day 20–30) and E-cadherin (day 30) characterise these cells as PSCs and HLC respectively [42]. Bulk cultures of iPSC and HLC were transduced at MOI 20 using


SIN’LTR or pHV LV, using 5 µg/ml polybrene (Sigma Aldrich, Dorset, UK). Bulk cultures were chosen for this analysis as they represent a large population of cells rather than independent


clones which may vary in their data outreads. Infected cells remained >85% viable after infection and no morphological changes were observed. Cells were grown for a total of three or


serially passaged for 30 days before harvesting DNA and RNA using DNEasy and RNEasy mini kits (Qiagen, Manchester, England) according to manufacturer’s instructions. Cells were analysed via


flow cytometry for GFP fluorescence expression and live images of cells were taken under green fluorescence were taken using the Floid® Cell Imaging System (Thermo Fisher Scientific, Hemel


Hempstead, England). Infected iPSCs were quantified by flow cytometry to express 90% GFP and HLC spheroids were viewed under fluorescent microscopy to show 85% GFP expression in comparison


to brightfield images (Fig. 1). Percentage infection was estimated through observation of all 256 spheroids per microplate after live cells fluorescent microscopy imaging in comparison to


brightfield images, as determined by cells expressing green fluorescence. For analysis of pTFBS, total insertion sites (ISs) from iPSC and HLC samples (three individual bulk samples were


grown and differentiated from a single batch of JHU106i cells, in parallel) transduced using LV carrying native (pHV) or SIN configuration (SIN’LTR) LTRs (Table 1) were used for BLAT


(http://genome.ucsc.edu) alignment to the human genome build 38 (http://genome.ucsc.edu). VECTOR INSERTION SITE ANALYSIS Amplification of vector-genomic DNA junctions: Mouse genomic DNA was


extracted as previously described [15, 51]. LAM-PCR: linear amplification for LV vectors was also performed as previously described [44, 45]. Briefly, LAM-PCR of genomic DNA was performed


using 100 ng of genomic DNA. PCR products were isolated and cloned into a TOPO TA plasmid cloning kit (Invitrogen) as per manufacturer’s instructions. HIV-ISs were sequenced by deep parallel


pyrosequencing (GS FLX/454: Roche, Mannheim, Germany) then subjected to Blas2Seq and the Smith-Waterman algorithm as previously described [75]. Sequences were aligned with the mouse genome


(_Mus musculus_ genome) assembly (NCBI37/mm9, UCSC _M. musculus_ genome) using UCSC BLAT genome browser (http://genome.ucsc.edu) or BLAST


(http://www.ncbi.nlm.nih.gov/genome/seq/MmBlast.html). Insertion sites (ISs) in iPSC and HLC were sequenced through EPTS/LM-PCR adapted from previously described [76]. RETRIEVAL OF PATIENT


HIV INSERTION SITE DATA We used the RID (https://rid.ncifcrf.gov/intro.php) search query to download HIV-1 IS loci in Homo sapiens genome build hg19 (Table 1). We then used BEDTools to


retrieve the 20 bp sequences upstream and downstream from each IS from genome build hg19. GENERATION OF 100,000 RANDOM 20 BP SEQUENCES FROM THE HG19 HUMAN AND MM9 MOUSE REFERENCE GENOMES To


generate statistically significant background sequences for OPOSSUM analysis we used a Perl script and BEDTools to randomly select 100,000 random 20 bp sequences from the hg19 and mm9


reference genome builds [77]. DETECTION OF PTFBS AT INTEGRATION LOCI We used oPOSSUM v3.0 Single Site Analysis (http://opossum.cisreg.ca/) to find species specific pTFBS that where


significantly enriched within the genes targeted by HIV-1. pTFBS were searched for using oPOSSUM3 Single Site Analysis v3.0 (http://opossum.cisreg.ca/) for human targets, using default


parameters with all vertebrate profiles with a specificity of 8 bits (minimum), 0.40 conversion cutoff and 85% matrix score threshold. The TF targets identified in oPossum have been shown


separately to be validated using CHIP-Seq and therefore are used for reliable identification of pTFBS [78]. 20 bp upstream and downstream of IS were chosen for analysis as this fall within


the range of TF binding [79]. Target sequence hits were quantified through filtering for ≥1 hit per transcription factor name identified (Table 1). pTFBS present in 5′ SIN and native LTR


configurations were identified through inputting 5′ sequences in a similar fashion [78, 80, 81]. Statistical analysis using _Z_ scores were determined using Opossum v3.0 software using


species specific background data sets. The _Z_ score is determined through comparison of predicted binding sites in the target input set to background set provided. This determines rate of


pTFBS found in comparison to expected rate determined from background set to indicate significance of sites found. Positive _Z_ scores were used to indicate enrichment of sites through


predisposition to sites after LV transduction. DATA AVAILABILITY The databases generated during an/or analysed during the current study are available from the corresponding author on


reasonable request. REFERENCES * Coffin JM, Hughes SH, Varmus HE. The Interactions of Retroviruses and their Hosts. In: Coffin JM, Hughes SH, Varmus HE, editors. Retroviruses Cold Spring


Harbor (NY): Cold Spring Harbor Laboratory Press; 1997. * Bushman FD. Targeting survival: integration site selection by retroviruses and LTR-retrotransposons. Cell. 2003;115:135–8. Article 


CAS  Google Scholar  * Bushman F, Lewinski M, Ciuffi A, Barr S, Leipzig J, Hannenhalli S, et al. Genome-wide analysis of retroviral DNA integration. Nat Rev Microbiol. 2005;3:848–58. Article


  CAS  Google Scholar  * Barr SD, Ciuffi A, Leipzig J, Shinn P, Ecker JR, Bushman FD. HIV integration site selection: targeting in macrophages and the effects of different routes of viral


entry. Mol Ther. 2006;14:218–25. Article  CAS  Google Scholar  * Lewinski MK, Yamashita M, Emerman M, Ciuffi A, Marshall H, Crawford G, et al. Retroviral DNA integration: viral and cellular


determinants of target-site selection. PLoS Pathog. 2006;2:e60. Article  Google Scholar  * Schroder AR, Shinn P, Chen H, Berry C, Ecker JR, Bushman F. HIV-1 integration in the human genome


favors active genes and local hotspots. Cell. 2002;110:521–9. Article  CAS  Google Scholar  * Barr SD, Leipzig J, Shinn P, Ecker JR, Bushman FD. Integration targeting by avian


sarcoma-leukosis virus and human immunodeficiency virus in the chicken genome. J Virol. 2005;79:12035–44. Article  CAS  Google Scholar  * Nowrouzi A, Cheung WT, Li T, Zhang X, Arens A,


Paruzynski A, et al. The fetal mouse is a sensitive genotoxicity model that exposes lentiviral-associated mutagenesis resulting in liver oncogenesis. Mol Ther. 2013;21:324–37. Article  CAS 


Google Scholar  * Ranzani M, Cesana D, Bartholomae CC, Sanvito F, Pala M, Benedicenti F, et al. Lentiviral vector-based insertional mutagenesis identifies genes associated with liver cancer.


Nat Methods. 2013;10:155–61. Article  CAS  Google Scholar  * Singhal R, Deng X, Chenchik AA, Kandel ES. Long-distance effects of insertional mutagenesis. PLoS ONE. 2011;6:e15832. Article 


CAS  Google Scholar  * Cherepanov P, Maertens G, Proost P, Devreese B, Van Beeumen J, Engelborghs Y, et al. HIV-1 integrase forms stable tetramers and associates with LEDGF/p75 protein in


human cells. J Biol Chem. 2003;278:372–81. Article  CAS  Google Scholar  * Vandegraaff N, Devroe E, Turlure F, Silver PA, Engelman A. Biochemical and genetic analyses of


integrase-interacting proteins lens epithelium-derived growth factor (LEDGF)/p75 and hepatoma-derived growth factor related protein 2 (HRP2) in preintegration complex function and HIV-1


replication. Virology. 2006;346:415–26. Article  CAS  Google Scholar  * Ciuffi A, Llano M, Poeschla E, Hoffmann C, Leipzig J, Shinn P, et al. A role for LEDGF/p75 in targeting HIV DNA


integration. Nat Med. 2005;11:1287–9. Article  CAS  Google Scholar  * Dicker IB, Samanta HK, Li Z, Hong Y, Tian Y, Banville J, et al. Changes to the HIV long terminal repeat and to HIV


integrase differentially impact HIV integrase assembly, activity, and the binding of strand transfer inhibitors. J Biol Chem. 2007;282:31186–96. Article  CAS  Google Scholar  * Felice B,


Cattoglio C, Cittaro D, Testa A, Miccio A, Ferrari G, et al. Transcription factor binding sites are genetic determinants of retroviral integration in the human genome. PLoS ONE.


2009;4:e4571. Article  Google Scholar  * Cattoglio C, Pellin D, Rizzi E, Maruggi G, Corti G, Miselli F, et al. High-definition mapping of retroviral integration sites identifies active


regulatory elements in human multipotent hematopoietic progenitors. Blood. 2010;116:5507–17. Article  CAS  Google Scholar  * Duh EJ, Maury WJ, Folks TM, Fauci AS, Rabson AB. Tumor necrosis


factor alpha activates human immunodeficiency virus type 1 through induction of nuclear factor binding to the NF-kappa B sites in the long terminal repeat. Proc Natl Acad Sci U S A.


1989;86:5974–8. Article  CAS  Google Scholar  * Dasgupta P, Saikumar P, Reddy CD, Reddy EP. Myb protein binds to human immunodeficiency virus 1 long terminal repeat (LTR) sequences and


transactivates LTR-mediated transcription. Proc Natl Acad Sci U S A. 1990;87:8090–4. Article  CAS  Google Scholar  * Canonne-Hergaux F, Aunis D, Schaeffer E. Interactions of the


transcription factor AP-1 with the long terminal repeat of different human immunodeficiency virus type 1 strains in Jurkat, glial, and neuronal cells. J Virol. 1995;69:6634–42. Article  CAS


  Google Scholar  * Tacheny A, Michel S, Dieu M, Payen L, Arnould T, Renard P. Unbiased proteomic analysis of proteins interacting with the HIV-1 5’LTR sequence: role of the transcription


factor Meis. Nucleic Acids Res. 2012 ;40:e168. Article  CAS  Google Scholar  * Chao SH, Walker JR, Chanda SK, Gray NS, Caldwell JS. Identification of homeodomain proteins, PBX1 and PREP1,


involved in the transcription of murine leukemia virus. Mol Cell Biol. 2003;23:831–41. Article  CAS  Google Scholar  * Ma C, Dong X, Li R, Liu L. A computational study identifies HIV


progression-related genes using mRMR and shortest path tracing. PLoS ONE. 2013;8:e78057. Article  CAS  Google Scholar  * Hohne K, Businger R, van Nuffel A, Bolduan S, Koppensteiner H,


Baeyens A, et al. Virion encapsidated HIV-1 Vpr induces NFAT to prime non-activated T cells for productive infection. Open Biol. 2016;6:https://doi.org/10.1098/rsob.160046. * Venkatachari


NJ, Zerbato JM, Jain S, Mancini AE, Chattopadhyay A, Sluis-Cremer N, et al. Temporal transcriptional response to latency reversing agents identifies specific factors regulating HIV-1 viral


transcriptional switch. Retrovirology. 2015;12:85–015. Article  Google Scholar  * Zhang P, Sun Y, Ma L. ZEB1: at the crossroads of epithelial-mesenchymal transition, metastasis and therapy


resistance. Cell Cycle. 2015;14:481–7. Article  CAS  Google Scholar  * Ao X, Ding W, Ge H, Zhang Y, Ding D, Liu Y. PBX1 is a valuable prognostic biomarker for patients with breast cancer.


Exp Ther Med. 2020;20:385–94. Article  CAS  Google Scholar  * Xiao ZJ, Liu J, Wang SQ, Zhu Y, Gao XY, Tin VP, et al. NFATc2 enhances tumor-initiating phenotypes through the NFATc2/SOX2/ALDH


axis in lung adenocarcinoma. Elife. 2017;6:https://doi.org/10.7554/eLife.26733. * Eferl R, Wagner EF. AP-1: a double-edged sword in tumorigenesis. Nat Rev Cancer. 2003;3:859–68. Article  CAS


  Google Scholar  * Xia Y, Shen S, Verma IM. NF-kappaB, an active player in human cancers. Cancer Immunol Res. 2014;2:823–30. Article  CAS  Google Scholar  * Beishline K, Azizkhan-Clifford


J. Sp1 and the ‘hallmarks of cancer’. FEBS J. 2015;282:224–58. Article  CAS  Google Scholar  * Doss MX, Sachinidis A. Current Challenges of iPSC-Based Disease Modeling and Therapeutic


Implications. Cells 2019;8:https://doi.org/10.3390/cells8050403. * Takahashi K, Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined


factors. Cell. 2006;126:663–76. Article  CAS  Google Scholar  * Yu J, Vodyanik MA, Smuga-Otto K, Antosiewicz-Bourget J, Frane JL, Tian S, et al. Induced pluripotent stem cell lines derived


from human somatic cells. Science. 2007;318:1917–20. Article  CAS  Google Scholar  * Georgomanoli M, Papapetrou EP. Modeling blood diseases with human induced pluripotent stem cells. Dis


Model Mech. 2019;12:https://doi.org/10.1242/dmm.039321. * Logan S, Arzua T, Canfield SG, Seminary ER, Sison SL, Ebert AD, et al. Studying Human Neurological Disorders Using Induced


Pluripotent Stem Cells: from 2D Monolayer to 3D Organoid and Blood Brain Barrier Models. Compr Physiol. 2019;9:565–611. Article  Google Scholar  * Chen FK, McLenachan S, Edel M, Da Cruz L,


Coffey PJ, Mackey DA. iPS Cells for Modelling and Treatment of Retinal Diseases. J Clin Med. 2014;3:1511–41. Article  Google Scholar  * Esteve J, Blouin JM, Lalanne M, Azzi-Martin L, Dubus


P, Bidet A, et al. Generation of induced pluripotent stem cells-derived hepatocyte-like cells for ex vivo gene therapy of primary hyperoxaluria type 1. Stem Cell Res. 2019;38:101467. Article


  CAS  Google Scholar  * Ramaswamy S, Tonnu N, Menon T, Lewis BM, Green KT, Wampler D, et al. Autologous and Heterologous Cell Therapy for Hemophilia B toward Functional Restoration of


Factor IX. Cell Rep. 2018;23:1565–80. Article  CAS  Google Scholar  * Zhang S, Chen S, Li W, Guo X, Zhao P, Xu J, et al. Rescue of ATP7B function in hepatocyte-like cells from Wilson’s


disease induced pluripotent stem cells using gene therapy or the chaperone drug curcumin. Hum Mol Genet. 2011;20:3176–87. Article  CAS  Google Scholar  * Gupta R, Schrooders Y, Hauser D, van


Herwijnen M, Albrecht W, Ter Braak B, et al. Comparing in vitro human liver models to in vivo human liver using RNA-Seq. Arch Toxicol. 2021;95:573–89. Article  CAS  Google Scholar  *


Cipriano M, Freyer N, Knospel F, Oliveira NG, Barcia R, Cruz PE, et al. Self-assembled 3D spheroids and hollow-fibre bioreactors improve MSC-derived hepatocyte-like cell maturation in vitro.


Arch Toxicol. 2017;91:1815–32. Article  CAS  Google Scholar  * Lucendo-Villarin B, Rashidi H, Alhaque S, Fischer L, Meseguer-Ripolles J, Wang Y, et al. Serum Free Production of


Three-dimensional Human Hepatospheres from Pluripotent Stem Cells. J Vis Exp. 2019. https://doi.org/10.3791/59965. * Alhaque S, Themis M, Rashidi H. Three-dimensional cell culture: from


evolution to revolution. Philos Trans R Soc Lond B Biol Sci. 2018;373:https://doi.org/10.1098/rstb.2017.0216. * Schmidt M, Carbonaro DA, Speckmann C, Wissler M, Bohnsack J, Elder M, et al.


Clonality analysis after retroviral-mediated gene transfer to CD34+ cells from the cord blood of ADA-deficient SCID neonates. Nat Med. 2003;9:463–8. Article  CAS  Google Scholar  * Schmidt


M, Zickler P, Hoffmann G, Haas S, Wissler M, Muessig A, et al. Polyclonal long-term repopulating stem cell clones in a primate model. Blood. 2002;100:2737–43. Article  CAS  Google Scholar  *


Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, et al. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids


Res. 2010;38:D105–10. Article  CAS  Google Scholar  * Romanchikova N, Ivanova V, Scheller C, Jankevics E, Jassoy C, Serfling E. NFAT transcription factors control HIV-1 expression through a


binding site downstream of TAR region. Immunobiology. 2003;208:361–5. Article  CAS  Google Scholar  * Duverger A, Wolschendorf F, Zhang M, Wagner F, Hatcher B, Jones J, et al. An AP-1


binding site in the enhancer/core element of the HIV-1 promoter controls the ability of HIV-1 to establish latent infection. J Virol. 2013;87:2264–77. Article  CAS  Google Scholar  * Stroud


JC, Oltman A, Han A, Bates DL, Chen L. Structural basis of HIV-1 activation by NF-kappaB-a higher-order complex of p50:RelA bound to the HIV-1 LTR. J Mol Biol. 2009;393:98–112. Article  CAS


  Google Scholar  * Harrich D, Garcia J, Wu F, Mitsuyasu R, Gonazalez J, Gaynor R. Role of SP1-binding domains in in vivo transcriptional regulation of the human immunodeficiency virus type


1 long terminal repeat. J Virol. 1989;63:2585–91. Article  CAS  Google Scholar  * Themis M, Waddington SN, Schmidt M, von Kalle C, Wang Y, Al-Allaf F, et al. Oncogenesis following delivery


of a nonprimate lentiviral gene therapy vector to fetal and neonatal mice. Mol Ther. 2005;12:763–71. Article  CAS  Google Scholar  * David RM, Doherty AT. Viral vectors: the road to reducing


genotoxicity. Toxicol Sci. 2017;155:315–25. Article  CAS  Google Scholar  * Wu C, Dunbar CE. Stem cell gene therapy: the risks of insertional mutagenesis and approaches to minimize


genotoxicity. Front Med. 2011;5:356–71. Article  Google Scholar  * Mitchell RS, Beitzel BF, Schroder AR, Shinn P, Chen H, Berry CC, et al. Retroviral DNA integration: ASLV, HIV, and MLV show


distinct target site preferences. PLoS Biol. 2004;2:E234. Article  Google Scholar  * Chen H, Li H, Liu F, Zheng X, Wang S, Bo X, et al. An integrative analysis of TFBS-clustered regions


reveals new transcriptional regulation models on the accessible chromatin landscape. Sci Rep. 2015;5:8465. Article  CAS  Google Scholar  * Whitfield TW, Wang J, Collins PJ, Partridge EC,


Aldred SF, Trinklein ND, et al. Functional analysis of transcription factor binding sites in human promoters. Genome Biol. 2012;13:R50–2012. Article  Google Scholar  * Knyazhanskaya E,


Anisenko A, Shadrina O, Kalinina A, Zatsepin T, Zalevsky A, et al. NHEJ pathway is involved in post-integrational DNA repair due to Ku70 binding to HIV-1 integrase. Retrovirology.


2019;16:30–019. Article  Google Scholar  * Mulder LC, Chakrabarti LA, Muesing MA. Interaction of HIV-1 integrase with DNA repair protein hRad18. J Biol Chem. 2002;277:27489–93. Article  CAS


  Google Scholar  * Modlich U, Navarro S, Zychlinski D, Maetzig T, Knoess S, Brugman MH, et al. Insertional transformation of hematopoietic cells by self-inactivating lentiviral and


gammaretroviral vectors. Mol Ther. 2009;17:1919–28. Article  CAS  Google Scholar  * Studamire B, Goff SP. Host proteins interacting with the Moloney murine leukemia virus integrase: multiple


transcriptional regulators and chromatin binding factors. Retrovirology. 2008;5:1–23. Article  Google Scholar  * Miyoshi H, Blömer U, Takahashi M, Gage FH, Verma IM. Development of a


Self-Inactivating Lentivirus Vector. J Virol. 1998;72:8150–7. Article  CAS  Google Scholar  * Zufferey R, Dull T, Mandel RJ, Bukovsky A, Quiroz D, Naldini L, et al. Self-inactivating


lentivirus vector for safe and efficient in vivo gene delivery. J Virol. 1998;72:9873–80. Article  CAS  Google Scholar  * Hargrove PW, Kepes S, Hanawa H, Obenauer JC, Pei D, Cheng C, et al.


Globin lentiviral vector insertions can perturb the expression of endogenous genes in beta-thalassemic hematopoietic cells. Mol Ther. 2008;16:525–33. Article  CAS  Google Scholar  * Ronen K,


Negre O, Roth S, Colomb C, Malani N, Denaro M, et al. Distribution of lentiviral vector integration sites in mice following therapeutic gene transfer to treat beta-thalassemia. Mol Ther.


2011;19:1273–86. Article  CAS  Google Scholar  * Hacein-Bey-Abina S, Garrigue A, Wang GP, Soulier J, Lim A, Morillon E, et al. Insertional oncogenesis in 4 patients after retrovirus-mediated


gene therapy of SCID-X1. J Clin Investig. 2008;118:3132–42. Article  CAS  Google Scholar  * Howe SJ, Mansour MR, Schwarzwaelder K, Bartholomae C, Hubank M, Kempski H, et al. Insertional


mutagenesis combined with acquired somatic mutations causes leukemogenesis following gene therapy of SCID-X1 patients. J Clin Investig. 2008;118:3143–50. Article  CAS  Google Scholar  * Baum


C, Kustikova O, Modlich U, Li Z, Fehse B. Mutagenesis and oncogenesis by chromosomal insertion of gene transfer vectors. Hum Gene Ther. 2006;17:253–63. Article  CAS  Google Scholar  *


Waddington SN, Nivsarkar MS, Mistry AR, Buckley SM, Kemball-Cook G, Mosley KL, et al. Permanent phenotypic correction of hemophilia B in immunocompetent mice by prenatal gene therapy. Blood.


2004 ;104:2714–21. Article  CAS  Google Scholar  * Khonsari H, Schneider M, Al-Mahdawi S, Chianea YG, Themis M, Parris C, et al. Lentivirus-meditated frataxin gene delivery reverses genome


instability in Friedreich ataxia patient and mouse model fibroblasts. Gene Ther. 2016;23:846–56. Article  CAS  Google Scholar  * Knight S, Bokhoven M, Collins M, Takeuchi Y. Effect of the


internal promoter on insertional gene activation by lentiviral vectors with an intact HIV long terminal repeat. J Virol. 2010;84:4856–9. Article  CAS  Google Scholar  * Bokhoven M, Stephen


SL, Knight S, Gevers EF, Robinson IC, Takeuchi Y, et al. Insertional gene activation by lentiviral and gammaretroviral vectors. J Virol. 2009;83:283–94. Article  CAS  Google Scholar  * Gay


V, Moreau K, Hong SS, Ronfort C. Quantification of HIV-based lentiviral vectors: influence of several cell type parameters on vector infectivity. Arch Virol. 2012;157:217–23. Article  CAS 


Google Scholar  * Chou BK, Gu H, Gao Y, Dowey SN, Wang Y, Shi J, et al. A facile method to establish human induced pluripotent stem cells from adult blood cells under feeder-free and


xeno-free culture conditions: a clinically compliant approach. Stem Cells Transl Med. 2015;4:320–32. Article  CAS  Google Scholar  * Cameron K, Tan R, Schmidt-Heck W, Campos G, Lyall MJ,


Wang Y, et al. Recombinant Laminins Drive the Differentiation and Self-Organization of hESC-Derived Hepatocytes. Stem Cell Rep. 2015;5:1250–62. Article  CAS  Google Scholar  * Kane NM,


Nowrouzi A, Mukherjee S, Blundell MP, Greig JA, Lee WK, et al. Lentivirus-mediated reprogramming of somatic cells in the absence of transgenic transcription factors. Mol Ther.


2010;18:2139–45. Article  CAS  Google Scholar  * Schmidt M, Hoffmann G, Wissler M, Lemke N, Mussig A, Glimm H, et al. Detection and direct genomic sequencing of multiple rare unknown


flanking DNA in highly complex samples. Hum Gene Ther. 2001;12:743–9. Article  CAS  Google Scholar  * Pearson WR, Wood TC. Statistical Significance in Biological Sequence Comparison.


Handbook of Statistical Genetics. 1st ed. Wiley; University of Virginia, Charlottesville, Virginia; 2004. * Kwon AT, Arenillas DJ, Worsley Hunt R, Wasserman WW. oPOSSUM-3: advanced analysis


of regulatory motif over-representation across genes or ChIP-Seq datasets. G3 (Bethesda). 2012;2:987–1002. Article  CAS  Google Scholar  * Wong KC, Li Y, Peng C, Wong HS. A Comparison Study


for DNA Motif Modeling on Protein Binding Microarray. IEEE/ACM Trans Comput Biol Bioinform. 2016;13:261–71. Article  CAS  Google Scholar  * Ho Sui SJ, Fulton DL, Arenillas DJ, Kwon AT,


Wasserman WW. oPOSSUM: integrated tools for analysis of regulatory motif over-representation. Nucleic Acids Res. 2007;35:W245–52. Web Server issue Article  Google Scholar  * Ho Sui SJ,


Mortimer JR, Arenillas DJ, Brumm J, Walsh CJ, Kennedy BP, et al. oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res.


2005;33:3154–64. Article  Google Scholar  Download references FUNDING This work was funded by an NC3Rs CRACK IT Challenge 21: InMutagene award, sponsored by GSK and Novartis. AUTHOR


INFORMATION Author notes * These authors contributed equally: Saqlain Suleman, Annette Payne. * Deceased: Manfred Schmidt. AUTHORS AND AFFILIATIONS * Department of Life Sciences, College of


Health, Medicine & Life Sciences, Brunel University London, Uxbridge, UK Saqlain Suleman, Johnathan Bowden, Sharmin Al Haque, Serena Fawaz, Mohammad S. Khalifa & Michael Themis *


Testavec Ltd, Queensgate House, Maidenhead, UK Saqlain Suleman, Annette Payne & Susan Jobling * Department of Computer Science, College of Engineering Design and Physical Sciences,


Brunel University London, Uxbridge, UK Annette Payne * Genewerk GmbH, Heidelberg, Germany Marco Zahn, Matteo Franco, Raffaele Fronza, Wei Wang, Olga Strobel-Freidekind, Annette Deichmann, 


Irene Gil-Farina & Manfred Schmidt * University Heidelberg, Medical Faculty, Heidelberg, Germany Marco Zahn * Institute of Environment, Health and Societies, College of Business, Arts


and Social Sciences, Brunel University London, Uxbridge, UK Susan Jobling * Centre for Regenerative Medicine, The University of Edinburgh, Edinburgh, UK David Hay * Division of Infection and


Immunity, University College London, London, UK Yasuhiro Takeuchi * Division of Advanced Therapies, National Institute for Biological Standards and Control, Potters Bar, UK Yasuhiro


Takeuchi * Gene Transfer Technology, EGA Institute for Women’s Health, University College London, London, UK Simon N. Waddington * MRC Antiviral Gene Therapy Research Unit, Faculty of Health


Sciences, University of the Witswatersrand, Johannesburg, South Africa Simon N. Waddington * Department of Translational Oncology, NCT and DKFZ, Heidelberg, Germany Manfred Schmidt *


Division of Ecology and Evolution, Department of Life Sciences, Imperial College London, London, UK Michael Themis Authors * Saqlain Suleman View author publications You can also search for


this author inPubMed Google Scholar * Annette Payne View author publications You can also search for this author inPubMed Google Scholar * Johnathan Bowden View author publications You can


also search for this author inPubMed Google Scholar * Sharmin Al Haque View author publications You can also search for this author inPubMed Google Scholar * Marco Zahn View author


publications You can also search for this author inPubMed Google Scholar * Serena Fawaz View author publications You can also search for this author inPubMed Google Scholar * Mohammad S.


Khalifa View author publications You can also search for this author inPubMed Google Scholar * Susan Jobling View author publications You can also search for this author inPubMed Google


Scholar * David Hay View author publications You can also search for this author inPubMed Google Scholar * Matteo Franco View author publications You can also search for this author inPubMed


 Google Scholar * Raffaele Fronza View author publications You can also search for this author inPubMed Google Scholar * Wei Wang View author publications You can also search for this author


inPubMed Google Scholar * Olga Strobel-Freidekind View author publications You can also search for this author inPubMed Google Scholar * Annette Deichmann View author publications You can


also search for this author inPubMed Google Scholar * Yasuhiro Takeuchi View author publications You can also search for this author inPubMed Google Scholar * Simon N. Waddington View author


publications You can also search for this author inPubMed Google Scholar * Irene Gil-Farina View author publications You can also search for this author inPubMed Google Scholar * Manfred


Schmidt View author publications You can also search for this author inPubMed Google Scholar * Michael Themis View author publications You can also search for this author inPubMed Google


Scholar CONTRIBUTIONS SS: Experimental procedures, data generation, data interpretation, data analysis, paper preparation and editing. AP: Data generation, data interpretation, data


analysis, paper editing. JB: Data generation, data analysis and paper preparation. SH, MZ, SF, MSK, MF, RF: Data generation and analysis. WW, OSF, AD, YT, SNW, IGF, SJ: Supervision. MK, MT:


Supervision, conceptualisation, methodology, paper review and editing. CORRESPONDING AUTHOR Correspondence to Michael Themis. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no


competing interests. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. RIGHTS AND


PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any


medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The


images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not


included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly


from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Suleman, S.,


Payne, A., Bowden, J. _et al._ HIV- 1 lentivirus tethering to the genome is associated with transcription factor binding sites found in genes that favour virus survival. _Gene Ther_ 29,


720–729 (2022). https://doi.org/10.1038/s41434-022-00335-4 Download citation * Received: 07 April 2021 * Revised: 01 April 2022 * Accepted: 06 April 2022 * Published: 05 May 2022 * Issue


Date: December 2022 * DOI: https://doi.org/10.1038/s41434-022-00335-4 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry,


a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative