Transposable elements are DNA sequences that can move or copy themselves from one location to another within a genome. Depending on whether the transposition process involves an RNA intermediate, transposable elements are classified into DNA transposons and retrotransposons. Retrotransposons are RNA-mediated and generate new copies at new genomic locations via a copy-and-paste mechanism, which include long terminal repeat retrotransposons (LTR-RTs) and non-LTR retrotransposons. LTR-RTs are characterized by the presence of directly repeated LTR arms flanking the internal coding region. A full-length LTR-RT typically consists of terminal repeat sequences, a capsid protein encoded by the gag gene, and a polyprotein associated with reverse transcription and integration. In metazoans, LTR-RTs are composed of an envelope protein and a polyprotein that includes aspartase, integrase, reverse transcriptase, and Ribonuclease H. Among them, reverse transcriptase and Ribonuclease H are involved in the replication and transposition of LTR-RTs, while integrase facilitates the integration of the DNA form of retrotransposons into the host genome. Based on the arrangement of reverse transcriptase, integrase, and Ribonuclease H sequences, LTR-RTs are classified into two types: Ty1/copia and Ty3/Gypsy. The LTR arms themselves do not encode any proteins. However, the timing of LTR-RT insertion into the host genome can be inferred from sequence variations between a pair of LTR arms. In fish, LTR-RTs exhibit the highest transcriptional activity immediately after zygotic genome activation. The insertion of LTR-RTs) into host genomes plays a critical role in host gene evolution.
Objective This study systematically investigated the evolutionary dynamics of LTR-RTs within fish genomes.
Methods We selected 30 representative fish species across various orders, along with 11 non-fish vertebrate species (covering amphibians, reptiles, birds, and mammals) as comparative references. Full-length LTR-RTs within these genomes were predicted to analyze their evolutionary patterns.
Results The analysis outputs indicated a significant variation in the abundance of LTR-RTs among different species when the Salmoniformes and Esociformes genome exhibited to carry the highest copy numbers. Comparison of the sequence divergence between the two LTR arms was used to estimate insertion ages, which revealed a large-scale expansion of the LTR-RTs in these species initiated within the last 2.5 million years. The expansion rate was negatively correlated with the onset time of expansion, suggesting that these elements may have enhanced host adaptability during periods of environmental upheaval, such as the Quaternary glaciation. Family analysis displayed that Ty3/Gypsy type LTR-RTs predominated in most species. Among the fish species, the V-clade of Ty3/Gypsy emerged as the most active and "young" subfamily.
Conclusion Phylogenetic analysis demonstrated that while different fish lineages (Salmon-Esocid Clade, Cyprinid-Clupeid-Sturgeon Clade, and Percid-Gadid Clade) shared several ancient subfamilies, lineage-specific subfamilies also existed (e.g., Osvaldo in the Salmon-Esocid Clade). The recent-inserted LTR-RTs exhibited higher transpositional activity in the host genome. Analysis of the reverse transcriptase among typical species identified five core conserved function motifs, of which, the DD domain (containing conserved Asp-Asp residues) of Motif 4 was essential for maintaining replicative activity. Sequence variation within this domain is closely associated with the replication and evolution of different subfamilies. Significance This study provides the first systematic depiction of the evolutionary trajectory of LTR-RTs in fish. These findings offer crucial evidence for understanding how fish genomes achieve rapid adaptive evolution through the integration and regulation of these exogenous DNA elements.