论文部分内容阅读
采用新一代高通量测序技术Illumina Hi Seq 2000对鸟巢蕨转录组(Asplenium nidus)进行测序,共获得29 254 595个序列读取片段(reads),包含了5 908 586 517个碱基序列(bp)信息。对reads进行序列组装,共获得42 907个单基因簇(Unigene),平均长度936 bp,序列信息达到了40.16 Mb。数据库中的序列同源性比较表明,24 993个Unigene与其他物种的已知基因具有不同程度的同源性。鸟巢蕨转录组中的Unigene根据GO功能大致可分为细胞组分、分子功能和生物学过程3大类51个分支,其中有大量的Unigene与代谢进程、结合活性、催化活性和细胞进程相关。将Unigene与COG数据库进行比对,根据其功能大致可分为24类。KEGG数据库作为参考,依据代谢途径可将Unigene定位到116个代谢途径分支。SSR位点查找发现,从42 907个Unigene中共找到6 067个SSR位点。SSR不同重复基序类型中,出现频率最高的为AG/CT,其次是AC/GT、A/T和AGG/CCT。针对这些序列,设计了20对引物进行了扩增效率和多态性检测,其中7对引物在不同蕨类材料中表现出多态性。
Asplenium nidus was sequenced using Illumina Hi Seq 2000, a new generation of high-throughput sequencing technology. A total of 29 254 595 reads were obtained, including 5 908 586 517 bp )information. A total of 42 907 unigene clusters were obtained, with an average length of 936 bp and sequence information of 40.16 Mb. A comparison of the sequence homology in the database indicates that 24,993 Unigene have different degrees of homology with known genes of other species. According to the GO function, Unigene can be divided into 51 branches in three categories of cell components, molecular functions and biological processes. Among them, a large number of Unigene are involved in metabolic processes, binding activities, catalytic activities and cell processes. Unigene and COG database comparison, according to its function can be divided into 24 categories. KEGG database as a reference, Unigene can be mapped to 116 metabolic pathway branches by metabolic pathway. The SSR locus search revealed that 6 067 SSR loci were found out of 42 907 Unigene. Among the different types of SSR repeat motifs, the most frequent occurrences were AG / CT, followed by AC / GT, A / T and AGG / CCT. For these sequences, 20 pairs of primers were designed to detect the amplification efficiency and polymorphism, of which 7 pairs of primers showed polymorphism in different ferns.