Frontiers, Genomics Evolutionary History and Diagnostics of the Alternaria alternata Species Group Including Apple and Asian Pear Pathotypes, Microbiology
Frontiers in Microbiology
- 1 Frontiers in Microbiology
- 2 Evolutionary and Genomic Microbiology
- 3 Original Research ARTICLE
- 4 Genomics Evolutionary History and Diagnostics of the Alternaria alternata Species Group Including Apple and Asian Pear Pathotypes
- 5 Introduction
- 6 Materials and Methods
- 7 Results
Evolutionary and Genomic Microbiology
This article is part of the Research Topic
Bioinformatics Applications In Molecular Plant Pathology View all 13 Articles
Curtin University, Australia
University of Tübingen, Germany
Department of Jobs, Precincts and Regions, Agriculture Victoria, Australia
The editor and reviewers’ affiliations are the latest provided on their Loop research profiles and may not reflect their situation at the time of review.
Original Research ARTICLE
Genomics Evolutionary History and Diagnostics of the Alternaria alternata Species Group Including Apple and Asian Pear Pathotypes
- 1 NIAB EMR, East Malling, United Kingdom
- 2 Natural Resources Institute, University of Greenwich, Chatham Maritime, London, United Kingdom
- 3 School of Life Sciences, University of Bedfordshire, Luton, United Kingdom
- 4 Parma Research and Extension Center, University of Idaho, Parma, ID, United States
- 5 FERA Science Ltd., York, United Kingdom
- 6 Warwick Crop Centre, University of Warwick, Warwick, United Kingdom
The Alternaria section alternaria (Alternaria alternata species group) represents a diverse group of saprotroph, human allergens, and plant pathogens. Alternaria taxonomy has benefited from recent phylogenetic revision but the basis of differentiation between major phylogenetic clades within the group is not yet understood. Furthermore, genomic resources have been limited for the study of host-specific pathotypes. We report near complete genomes of the apple and Asian pear pathotypes as well as draft assemblies for a further 10 isolates representing Alternaria tenuissima and Alternaria arborescens lineages. These assemblies provide the first insights into differentiation of these taxa as well as allowing the description of effector and non-effector profiles of apple and pear conditionally dispensable chromosomes (CDCs). We define the phylogenetic relationship between the isolates sequenced in this study and a further 23 Alternaria spp. based on available genomes. We determine which of these genomes represent MAT1-1-1 or MAT1-2-1 idiomorphs and designate host-specific pathotypes. We show for the first time that the apple pathotype is polyphyletic, present in both the A. arborescens and A. tenuissima lineages. Furthermore, we profile a wider set of 89 isolates for both mating type idiomorphs and toxin gene markers. Mating-type distribution indicated that gene flow has occurred since the formation of A. tenuissima and A. arborescens lineages. We also developed primers designed to AMT14, a gene from the apple pathotype toxin gene cluster with homologs in all tested pathotypes. These primers allow identification and differentiation of apple, pear, and strawberry pathotypes, providing new tools for pathogen diagnostics.
Species within the genus Alternaria encompass a range of lifestyles, acting as saprotroph, opportunistic pathogens, and host-adapted plant pathogens (Thomma, 2003). Large spored species include Alternaria solani, a major pathogen of potato, whereas small spored taxa include the Alternaria alternata species group (Alternaria sect. alternaria), which are found ubiquitously in the environment acting as saprotrophs and opportunistic necrotrophs. This species group is responsible for opportunistic human infections and a range of host adapted plant diseases.
Taxonomy within this presumed asexual genus has been subject to recent revision (Lawrence et al., 2013). Large spored species can be clearly resolved by standard phylogenetic markers such as ITS and are supported by morphological characters. However, small spored species within the A. alternata species group overlap in morphological characters, possess the same ITS haplotype (Kusaba and Tsuge, 1995), and show low variation in other commonly used barcoding markers (Lawrence et al., 2013; Armitage et al., 2015; Woudenberg et al., 2015). Highly variable phylogenetic markers have provided resolution between groups of isolates that possess morphological patterns typical of descriptions for Alternaria gaisen, Alternaria tenuissima, and Alternaria arborescens (Armitage et al., 2015).
The taxonomy of the species group is complicated by designation of isolates as pathotypes, each able to produce polyketide host-selective toxins (HST) adapted to apple, Asian pear, tangerine, citrus, rough lemon, or tomato (Tsuge et al., 2013). Genes involved in the production of these HSTs are located on conditionally dispensable chromosomes (CDCs) (Hatta et al., 2002). CDCs have been estimated to be 1.05 Mb in the strawberry pathotype (Hatta et al., 2002), 1.1–1.7 Mb in the apple pathotype (Johnson et al., 2001), 1.1–1.9 Mb in the tangerine pathotype (Masunaka et al., 2000, 2005), and 4.1 Mb in the pear pathotype (Tanaka et al., 1999; Tanaka and Tsuge, 2000). These CDCs are understood to have been acquired through horizontal gene transfer and as such, the evolutionary history of CDCs may be distinct from the core genome.
The polyketide synthase genes responsible for the production of the six HSTs are present in clusters. Some genes within these clusters are conserved between pathotypes (Hatta et al., 2002; Miyamoto et al., 2009), while genes are also present within these clusters that are unique to particular pathotypes (Ajiro et al., 2010; Miyamoto et al., 2010). This is reflected in structural similarities between the pear (AKT) and strawberry (AFT) and tangerine (ACTT) toxins with each containing a 9,10-epoxy-8-hydroxy-9-methyl-decatrienoic acid moiety. In contrast, the toxin produced by the apple pathotype (AMT) does not contain this moiety and is primarily cyclic in structure (Tsuge et al., 2013).
Studies making use of bacterial artificial chromosomes (BAC) have led to the sequencing of toxin gene cluster regions from three apple pathotype isolates (GenBank accessions: AB525198, AB525199, AB525200; unpublished). These sequences are 100–130 kb in size and contain 17 genes that are considered to be involved in synthesis of the AMT apple toxin (Harimoto et al., 2007). AMT1, AMT2, AMT3, and AMT4 have been demonstrated to be involved in AMT synthesis, as gene disruption experiments have led to loss of toxin production and pathogenicity (Johnson et al., 2000; Harimoto et al., 2007, 2008). However, experimental evidence has not been provided to show that the remaining 13 AMT genes have a role in toxin production. Four genes present in the CDC for the pear pathotype have been identified and have been named AKT1, AKT2, AKT3, AKTR-1 (Tanaka et al., 1999; Tanaka and Tsuge, 2000; Tsuge et al., 2013) and a further two genes (AKT4, AKTS1) have been reported (Tsuge et al., 2013).
The toxicity of an HST is not restricted to the designated host for that pathotype. All or some of the derivatives of a toxin may induce necrosis on “non-target” host leaves. For example, AMT from the apple pathotype can induce necrosis on the leaves of Asian pear (Kohmoto et al., 1976). Therefore, non-host resistance may be triggered by recognition of non-HST avirulence genes.
Alternaria spp. are of phytosanitary importance, with apple and pear pathotypes subject to quarantine regulations in Europe under Annex IIAI of Directive 2000/29/EC as A. alternata (non-European pathogenic isolates). As such, rapid and accurate diagnostics are required for identification. Where genes on essential chromosomes can be identified that phylogenetically resolve taxa, then these can be used for identification of quarantine pathogens (Bonants et al., 2010; Quaedvlieg et al., 2012). However, regulation and management strategies also need to consider the potential for genetic exchange between species. The Alternaria sect. alternaria are presumed asexual but evidence has been presented for either the presence of sexuality or a recent sexual past. Sexuality or parasexuality provides a mechanism for reshuffling the core genome associated with CDCs of a pathotype. It is currently unknown whether pathotype identification can be based on sequencing of phylogenetic loci, or whether the use of CDC-specific primers is more appropriate. This is of particular importance for the apple and Asian pear pathotypes due to the phytosanitary risk posed by their potential establishment and spread in Europe.
Materials and Methods
Twelve genomes were sequenced, selected from a collection of isolates whose phylogenetic identity was determined in previous work (Armitage et al., 2015). These 12 isolates were three A. arborescens clade isolates (FERA 675, RGR 97.0013, and RGR 97.0016), four A. tenuissima clade isolates (FERA 648, FERA 1082, FERA 1164, and FERA 24350), three A. tenuissima clade apple pathotype isolates (FERA 635, FERA 743, FERA 1166, and FERA 1177), and one A. gaisen clade pear pathotype isolate (FERA 650).
DNA and RNA Extraction and Sequencing
Apple pathotype isolate FERA 1166 and Asian pear pathotype isolate FERA 650 were sequenced using both Illumina and nanopore MinION sequencing technologies and the remaining 10 isolates were sequenced using Illumina sequencing technology. For both illumina and MinION sequencing, DNA extraction was performed on freeze dried mycelium grown in PDB for 14 days.
High molecular weight DNA was extracted for MinION sequencing using the protocol of Schwessinger and McDonald (2017), scaled down to a starting volume of 2 ml. This was followed by phenol-chloroform purification and size selection to a minimum of 30 kb using a Blue Pippin. The resulting product was concentrated using ampure beads before library preparation was performed using a Rapid Barcoding Sequencing Kit (SQK-RBK001) modified through exclusion of LLB beads. Sequencing was performed on an Oxford Nanopore GridION generating 40 and 34 times coverage of sequence data for isolates FERA 1166 and FERA 650, respectively.
gDNA for illumina sequencing of isolate FERA 1166 was extracted using a modified CTAB protocol (Li et al., 1994). gDNA for illumina sequencing of the eleven other isolates was extracted using a Genelute Plant DNA Miniprep Kit (Sigma) using the manufacturer’s protocol with the following modifications: the volume of lysis solutions (PartA and PartB) were doubled; an RNase digestion step was performed as suggested in the manufacturer’s protocol; twice the volume of precipitation solution was added; elution was performed using elution buffer EB (Qiagen). A 200 bp genomic library was prepared for isolate FERA 1166 using a TrueSeq protocol (TrueSeq Kit, Illumina) and sequenced using 76 bp paired-end reads on an Illumina GA2 Genome Analyzer. Genomic libraries were prepared for the other eleven isolates using a Nextera Sample Preparation Kit (Illumina) and libraries sequenced using a MiSeq Benchtop Analyzer (Illumina) using 250 bp, paired-end reads.
RNAseq was performed to aid training of gene models. mRNA was extracted from isolates FERA 1166 and FERA 650 grown in full strength PDB, 1% PDB, Potato Carrot Broth (PCB), and V8 juice broth (V8B). The protocol for making PCB and V8B was as described in Simmons (2007) for making Potato Carrot Agar and V8 juice agar, with the exception that agar was not added to the recipe. Cultures were grown in conical flasks containing 250 ml of each liquid medium for 14 days. mRNA extraction was performed on freeze dried mycelium using the RNeasy Plant RNA extraction Kit (Qiagen). Concentration and quality of mRNA samples were assessed using a Bioanalyzer (Agilent Technologies). mRNA from the sample grown in 1% PDB for isolate FERA 650 showed evidence of degradation and was not used further. Samples were pooled from growth mediums for each isolate and 200 bp cDNA libraries prepared using a TrueSeq Kit (Illumina). These libraries were sequenced in multiplex on a MiSeq (Illumina) using 200 bp paired-end reads.
Genome Assembly and Annotation
De novo genome assembly was performed for all 12 isolates. Assembly for isolate FERA 650 was generated using SMARTdenovo 1 (February 26, 2017 github commit), whereas assembly for isolate FERA 1166 was generated by merging a SMARTdenovo assembly with a MinION-Illumina hybrid SPAdes v.3.9.0 assembly using quickmerge v.0.2 (Antipov et al., 2016; Chakraborty et al., 2016). Prior to assembly, adapters were removed from MinION reads using Porechop v.0.1.0 and reads were further trimmed and corrected using Canu v.1.6 (Ruan, 2016; Koren et al., 2017). Following initial assembly, contigs were corrected using MinION reads through ten rounds of Racon (May 29, 2017 github commit) correction (Vaser et al., 2017)and one round of correction using MinION signal information with nanopolish (v0.9.0) (Loman et al., 2015). Final correction was performed through ten rounds of Pilon v.1.17 (Walker et al., 2014) using Illumina sequence data. Assemblies for the ten isolates with Illumina-only data were generated using SPAdes v.3.9.0 (Bankevich et al., 2012). Assembly quality statistics were summarized using Quast v.4.5 (Gurevich et al., 2013). Single copy core Ascomycete genes were identified within the assembly using BUSCO v.3 and used to assess assembly completeness (Simão et al., 2015). RepeatModeler, RepeatMasker, and TransposonPSI were used to identify repetitive and low complexity regions 2 , 3 . Visualization of whole genome alignments between FERA 1166 and FERA 650 was performed using circos v.0.6 (Krzywinski et al., 2009), following whole genome alignment using the nucmer tool as part of the MUMmer package v.4.0 (Marçais et al., 2018).
Gene prediction was performed on softmasked genomes using Braker1 v.2 (Hoff et al., 2016), a pipeline for automated training and gene prediction of AUGUSTUS v.3.1 (Stanke and Morgenstern, 2005). Additional gene models were called in intergenic regions using CodingQuarry v.2 (Testa et al., 2015). Braker1 was run using the “fungal” flag and CodingQuarry was run using the “pathogen” flag. RNAseq data generated from FERA 1166 and FERA 650 were aligned to each genome using STAR v.2.5.3a (Dobin et al., 2013), and used in the training of Braker1 and CodingQuarry gene models. Orthology was identified between the 12 predicted proteomes using OrthoMCL v.2.0.9 (Li et al., 2003) with an inflation value of 5.
Draft functional genome annotations were determined for gene models using InterProScan-5.18-57.0 (Jones et al., 2014) and through identifying homology (BLASTP, e-value >1 × 10 –100 ) between predicted proteins and those contained in the March 2018 release of the SwissProt database (Bairoch and Apweiler, 2000). Putative secreted proteins were identified through prediction of signal peptides using SignalP v.4.1 and removing those predicted to contain transmembrane domains using TMHMM v.2.0 (Käll et al., 2004; Krogh et al., 2001). Additional programs were used to provide evidence of effectors and pathogenicity factors. EffectorP v.1.0 was used to screen secreted proteins for characteristics of length, net charge and amino acid content typical of fungal effectors (Sperschneider et al., 2016). Secreted proteins were also screened for carbohydrate active enzymes using HMMER3 (Mistry et al., 2013) and HMM models from the dbCAN database (Huang et al., 2018). DNA binding domains associated with transcription factors (Shelest, 2017) were identified along with two additional fungal-specific transcription factors domains (IPR007219 and IPR021858). Annotated assemblies were submitted as Whole Genome Shotgun projects to DDBJ/ENA/GenBank (Table 1). This included passing assemblies through the NCBI contamination screen, which did not identify presence of contaminant organisms.
Table 1. NCBI biosample and genome accession numbers of data generated in this study.
BUSCO hits of single copy core ascomycete genes to assemblies were extracted and retained if a single hit was found in all of the 12 sequenced genomes and 23 publicly available Alternaria spp. genomes from the Alternaria genomes database (Dang et al., 2015). Nucleotide sequences from the resulting hits of 500 loci were aligned using MAFFT v6.864b (Katoh and Standley, 2013), before alignments were trimmed using trimAl v.1.4.1 (Capella-Gutiérrez et al., 2009), and trees calculated for each locus using RAxML v.8.1.17 (Liu et al., 2011). The most parsimonious tree from each RAxML run was used to determine a single consensus phylogeny of the 500 loci using ASTRAL v.5.6.1 (Zhang et al., 2018). The resulting tree was visualized using the R package GGtree v.1.12.4 (Yu et al., 2016).
Contigs unique to apple and pear pathotypes were identified through read alignment to assembled genomes. Short read alignment was performed using Bowtie2 (Langmead and Salzberg, 2012), returning a single best alignment for each paired read, whereas long read alignments were performed using Minimap2 v.2.8-r711-dirty (Li, 2018). Read coverage was quantified from these alignments using Samtools (Li et al., 2009).
Toxin-Synthesis Genes in Alternaria Genomes
Sequence data for 40 genes located in A. alternata HST gene clusters were downloaded from GenBank. BLASTn searches were performed for all 40 gene sequences against one another to identify homology between these sequences. Genes were considered homologous where they had >70% identical sequences over the entire query length, and an e-value of 1 × 10 –30 . tBLASTx was used to search for the presence of these genes in assemblies.
Signatures of Genetic Exchange
Mating type idiomorphs present in publicly available genomes were identified using BLASTn searches. A wider assessment within 89 characterized Alternaria isolates (Armitage et al., 2015) was undertaken using specific PCR primers (Arie et al., 2000). PCR primers (AAM1-3: 5′-TCCCAAACTCGCAGTG GCAAG-3′; AAM1-3: 5′-GATTACTCTTCTCCGCAGTG-3; M2F: 5′-AAGGCTCCTCGACCGATGAA-3; M2R: 5′-CTGG GAGTATACTTGTAGTC-3) were run in multiplex with PCR reaction mixtures consisting of 10 μl redtaq (REDTaq ReadyMix PCR Reaction Mix, Sigma-Aldrich), 2 μl DNA, 1 μl of each primer (20 μM), and 4 μl purified water (Sigma-Aldrich). PCR reaction conditions comprised of an initial 60 s denaturing step at 94°C followed by 30 cycles of a melting step of 94°C for 30 s, an annealing step at 57°C for 30 s, and an extension step at 72°C for 60 s, these cycles were followed by a final extension step at 72°C for 420 s. MAT1-1-1 or MAT1-2-1 idiomorphs were determined through presence of a 271 or 576 bp product following gel electrophoresis, respectively.
PCR Screens for Apple and Pear Toxin-Synthesis Genes
A set of 90 previously characterized isolates was used to further investigate the distribution of pathotypes throughout the A. alternata species group. PCR primers were designed for the amplification of three genes (AMT4, AMT14, AKT3) located within CDC gene clusters involved in toxin synthesis. Primers for AMT4 were designed to amplify apple pathotype isolates, AKT3 to amplify pear pathotype isolates and AMT14 to identify both apple and pear pathotype isolates. These primers were then used to screen isolates for the presence of these genes in 30 cycles of PCR using 0.25 μl Dream taq, 1 μl of 10x PCR buffer, 1 μl of dNTPs, 1 μl of gDNA, 1 μl of each primer (5 μM), and 4.75 μl purified water (Sigma-Aldrich). PCR products were visualized using gel electrophoresis and amplicon identity confirmed through Sanger sequencing. Primers AMT4-EMR-F (5′-CTCGACGACGGTTTGGAGAA-3) and AMT4-EMR-R (5′-TTCCTTCGCATCAATGCCCT-3) were used for amplification of AMT4. Primers AKT3-EMR-F (5′-GCAATGGACGCAGACGATTC-3) and AKT3-EMR-R (5′- CTTGGAAGCCAGGCCAACTA-3) were used for amplification of AKT3. Primers AMT14-EMR-F (5′-TTTCTGCAACGGCG KCGCTT-3) and AMT14-EMR-R (5′-TGAGGAGTYAGACCR GRCGC-3) were used for amplification of AMT14. PCR reaction conditions were the same as described above for mating type loci, but with annealing performed at 66°C for all primer pairs.
Pathogenicity assays were performed on apple cv. Spartan and cv. Bramley’s seedling to determine differences in isolate virulence between A. tenuissima isolates possessing the apple pathotype CDC (FERA 635, FERA 743 or FERA 1166) and non-pathotype isolates lacking the CDC (FERA 648, FERA 1082 or FERA 1164). Briefly, leaves were inoculated with 10 μl of 1 × 10 5 spore suspensions at six points and the number of leaf spots counted at 14 days post inoculation. One isolate was infected per leaf, with 10 replicates per cultivar. Binomial regression using a generalized linear model (GLM) was used to analyse the number of resulting lesions per leaf.
Unfolded adult apple leaves, less than 10 cm in length were cut from young (less than 12 months old) apple cv. Spartan trees or cv. Bramley’s seedling trees. These were quality-checked to ensure that they were healthy and free from disease. Leaves were grouped by similar size and age and organized into ten experimental replicates of nine leaves. Leaves placed in clear plastic containers, with the abaxial leaf surface facing upwards. The base of these boxes was lined with two sheets of paper towel, and wetted with 50 ml of sterile distilled water (SDW). The cultivars were assessed in two independent experiments.
Spore suspensions were made by growing A. alternata isolates on 1% PDA plates for 4 weeks at 23°C before flooding the plate with 2 ml of SDW, scraping the plate with a disposable L-shaped spreader. Each leaf was inoculated with 10 ml of 1 × 10 5 spores ml –1 A. alternata spore suspension or 10 ml of sterile-distilled water at six points on the abaxial leaf surface. Of the nine leaves in each box, three leaves were inoculated with a spore suspensions from isolates carrying apple pathotype CDC, three leaves were inoculated with non-pathotype isolates lacking the CDC, and three leaves were inoculated with SDW. Following inoculation, each container was sealed and placed in plastic bags to prevent moisture loss. Boxes were then kept at 23°C with a 12 h light/12 h dark cycle.
Generation of Near-Complete Genomes for the Apple and Pear Pathotype Using MinION Sequencing
Assemblies using nanopore long-read sequence data for the apple pathotype isolate FERA 1166, and pear pathotype isolate FERA 650 were highly contiguous, with the former totaling 35.7 Mb in 22 contigs and the latter totaling 34.3 Mb in 27 contigs (Table 2). Whole genome alignments of these assemblies to the 10 chromosomes of A. solani showed an overall macrosynteny between genomes (Figure 1), but with structural rearrangement of apple pathotype chromosomes in comparison to A. solani chromosomes 1 and 10. The Asian pear pathotype had distinct structural rearrangements in comparison to A. solani, chromosomes 1 and 2 (Figure 1). Scaffolded contigs of FERA 1166 spanned the entire length of A. solani chromosomes 2, 3, 6, 8, and 9 and chromosomes 4, 5, 6, and 10 for FERA 650 (Figure 1). Interestingly, sites of major structural rearrangements within A. solani chromosome 1 were flanked by telomere-like TTAGGG sequences.
Table 2. Sequence data, assembly and gene prediction statistics for genomes from three A. alternata species group lineages, including apple and Asian pear pathotype isolates.
Figure 1. Genome alignment between the reference A. solani genome and long-read assemblies of A. alternata apple (A) and Asian pear (B) pathotypes. Links are shown between aligned regions. Locations of telomere repeat sequences are marked within assembled contigs. Contig order in reference to A. solani chromosomes is summarized, with those contigs displaying evidence of structural rearrangement marked with an asterisk.
Genome assembly of 10 Illumina sequenced isolates yielded assemblies of a similar total size to MinION assemblies (33.9–36.1 Mb) but fragmented into 167-912 contigs. Assembled genomes were repeat sparse, with 1.41–2.83% of genomes repeat masked (Table 2). Genome assemblies of A. arborescens isolates (33.8–33.9 Mb), were of similar total size to non-pathotype A. tenuissima isolates and had similar repetitive content (2.51–2.83 and 1.41–2.83%, respectively). Despite this, identification of transposon families in both genomes showed expansion of DDE (T5df = 5.36, P > 0.01) and gypsy (T5df = 6.35, P > 0.01) families in A. arborescens genomes (Figure 2).
Figure 2. Distinct DDE and gypsy family transposon families between genomes of A. arborescens (arb) and A. tenuissima (ten) clade isolates. Numbers of identified transposons are also shown for hAT, TY1 copia, mariner, cacta, LINE, MuDR/Mu transposases, helitrons, and the Ant1-like mariner elements.
Phylogeny of Sequenced Isolates
The relationship between the 12 sequenced isolates and 23 Alternaria spp. with publicly available genomes was investigated through phylogenetic analysis of 500 shared core ascomycete genes. A. pori and A. destruens genomes were excluded from the analysis due to low numbers of complete single copy ascomycete genes being found in their assemblies (Supplementary Table S1). The 12 sequenced isolates were distributed throughout A. gaisen, A. tenuissima, and A. arborescens clades (Figure 3). The resulting phylogeny (Figure 3), formed the basis for later assessment of CDC presence and mating type distribution among newly sequenced and publicly available genomes, as discussed below.
Figure 3. Phylogeny of sequenced and publicly available Alternaria spp. genomes. Maximum parsimony consensus phylogeny of 500 conserved single copy loci. Dotted lines show branches with support from 0.01, F2,8df = 51.19). BUSCO analysis identified that gene models included over 97% of the single copy conserved ascomycete genes, indicating well trained gene models. Apple pathotype isolates possessed greater numbers of secondary metabolite clusters (P > 0.01, F2,8df = 8.96) and secreted genes (P > 0.01, F2,8df = 44.21) than non-pathotype A. tenuissima isolates, indicating that CDCs contain additional secreted effectors. Non-pathotype A. tenuissima clade isolates were found to possess greater numbers of genes encoding secreted proteins than A. arborescens isolates (P > 0.01, F2,8df = 44.21), including secreted CAZYmes (P > 0.01, F2,8df = 9.83). The basis of differentiation between these taxa was investigated further.
Genomic Differences Between A. tenuissima and A. arborescens Clades
Orthology analysis was performed upon the combined set of 158,280 total proteins from the 12 sequenced isolates. In total, 99.2% of proteins clustered into 14,187 orthogroups. Of these, 10,669 orthogroups were shared between all isolates, with 10,016 consisting of a single gene from each isolate. This analysis allowed the identification of 239 orthogroups that were either unique to A. arborescens isolates or expanded in comparison to non-pathotype A. tenuissima isolates.
Expanded and unique genes to A. arborescens isolates was further investigated using FERA 675 (Supplementary Table S2). Genes involved in reproductive isolation were in this set, including 21 of the 148 heterokaryon incompatibility (HET) loci from FERA 675. CAZymes were also identified within this set, three of which showed presence of chitin binding activity and the other three having roles in xylan or pectin degradation. In total, 25 genes encoding secreted proteins were within this set, secreted proteins with pathogenicity-associated functional annotations included a lipase, a chloroperoxidase, an aerolysin-like toxin, a serine protease and an aspartic peptidase. A further six secreted genes had an effector-like structure by EffectorP but no further functional annotations. Furthermore, one gene from this set was predicted to encode a fungal-specific transcription factor unique to A. arborescens isolates.
Further to the identification of genes unique or expanded in A. arborescens, 220 orthogroups were identified as unique or expanded in the A. tenuissima. These orthogroups were further investigated using isolate FERA 648 (Supplementary Table S2). This set also contained genes involved in reproductive isolation, including nine of the 153 from FERA 648. CAZymes within the set included two chitin binding proteins, indicating a divergence of LysM effectors between A. tenuissima and A. arborescens lineages. The five additional CAZymes in this set represented distinct families from those expanded/unique in A. arborescens, including carboxylesterases, chitooligosaccharide oxidase, and sialidase. In total, 18 proteins from this set were predicted as secreted, including proteins with cupin protein domains, leucine rich-repeats, astacin family peptidase domains and with four predicted to have effector-like structures but no further annotations. A. tenuissima isolates had their own complement of transcription factors, represented by four genes within this set.
Identification of CDC Contigs and Assessment of Copy Number
Alignment of Illumina reads to the apple and Asian pear pathotype MinION reference assemblies identified variable presence of some contigs, identifying these as contigs representing CDCs (CDC contigs). Six contigs totaling 1.87 Mb were designated as CDCs in the apple pathotype reference (Table 3) and four contigs totaling 1.47 Mb designated as CDCs in the pear pathotype reference (Table 4). Two A. tenuissima clade non-pathotype isolates (FERA 1082, FERA 1164) were noted to possess apple pathotype CDC contigs 14 and 19 totaling 0.78 Mb (Table 3).
Table 3. Identification of CDC regions in the A. alternata apple pathotype reference genome.