Information

Tools to find miRNA ID

Tools to find miRNA ID


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

How can I get accession numbers of bread wheat miRNAs by their sequences? What are the tools or website that can be used?


  1. If you have a sequence, run a nucleotide BLAST search at NCBI, specifying Triticum aestivum (taxid:4565) as the organism.

  2. The miRBase page for Triticum aestivum has details of wheat microRNAs, and a page where you can do a sequence search with either BLAST or SSEARCH (as @WYSIWYG kindly pointed out).

  3. Perhaps the Triticum aestivum page at PlantGDB may help.


Common features of microRNA target prediction tools


  • 1 Center for Molecular Medicine, Maine Medical Center Research Institute, Scarborough, ME, USA
  • 2 Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
  • 3 Department of Computer Science, University of Southern Maine, Portland, ME, USA

The human genome encodes for over 1800 microRNAs (miRNAs), which are short non-coding RNA molecules that function to regulate gene expression post-transcriptionally. Due to the potential for one miRNA to target multiple gene transcripts, miRNAs are recognized as a major mechanism to regulate gene expression and mRNA translation. Computational prediction of miRNA targets is a critical initial step in identifying miRNA:mRNA target interactions for experimental validation. The available tools for miRNA target prediction encompass a range of different computational approaches, from the modeling of physical interactions to the incorporation of machine learning. This review provides an overview of the major computational approaches to miRNA target prediction. Our discussion highlights three tools for their ease of use, reliance on relatively updated versions of miRBase, and range of capabilities, and these are DIANA-microT-CDS, miRanda-mirSVR, and TargetScan. In comparison across all miRNA target prediction tools, four main aspects of the miRNA:mRNA target interaction emerge as common features on which most target prediction is based: seed match, conservation, free energy, and site accessibility. This review explains these features and identifies how they are incorporated into currently available target prediction tools. MiRNA target prediction is a dynamic field with increasing attention on development of new analysis tools. This review attempts to provide a comprehensive assessment of these tools in a manner that is accessible across disciplines. Understanding the basis of these prediction methodologies will aid in user selection of the appropriate tools and interpretation of the tool output.


Introduction

Prion diseases (or transmissible spongiform encephalopathies) are an invariably fatal class of progressive neurodegenerative disorders, including sporadic Creutzfeldt-Jakob disease (sCJD) and Gerstmann-Straüssler-Scheinker (GSS) syndrome in humans 1 . The disease pathology is contributed by the conformational conversion of normal cellular PrP C to the disease-associated isoform PrP Sc which leads to the characteristic spongiform vacuolation in the brain and progressive loss of neurons. sCJD commonly presents as a dementing encephalopathy with myoclonus, cerebellar ataxia, and the typical duration of illness with a median survival of four months 2 . Clinical diagnosis of prion diseases is performed in an advanced stage of neurological decline, where affected individuals are often immobile and non-communicable. Confirmation of a diagnosis of prion diseases is either by undertaking a brain biopsy or autopsy. In order to improve diagnosis and intervention strategies for prion disease, we need to further understand the pathogenesis of prion diseases and the key regulators of the molecular processes during infection that drives the prion infectivity and disease spreading. Here, we seek to determine the molecular fingerprint of microRNA (miRNA) changes in the brain and serum of prion-infected mice that could be used as pre-clinical and clinical markers of sCJD.

miRNA molecules have been recognised as a key modulator in translational repression or mRNA degradation by binding to the 3′ untranslated region (UTR) of their target sites. In the context of prion diseases, animal models have revealed a list of miRNA signatures that is responsible for vital biological pathways including synaptic plasticity, neuronal development and cell survival 3,4,5,6 . Furthermore, miRNA found packaged in extracellular vesicles (EVs), such as exosomes, have been increasingly studied to determine whether they can serve as diagnostic biomarkers when isolated from blood 7,8,9 . Recently, isolation of neural-derived exosomes in blood 10 suggests that exosomes derived from the brain may bypass the blood brain barrier (BBB). This may allow the ability to diagnose neurodegenerative diseases through a non-invasive blood test known as a liquid brain biopsy.

To investigate the temporal miRNA expression patterns in a specific brain region, we have utilised a well-characterised in vivo mouse model with the M1000 prion strain that is a mouse-adapted human strain isolated from a patient who died from GSS 11,12 . These infected mice develop first signs of neuropathology after the mid-incubation period, 13 weeks post-inoculation (wpi), in the thalamus and hippocampus which spreads to the occipital cortex and brain stem from 14 wpi with M1000-prions 3 . In this study, the thalamus brain region was collected from the M1000 infected mice and used to profile miRNA at timepoints representing the pre-clinical stage of disease (3 and 13 wpi, prion M1000 infected week 3 (n = 6), week 13 (n = 5) and uninfected week 3 (n = 6), week 13 (n = 6)) to identify possible pre-clinical miRNA markers. For comparison, tissues were also collected at clinical stage whereby the infected mice display symptoms (20 wpi, terminal, prion M1000 infected terminal (n = 6) and uninfected terminal (n = 5)). In addition, to identify potential pre-clinical blood-based biomarkers for sCJD, we also profiled EVs isolated from matching serum to identify the temporal relationship between miRNA expression changes in the thalamus and serum EVs across the course of the disease and how these changes are involved in the pathogenesis of prion diseases. The serum biomarkers could be used to develop a method to monitor the stage of disease progression in sCJD patients and assist in therapeutic treatment and trials.

Statistically significant miRNA identified in the M1000 infected mouse model at pre-clinical and clinical timepoints were selected for validation in a set of human clinical samples collected from sCJD patients and controls. Prion diseases include three forms: sporadic, familial and acquired by an infection which can display different pathological features. While the incubation period or pre-clinical period may be long (up to four decades) sCJD can progress very rapidly upon clinical diagnosis. Codon 129 of the prion protein gene (PRNP) is the site of a common methionine (M)/valine (V) polymorphism which is associated with certain phenotypes of human prion diseases such as sCJD. In the Caucasian population, 52% of individuals are M homozygous (MM), 36% are heterozygous (MV) and 12% are V homozygous (VV) 13 . The mean onset of the disease with those who are MM homozygous or MV heterozygous is 65 years and the average clinical duration is four months with a range of 1–18 months. Those who are V homozygous have been observed to have a clinical duration of 3–18 months where the mean age of onset to be between 41 and 81 years (reviewed in ref. 14 ).

A non-invasive blood test would be an invaluable tool for sCJD as it is typically a difficult and rare condition to diagnose. Patients are often misdiagnosed due to poor clinical measures and lack of effective systemic drug therapies. The miRNA panels identified in this study identified potential biomarkers for early diagnosis and to improve the outcomes of patients with suspected sCJD (rapid progressive dementia) that has a short clinical duration.


Results and discussion

Canonical miRNA target sites function primarily in Drosophila 3′ UTRs

To acquire datasets suitable for quantitative analysis of miRNA targeting in fly cells, we monitored the changes in mRNA levels after co-transfecting S2 cells with one of six different miRNA duplexes and a green fluorescent protein (GFP)-encoding plasmid. The six transfected miRNAs (miR-1, miR-4, miR-92a, miR-124, miR-263a, and miR-994) were chosen because they (or related miRNAs in the same seed family) were not endogenously expressed in S2 cells [8], and they had diverse starting-nucleotide identities, a range of GC content within their seeds, and a moderate-to-high range of predicted target-site abundances. After enriching for transfected, GFP-positive cells by fluorescence-activated cell sorting (FACS), mRNA-seq was performed, and mRNA fold changes were calculated for each miRNA transfection condition relative to a mock transfection, in which the GFP plasmid was transfected without any miRNA duplex (Additional file 1: Table S1). We then normalized the data to reduce batch effects (Additional file 2: Figure S1A–D), some of which were attributable to modest but statistically significant de-repression of the predicted targets of highly expressed endogenous miRNAs, such as bantam miRNA (Additional file 2: Figure S1E–G) [50, 51]. With this new dataset, we begin investigating the features of miRNA target sites that correlate with mRNA repression in Drosophila cells.

In mammals, the presence of an A opposite the first nucleotide of a miRNA is preferentially conserved and correlates with enhanced repression, regardless of the identity of the first nucleotide of the miRNA—observations explained by a pocket within human Argonaute2 (hsAGO2) that preferentially binds this A [34, 37, 39, 52]. In flies, an A at this position of the target site is also associated with enhanced conservation compared to otherwise identical sites missing this A [20], whereas in nematodes conservation and efficacy of a site with perfect pairing to miRNA nucleotides 2–8 followed by a U (8mer-U1 sites) resembles that of 8mer-A1 sites [20, 53, 54]. We therefore examined the influence of the nucleotide at target position 1 in flies, considering the data from all miRNA transfections pooled together. Of the mRNAs possessing a single match to miRNA nucleotides 2–8 in their 3′ UTR, those with an A opposite miRNA position 1 (i.e., those with the 8mer-A1 site) tended to be more repressed than those with each of the other three possibilities opposite miRNA position 1 (8mer-C1, 8mer-G1, and 8mer-U1, respectively), with the identity of the other three possibilities having little influence on repression (Fig. 1a). As expected based on the observation that the first position of the guide RNA is buried within Argonaute and unavailable for pairing [52, 55, 56], this observation generally held when considering each miRNA transfection independently, regardless of whether the identity of the first nucleotide of the miRNA was a U (Additional file 2: Figure S2). Thus Drosophila exhibits a preference for A at target position 1 resembling that of mammals, implying that this target nucleotide is recognized by a pocket within dmAgo1 resembling that of hsAGO2. With respect to nomenclature, these results further supported consideration of the 8mer-A1 site as the canonical 8mer site of Drosophila, as was done originally in mammals [34].

Drosophila miRNAs mediate mRNA repression through the targeting of canonical site types, preferentially in 3′ UTRs. a The increased efficacy in Drosophila of sites with an A across from miRNA position 1. Shown is the response of mRNAs to the transfection of a miRNA (either miR-1, miR-4, miR-92a, miR-124, miR-263a, or miR-994). Data were pooled across these six independent experiments. Plotted are cumulative distributions of mRNA fold changes observed upon miRNA transfection for mRNAs that contained a single site of the indicated type to the transfected miRNA. The site types compared are 8mers that perfectly match miRNA positions 2–7 and have the specified nucleotide (A, C, G, or U) across from position 1 of the miRNA. Also plotted for comparison is the cumulative distribution of mRNA fold changes for mRNAs that did not contain a canonical 7- or 8-nt site to the transfected RNA in their 3′ UTR (no site). Similarity of site-containing distributions to the no-site distribution was tested with the one-sided Kolmogorov–Smirnov (K–S) test (P values). Shown in parentheses are the numbers of mRNAs analyzed in each category. b The six canonical site types for which a signal for repression was detected after transfecting a miRNA into Drosophila cells. ce The efficacy of the canonical site types observed in Drosophila 3′ UTRs (c), ORFs (d), and 5′ UTRs (e). These panels are as in a, but compare fold-change distributions for mRNAs possessing a single canonical site in the indicated region to those with no canonical sites in the entirety of the mRNA. See also Additional file 2: Figures S1 and S2

Analogous analyses of mRNA fold-change values in mammalian systems have demonstrated the function and relative efficacy of 8mer, 7mer-m8, 7mer-A1, 6mer, and offset 6mer sites [37, 57]. Accordingly, we examined the function of these site types in Drosophila, again pooling the data and focusing on mRNAs with a single site to the cognate miRNA. We also considered a sixth site type, the 6mer-A1 site, which has implied function in nematodes [20] and completes the set of all possible 8-, 7-, and 6-nt perfect matches to the 8-nt seed region, which we refer to as the canonical site types (Fig. 1b note the distinction between the 6-nt seed and the 8-nt seed region). When located in the context of 3′ UTRs, each canonical site type was associated with repression, with the magnitude of repression following the hierarchy of 8mer > 7mer-m8 > 7mer-A1 > 6mer

6mer-A1 (Fig. 1c), as indicated from statistical testing of differences in fold-change distributions (Additional file 3: Table S2). This hierarchy resembled that of mammals, except that in mammals the efficacy of the different 6-nt sites is much more distinct, with 6mer > offset 6mer > 6mer-A1, and with the 6mer-A1 difficult to distinguish from background [37, 57].

We also examined the efficacy of canonical sites in mRNA regions outside of the 3′ UTR. Some repression was observed for mRNAs with a site in their open reading frame (ORF) (and no canonical site elsewhere in the mRNA), most convincingly for 8mer sites, although the efficacy of these sites was much less than that observed in 3′ UTRs (Fig. 1d). These observations are consistent with those in mammals [37, 58, 59]. In contrast to observations in mammals, however, repression was also observed for mRNAs with an 8mer site in their 5′ UTR (Fig. 1e). Taking these findings together, we conclude that miRNA targeting in flies resembles that of mammals, except that the efficacy of the three 6-nt canonical sites is more uniform in flies and repression of endogenous mRNAs is more readily detected in fly 5′ UTRs.

Widespread conservation of canonical miRNA target sites in Drosophila UTRs

A previous evolutionary analysis of mammalian miRNA target sites provided a framework for estimating the likelihood that predicted miRNA target sites are conserved across species, while controlling for factors such as differential species relatedness, differential background conservation in UTRs, and differential rates of dinucleotide substitutions [57]. Although this approach has also been applied to Drosophila genomes [20], we improved and extended it by (1) updating conserved miRNA family classifications and 3′ UTR annotations, (2) using an expanded evolutionary tree that incorporated additional insect species, (3) extending analyses to Drosophila 5′ UTRs, (4) using a modified evolutionary analysis pipeline [51], and (5) comparing our evolutionary results to our functional data. Towards this end, we compiled miRNA annotations from multiple studies [8, 10, 11, 15] and classified 91 miRNA families as broadly conserved among Drosophila species, 29 of which have been conserved since the last bilaterian ancestor (Additional file 4: Table S3). We also extracted multiple sequence alignments corresponding to annotated D. melanogaster 5′ UTRs and 3′ UTRs, assigning each UTR to one of five bins based on its background UTR conservation rates [20]. For each bin, we computed phylogenetic trees with a fixed species tree topology that encompassed 27 insect species, allowing for variable branch lengths to capture slower or faster substitution rates among the UTRs of the bin (Fig. 2a). These trees were then used to assign a branch-length score (BLS) [17] to each motif occurrence in D. melanogaster UTRs, which quantified the extent of conservation of that occurrence while controlling for the background conservation rate of its overall UTR context [57]. For example, a motif occurrence detected among all Sophophora species in the 3′ UTR alignment would be assigned a BLS of 4.50, 2.53, or 1.69, depending upon whether the corresponding 3′ UTR in which it resided was in the first, third, or fifth conservation bin, respectively (Fig. 2a).

Evolutionary conservation of canonical sites in Drosophila 5′ UTRs and 3′ UTRs. a Phylogenetic tree of the 27 species used to examine miRNA site conservation. Outgroups of the genus Drosophila include Musca domestica (the housefly), Anopheles gambiae (the mosquito), Apis mellifera (the European honey bee), and Tribolium castaneum (the red flour beetle). D. melanogaster 3′ UTRs were assigned to one of five conservation bins based upon the median conservation of nucleotides across the entire 3′ UTR. The tree is drawn using the branch lengths and topology reported from genome-wide alignments in the UCSC Genome Browser. To the left of the tree, are color-coded branch-length scores corresponding to a site conserved among an entire subgroup of species indicated by a bar of the same color, showing scores for a site within a 3′ UTR in the lowest, middle, and highest conservation bins, labeled in parentheses as bins 1, 3, or 5, respectively. b, c Signal-to-background ratios for indicated site types at increasing branch-length cutoffs, computed for sites located in 3′ UTRs (b) or 5′ UTRs (c). Broken lines indicate 5% lower confidence limits (z-test). These panels were modeled after the one originally shown for the analysis of mammalian 3′ UTR sites [57]. d, e Signal above background for indicated site types at increasing branch-length cutoffs, computed for sites located in 3′ UTRs (d) or 5′ UTRs (e). Broken lines indicate 5% lower confidence limits (z-test). These panels were modeled after the one originally shown for the analysis of mammalian 3′ UTR sites [57]. f Signal-to-background ratios for the 8mer sites of 91 conserved miRNA seed families, calculated at near optimal sensitivity (a branch-length cutoff of 1.0), comparing the ratios observed for sites in 5′ UTRs to those for sites in 3′ UTRs (rs Spearman correlation). Seed families conserved since the ancestor of bilaterian animals are distinguished from those that emerged more recently (orange and blue, respectively). Boxplots on the sides show the distributions of ratios for these two sets of families, with statistical significance for differences in these distributions evaluated using the one-sided Wilcoxon rank-sum test (*P < 0.01). See also Additional file 4: Table S3. g Relationship between site conservation rate and repression efficacy. The fraction of sites conserved above background was calculated as ([Signal – Background]/Signal) at a branch-length cutoff of 1.0. The minimal fraction of sites conferring destabilization was determined from the cumulative distributions (e.g., those in Additional file 2: Figure S2), considering the maximal vertical displacement from the no-site distribution (error bars, standard deviation, n = 6 miRNAs). Colors and shapes represent the canonical site types and UTR location, respectively. This panel was modeled after the one originally shown for the analysis of mammalian 3′ UTR sites [57]. h Relationship between site efficacy and site PCT. mRNAs were selected to have either one 7mer-A1, one 7mer-m8, or one 8mer 3′ UTR site to the transfected miRNA and no other canonical 3′ UTR site. mRNAs with sites of each type were grouped into six equal bins based on the site PCT. For each bin, mean mRNA fold change in the transfection data (error bars, standard error) is plotted with respect to the mean PCT, with the dashed lines showing the least-squares fit to the data. The slopes for each are negative and significantly different from zero (P value < 10 − 10 , linear regression using unbinned data)

For each site type of each of the 91 broadly conserved miRNA families, we computed the “signal” as the number of times that site occurred in D. melanogaster UTRs and had a BLS that equaled or surpassed a particular value (i.e., the “branch-length cutoff”). In parallel, we also computed the “background” as the number of conserved occurrences expected by chance, based upon the mean fraction of conserved motif instances for 50 length-matched k-mer controls, each of which was predicted to have background conservation resembling that of the miRNA site, as estimated from aggregated dinucleotide conservation rates [57]. This allowed us to compute a signal-to-background ratio at each branch-length cutoff, which represented the estimated enrichment of preferentially conserved miRNA sites in fly UTRs (Fig. 2b and c). It also allowed us to compute the signal above background, which represented the estimated number of miRNA sites that have been preferentially conserved in fly UTRs (Fig. 2d and e).

As expected, the signal-to-background ratios increased as the evolutionary conservation criteria became more stringent, with 8mers in 3′ UTRs reaching a ratio of nearly five conserved sites for every one control site at the greater branch-length cutoffs (Fig. 2b). For each site type, the ratios were consistently greater in the 3′ UTRs than they were in 5′ UTRs (Fig. 2b and c). For example, in 5′ UTRs the signal-to-background ratio for 8mers did not surpass 1.6 (Fig. 2c). These results showed that sites are more likely to be conserved if they reside in 3′ UTRs, presumably because this is where they are also more effective (Fig. 1). Nonetheless, when comparing the signal-to-background ratios for different miRNA families, ratios in 5′ UTRs correlated with those in 3′ UTRs (Fig. 2f Additional file 4: Table S3). The greatest ratios tended to be for the fly miRNA families that have been conserved since the ancestor of bilaterian animals (Fig. 2f), as might be expected for these ancient families that have had more time to acquire more roles in gene-regulatory networks.

Although the sequence-conservation signal-to-background hierarchy of 8mer > 7mer > 6mer observed in both 5′ and 3′ UTRs matched the hierarchy observed for efficacy, some differences were observed. Most notably, the conservation signal for the 6mer site was robustly above background, whereas those for the offset 6mer and 6mer-A1 sites were both indistinguishable from background (Fig. 2b), even though these three 6-nt sites had similar efficacies in our repression data (Fig. 1c). Conversely, the 5′ UTR 7mer-A1 site exhibited a detectable signal for conservation (Fig. 2b), even though it had no detectable efficacy in mediating repression (Fig. 1c).

For sites in both 3′ and 5′ UTRs, the signal above background peaked near a branch-length cutoff of 1.0 (Fig. 2d). At this and other branch-length cutoffs, the signal above background was far higher in the 3′ UTR than in the 5′ UTR (Fig. 2d and e), which can be attributed to both a higher fraction of the sites preferentially conserved in 3′ UTRs, as indicated by the higher signal-to-background ratio in 3′ UTRs, and more sites residing in 3′ UTRs, mostly a consequence of 3′ UTRs generally being longer than 5′ UTRs. Including site types whose lower 5% confidence intervals exceeded zero, our results provided an estimate of

12,285 sites conserved above background in 3′ UTRs (2738 ± 31 8mer, 2837 ± 68 7mer-m8, 4062 ± 100 7mer-A1, 2128 ± 221 6mer sites, and 520 ± 244 offset 6mer sites, calculated at a branch-length cutoff of 1.0 and reported ±90% confidence interval) (Fig. 2d). When added to our estimate of

840 sites conserved above background in 5′ UTRs (350 ± 18 8mer, 165 ± 46 7mer-m8 sites, and 325 ± 44 7mer-A1 sites) (Fig. 2e), the estimated number of preferentially conserved UTR sites in Drosophila UTRs totaled

13,125. Simulations that considered all of the conserved instances of site types, and then accounted for those that were estimated to be conserved by chance in 5′ UTRs and 3′ UTRs, indicated that these 13,125 preferentially conserved sites reside within 5035 ± 83 (90% confidence interval) of the 13,550 unique mRNAs with annotated UTRs of Drosophila, implying that mRNAs from 37.2% ± 0.6% of the Drosophila genes are conserved targets of the broadly conserved miRNAs.

Additional comparison of the results from our analyses of site conservation and site efficacy revealed that, as observed for mammalian 3′ UTR sites [57], there was a striking correlation between the fraction of sites conserved above background for each site type and the corresponding fraction of sites mediating mRNA destabilization (Fig. 2g). Slightly deviating from this trend were 3′ UTR 6mer-A1 sites, which appeared to mediate some repression despite lacking a signal for conservation, and 5′UTR 7mer-A1 sites, which had a modest signal for conservation despite undetectable efficacy of repression (Fig. 2g).

To estimate the extent to which each instance of each of the three most effective sites has been preferentially conserved, we computed the probability of conserved targeting (PCT) score for each of the 8mer, 7mer-m8, and 7mer-A1 sites residing in D. melanogaster 3′ UTRs. PCT scores, which range from 0 to 1, summarize the estimated probability that a given site has been evolutionarily conserved because of its pairing to the cognate miRNA, while controlling for other factors, such as its length, surrounding genomic context, and dinucleotide content [57]. These scores provide a valuable resource for biologists wanting to focus on conserved targeting interactions. They also can help predict targeting efficacy [51, 57]. Indeed, sites with greater PCT scores tended to confer more repression (Fig. 2h), implying that as expected, conserved sites were more likely to reside within contexts that favored their efficacy.

Features useful for predicting site efficacy in flies

Before beginning to explore the features of site context associated with site efficacy, we improved the 3′ UTR annotations in S2 cells, the cell line in which we had acquired our functional data. We reasoned that more accurate annotation of these UTRs would allow us to reduce the impact of false-positive sites while appropriately weighting sites by the frequency of their inclusion within 3′ UTR isoforms [51, 60]. Knowledge of abundant alternative 3′ UTR isoforms for the mRNAs of a gene would also provide a more informed assessment of 3′ UTR-related features, such as 3′ UTR length and distance from the closest 3′ UTR end. Accordingly, we identified and quantified the 3′ UTR isoforms of S2 cells using poly(A)-position profiling by sequencing (3P-seq) [20]. Although the majority of the 3P-seq-supported poly(A) sites corresponded to either 3′ UTR isoforms that had been previously annotated by FlyBase or a large-scale study that annotated additional poly(A) sites [61], nearly 47% of the 3P-seq-supported poly(A) sites did not correspond to existing annotations, and most of these novel sites could be linked to a nearby gene with the support of RNA-seq evidence (Fig. 3a). In cases in which the longest 3′ UTR isoform for a gene annotated using 3P-seq differed from that annotated in FlyBase, it was more often longer, although for nearly 1000 genes the 3P-seq results implicated the dominant use of a shorter 3′ UTR isoform in S2 cells (Fig. 3b). Using this information, we compiled a set of 3826 mRNAs that passed our expression threshold in S2 cells and for which ≥ 90% of the 3P-seq tags corresponded to a single dominant 3′ UTR isoform in these cells, and we used this set to investigate features of site context associated with site efficacy.

Refinement of 3′ UTR annotations in S2 cells and development of a regression model that predicts miRNA targeting efficacy in Drosophila. a Poly(A) sites detected in S2 cells by 3P-seq, classified with respect to their previous annotation status. b Extension and contraction of longest 3′ UTR isoforms relative to the FlyBase annotations. For each gene with a poly(A) site detected using 3P-seq, the difference between the longest 3′ UTR isoform annotated using 3P-seq was compared to longest 3′ UTR isoform annotated at FlyBase. These differences were then binned as indicated, and the number of sites assigned to each bin is plotted. c Optimization of scoring of predicted 3′ supplementary pairing in flies. Predicted thermodynamic energy scores were computed for the pairing between a 9-nt region upstream of canonical 7–8-nt 3′UTR sites and a variable-length region of the miRNA with the indicated size (window size) that began at the indicated position of the miRNA. The heatmap displays the partial correlations between these scores and the repression associated with the corresponding sites, determined while controlling for site type. d Optimization of the scoring of predicted structural accessibility in flies. Predicted RNA structural accessibility scores were computed as the average pairing probabilities for variable-length (window size) regions that centered at the indicated mRNA position, shown with respect to the seed match of each canonical 7–8-nt 3′ UTR site. The heatmap displays the partial correlations between these values and the repression associated with the corresponding sites, determined while controlling for site type. e The contributions of site type and each of the six features of the context model. For each site type, the coefficients for the multiple linear regression are plotted for each feature. Because features were each scored on a similar scale, the relative contribution of each feature in discriminating between more or less effective sites was roughly proportional to the absolute value of its coefficient. Also plotted are the intercepts, which roughly indicate the discriminatory power of site type. Bars indicate the 95% confidence intervals of each coefficient. See also Additional file 2: Table S4, Table S5, and Figure S3A

With this set of mRNAs and repression values in hand, we examined two of the more complex features of site context, confirming their effects in Drosophila cells and developing scoring schemes that best correlated with their influence in these cells. The first of these two features was 3′ supplementary pairing, i.e., pairing to the target by miRNA nucleotides outside of the seed region. The strength of this pairing was evaluated as the predicted thermodynamic energy of pairing between the 3′ region of the miRNA and a corresponding mRNA region upstream of the seed match. This predicted energy of pairing was evaluated for mRNAs that possessed a single 7–8-nt 3′ UTR site for the transfected miRNA and then compared to the repression observed for the mRNAs upon miRNA transfection by computing a partial correlation between 3′ supplementary pairing energies and mRNA changes, controlling for site type.

In mammalian cells, 3′ supplementary pairing is most influential when centered on nucleotides 13–17 [37], but in flies the pairing possibilities most consequential for repression had not been identified. To systematically examine these possibilities, we varied three parameters: (1) the start position of the miRNA region considered, examining all start possibilities from positions 9 to 19, (2) the length of the miRNA region considered, examining lengths from 4 to 13 nt, and (3) the length of the target region upstream of the seed match, examining lengths from 4 to 20 nt. A grid search over all parameter combinations revealed that the predicted energy of 3′ supplementary pairing energy was optimally predictive of repression efficacy when it was calculated for the pairing that can occur between miRNA nucleotides 13–17 and a 9-nt region upstream of the seed match (Fig. 3c).

The second feature we investigated was the influence of 3′ UTR structure on target-site accessibility. This feature has been evaluated previously using two approaches, either evaluating nucleotide composition near the site, reasoning that sites residing in high local AU content would be more accessible [37], or attempting to predict site accessibility using various RNA-folding algorithms [38, 51, 62,63,64,65]. With respect to the second approach, a method originally developed to predict small interfering RNA (siRNA) target-site accessibility [62] appears to be one of the more effective methods for predicting miRNA target-site accessibility in mammals [51]. This method folds the 80-nt region centered on the seed match and then reports a structural accessibility (SA) score calculated as the mean unpaired probabilities for a smaller window in the vicinity of the seed match [51, 62]. To determine the optimal location and width of this window for scoring SA in flies, we again computed partial correlations, this time between mean pairing probabilities and mRNA changes, varying two parameters: (1) the position of the center of the window within the target mRNA, examining each position within 20 nt of the seed match, and (2) the size of this window, considering sizes of 1 to 25 nt. A grid search over all parameter combinations indicated that a 25-nt window centered on the nucleotide that pairs to miRNA position 7 was optimal for calculating SA in flies (Fig. 3d). Although the optimal window size fell at the edge of the range, larger windows were not considered because they were more prone to extend beyond 3′ UTR boundaries, which reduced the sample size.

A quantitative model for predicting site efficacy in flies

To identify and evaluate additional features associated with site efficacy in flies and generate a resource for placing fly miRNAs into gene regulatory networks, we developed a quantitative model of miRNA targeting efficacy for flies, which resembled our models developed for mammals [37, 51, 66]. The smaller scope of our fly dataset imposed some limitations on the features we could examine in flies as well as the strategy used to train the model. In particular, the number of training examples was an order of magnitude lower in the fly dataset relative to the human dataset. This was due to (1) fewer small-RNA transfection datasets in S2 cells compared to those available in HeLa cells, (2) a smaller number of genes expressed in S2 cells compared to those expressed in HeLa cells, and (3) shorter 3′ UTRs in flies, which further decreased the number of 3′ UTRs with a site for a miRNA of interest. Thus, we did not consider features related to the identity of the miRNA seed, such as estimated target-site abundance within the transcriptome, predicted seed-pairing stability, and nucleotide identity at the miRNA or target position 8, which are each informative for predicting targeting efficacy in human cells [51, 66]. Moreover, rather than considering features for each site type independently, we trained a single, unified regression model that considered the site type itself as a potential feature of targeting. In addition to site type, seven other features of the sites and their surrounding context and nine features of the target mRNAs were considered as potentially informative of targeting efficacy, either because they had been previously shown to correlate with targeting efficacy in flies or mammals, or because they were related to features shown to correlate with efficacy (Table 1).

Starting with these features, we trained models of targeting efficacy using a variety of machine-learning algorithms. To evaluate each algorithm, we partitioned our dataset into 1000 bootstrapped samples to estimate the held-out prediction performance. Each sample included 70% of the mRNAs with a single 7–8-nt 3′ UTR site from each miRNA transfection experiment (randomly selected without replacement) we reserved the remaining 30% for testing. Among the different algorithms, a stepwise regression strategy that maximized the Akaike information criterion (AIC) led to the best empirical performance (Additional file 2: Figure S3A). This stepwise regression strategy was the same algorithm that we had recently used to build a model of mammalian miRNA targeting efficacy [51]. Relative to a model that considered only site type (the “site only” model), the stepwise regression model that considered features of site context was twofold to threefold improved in predicting the mRNA fold-change measurements (median r 2 of 0.08 and 0.19, respectively P < 0.001, paired Wilcoxon signed-rank test Additional file 2: Figure S3A).

At first glance, an r 2 of only 0.19 for the best algorithm might seem to be a concern, as it implies that the method accounts for only 19% of the variability observed in our datasets. However, no model of miRNA targeting can explain variability arising from either experimental noise or the secondary effects of repressing the primary targets, which together contribute a large fraction of the variability observed in miRNA transfection datasets. Indeed, our analysis of the changes observed for predicted targets of one miRNA when another miRNA was transfected indicated that experimental noise and secondary effects together accounted for nearly half of the variability observed in our datasets, implying that a perfect model of direct targeting could explain at most 52% of the variability (Additional file 2: Figure S3B). Thus, the r 2 of 0.19, which resembled that obtained in mammalian analyses [51], implied that the model explained

37% of the variability attributable to direct targeting.

The features most informative for the stepwise regression model were presumably those with the greatest impact on site efficacy in flies. To identify these key features, we quantified the percentage of bootstrapped samples in which each feature was chosen (Table 1). Seven of the 17 features were selected in ≥ 90% of the bootstrap samples (Table 1), and multiple linear regression models trained with only these seven features performed at least as well as those that considered all 17 features (median r 2 of 0.20 Additional file 2: Figure S3A). Aside from site type, which has long been considered in TargetScanFly [8], these robustly selected features included three features of the site: energy of 3′ supplementary pairing (3P_energy), SA, and evolutionary conservation (PCT) and three features of the mRNA: ORF length (len_ORF), 3′ UTR length (len_3UTR), and the number of weak sites within the mRNA (other_sites) (Table 1). Notably, all of these features were previously selected when modeling site efficacy in mammals [51], with the nuance that in flies 3P_energy outperformed 3P_score, another method of evaluating 3′ supplementary pairing which had been optimized on mammalian data [37]. However, two features strongly associated with site efficacy in mammals were not consistently selected in the fly analysis. These included AU composition in the vicinity of the target site (local_AU) and the minimum distance of a site from 3′ UTR boundaries (min_dist) [37]. Perhaps these features did not strongly discriminate effective targets from ineffective ones in flies because compared to mammalian 3′ UTRs, fly 3′ UTRs are constitutively more AU-rich and much shorter. (The median 3′ UTR length is 661 nt and 202 nt for human and fly, respectively, considering the longest UTR annotation per gene after removing genes with the longest UTR annotations ≤2 nt.)

Using the seven consistently selected features and the entire dataset of 3′ UTRs containing single 7mer-A1, 7mer-m8, or 8mer sites, we trained independent multiple linear regression models for each of these three canonical sites. These three models were then combined to generate a model for fly miRNA targeting, which we call the “context model” because it resembled our context models developed for mammalian miRNA targeting in that it modeled site context in addition to site type. The sign of each coefficient revealed the relationship of each feature to repression (Fig. 3e). For example, mRNAs with longer ORFs or longer 3′ UTRs, and sites with weaker 3′ supplementary pairing energy were more refractory to repression (as indicated by a positive coefficient), whereas target sites that were more structurally accessible or more conserved, and mRNAs with other weak sites were more prone to repression (as indicated by a negative coefficient). Normalizing the scores of each feature to a similar scale enabled assessment of the relative contribution of each feature to the context model (Fig. 3e). As expected, site type was also a major predictor of repression in the model, as indicated by the large magnitude of the intercept term (Fig. 3e). The signs and relative magnitudes of the features largely paralleled those found in the mammals [51], indicating that the influence of these features might reflect evolutionarily conserved aspects of miRNA targeting in bilaterian species. One difference was that PCT scores contributed relatively more to the fly context model than they do to the analogous mammalian model [51], implying that the detection and scoring of the molecular features of target efficacy have more room for improvement in flies, presumably because less data were available in flies for feature identification and evaluation.

Comparison to the performance of previous methods

We next compared the performance of the fly context model to that of previously reported methods, measuring how successfully each method predicted and ranked the mRNAs that respond to the gain or loss of a miRNA in Drosophila. For training, our context model had considered only mRNAs that had a single 7–8-nt site to the cognate miRNA within their 3′ UTR, but for testing it needed to be extended to mRNAs that had multiple sites to the same miRNA within their 3′ UTRs. Accordingly, for each predicted target, we generated a total context score, calculated as the sum of the context scores of the sites to the cognate miRNA [37], and used these total context scores to rank all of the predicted targets for each miRNA. The response of the top-ranked targets was then compared to that of 14 previously reported methods, chosen because predictions for Drosophila targets were available online, as was information needed to rank the predictions. Having already generated the PCT scores of the Drosophila sites, we also combined the scores of multiple 7–8-nt canonical sites when present within the same 3′ UTRs to generate Aggregate PCT scores, which were also used to rank predictions based solely on the probability that they were preferentially conserved targets of the miRNA [57].

We took precautions to perform a fair comparison of the algorithms. First, for each algorithm, we considered only predicted targets that corresponded to mRNAs expressed above the quantification threshold in the relevant test-set sample lacking the miRNA. Second, we avoided testing the context model on the same transfection data upon which it was trained. More specifically, we implemented a cross-validation strategy when testing the results of the context model using the transfection datasets, sequentially holding out each dataset and retraining the coefficients for the features in our context model using the five remaining transfection datasets before generating predictions for the held-out dataset. Further reducing the concern of overfitting was the observation that most top-ranked targets contained two or more canonical 3′ UTR sites and thus were not used during the development and training of our model. Third, for all testing of the context model, we used coefficients retrained on publicly available FlyBase 3′ UTR annotations, reasoning that training on improved 3′ UTR annotations derived from our 3P-seq data would have imparted an advantage to our model.

Another key consideration for the fair comparison of prediction performance is the choice of the approach used to evaluate performance. The use of standard methods for evaluating a binary classifier, such as a receiver operating characteristic (ROC) curve, is not appropriate for several reasons. First, for miRNA target predictions, there is no suitable set of known true positives or true negatives, because databases of validated targets miss many of the actual targets and are strongly biased in favor of the prediction algorithms used to identify the target candidates that are then validated. In the absence of suitable sets of known positives and negatives, ROC analyses can be performed using the molecular effects of perturbing the miRNA, but this approach requires choosing a threshold to separate mRNAs that respond from those that do not. Choosing a stringent threshold misses many of the authentic targets, whereas choosing a less stringent threshold that has a chance of capturing most of the authentic targets brings in too many false positives. The problems with ROC curves compound when trying to compare the performance of different algorithms, some of which predict 100 times more targets than others. Picking a high-stringency cutoff does not do justice to the algorithms that provide many predictions with the goal of achieving greater prediction sensitivity, whereas picking a low-stringency cutoff is unfair to the algorithms that provide relatively few predictions in an effort to achieve greater prediction specificity. Moreover, the use a binary threshold obscures how accurately the algorithms rank their predicted targets. For these reasons, recasting the quantitative phenomenon of miRNA targeting as a binary classification problem is not appropriate, and fairly comparing prediction performance using ROC curves is not possible.

Recognizing these issues, a new approach has been developed for evaluating miRNA target-prediction performance [67], which we first implemented using our six datasets that each examined mRNA changes after transfecting a miRNA into S2 cells (Fig. 4a). For each algorithm and each transfected miRNA, we computed the mean mRNA fold change of the top-ranked targets of the transfected miRNA and then plotted the mean value for the six different miRNAs at various ranking thresholds, thereby summarizing repression efficacy of the top-ranked targets at each threshold. This approach of plotting mean repression over a range of ranking thresholds has several key features that make it suitable for fairly comparing target-prediction performance: (1) It is designed to test performance using global molecular measurements and thus does not require knowledge of true positives and true negatives, (2) it uses a sliding threshold and thus allows for simultaneous comparisons at all stringency cutoffs, (3) its sliding threshold is well suited for evaluating the ability of algorithms to rank predicted targets (given by the relationship between mean repression and stringency threshold).

Performances of different target-prediction algorithms in flies. a The differential ability of algorithms to predict the mRNAs most responsive to miRNAs transfected into Drosophila cells. Shown for each algorithm in the key are mean mRNA fold changes observed for top-ranked predicted targets, evaluated over a sliding sensitivity threshold using the six miRNA transfection datasets. Some methods, such as PicTar, which generated relatively few predictions, could be evaluated at only a few thresholds, whereas others, such as RNA22 and TargetSpy, could be evaluated at many more. For each algorithm, predictions for each of the six miRNAs were ranked according to their scores, and the mean fold-change values were plotted at each sensitivity threshold. For example, at a threshold of 16, the 16 top predictions for each miRNA were identified (not considering predictions for mRNAs expressed too low to be accurately quantified). mRNA fold-change values for these predictions were collected from the cognate transfections, and the mean fold-change values were computed for each transfection for which the threshold did not exceed the number of reported predictions. The mean of the available mean values was then plotted. Also plotted are the mean of mean mRNA fold changes for all mRNAs with at least one cognate canonical 7–8-nt site in their 3′ UTR (dashed line), the mean of mean fold change for all mRNAs with at least one conserved cognate canonical 7–8-nt site in their 3′ UTR (dotted line) and the 95% confidence interval for the mean fold changes of randomly selected mRNAs, determined using 1000 resamplings (without replacement) at each cutoff (shading). Sites were considered conserved if their branch-length scores exceeded a cutoff with a signal:background ratio of 2:1 for the corresponding site type (cutoffs of 1.0, 1.6, and 1.6 for 8mer, 7mer-m8, and 7mer-A1 sites, respectively Fig. 2b). Thresholds at which the distribution of fold changes for predicted targets of the context model was significantly greater than that of any other model are indicated (*, one-sided Wilcoxon rank-sum test, P value < 0.05). See also Additional file 2: Figure S4. b The differential ability of algorithms to predict the mRNAs most responsive to knocking out miRNAs in flies. Shown for each algorithm in the key are mean mRNA fold changes observed for top-ranked predicted targets, evaluated over a sliding sensitivity threshold using the three knockout datasets. Otherwise, this panel is as in a. c and d The differential ability of algorithms to predict targets that respond to the miRNA despite lacking a canonical 7–8-nt 3′ UTR site. These panels are as in a and b, except they plot results for only the predicted targets that lack a canonical 7–8-nt site in their 3′ UTR. Results for our context model and other algorithms that only predict targets with canonical 7–8-nt 3′ UTR sites are not shown. Instead, results are shown for a 6mer context model, which considers only the additive effects of 6mer, offset 6mer, and 6mer-A1 sites and their corresponding context features. e and f The difficulty of predicting mRNAs that respond to miRNA transfection or knockout despite lacking canonical 6–8-nt 3′ UTR sites. These panels are as in c and d, respectively, except they plot results for mRNAs with 3′ UTRs that lack a canonical 6–8-nt site

When applying this analysis of performance, we found that all algorithms except RNA22 predicted repressed targets better than expected by chance (Fig. 4a). However, some, including ComiR, PicTar, MinoTar, RNAhybrid, TargetSpy, and mirSVR, performed similarly or worse than a naïve strategy of selecting all mRNAs that have at least one 7–8-nt canonical site in their 3′ UTR. Of the previously reported algorithms, TargetScanFly, EMBL, and PITA.Top performed the best. Nevertheless, our context model performed better than all previous methods, providing predictions that were the most responsive to transfection of the miRNA at each threshold tested (Fig. 4a).

Although our cross-validation strategy avoided testing our model on the same measurements as used for its training, some concerns regarding testing on the transfection data remained, because these data were used to optimize scoring of some features of our model. Moreover, transfection introduces high concentrations of miRNAs into cells in which they normally are not acting, raising the concern that a model developed and tested solely on transfection datasets might not accurately predict the response of miRNAs in their endogenous physiological contexts. Therefore, we searched for a test set that had not been used to develop any of the algorithms and that monitored the transcriptome response to endogenous miRNAs expressed at physiological levels. Instead of monitoring the new repression observed upon ectopic addition of a miRNA, such a test set would examine the de-repression observed upon loss of an endogenous miRNA. Surveying the Drosophila literature, we identified three miRNA knockout datasets with compelling signals for de-repression. Pooling these datasets, which monitored mRNA changes after deleting either miR-14 [31], miR-34 [32], or miR-277 [33], and carrying out the same type of analysis as we had done for the transfection datasets (but monitoring de-repression following loss of a miRNA instead of repression following introduction of a miRNA) revealed performances that generally resembled those observed with the transfection datasets (Fig. 4b). The relative performances of the previous methods shifted somewhat, with improvement observed for Aggregate PCT, miRanda-MicroCosm, and PicTar and worsening observed for MinoTar, TargetScanFly, and TargetSpy. Importantly, however, when testing on these consequences of endogenous miRNA targeting in flies, the context model again performed better than all previous models. Results for miR-277 resembled those for the other two miRNAs (data not shown), even though miR-277 is unusual in that it primarily resides within Ago2 rather than Ago1 [2].

Using the mean fold change to evaluate repression (or de-repression) of top-ranked targets had several potential limitations. For example, it can exaggerate the influence of individual outliers or more heavily weight datasets with a greater variance in their fold-change distributions. Nonetheless, examination of plots showing the mean of median mRNA changes did not substantially change our assessment of the relative performance of each algorithm, which indicated that we did not arrive at erroneous conclusions because of outliers (Additional file 2: Figure S4). Another potential caveat is that our test sets looking at mRNA changes might miss targets that are repressed only at the level of translation, without changes in mRNA stability. Although such translation-only repression is widespread in early fish embryos [68, 69], examination of later embryos and post-embryonic mammalian cells and tissues has failed to find a set of targets convincingly regulated at only the level of translation [69,70,71], and we have no reason to suspect that such targets exist in the post-embryonic flies. Also potentially influencing our comparisons was the fact that for some previous algorithms predictions were missing for some miRNAs of our test sets. For example, EMBL predictions were not available for miR-263a and miR-994, and because targets for these two miRNAs happened to undergo less repression in our transfections, the testing of EMBL on only the remainder of the transfection datasets presumably inflated its relative performance.

Target-prediction algorithms have been developed with divergent priorities regarding prediction accuracy. Out of concern for prediction specificity, some, including our context model, consider only predictions with the most effective types of sites, i.e., 7–8-nt seed-matched sites within 3′ UTRs. In contrast, other algorithms, out of concern for prediction sensitivity, do not limit their predictions to those with these most effective site types, and some of these include predictions with a vast array of non-canonical sites that show no evidence of efficacy when tested using data from mammals and fish [51]. To begin to explore the tradeoffs of these divergent priorities when predicting miRNA targets in flies, we removed predictions containing 7–8-nt canonical sites to the cognate miRNA in their 3′ UTRs, and tested the behavior of the remaining predictions that lacked these more effective canonical sites. When testing on the transfection data, most algorithms that do not strictly focus on 3′ UTRs with 7–8-nt canonical sites generated predictions that were repressed more than expected by chance (Fig. 4c).

Encouraged by these results, we used our context features to build a model that considered predictions that lacked canonical 7–8-nt 3′ UTR sites but had at least one offset 6mer, 6mer, or 6mer-A1 site in their 3′ UTR. When using either test set and testing only predictions that lacked canonical 7–8-nt 3′ UTR sites to the cognate miRNA, this model, which we call the “6mer context” model, performed better than all existing algorithms, although statistically significant improvement was observed at only two thresholds when testing on de-repression of endogenous targets (Fig. 4c and d). The other algorithm that yielded predictions consistently repressed better than background was DIANA-microT-CDS, which includes predictions with only canonical ORF sites. Thus, taken together, our analysis indicates that two distinct strategies that focus on only marginally effective sites can be predictive in flies, as judged by both transfection and knockout results one approach focuses on canonical 6-nt sites in 3′ UTRs, and the other focuses on canonical ORF sites. However, at best, the average repression of the four to eight top predictions from these approaches was much less than that of the top targets of the standard context model and instead resembled that of the hundreds of mRNAs that contained 7–8-nt canonical 3′ UTR sites (Fig. 4a–d).

The observation that models could be built that successfully predicted targets with only marginal canonical sites was consistent with the demonstrated efficacy of these marginal sites in Drosophila cells (Fig. 1). A larger challenge has been to predict effective non-canonical sites, which lack at least a 6-nt perfect match to the seed region. Although two types of non-canonical sites, known as the 3′ supplementary sites and centered sites, can mediate repression, these sites are rare—indeed so rare that is difficult to observe a signal for their action in mammalian cells without aggregating many datasets [5, 72]. Nonetheless, some algorithms yield many predictions that have only non-canonical sites. Analyses of mammalian datasets indicate that these predictions are no more repressed than expected by chance [51], raising the question as to whether any of the algorithms might successfully predict non-canonical sites in Drosophila. To answer this question, we used our two test sets to measure the response of predictions that lacked any canonical 6–8-nt site to the cognate miRNA in their 3′ UTR (Fig. 4e, f). The only predictions with a convincing signal above background in either test set were those of EMBL, DIANA-microT-CDS, and MinoTar. Manually examining the top-ranked predictions from EMBL revealed that the signal observed for its predictions was attributable to canonical sites located in ORFs and 3′ UTRs of alternative last exons, whereas the signal for the predictions of DIANA-microT-CDS and MinoTar was attributable to canonical ORF sites. We conclude that in flies, as in mammals [51], non-canonical sites only rarely mediate repression, although we cannot exclude the formal possibility that effective non-canonical sites are abundant yet for some reason not predicted above background by any of the existing algorithms.

TargetScanFly (v7)

Having found that the context model performed better than the models that have been providing target predictions to the Drosophila research community (Fig. 4a, b), we overhauled TargetScanFly (available at targetscan.org) to display these improved predictions. Because of the diminishing returns of predicting targets with only marginal sites (Fig. 4c–f), we continued to limit TargetScanFly to predictions with 7–8-nt canonical 3′ UTR sites, with ranks driven by a version of the context model that was trained on the entire transfection dataset.

For simplicity, we had developed the context model using mRNAs without abundant alternative 3′ UTR isoforms (Fig. 3), and to make fair comparisons with the output of previous models, we had tested the context model using only the longest FlyBase-annotated isoform (Fig. 4). Nevertheless, because considering the usage of alternative 3′ UTR isoforms significantly improves the performance of miRNA targeting models [51, 60], our overhaul of the TargetScanFly predictions incorporated both the context scores and current isoform information when ranking mRNAs with canonical 7–8-nt miRNA sites in their 3′ UTRs.

Because the main gene-annotation databases (e.g., Ensembl/FlyBase) were still in the process of incorporating the information available on 3′ UTR isoforms, the first step in the overhaul was to compile a set of reference 3′ UTRs that represented the longest 3′ UTR isoforms for representative ORFs of the fly. These representative ORFs were chosen among the set of transcript annotations sharing the same stop codon, with alternative last exons generating multiple representative ORFs per gene. To compile this set of fly 3′ UTRs, we started with FlyBase annotations [73] for which 3′ UTRs were extended, when possible, using recently identified long 3′ UTR isoforms [74] and 3′-end reads marking additional distal cleavage and polyadenylation sites. The extension of these 3′ UTRs led to a substantial increase in the number of predicted regulatory interactions, with the median number of targets for conserved miRNAs increasing by 78% over the previous version of TargetScanFly (Additional file 2: Figure S5).

For each of these reference 3′ UTR isoforms, 3′-end datasets were used to quantify the relative abundance of tandem isoforms, thereby generating the isoform profiles needed to score features that vary with 3′ UTR length (len_3UTR and other_sites) and assign a weight to the context score of each site, which accounted for the fraction of 3′ UTR molecules containing the site [60]. Our 3P-seq data from S2 cells were combined with 3′-seq data from a range of developmental stages of the fly [74] to generate a meta 3′ UTR isoform profile for each representative ORF, as illustrated for Ultrabithorax (Ubx) (Fig. 5), which is known to undergo alternative cleavage and polyadenylation [75]. Although this meta approach is not expected to be as accurate as using individual datasets to generate isoform profiles and predictions tailored for an individual stage or cell type [61, 75,76,77], it simplifies the summary ranking of predicted targets for each miRNA and still outperforms the previous approach of not considering isoform abundance at all, presumably because isoform profiles for many genes are highly correlated in diverse cell types [60].

An example of a TargetScanFly page, which displays the predicted sites of conserved miRNAs within the Ubx 3′ UTR. At the top is the 3′ UTR profile, showing the relative expression of tandem 3′ UTR isoforms, as measured using 3′-seq [74] as well as our 3P-seq data. Shown on this profile is the end of the longest FlyBase annotation (blue vertical line) and the number of 3′-end reads (525) used to generate the profile (labeled on the y-axis). Below the profile are conserved and poorly conserved sites for miRNAs broadly conserved among insects (colored according to the key), with options to also display sites for poorly conserved miRNAs and other miRBase annotations. Boxed are the predicted miR-iab-8 sites, with the site selected by the user indicated with a darker box. The multiple sequence alignment shows the species in which an orthologous site can be detected (white highlighting) among 27 insect species. Below the alignment is the predicted consequential pairing between the selected miRNA and its conserved and poorly conserved sites, showing also for each site its position, site type, context score, context score percentile, weighted context score, branch-length score, and PCT score

For each 7–8-nt canonical site, we used the corresponding 3′ UTR profile to compute the context score and to weight this score based on the relative abundance of tandem 3′ UTR isoforms that contained the site [60]. Scores for multiple sites to the same miRNA family were also combined to generate cumulative weighted context scores for the 3′ UTR profile of each representative ORF, which provided the default approach for ranking predicted targets with at least one 7–8-nt site to that miRNA family [51]. As an option, the user can instead request that predicted targets of broadly conserved miRNAs be ranked based on their aggregate PCT scores [57], as updated in this study. The user can also obtain predictions from the perspective of each protein-coding gene, viewed either as the mapping of 7–8-nt sites shown beneath the 3′ UTR profile and above the 3′ UTR sequence alignment (Fig. 5), or as a table of miRNAs ranked by either cumulative weighted context score or aggregate PCT score.


Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.5015282.

Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.

References

Hosaka T, Yamashita T, Tamaoka A, Kwak S

. 2019 Extracellular RNAs as biomarkers of sporadic amyotrophic lateral sclerosis and other neurodegenerative diseases . Int. J. Mol. Sci. 20, 3148. (doi:10.3390/ijms20133148) Crossref, ISI, Google Scholar

. 2017 Phase I clinical trial of safety of L-serine for ALS patients . Amyotroph. Lateral Scler. Front. Degener. 18, 107-111. (doi:10.1080/21678421.2016.1221971) Crossref, PubMed, ISI, Google Scholar

. 2006 Identification of potential CSF biomarkers in ALS . Neurology 66, 1218-1222. (doi:10.1212/01.wnl.0000203129.82104.07) Crossref, PubMed, ISI, Google Scholar

. 2001 Decreased platelet glutamate uptake in patients with amyotrophic lateral sclerosis . Neurology 56, 270-272. (doi:10.1212/WNL.56.2.270) Crossref, PubMed, ISI, Google Scholar

. 2015 Plasma neurofilament heavy chain levels and disease progression in amyotrophic lateral sclerosis: Insights from a longitudinal study . J. Neurol. Neurosurg. Psychiatry 86, 565-573. (doi:10.1136/jnnp-2014-307672) Crossref, PubMed, ISI, Google Scholar

Labra J, Menon P, Byth K, Morrison S, Vucic S

. 2016 Rate of disease progression: a prognostic biomarker in ALS . J. Neurol. Neurosurg. Psychiatry 87, 628-632. (doi:10.1136/jnnp-2015-310998) Crossref, PubMed, ISI, Google Scholar

Shepheard SR, Wuu J, Cardoso M, Wiklendt L, Dinning PG, Chataway T, Schultz D, Benatar M, Rogers ML

. 2017 Urinary p75ECD: a prognostic, disease progression, and pharmacodynamic biomarker in ALS . Neurology 88, 1137-1143. (doi:10.1212/WNL.0000000000003741) Crossref, PubMed, ISI, Google Scholar

2019 Diagnostic and prognostic performance of neurofilaments in ALS . Front. Neurol. 9, 1167. (doi:10.3389/fneur.2018.01167) Crossref, PubMed, ISI, Google Scholar

Simpson EP, Henry YK, Henkel JS, Smith RG, Appel SH

. 2004 Increased lipid peroxidation in sera of ALS patients: a potential biomarker of disease burden . Neurology 62, 1758-1765. (doi:10.1212/WNL.62.10.1758) Crossref, PubMed, ISI, Google Scholar

Boylan K, Yang C, Crook J, Overstreet K, Heckman M, Wang Y, Borchelt D, Shaw G

. 2009 Immunoreactivity of the phosphorylated axonal neurofilament H subunit (pNF-H) in blood of ALS model rodents and ALS patients: evaluation of blood pNF-H as a potential ALS biomarker . J. Neurochem. 111, 1182-1191. (doi:10.1111/j.1471-4159.2009.06386.x) Crossref, PubMed, ISI, Google Scholar

Wilson ME, Boumaza I, Lacomis D, Bowser R

. 2010 Cystatin C: a candidate biomarker for amyotrophic lateral sclerosis . PLoS ONE 5, e15133. (doi:10.1371/journal.pone.0015133) Crossref, PubMed, ISI, Google Scholar

Bede P, Bokde ALW, Byrne S, Elamin M, Fagan AJ, Hardiman O

. 2012 Spinal cord markers in ALS: diagnostic and biomarker considerations . Amyotroph. Lateral Scler. 13, 407-415. (doi:10.3109/17482968.2011.649760) Crossref, PubMed, Google Scholar

. 2012 Electrical impedance myography as a biomarker to assess ALS progression . Amyotroph. Lateral Scler. 13, 439-445. (doi:10.3109/17482968.2012.688837) Crossref, PubMed, Google Scholar

Tarasiuk J, Kułakowska A, Drozdowski W, Kornhuber J, Lewczuk P

. 2012 CSF markers in amyotrophic lateral sclerosis . J. Neural Transm. 119, 747-757. (doi:10.1007/s00702-012-0806-y) Crossref, PubMed, ISI, Google Scholar

Feneberg E, Steinacker P, Lehnert S, Schneider A, Walther P, Thal DR, Linsenmeier M, Ludolph AC, Otto M

. 2014 Limited role of free TDP-43 as a diagnostic tool in neurodegenerative diseases . Amyotroph. Lateral Scler. Front. Degener. 15, 351-356. (doi:10.3109/21678421.2014.905606) Crossref, PubMed, ISI, Google Scholar

. 2015 Neurofilament light chain: a prognostic biomarker in amyotrophic lateral sclerosis . Neurology 84, 2247-2257. (doi:10.1212/WNL.0000000000001642) Crossref, PubMed, ISI, Google Scholar

2012 Monitoring CSF proteome alterations in amyotrophic lateral sclerosis: obstacles and perspectives in translating a novel marker panel to the clinic . PLoS ONE 7, e44401. (doi:10.1371/journal.pone.0044401) Crossref, PubMed, ISI, Google Scholar

. 2018 Minimal information for studies of extracellular vesicles 2018 (MISEV2018): a position statement of the International Society for Extracellular Vesicles and update of the MISEV2014 guidelines . J. Extracell. Vesicles 7, 1535750. (doi:10.1080/20013078.2018.1535750) Crossref, PubMed, ISI, Google Scholar

Cheruiyot C, Pataki Z, Ramratnam B, Li M

. 2018 Proteomic analysis of exosomes and its application in HIV-1 infection . Proteomics Clin. Appl. 12, 1700142. (doi:10.1002/prca.201700142) Crossref, ISI, Google Scholar

. 2018 Exosome-associated miRNA profile as a prognostic tool for therapy response monitoring in multiple sclerosis patients . FASEB J. 32, 4241-4246. (doi:10.1096/fj.201701533R) Crossref, PubMed, ISI, Google Scholar

Meng Y, Sun J, Wang X, Hu T, Ma Y, Kong C, Piao H, Yu T, Zhang G

. 2019 Exosomes: a promising avenue for the diagnosis of breast cancer . Technol. Cancer Res. Treat. 18, 1533033818821421. (doi:10.1177/1533033818821421) Crossref, ISI, Google Scholar

Shi M, Jiang Y, Yang L, Yan S, Wang Y-G, Lu X-J

. 2018 Decreased levels of serum exosomal miR-638 predict poor prognosis in hepatocellular carcinoma . J. Cell. Biochem. 119, 4711-4716. (doi:10.1002/jcb.26650) Crossref, PubMed, ISI, Google Scholar

Chen B, Xia Z, Deng YN, Yang Y, Zhang P, Zhu H, Xu N, Liang S

. 2019 Emerging microRNA biomarkers for colorectal cancer diagnosis and prognosis . Open Biol. 9, 180212. (doi:10.1098/rsob.180212) Link, ISI, Google Scholar

Wu HZY, Ong KL, Seeher K, Armstrong NJ, Thalamuthu A, Brodaty H, Sachdev P, Mather K

. 2015 Circulating microRNAs as biomarkers of Alzheimer's disease: a systematic review . J. Alzheimer's Dis. 49, 755-766. (doi:10.3233/JAD-150619) Crossref, ISI, Google Scholar

Zheng X, Zhang Y, Yue P, Liu L, Wang C, Zhou K, Hua Y, Wu G, Li Y

. 2019 Diagnostic significance of circulating miRNAs in systemic lupus erythematosus . PLoS ONE 14, e0217523. (doi:10.1371/journal.pone.0217523) Crossref, PubMed, ISI, Google Scholar

. 2019 A review of MicroRNA biomarkers in traumatic brain injury . J. Exp. Neurosci. 13, 117906951983228. (doi:10.1177/1179069519832286) Crossref, ISI, Google Scholar

Zhou SS, Jin JP, Wang JQ, Zhang ZG, Freedman JH, Zheng Y, Cai L

. 2018 MiRNAS in cardiovascular diseases: potential biomarkers, therapeutic targets and challenges review-article . Acta Pharmacol. Sin. 39, 1073-1084. (doi:10.1038/aps.2018.30) Crossref, PubMed, ISI, Google Scholar

Roser AE, Caldi Gomes L, Schünemann J, Maass F, Lingor P

. 2018 Circulating miRNAs as diagnostic biomarkers for Parkinson's disease . Front. Neurosci. 12, 625. (doi:10.3389/fnins.2018.00625) Crossref, PubMed, ISI, Google Scholar

Piket E, Zheleznyakova GY, Kular L, Jagodic M

. 2019 Small non-coding RNAs as important players, biomarkers and therapeutic targets in multiple sclerosis: a comprehensive overview . J. Autoimmun. 101, 17-25. (doi:10.1016/j.jaut.2019.04.002) Crossref, PubMed, ISI, Google Scholar

Vaishya S, Sarwade RD, Seshadri V

. 2018 MicroRNA, proteins, and metabolites as novel biomarkers for prediabetes, diabetes, and related complications . Front. Endocrinol. (Lausanne) 9, 180. (doi:10.3389/fendo.2018.00180) Crossref, PubMed, ISI, Google Scholar

Mustapic M, Eitan E, Werner JKJ, Berkowitz ST, Lazaropoulos MP, Tran J, Goetzl EJ, Kapogiannis D

. 2017 Plasma extracellular vesicles enriched for neuronal origin: a potential window into brain pathologic processes . Front. Neurosci. 11, 278. (doi:10.3389/FNINS.2017.00278) Crossref, PubMed, ISI, Google Scholar

Brooks BR, Miller RG, Swash M, Munsat TL

. 2000 El Escorial revisited: revised criteria for the diagnosis of amyotrophic lateral sclerosis . Amyotroph. Lateral Scler. 1, 293-299. (doi:10.1080/146608200300079536) Google Scholar

. 2011 Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10. (doi:10.14806/ej.17.1.200) Crossref, Google Scholar

. 2012 Fast gapped-read alignment with Bowtie 2 . Nat. Methods 9, 357-359. (doi:10.1038/nmeth.1923) Crossref, PubMed, ISI, Google Scholar

Robinson MD, McCarthy DJ, Smyth GK

. 2009 edgeR: a bioconductor package for differential expression analysis of digital gene expression data . Bioinformatics 26, 139-140. (doi:10.1093/bioinformatics/btp616) Crossref, PubMed, ISI, Google Scholar

. 2010 A scaling normalization method for differential expression analysis of RNA-seq data . Genome Biol. 11, R25. (doi:10.1186/gb-2010-11-3-r25) Crossref, PubMed, ISI, Google Scholar

Kozomara A, Griffiths-Jones S

. 2014 miRBase: annotating high confidence microRNAs using deep sequencing data . Nucleic Acids Res. 42, D68-D73. (doi:10.1093/nar/gkt1181) Crossref, PubMed, ISI, Google Scholar

. 2009 The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments . Clin. Chem. 55, 611-622. (doi:10.1373/clinchem.2008.112797) Crossref, PubMed, ISI, Google Scholar

Andersen CL, Jensen JL, Ørntoft TF

. 2004 Normalization of real-time quantitative reverse transcription-PCR data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets . Cancer Res. 64, 5245-5250. (doi:10.1158/0008-5472.CAN-04-0496) Crossref, PubMed, ISI, Google Scholar

Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, Speleman F.

2002 Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes . Genome Biol. 3, RESEARCH0034. (doi:10.1186/gb-2002-3-7-research0034) Crossref, PubMed, ISI, Google Scholar

. 2013 Standardization of sample collection, isolation and analysis methods in extracellular vesicle research . J. Extracell. Vesicles 2, 20360. (doi:10.3402/jev.v2i0.20360) Crossref, Google Scholar

Hill AF, Pegtel DM, Lambertz U, Leonardi T, O'Driscoll L, Pluchino S, Ter-Ovanesyan D, Nolte-‘t Hoen ENM

. 2013 ISEV position paper: extracellular vesicle RNA analysis and bioinformatics . J. Extracell. Vesicles 2, 22859. (doi:10.3402/jev.v2i0.22859) Crossref, Google Scholar

Katsu M, Hama Y, Utsumi J, Takashina K, Yasumatsu H, Mori F, Wakabayashi K, Shoji M, Sasaki H

. 2019 MicroRNA expression profiles of neuron-derived extracellular vesicles in plasma from patients with amyotrophic lateral sclerosis . Neurosci. Lett. 708, 134176. (doi:10.1016/j.neulet.2019.03.048) Crossref, PubMed, ISI, Google Scholar

. 2012 Modulating inflammatory monocytes with a unique microRNA gene signature ameliorates murine ALS . J. Clin. Invest. 122, 3063-3087. (doi:10.1172/JCI62636) Crossref, PubMed, ISI, Google Scholar

De Felice B, Guida M, Guida M, Coppola C, De Mieri G, Cotrufo R.

2012 A miRNA signature in leukocytes from sporadic amyotrophic lateral sclerosis . Gene 508, 35-40. (doi:10.1016/j.gene.2012.07.058) Crossref, PubMed, ISI, Google Scholar

. 2017 Serum miRNAs miR-206, 143-3p and 374b-5p as potential biomarkers for amyotrophic lateral sclerosis (ALS) . Neurobiol. Aging 55, 123-131. (doi:10.1016/j.neurobiolaging.2017.03.027) Crossref, PubMed, ISI, Google Scholar

Liguori M, Nuzziello N, Introna A, Consiglio A, Licciulli F, D'Errico E, Scarafino A, Distaso E, Simone IL

. 2018 Dysregulation of microRNAs and target genes networks in peripheral blood of patients with sporadic amyotrophic lateral sclerosis . Front. Mol. Neurosci. 11, 288. (doi:10.3389/fnmol.2018.00288) Crossref, PubMed, ISI, Google Scholar

. 2018 Correlating serum micrornas and clinical parameters in amyotrophic lateral sclerosis . Muscle Nerve 58, 261-269. (doi:10.1002/mus.26106) Crossref, PubMed, ISI, Google Scholar

Vrabec K, Boštjančič E, Koritnik B, Leonardis L, Dolenc Grošelj L, Zidar J, Rogelj B, Glavač D, Ravnik-Glavač M

. 2018 Differential expression of several miRNAs and the host genes AATK and DNM2 in leukocytes of sporadic ALS patients . Front. Mol. Neurosci. 11, 106. (doi:10.3389/fnmol.2018.00106) Crossref, PubMed, ISI, Google Scholar

. 2014 miR-338-3p is over-expressed in blood, CFS, serum and spinal cord from sporadic amyotrophic lateral sclerosis patients . Neurogenetics 15, 243-253. (doi:10.1007/s10048-014-0420-2) Crossref, PubMed, ISI, Google Scholar

Toivonen JM, Manzano R, Oliván S, Zaragoza P, García-Redondo A, Osta R

. 2014 MicroRNA-206: a potential circulating biomarker candidate for amyotrophic lateral sclerosis . PLoS ONE 9, e89065. (doi:10.1371/journal.pone.0089065) Crossref, PubMed, ISI, Google Scholar

Takahashi I, Hama Y, Matsushima M, Hirotani M, Kano T, Hohzen H, Yabe I, Utsumi J, Sasaki H

. 2015 Identification of plasma microRNAs as a biomarker of sporadic amyotrophic lateral sclerosis . Mol. Brain 8, 67. (doi:10.1186/s13041-015-0161-7) Crossref, PubMed, ISI, Google Scholar

Chen Y, Wei Q, Chen X, Li C, Cao B, Ou R, Hadano S, Shang H-F

. 2016 Aberration of miRNAs expression in leukocytes from sporadic amyotrophic lateral sclerosis . Front. Mol. Neurosci. 9, 69. (doi:10.3389/fnmol.2016.00069) Crossref, PubMed, ISI, Google Scholar

de Andrade HMT, de Albuquerque M, Avansini SH, de Rocha SC, Dogini DB, Nucci A, Carvalho B, Lopes-Cendes I

. 2016 MicroRNAs-424 and 206 are potential prognostic markers in spinal onset amyotrophic lateral sclerosis . J. Neurol. Sci. 368, 19-24. (doi:10.1016/j.jns.2016.06.046) Crossref, PubMed, ISI, Google Scholar

Tasca E, Pegoraro V, Merico A, Angelini C

. 2016 Circulating microRNAs as biomarkers of muscle differentiation and atrophy in ALS . Clin. Neuropathol. 35, 22-30. (doi:10.5414/NP300889) Crossref, PubMed, ISI, Google Scholar

. 2017 Circulating brain-enriched microRNAs as novel biomarkers for detection and differentiation of neurodegenerative diseases . Alzheimer's Res. Ther. 9, 1-13. (doi:10.1186/s13195-017-0316-0) Crossref, PubMed, ISI, Google Scholar

Waller R, Wyles M, Heath PR, Kazoka M, Wollff H, Shaw PJ, Kirby J

. 2018 Small RNA sequencing of sporadic amyotrophic lateral sclerosis cerebrospinal fluid reveals differentially expressed miRNAs related to neural and glial activity . Front. Neurosci. 11, 731. (doi:10.3389/fnins.2017.00731) Crossref, PubMed, ISI, Google Scholar

Benoist M, Palenzuela R, Rozas C, Rojas P, Tortosa E, Morales B, González-Billault C, Ávila J, Esteban JA

. 2013 MAP1B-dependent Rac activation is required for AMPA receptor endocytosis during long-term depression . EMBO J. 32, 2287-2299. (doi:10.1038/emboj.2013.166) Crossref, PubMed, ISI, Google Scholar

. 2012 Regulation of monocyte functional heterogeneity by miR-146a and Relb . Cell Rep. 1, 317-324. (doi:10.1016/j.celrep.2012.02.009) Crossref, PubMed, ISI, Google Scholar

. 2013 Modulation of mGluR-dependent MAP1B translation and AMPA receptor endocytosis by microRNA miR-146a-5p . J. Neurosci. 33, 9013-9020. (doi:10.1523/JNEUROSCI.5210-12.2013) Crossref, PubMed, ISI, Google Scholar

Sison SL, Patitucci TN, Seminary ER, Villalon E, Lorson CL, Ebert AD

. 2017 Astrocyte-produced miR-146a as a mediator of motor neuron loss in spinal muscular atrophy . Hum. Mol. Genet. 26, 3409-3420. (doi:10.1093/hmg/ddx230) Crossref, PubMed, ISI, Google Scholar

Lu Y, Cao DL, Jiang BC, Yang T, Gao YJ

. 2015 MicroRNA-146a-5p attenuates neuropathic pain via suppressing TRAF6 signaling in the spinal cord . Brain. Behav. Immun. 49, 119-129. (doi:10.1016/j.bbi.2015.04.018) Crossref, PubMed, ISI, Google Scholar

Iyer A, Zurolo E, Prabowo A, Fluiter K, Spliet WGM, van Rijen PC, Gorter JA, Aronica E.

2012 MicroRNA-146a: a key regulator of astrocyte-mediated inflammatory response . PLoS ONE 7, e44789. (doi:10.1371/journal.pone.0044789) Crossref, PubMed, ISI, Google Scholar

. 2012 Association of microRNA-146a with autoimmune diseases . Inflammation 35, 1525-1529. (doi:10.1007/s10753-012-9467-0) Crossref, PubMed, ISI, Google Scholar

Cui JG, Li YY, Zhao Y, Bhattacharjee S, Lukiw WJ

. 2010 Differential regulation of Interleukin-1 receptor-associated Kinase-1 (IRAK-1) and IRAK-2 by microRNA-146a and NF-κB in stressed human astroglial cells and in Alzheimer disease . J. Biol. Chem. 285, 38 951-38 960. (doi:10.1074/jbc.M110.178848) Crossref, ISI, Google Scholar

. 2008 Identification of miRNA changes in Alzheimer's disease brain and CSF yields putative biomarkers and insights into disease pathways . J. Alzheimer's Dis. 14, 27-41. (doi:10.3233/JAD-2008-14103) Crossref, PubMed, ISI, Google Scholar

. 2009 Micro-RNA abundance and stability in human brain: specific alterations in Alzheimer's disease temporal lobe neocortex . Neurosci. Lett. 459, 100-104. (doi:10.1016/j.neulet.2009.04.052) Crossref, PubMed, ISI, Google Scholar

Kiko T, Nakagawa K, Tsuduki T, Furukawa K, Arai H, Miyazawa T

. 2014 MicroRNAs in plasma and cerebrospinal fluid as potential markers for Alzheimer's disease . J. Alzheimer's Dis. 39, 253-259. (doi:10.3233/JAD-130932) Crossref, PubMed, ISI, Google Scholar

Müller M, Kuiperij HB, Claassen JA, Küsters B, Verbeek MM

. 2014 MicroRNAs in Alzheimer's disease: differential expression in hippocampus and cell-free cerebrospinal fluid . Neurobiol. Aging 35, 152-158. (doi:10.1016/j.neurobiolaging.2013.07.005) Crossref, PubMed, ISI, Google Scholar

. 2015 Serum MicroRNA profiles serve as novel biomarkers for the diagnosis of Alzheimer's disease . Dis. Markers 2015, 625659. (doi:10.1155/2015/625659) Crossref, PubMed, ISI, Google Scholar

. 2014 Rapid isolation of extracellular vesicles from cell culture and biological fluids using a synthetic peptide with specific affinity for heat shock proteins . PLoS ONE 9, e110443. (doi:10.1371/journal.pone.0110443) Crossref, PubMed, ISI, Google Scholar

. 2019 Identification of a circulating miRNA signature in extracellular vesicles collected from amyotrophic lateral sclerosis patients . Brain Res. 1708, 100-108. (doi:10.1016/j.brainres.2018.12.016) Crossref, PubMed, ISI, Google Scholar

Hua YJ, Tang ZY, Tu K, Zhu L, Li YX, Xie L, Xiao HS

. 2009 Identification and target prediction of miRNAs specifically expressed in rat neural tissue . BMC Genomics 10, 214. (doi:10.1186/1471-2164-10-214) Crossref, PubMed, ISI, Google Scholar

Liu G, Detloff MR, Miller KN, Santi L, Houlé JD

. 2012 Exercise modulates microRNAs that affect the PTEN/mTOR pathway in rats after spinal cord injury . Exp. Neurol. 233, 447-456. (doi:10.1016/j.expneurol.2011.11.018) Crossref, PubMed, ISI, Google Scholar

. 2015 MiR-10b-5p expression in Huntington's disease brain relates to age of onset and the extent of striatal involvement . BMC Med. Genomics 8, 1-14. (doi:10.1186/s12920-015-0083-3) Crossref, PubMed, ISI, Google Scholar

Varendi K, Kumar A, Härma MA, Andressoo JO

. 2014 MIR-1, miR-10b, miR-155, and miR-191 are novel regulators of BDNF . Cell. Mol. Life Sci. 71, 4443-4456. (doi:10.1007/s00018-014-1628-x) Crossref, PubMed, ISI, Google Scholar

Li Y, Yui D, Luikart BW, McKay RM, Li Y, Rubenstein JL, Parada LF

. 2012 Conditional ablation of brain-derived neurotrophic factor-TrkB signaling impairs striatal neuron development . Proc. Natl Acad. Sci. USA 109, 15 491-15 496. (doi:10.1073/pnas.1212899109) Crossref, ISI, Google Scholar

Buchman AS, Yu L, Boyle PA, Schneider JA, De Jager PL, Bennett DA.

2016 Higher brain BDNF gene expression is associated with slower cognitive decline in older adults . Neurology 86, 735-741. (doi:10.1212/WNL.0000000000002387) Crossref, PubMed, ISI, Google Scholar

Sadanand A, Janardhanan A, Vanisree AJ, Pavai T

. 2018 Neurotrophin Expression in lymphocytes: a powerful indicator of degeneration in Parkinson's disease, amyotrophic lateral sclerosis and ataxia . J. Mol. Neurosci. 64, 224-232. (doi:10.1007/s12031-017-1014-x) Crossref, PubMed, ISI, Google Scholar

. 2015 The two faces of miR-29 . J. Cardiovasc. Med. 16, 480-490. (doi:10.2459/JCM.0000000000000246) Crossref, Google Scholar

. 2014 Serum microRNA profiles in children with autism . Mol. Autism 5, 40. (doi:10.1186/2040-2392-5-40) Crossref, PubMed, ISI, Google Scholar

Moreau MP, Bruse SE, David-Rus R, Buyske S, Brzustowicz LM

. 2011 Altered microRNA expression profiles in postmortem brain samples from individuals with schizophrenia and bipolar disorder . Biol. Psychiatry 69, 188-193. (doi:10.1016/j.biopsych.2010.09.039) Crossref, PubMed, ISI, Google Scholar

. 2013 A blood based 12-miRNA signature of Alzheimer disease patients . Genome Biol. 14, R78. (doi:10.1186/gb-2013-14-7-r78) Crossref, PubMed, ISI, Google Scholar

2018 miRNA-based signatures in cerebrospinal fluid as potential diagnostic tools for early stage Parkinson's disease . Oncotarget 9, 17 455-17 465. (doi:10.18632/oncotarget.24736) Crossref, Google Scholar

Pallarès-Albanell J, Zomeño-Abellán MT, Escaramís G, Pantano L, Soriano A, Segura MF, Martí E

. 2019 A high-throughput screening identifies microRNA inhibitors that influence neuronal maintenance and/or response to oxidative stress . Mol. Ther. Nucleic Acids 17, 374-387. (doi:10.1016/j.omtn.2019.06.007) Crossref, PubMed, ISI, Google Scholar

Kruman II, Pedersen WA, Springer JE, Mattson MP

. 1999 ALS-linked Cu/Zn-SOD mutation increases vulnerability of motor neurons to excitotoxicity by a mechanism involving increased oxidative stress and perturbed calcium homeostasis . Exp. Neurol. 160, 28-39. (doi:10.1006/exnr.1999.7190) Crossref, PubMed, ISI, Google Scholar

Bao N, Fang B, Lv H, Jiang Y, Chen F, Wang Z, Ma H

. 2018 Upregulation of miR-199a-5p protects spinal cord against ischemia/reperfusion-induced injury via downregulation of ECE1 in Rat . Cell. Mol. Neurobiol. 38, 1293-1303. (doi:10.1007/s10571-018-0597-2) Crossref, PubMed, ISI, Google Scholar


MicroRNA COOPERATIVITY IN CANCER: MECHANISMS, FUNCTIONS AND POTENTIAL THERAPIES

Although the use of miRNA mimics and miRNA inhibitors as therapeutics seems promising, only a small number of miRNA therapeutics has so far progressed into clinical development. One major challenge is the identification of the best miRNA candidates or miRNA targets for different types of cancers. An intuitive strategy is to combine gene expression analysis with miRNA target prediction algorithms, such as TargetScan (30). However, this strategy is compromised by indirect effects and complex gene regulation involving different molecular species and their dynamical interactions. Thus, biochemical techniques based on Argonaute and miRNA immunoprecipitation (e.g. HITS-CLIP and PAR-CLIP) have been developed to experimentally identify miRNA–target interactions at a transcriptome-wide scale (31). Another challenge for miRNA therapeutics is to avoid or minimize toxicity and off-target effects. As a single miRNA often represses its targets quite weakly (21,32), high doses of effective miRNA mimics are usually required to achieve the expected effect. However, high doses also provoke undesired consequences, including unintended targeting by the administered miRNAs. For example, the MRX34 clinical trial had to be terminated due to immune-related adverse events involving patient deaths. Such failures may be prevented by lowering the dosage of the miRNA mimic and consequently reducing off-target effects, but therapeutic benefits could dwindle equally. To overcome this problem, a reasonable approach would be to use lower-dose combinations of miRNAs that synergistically regulate the expression of a shared target. In this context, we expect a reduction or avoidance of undesired events in patients when multiple miRNAs are co-administered at lower levels compared to an individual high-dose miRNA treatment.

Cooperative and synergistic miRNA regulation is an intriguing yet poorly explored mechanism. Different miRNAs can for example cooperate by regulating multiple, complementary targets in a pathway (Figure ​ (Figure1). 1 ). There are a few experimentally validated examples where miRNAs exert synergistic effects in cancer (Table ​ (Table1). 1 ). Co-transfection of miR-34a and miR-15a/16 led to increased cell cycle arrest in non-small cell lung cancer cells due to the fact that miR-15a and miR-16 specifically downregulate CCNE1 and CCND3. Such gene regulation exerts a complementary effect to cell-cycle regulation by miR-34a (33). Pencheva etਊl. identified that miR-1908, miR-199a-5p and miR-199a-3p jointly target ApoE signaling in melanoma. LNA-mediated inhibition of these miRNAs strongly suppressed melanoma metastasis (34). In comparison to single-miRNA treatment in acute lymphoblastic leukemia cells of children, co-expression of miR-125b, miR-100ਊnd miR-99a resulted in downregulation of multiple targets that is causally linked to the resistance to the chemotherapeutic agent vincristine (35). Through the investigation of nine miRNA pairs in glioma cells, Zhao etਊl. found that extensive synergy occurred among upregulated miRNAs. They showed that the highest synergistic effect increasing apoptosis of glioma cells is achieved through simultaneous inhibition of miR-20a and miR-21 (36). Derepression of tumor suppressor genes (PDCD4, BTG2ਊnd NEDD4L) by inhibiting miR-21, miR-23a and miR-27a showed synergistic effects toward reducing pancreatic tumor growth and progression (37). Furthermore, considering multiple targeting of miRNAs, researchers proposed miRNAs as adjuvants in conjunction with available cancer therapies (38). For instance, miRNA modulation can be used to increase the efficiency of small-molecule inhibitors targeting oncogenes (39) and to lower effective doses of chemotherapies (40). We have demonstrated how a potential therapeutic gain can be achieved by utilizing miR-205 and miR-342 as co-adjuvants to sensitize tumor cells to a genotoxic anti-cancer drug (41).

Implementation of miRNA cooperativity through targeting of a shared pathway or of a shared protein-coding gene. Targeting of several interlinked protein-coding genes by multiple miRNAs leads to the regulation of a pathway and thereby modulation of the phenotypic outcome (pathway A). Concerted targeting of a protein-coding gene by two miRNAs can induce efficient regulation of a biological process that is controlled by the gene (pathway B). miRNA targets are highlighted in red.


1. Target Identification & Validation

Target Identification & Characterization

Target identification and characterization begins with identifying the function of a possible therapeutic target (gene/protein) and its role in the disease. Identification of the target is followed by characterization of the molecular mechanisms addressed by the target. A good target should be efficacious, safe, meet clinical and commercial requirements and be "druggable".

Approaches:

  • Data mining using bioinformatics
    — identifying, selecting and prioritizing potential disease targets
  • Genetic association
    — genetic polymorphism and connection with the disease
  • Expression profile
    — changes in mRNA/protein levels
  • Pathway and phenotypic analysis
    — In vitro
    cell-based mechanistic studies
  • Functional screening
    — knockdown, knockout or using target specific tools

Target Validation

Target Validation shows that a molecular target is directly involved in a disease process, and that modulation of the target is likely to have a therapeutic effect. The most important criteria for target validation is to take multi-validation approach.

Approaches:

  • Genetic manipulation of target genes (in vitro)
    — knocking down the gene (shRNA, siRNA, miRNA), knocking out the gene (CRISPR, ZFNs), knocking in the gene
    (viral transfection of mutant genes)
  • Antibodies
    — interacting to the target with high affinity and blocking further interactions
  • Chemical genomics
    — chemical approaches against genome encoding protein

Is your lab having difficulty discovering novel targets? Our broad portfolio of assays, reagents, and libraries can help you find the right lock so you can begin the work to unlock it. From the Sanger Whole Genome CRISPR Library to Duolink ® PLA to measure protein-protein interactions, and bioactive small molecules, we provide you with the right tools to enable the identification, validation, and characterization of the novel targets you’re looking for.


MiRBase

The numbering of miRNA genes is simply sequential. For instance, at the time of writing the last published miRNA was mouse mir-352. The next novel published miRNA will get the number 353. However, if you submit an Xenopus miRNA that is identical to human mir-121 for example, we will suggest you also name your sequence mir-121.

The names/identifiers in the database are of the form hsa-mir-121. The first three letters signify the organism. The mature miRNA is designated miR-121 in the database and in much of the literature, whilst mir-121 refers to the miRNA gene and also to the predicted stem-loop portion of the primary transcript. Distinct precursor sequences and genomic loci that express identical mature sequences get names of the form hsa-mir-121-1 and hsa-mir-121-2. Lettered suffixes denote closely related mature sequences -- for example hsa-miR-121a and hsa-miR-121b would be expressed from precursors hsa-mir-121a and hsa-mir-121b respectively.

miRNA cloning studies sometimes identify two

22nt sequences miRNAs which originate from the same predicted precursor. When the relative abundancies clearly indicate which is the predominantly expressed miRNA, the mature sequences are assigned names of the form miR-56 (the predominant product) and miR-56* (from the opposite arm of the precursor). When the data are not sufficient to determine which sequence is the predominant one, names like miR-142-5p (from the 5' arm) and miR-142-3p (from the 3' arm). An older convention sometimes used miR-142-s and miR-142-as.

miRNAs that do not conform to these ideas have in some cases been renamed in the database. There are however a few published exceptions to these rules that are accommodated. For example, different organisms have slightly different naming conventions -- in plants, published names are of the form MIR121. Viral miRNAs also adopt a slightly different naming scheme. For this reason it is unwise to rely on capitalisation to confer information, such as the mir/miR precursor/mature convention. let-7 and lin-4 are obvious exceptions to the numbering scheme, and these names are retained for historical reasons. New submissions of homologues of let-7 or lin-4 will also acquire these names.

Please note that miRNA names are able to convey only limited information, and are entirely unsuitable to encode information about complex sequence relationships. You should not therefore rely on the name to tell you all you need to know about the sequence. Sensible database approaches should instead use dedicated fields and annotation to describe such relationships, such as the "family" data provided here.

Criteria and conventions for miRNA identification and naming are described in the following short article:

Victor Ambros, Bonnie Bartel, David P. Bartel, Christopher B. Burge, James C. Carrington, Xuemei Chen, Gideon Dreyfuss, Sean R. Eddy, Sam Griffiths-Jones, Mhairi Marshall, Marjori Matzke, Gary Ruvkun, and Thomas Tuschl. A uniform system for microRNA annotation. RNA 2003 9(3):277-279.

In addition to a name or ID, each miRBase Sequence entry has a unique accession number. The accession number is the only truly stable identifier for an entry -- miRNA names may change from those published as relationships between sequences become clear. The advantage of the accessioned system is that such changes can be tracked in the database, allowing names to evolve to remain consistent, whilst providing the user with full access to the data and history. However, accessions convey little biological meaning, and it is expected that miRNAs are referred to by name in publications.


MiRBase: the microRNA database

miRBase provides the following services:

  • The miRBase database is a searchable database of published miRNA sequences and annotation. Each entry in the miRBase Sequence database represents a predicted hairpin portion of a miRNA transcript (termed mir in the database), with information on the location and sequence of the mature miRNA sequence (termed miR). Both hairpin and mature sequences are available for searching and browsing, and entries can also be retrieved by name, keyword, references and annotation. All sequence and annotation data are also available for download.
  • The miRBase Registry provides miRNA gene hunters with unique names for novel miRNA genes prior to publication of results. Visit the help pages for more information about the naming service.

To receive email notification of data updates and feature changes please subscribe to the miRBase announcements mailing list. Any queries about the website or naming service should be directed at [email protected]

miRBase is managed by the Griffiths-Jones lab at the Faculty of Biology, Medicine and Health, University of Manchester with funding from the BBSRC. miRBase was previously hosted and supported by the Wellcome Trust Sanger Institute.


Change history

Simons, M. & Raposo, G. Exosomes–vesicular carriers for intercellular communication. Curr. Opin. Cell Biol. 21, 575–581 (2009).

Valadi, H. et al. Exosome-mediated transfer of mRNAs and microRNAs is a novel mechanism of genetic exchange between cells. Nat. Cell Biol. 9, 654–659 (2007).

Hunter, M. P. et al. Detection of microRNA expression in human peripheral blood microvesicles. PLoS One 3, e3694 (2008).

Ratajczak, J., Wysoczynski, M., Hayek, F., Janowska-Wieczorek, A. & Ratajczak, M. Z. Membrane-derived microvesicles: important and underappreciated mediators of cell-to-cell communication. Leukemia 20, 1487–1495 (2006).

Selbach, M. et al. Widespread changes in protein synthesis induced by microRNAs. Nature 455, 58–63 (2008).

Baek, D. et al. The impact of microRNAs on protein output. Nature 455, 64–71 (2008).

Bartel, D. P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281–297 (2004).

Mitchell, P. S. et al. Circulating microRNAs as stable blood-based markers for cancer detection. Proc. Natl Acad. Sci. USA 105, 10513–10518 (2008).

Janas, T., Janas, T. & Yarus, M. Specific RNA binding to ordered phospholipid bilayers. Nucleic Acids Res. 34, 2128–2136 (2006).

Manavbasi, Y. & Suleymanoglu, E. Nucleic acid-phospholipid recognition: Fourier transform infrared spectrometric characterization of ternary phospholipid-inorganic cation-DNA complex and its relevance to chemicopharmaceutical design of nanometric liposome based gene delivery formulations. Arch. Pharm. Res. 30, 1027–1040 (2007).

Suleymanoglu, E. Phospholipid-nucleic acid recognition: developing an immobilized liposome chromatography for DNA separation and analysis. PDA J. Pharm. Sci. Technol. 60, 232–239 (2006).

Gromelski, S. & Brezesinski, G. DNA condensation and interaction with zwitterionic phospholipids mediated by divalent cations. Langmuir 22, 6293–6301 (2006).

Kim, S. I. et al. Systemic and specific delivery of small interfering RNAs to the liver mediated by apolipoprotein A-I. Mol. Ther. 15, 1145–1152 (2007).

McManus, J. J., Radler, J. O. & Dawson, K. A. Does calcium turn a zwitterionic lipid cationic? J. Phys. Chem. B 107, 9869–9875 (2003).

Mengistu, D. H., Bohinc, K. & May, S. Binding of DNA to zwitterionic lipid layers mediated by divalent cations. J. Phys. Chem. B 113, 12277–12282 (2009).

Thery, C., Amigorena, S., Raposo, G. & Clayton, A. Curr. Protoc. Cell Biol. Chapter 3, 22 (John Wiley & Sons, 2006) Unit 3.

Lima, E. S. & Maranhao, R. C. Rapid, simple laser-light-scattering method for HDL particle sizing in whole plasma. Clin. Chem. 50, 1086–1088 (2004).

Simpson, R. J., Lim, J. W., Moritz, R. L. & Mathivanan, S. Exosomes: proteomic insights and diagnostic potential. Expert Rev. Proteomics 6, 267–283 (2009).

Mathivanan, S. & Simpson, R. J. ExoCarta: a compendium of exosomal proteins and RNA. Proteomics 9, 4997–5000 (2009).

Conde-Vancells, J. et al. Characterization and comprehensive proteome profiling of exosomes secreted by hepatocytes. J. Proteome Res. 7, 5157–5166 (2008).

Lim, L. P., Glasner, M. E., Yekta, S., Burge, C. B. & Bartel, D.P. Vertebrate microRNA genes. Science 299, 1540 (2003).

Chen, C. Z., Li, L., Lodish, H. F. & Bartel, D. P. MicroRNAs modulate hematopoietic lineage differentiation. Science 303, 83–86 (2004).

Rader, D. J., Cohen, J. & Hobbs, H. H. Monogenic hypercholesterolemia: New insights in pathogenesis and treatment. J. Clin. Invest. 111, 1795–1803 (2003).

Lund-Katz, S. & Phillips, M. C. High density lipoprotein structure-function and role in reverse cholesterol transport. Subcell. Biochem. 51, 183–227 (2010).

Kosaka, N. et al. Secretory mechanisms and intercellular transfer of microRNAs in living cells. J. Biol. Chem. 285, 17442–17452 (2010).

Sun, G., Li, H. & Rossi, J. J. Sequence context outside the target region influences the effectiveness of miR-223 target sites in the RhoB 3′ UTR. Nucleic Acids Res. 38, 239–252 (2010).

Cui, X. D. et al. EFNA1 ligand and its receptor EphA2: potential biomarkers for hepatocellular carcinoma. Int. J. Cancer 126, 940–949 (2010).

Feinberg, E. H. & Hunter, C. P. Transport of dsRNA into cells by the transmembrane protein SID-1. Science 301, 1545–1547 (2003).

Wolfrum, C. et al. Mechanisms and optimization of in vivo delivery of lipophilic siRNAs. Nat. Biotechnol. 25, 1149–1157 (2007).

Lewis, B. P., Burge, C. B. & Bartel, D. P. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15–20 (2005).

Friedman, R. C., Farh, K. K., Burge, C. B. & Bartel, D. P. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19, 92–105 (2009).

Podrez, E. A. Antioxidant properties of high density lipoprotein and atherosclerosis. Clin. Exp. Pharmacol. Physiol. 37, 719–725 (2010).

Heinecke, J. W. The HDL proteome: a marker–and perhaps mediator–of coronary artery disease. J. Lipid Res. 50 (Suppl), S167–171 (2009).

Rothblat, G. H. & Phillips, M. C. High-density lipoprotein heterogeneity and function in reverse cholesterol transport. Curr. Opin. Lipidol. 21, 229–238 (2010).

Lu, D. & Rhodes, D. G. Binding of phosphorothioate oligonucleotides to zwitterionic liposomes. Biochim. Biophys. Acta 1563, 45–52 (2002).

Qiu, X. et al. Crystal structure of cholesteryl ester transfer protein reveals a long tunnel and four bound lipid molecules. Nat. Struct. Mol. Biol. 14, 106–113 (2007).

Trajkovic, K. et al. Ceramide triggers budding of exosome vesicles into multivesicular endosomes. Science 319, 1244–1247 (2008).

Ferracin, M., Veronese, A. & Negrini, M. Micromarkers: miRNAs in cancer diagnosis and prognosis. Expert Rev. Mol. Diagn. 10, 297–308 (2010).

Wang, G. K. et al. Circulating microRNA: a novel potential biomarker for early diagnosis of acute myocardial infarction in humans. Eur. Heart J. 31, 659–666 (2010).

Wang, J. F. et al. Serum miR-146a and miR-223 as potential new biomarkers for sepsis. Biochem. Biophys. Res. Commun. 394, 184–188 (2010).

Heneghan, H. M., Miller, N., Lowery, A. J., Sweeney, K. J. & Kerin, M. J. MicroRNAs as novel biomarkers for breast cancer. J. Oncol. 2009, 950201 (2009).

Gilad, S. et al. Serum microRNAs are promising novel biomarkers. PLoS One 3, e3148 (2008).

MacArthur, J. M. et al. Liver heparan sulfate proteoglycans mediate clearance of triglyceride-rich lipoproteins independently of LDL receptor family members. J. Clin. Invest. 117, 153–164 (2007).

Ramakrishnan, S. N., Lau, P., Burke, L. J. & Muscat, G. E. Rev-erbbeta regulates the expression of genes involved in lipid absorption in skeletal muscle cells: evidence for cross-talk between orphan nuclear receptors and myokines. J. Biol. Chem. 280, 8651–8659 (2005).

Yao, Y. et al. High-density lipoproteins affect endothelial BMP-signaling by modulating expression of the activin-like kinase receptor 1 and 2. Arterioscler. Thromb. Vasc. Biol. 28, 2266–2274 (2008).

Moreno, P. R., Purushothaman, K. R., Sirol, M., Levy, A. P. & Fuster, V. Neovascularization in human atherosclerosis. Circulation 113, 2245–2252 (2006).

Lee, H. et al. Hepatic siRNA delivery using recombinant human apolipoprotein A-I in mice. Biochem. Biophys. Res. Commun. 378, 192–196 (2009).

Fukao, T. et al. An evolutionarily conserved mechanism for microRNA-223 expression revealed by microRNA gene profiling. Cell 129, 617–631 (2007).

Gentner, B. et al. Stable knockdown of microRNA in vivo by lentiviral vectors. Nat. Methods 6, 63–66 (2009).

Eyholzer, M. et al. Complexity of miR-223 regulation by CEBPA in human AML. Leuk. Res. 34, 672–676 (2010).

Fazi, F. et al. A minicircuitry comprised of microRNA-223 and transcription factors NFI-A and C/EBPα regulates human granulopoiesis. Cell 123, 819–831 (2005).

Pulikkan, J. A. et al. Cell-cycle regulator E2F1 and microRNA-223 comprise an autoregulatory negative feedback loop in acute myeloid leukemia. Blood 115, 1768–1778 (2010).

Lu, H., Buchan, R. J. & Cook, S. A. MicroRNA-223 regulates Glut4 expression and cardiomyocyte glucose metabolism. Cardiovasc. Res. 86, 410–420 (2010).

Yu, C. H., Xu, C. F. & Li, Y. M. Association of MicroRNA-223 expression with hepatic ischemia/reperfusion injury in mice. Dig. Dis. Sci. 54, 2362–2366 (2009).

Sugatani, T. & Hruska, K. A. MicroRNA-223 is a key factor in osteoclast differentiation. J. Cell Biochem. 101, 996–999 (2007).

Iida, H. et al. Ephrin-A1 expression contributes to the malignant characteristics of <α>-fetoprotein producing hepatocellular carcinoma. Gut 54, 843–851 (2005).

Huang, L. et al. Immunoaffinity separation of plasma proteins by IgY microbeads: meeting the needs of proteomic sample preparation and analysis. Proteomics 5, 3314–3328 (2005).

Nieuwland, R. et al. Cellular origin and procoagulant properties of microparticles in meningococcal sepsis. Blood 95, 930–935 (2000).

Matz, C. E. & Jonas, A. Micellar complexes of human apolipoprotein A-I with phosphatidylcholines and cholesterol prepared from cholate-lipid dispersions. J. Biol. Chem. 257, 4535–4540 (1982).

Griffiths-Jones, S., Saini, H. K., van Dongen, S. & Enright, A. J. miRBase: tools for microRNA genomics. Nucleic Acids Res. 36, D154–158 (2008).

Cline, M. S. et al. Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc. 2, 2366–2382 (2007).

Li, C. & Wong, W. H. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl Acad. Sci. USA 98, 31–36 (2001).