This search tool helps you to identify chromatograms in the GMD (and successively samples and experiments) which are most similar to a given metabolite profile – a list of metabolites and concentrations - according to the Dot-Product distance metric.
The GMD provides orthogonal metabolite profile visualisation; this is to say, to profile a single metabolite cross multiple experiments/samples/chromatograms or to profile all metabolites within a single experiment/sample/chromatogram. To adjust the metabolite profiles prior comparison we set the concentrations of missing metabolites to zero. Further, to negate a scaling effect, we independently normalize both profiles’ concentration vectors to unit vectors such that the concentration vectors are divided by their magnitude.
The “1-DotProduct” distance score calculates the sum of the multiplied concentrations across all matching metabolites within both profiles, namely the query profile and the hit profile. To make it consistent with other metrics and to associate small values with similar profiles and large values with dissimilar profiles, we modified the reported value and report the difference to one, i.e. 1 – dot product. The “1-DotProduct” matching parameter domain ranges from 0 (perfect match) to 1 (complete mismatch).
To increase the search flexibility, each metabolite can optionally be tagged to be either required (r, must be present), optional (o, can be present) or excluded (x, must not be present) in the matching profiles.
To best document the matching process, we created a two-step workflow. In the first step (Step 1) a data table shows the recognised metabolites available in the GMD (which is prerequisite for the matching process). Subsequently in second step (Step 2), for your convenience, the profile search result is reported as a sortable table with the following columns:
- Chromatogram, this is a link taking you to the actual chromatogram details page
- 1-DotProduct score, the “1-DotProduct” distance between the user metabolite profile and the database hit
- Count Matching Metabolites, the number of metabolites which are present in both, user input metabolite profile and data base hit and which were used to calculate the score
- Factor(s), describes the experiment from which the chromatogram was obtained
- Label, the actual environmental condition, where this chromatogram was taken
- Species, the species of the sample