API    Tools    Contact Us    Help

Keyword: Advanced




The goal of the CarboGrove database is to make working with lectins and glycan binding proteins easier so that these probes can be used effectively by non-experts. This is achieved through the modeling and summarizing of glycan microarray data utilizing the MotifFinder array analysis software which enables platform independent analysis of glycan array data. This objective has expanded with the introduction of the CarboGrove API to serve as a resource for bioinformatic applications as well.

Why "CarboGrove"?

Other motif analysis software for glycan array analysis utilize frequent subtree mining algorithms and adopt "mining" nomenclature, ie "Glycan Motif Miner." The MotifFinder platform used to create these models uses a regression tree in the core of the algorithm, which is reflected here in the database name "CarboGrove" since the database is a collection of regression trees.


Using the database

The CarboGrove database can be searched by simply typing a keyword into the search box and selecting "Search." Keywords include all common glycan binding protein aliases or partial glycan binding protein names. To search the entire database simply leave this blank.

Results can be sorted based on affinity to motifs, by selecting the "Sort Motif" dropdown and selecting a motif to sort by. To find proteins which bind a specific motif leave the keyword search box blank.

The largest source for the CarboGrove database is the CFG database and includes some of the metadata contained therein. For metadata not carried over into this database, I direct you to search that database for the necessary information.

For all results the reference for the the array and the reference for the data is available. The array reference is linked under the "Data Source" column and the data reference is linked in the expanded row for each dataset under "Data Source"

In the expanded row for results you can find links to other databases where available. Protein Info gives the UniProt ID and links to the UniLectin database, which includes standardized lectin information and interactive lectin structures where available. Protein Family gives the PFAM family and links to the PFAM database.

To filter the database or search results down, select the "Advanced" button for advanced search options. From there an option can be selected from the "Sample Provider" "Protein Family" "Model Complexity" or "Model Source" drop down selectors. Note that the model complexity refers to the settings used when developing a model for the protein binding in MotifFinder. A low-complexity model will consist of fewer motifs and will be easier to understand while a higher complexity model might have many motifs and be difficult to understand.

Note that the most appropriate model may differ from protein to protein. When a protein has only a few bound glycans (2-4) it will often be better to use the high-complexity model (which requires a motif to have 2 or more glycans) but when many glycans are bound it is often better to use a lower complexity model (requiring 5 or more glycans). Manually curated "good" settings for each dataset have been chosen and are displayed with the "Suggested" option. Select "All" to see all models for a set of datasets.

Search examples:

To search for lectins with a common name like "MAL" like MAL-I and MAL-II:

To search for lectins with whose names include "Helix" like HPA and HAA:

To search for data on recombinant lectins:


Using the Results:

Fundamentally this database is designed to help the research in two ways: First, to make a detailed summary of glycan binding protein binding readily available for the design of experiments making use of these probes. Second, to make the analysis of experiments using these probes more easily analyzed in a quantitative way.

To get a detailed summary of glycan binding protein binding, simply select "View" from any result listed. The resulting web page gives a set of motifs which work together to model glycan binding (through a multivariate regression tree). A number of graphs and example glycans are given to enable researchers to critically assess the model. See below for a breakdown of the reports

To be able to use these models to analyze datasets of glycan binding results, researchers can download the model objects by selecting "Download" for any of the results. The downloaded "*.gbp" file is a file recognized by MotifFinder (file>Load Model). MotifFinder then can be used to make predictions on the binding to new glycans using the model.

Analysis methods consist of defining potential glycans or partial-glycans which may exist in the sample(s) and making predictions for binding of all glycan binding proteins in the analysis. Then the researcher only needs to look for some composition of the potential glycans which gives the least error between predicted and observed binding. One example of this approach is the GlycanSolver algorithm which uses a non-negative regression.

Reading the Reports:

The CarboGrove reports contain detailed tables and plots to summarize the binding of proteins to glycans. Though the motifs can often be useful indepenently, they are best thought of in the context of the model which made them. The regression tree model starts with one or more initial motifs and identifies variations of those motifs which bind stronger or weaker. It is only in the context of the model that the "Relative Binding" of the motifs is true as each motif is evaluated in the context of it's parent.

For a detailed breakdown of report components see the annotated Example Report.


Using the API

Generally, the API is intended to be run in two stages - first to search for which results or datasets to pull and to get the associated metadata. Then using the DataID or ResultID to get the data from the get functions.

The different functions of the API and their return values are detailed on the API home page. From a practical side, API's are quite simple. Using GET parameters options are passed to the API functions. GET parameters are passed as part of the URL, for example:

This API call passes the text "ECL" to the "search" parameter of the list_gbp function.

Multiple parameters are simply passed with seperated by "&" like so:

All search parameters are matched based on the presence of the term in the field, so a search for CFG may also return results from NCFG.

Return values are given as JSON objects where large datasets are passed as an list of columns (where data is associated by row) while searches return objects for each row, with columns as fields.


Use Cases

Aiding Experimental Design:

CarboGrove can be a valuable aid for researchers designing experiments using glycan binding proteins. For this purpose, its recommended to filter results by a single provider (see advanced search options). Where the glycan binding proteins are fixed, binding observations of interest can be investigated by looking up the results page for the glycan binding protein to see what sorts of glycans might be bound.
Where the researcher wishes to find glycan binding proteins with a specific binding preference, researchers may sort results by a motif that contains the epitope of interest. For example, to study terminal GalNAc structures you can sort by Blood Group A, terminal LacDiNAc, and Tn Antigen. Furthermore, all results which bind one of those structures can be downloaded, loaded into MotifFinder, and then, either by manually defining a list of glycans or using the MotifFinder's "Generate Glycans" tool, you can make predictions of the binding to glycans of interest and compare the binding profile of the glycan binding proteins.

Comparing Results Between Arrays:

The comparison between arrays as demonstrated in the publication can be achieved fairly easily manually but could also be done in an automated fashion. A manual comparison can be made by seaching for a glycan binding protein of interest and copying motif structures and relative binding values into a figure. Similar results could be achieved using the API: First using the search_results function to get ResultIDs for a protein of interest. Then using the get_result function to pull the data for each of the results identified by the ResultIDs. The "MotifReadable" version of the motif structure corresponds to the motif structures presented in the reports. Using the Motif identifiers you can calculate relative binding in the reports. For all glycans of the concentration listed in the report simply calculate the mean binding of the glycans assigned to that motif. These values are simply rescaled such that the highest mean is 1 and the nonbinder group (Motif id "0") is 0. So if X was an array of means then relative binding is equal to (X-X['0'])/max(X-X['0']).

Comparing Glycan Binding Proteins:

To perform an analysis of the results which requires a high level summary of the results it is best to motifsort data obtained using the api get_motifsort_data function. This data, combined with result details obtained using a call to the search_results function allows for a comparison of the apparent affinity for different motifs as observed in the results. This is best done by using the result details to filter the results, by those which are derived from the same source and excluding results which aren't relevant (ie antibodies, PFAM "ig") or are derived from too few datasets. From here we can ask if there are patterns in apparent affinity for motifs based on PFAM or sequence (using UniProt ID to pull sequence information where available).


Frequently Asked Questions

Where can I download the glycan binding data?

The raw glycan binding data is available through FigShare. This data consists of three tables, one for the binding observations, one for the dataset metadata, and one for the glycan structures. The only changes made to the binding data are for the results for the CUPRA array which were transformed from their original values (depeletion index or DI) to have a positive association with binding (transformed to be 1-DI).

Why aren't GlyTouCanIDs linked in the Report Page?

The conversion of glycan structures and assignment of GlyTouCanIDs is a newer addition to the database. Additionally the release version of the database presents the results as static reports. We are actively developing interactive versions of the reports which will feature glycan link outs, presentation of all concentrations, and binding curves for all glycans.

Where is X or how can I do Y?

If its not explained here and you can't find a way to do it yourself let me know via the contact page and I'll be happy to help figure out a way to get what you need.