Database Features
About this site

Database Features

ADFS (Allergen Database for Food Safety) is allergen database which comprised of allergen information and allergenicity prediction tools for food safety. Among the worlds, people especially kids are far more likely than ever before to develop food allergies. The serious allergic reactions called “anaphylaxis” can be life-threatening, for which the government of each country requires labeling major allergen in food products to avoid such risk in advance. However, the sensitivity to allergens varies among individuals, and food ingredients which are not known allergens may induce allergic reaction in some cases: therefore, it is crucial to update allergen information based on clinical medicine. NIHS, as a public regulatory institute, provides up-to-date allergens information on ADFS so that food safety for those who need it can be properly evaluated. Allergen prediction tool AllerSTAT, the machine learning-based identification algorithm of allergen, may afford the best possible knowledge for allergy safety of unidentified allergen and new protein in genetically modified organism, GMO.

Aim of ADFS:

・Providing access to searchable allergen database

・Assessing allergenic potential by the prediction model based on the machine learning technology

Expecting outcome:

・Understanding the features of known allergens

・Prediction of the potential risk for food allergy

・Implementation of optimal risk assessment for food product development

Listed data:

・IgE epitope; peer-reviewed literatures by ADFS multidisciplinary team

・Allergenic protein;

Lists is collected from literature sources and AllergenOnline (http://www.allergenonline.org/) whose allergens have been peer-reviewed by international experts of allergology. supplemental information of each allergen, such as sequence, glycosylation, and structure, were added by referring public sequence listings (e.g., UniProt)

・Low-molecular weight (LowMolWt) allergenic compounds;

List is extracted from general allergen information provided in AllAllergy (Database is currently not available), by careful reviewing by us.

This database has been supported by Health and Labor Sciences Research Grants program.

Use case

Cutting-edge technologies such as genome editing and synthetic biology allow us to produce novel foods and functional proteins. Although these foods and proteins can be helpful to human health, attention need to be paid to an accurately assessment for the toxicity and allergenicity. Predicting food allergenicity of those foods is a very important topic, even progress in this area remains unsatisfactory. ADFS implements allergen list and sequence searching tools that enable to predict candidate potential in allergy safety.

Allergen List

Those who want to know the substance that cause food allergies can access allergen-related information consisting of allergen names, sources, sequences, structures, domains, IgE epitopes, and literature references as well as links to the major protein (UniProt, PDB, InterPro, Pfam) and medical literature (NCBI PubMed) servers.

Allergen Prediction

ADFS tools can be used to assess a potential of allergenic cross-reactivities with four different type of sequence identity search.

・Homology Search (Epitope and Protein)

By running FASTA program, “Epitope” and “Protein Search” align whether query proteins are homologous to known IgE epitopes or relatively identical to allergen, respectively.

・FAO/WHO

A report from a Joint FAO/WHO Expert Consultation on Foods Derived from Biotechnology proposed that cross-reactivity between a query protein and a known allergen must be considered when there is 1) more than 35% identity in the amino acid sequence of the expressed protein, using a window of 80 amino acids and a suitable gap penalty, or 2) identity of 6 contiguous amino acids. Using the FAO/WHO allergenicity prediction in our site, you can analyze the potential allergenicity of your query protein according to the two criteria above. It should be noted that the prediction results will not necessarily indicate that your query protein has an allergenicity

・AllerSTAT

ADFS introduce a data-driven approach and constructed a machine learning method to detect amino-acid subsequences of proteins that contribute to allergenicity. The obtained amino-acid subsequences that occur only in allergenic proteins with statistical significance are used to assessment for allergen potential.

About NIHS

The National Institute of Health Sciences (NIHS) conducts testing, research, and studies toward the proper evaluation of the quality, safety, and efficacy of pharmaceutical products, foods, and the numerous chemicals in the living environment. NIHS also provide information on drugs, medical devices, food, and chemicals.

Contact us

Please use contact form to connect with ADFS support team.

We apologize for any inconvenience that may arise and thank you for your understanding.

About this site

1

The Purpose of This Site

2

Who We Are

3

Privacy Policy

4

Data Collection

5

Allergen Data

6

Search tools

7

Allergenicity Prediction

8

Acknowledgement

1) The Purpose of This Site

Food allergy is on the rise around the world and increasingly common health problem. New biotechnology techniques including gene-editing and gene-modification bring about the food revolution that benefit to food supply. Such a techniques, however, has a potential to generate new protein which express never before and has further raised public concern about allergy. To distribute these food source in the market require the risk assessment of food allergenicity, but progress in this area remains unsatisfactory. Using the vast amount of data produced by genomic, functional, and structural studies, bioinformatic approachs might provide useful insights into allergenicity. The database component of the ADFS contains information on allergen names, sources, sequences, structures, domains, IgE epitopes, and literature references as well as links to the major protein (UniProt, PDB, InterPro, Pfam) and medical literature (NCBI PubMed) servers. This site also provides sequence searching tools that enable the user to obtain the sequence homology of a certain protein or peptide related to allergens (protein, epitope search). This and other bioinformatics tools can be used to rapidly determine potential cross-reactivities between allergens and to screen novel proteins for the presence of IgE epitopes that they may share with known allergens.

2) Who We Are

We are members of the Division of Biochemistry of the National Institute of Health Sciences, located in Kanagawa, Japan.

3) Privacy Policy

If you give us with personal information, as in an e-mail message, we will store your personal information to respond you by e-mail or any other contact information that you have provided. We will not disclose, give, sell or transfer any personal information about our visitors unless required by law enforcement officials or a legal statute.

4) Data Collection

While we aim to provide data that is as accurate as possible, you are strongly recommended to check the data and outcomes that you retrieve from this site against original scientific literature and other sources of information. No rights or damage can be claimed based on data provided and/or generated by this site and its tools.

4-1) Allergens listed in ADFS

The allergen and isoallergen data in the ADFS were collected from literature sources and AllergenOnline (http://www.allergenonline.org/) whose allergens have been peer-reviewed by international experts of allergology. As the sequence data of allergens in AllergenOnline are represented in NCBI gene identifier (gi) format, we substituted corresponding UniProt ID for these gi numbers to put them into our database. The latest data were collected in Jan 2025.

4-2) Epitope

Data published in the NCBI PubMed database was used to obtain the epitope sequences listed in the ADFS; this data was primarily collected using the following keywords: Immunoglobulin E, IgE, IgE-epitope, IgE-binding, Epitope, Epitope Mapping, Linear, Conformational, Structural, Discontinuous, Three-dimensional, Sequence, Analysis, Identification, Peptide, Mimotope, Bacteriophages, Phage, Display. All epitope sequences that were retrieved were added to the database. Some of the epitope data reported before 2002 was cited from the SDAP database (http: //fermi.utmb.edu/SDAP/). The latest data were collected in Jan 2025. The epitope sequences collected using the above method were linear and were thus labeled as "L". In addition to linear epitopes, we also collected data on conformational epitopes described in published references in the NCBI PubMed database using the following keywords: conformational, structural, discontinuous, and three-dimensional. We labeled conformational epitopes as "C". In several cases, carbohydrate moiety itself would act as an IgE-binding epitope, and we labeled such epitopes as “S”.

5) Allergen Data

Allergen data in the ADFS is presented in two different windows. A summarized table or sequence search results are shown in the main window, and detailed allergen data are shown in an Entry View window. Optional windows can be opened, if necessary. Note that each row in the main window tables corresponds to each allergen/isoallergen data entry which has an individual gi number. That means the number of rows in the table shown on the top-left is not necessarily identical to the number of unique allergens since it is often the case that one allergen has multiple gi numbers. However, if multiple data entries have an identical allergen/isoallergen name, clicking the name will open the same Entry View window.

5-1) Allergen Names

The allergens listed in the ADFS were named according to IUIS Allergen Nomenclature (http://www.allergen.org/) as far as possible. However, some allergen entries did not have corresponding names assigned by IUIS. In such cases, we have named these allergens as follows: the first three or four letters of the genus of the organism + the first one or two letter(s) of the species of the organism + “?”. For example, we have named an unassigned allergen in peanut (Arachis hypogaea) which has gi:14347293 as "Ara h ?". Taxonomic and common names of allergens and allergen descriptions were automatically collected from UniProt and GenBank.

5-3) Sequences

As described in 4-1, our allergen sequences were derived from AllergenOnline by transffering gi numbers to the corresponding UniProt ID whenever possible. As for the epitope sequences, their amino acid residue numbers were represented in a manner such that the first methionine was regarded as position number one. If the corresponding sequence in UniProt did not include the first methionine, the first residue of the sequence was regarded as position number one.

5-4) Structural Information

Based on the UniProt data, PDB and/or HSSP IDs were included as structural information on the allergens. The corresponding PDB entry page will open in a new window.

5-5) Domain Information

Based on the UniProt data, InterPro and/or Pfam IDs were represented as domain information applicable to the allergens. The corresponding InterPro and/or Pfam entry page will open in a new window.

5-6) Epitope Information

Linear epitope sequences are shown on the individual sites for each allergen, and each sequence is directly linked to the BLAST Search tool. If you click "Epitope" or "Protein" (next to an epitope sequence), an Epitope Search or Protein Search window will open, respectively, with the epitope sequence pre-entered. Using this function, you can easily search for other allergens or proteins that might be cross-reactive with the original allergen through the epitope. In addition to information on the linear epitope sequences, information on the original reference (PubMed No., UniProt accession number, reference title, author, journal name and methods of epitope analysis) is also presented. When plural references are present, the information on the epitopes in each reference are individually cited. For conformational epitopes, only information on the original reference is cited. If a carbohydrate moiety itself acts as a bioactive epitope, the treatment of deglycosylation, assay method, residue number of glycosylation, are indicated, if available.

5-7) Carbohydrate Information

If an allergen has carbohydrate information(s) in its UniProt data, a "Sugar" icon will be displayed in the main window and the carbohydrate residue number(s) and the type of the sugar linkage will be shown in the Entry View window.

5-8) Searching for related allergens

You can search for structurally related allergens by clicking a loupe icon next to the PDB/HSSP/InterPro/Pfam IDs. The results of a keyword search using the ID will then be opened.

To top

6) Search tools

You can search for allergens in the ADFS by name, keywords, or category. Keyword and category searches are novel characteristics of our database. The site also provides sequence search tools that enable you to obtain the sequence homology of certain allergen-related proteins or peptides. The details of these search methods are described below.

6-1) Allergen Search (Name Search)

The allergens and isoallergens are alphabetically listed by name. You can select an initial letter to list the allergens.

6-2) Allergen Search (Category Search)

The allergens are grouped into 13 categories: aero animal, aero fungi, aero insect, aero mite, aero plant, contact, food animal, food fungi, food plant, gliadin, protozoan, venom/salivary, and worm. Each category is distinguished by a different color. You can select a category to list all the allergens belonging to it.

6-4) Sequence Search (Protein Search)

You can search for allergen(s) in the ADFS similar to any query protein based on the NCBI BLAST algorithm (protein-protein BLAST(blastp) or position-specific iterated BLAST (PSI-BLAST)) using the appropriate optional parameters. You can also search for proteins in UniProt that are similar to your query protein. Basically, if E-value is smaller than 0.0001 (1.0e-4), query protein should be considered structurally homologous to allergen protein. Specially, high identity (> 50% ) and smaller E-value (< 1.0e-7) indicates possible cross-reactivity.

6-5) Sequence Search (Epitope Search)

You can search for epitope sequence(s) in ADFS similar to any query peptide based on the NCBI BLAST algorithm (search for short, nearly exact matches) using the appropriate optional parameters. More than four amino acid residues are required to perform a search. You can also search for peptide sequences in UniProt that are similar to your query peptide.

6-6) Low Mol Wt Allergens (Name Search)

The low mol weight allergens are alphabetically listed by name. You can select an initial letter to list the allergens.

6-7) Low Mol Wt Allergens (Keyword Search)

You can search several data fields for low mol weight allergen(s) by entering keywords (phrases). Allergens that include all of your keywords will be returned. If you want results that include a long and exact phrase, please put quotation marks around your keywords. Wildcard search option is also available by checking the “Use Wildcard” checkbox on.

To top

7) Allergenicity Prediction

This site provides tools for allergenicity prediction based on two different methods (the FAO/WHO method and allerSTAT method). It should be noted that the predictions do not necessarily mean that the query protein exhibits allergenicity. Nevertheless, allergenicity predictions based on protein sequences provide important information that is helpful in identifying potential cross-reactivity with known allergens.

7-1) FAO/WHO method

The FAO/WHO method for allergenicity prediction used on this site is based on a report from a Joint FAO/WHO Expert Consultation on Foods Derived from Biotechnology. Your query sequence and the ADFS allergen sequence database are searched for shared structural similarities by FASTA, and your sequence is predicted to be allergenic if both amino acid identities and an overlapped amino acid length of your query and any allergen is more than a threshold value. The default threshold values of identity and overlapped length are 35% and 80 residues, respectively. You can also obtain a full FASTA alignment with E-value for your sequence and any allergen in the ADFS database. If a positive result is obtained using full-sequence identities, the allergenicity of the query protein will be judged as positive and the window cells will be colored pink. When the window cells are colored yellow, either the amino acid identities or an overlapped length between the query and an allergen was more than the threshold value.

In addition to the FASTA comparisons used to assess overall structural similarity, short range similarities can be analyzed to find exact wordmatches between contiguous amino acid residues in the query sequence and sequences within the ADFS database. Your sequence is predicted to be allergenic if the length of the longest identical segment is more than the threshold value (6 is the default value). If a positive result is obtained using a small exact wordmatch, the number of the exact wordmatch and the detailed results will be shown.

7-2) AllerSTAT

A data-driven machine-leaning predictions of the allergenicity are also available in the ADFS. Following the method described by Goto et al. (submitted under review), allergen motifs have been extracted from allergen sequences using the Fast-WY motif discovery tool. The potential allergenicity of your query protein is analyzed by the structural similarity between your query sequence and the allergen motifs with BLAST-P. If your query matches any of the allergen motifs, the protein is predicted to be allergenic. The analysis result window will be colored pink.

8) Acknowledgement

The construction of this site and its updates were supported by a grant from the Ministry of Health, Labor and Welfare.