menu










Allergy has become a common chronic health problem in recent years, and the introduction of recombinant proteins into foods and other products has further raised public concern about allergy. The prediction of food allergenicity is a very important topic, but progress in this area remains unsatisfactory. Using the vast amount of data produced by genomic, functional, and structural studies, bioinformatic comparisons of proteins can provide useful insights into allergenicity. This allergen database for food safety (ADFS) is a web-based database of allergenic proteins relevant to food safety. The ADFS was launched as a project of the Division of Biochemistry of the National Institute of Health Sciences, and the project was partly supported by a grant from the Ministry of Health, Labor and Welfare of Japan. The database component of the ADFS contains information on allergen names, sources, sequences, structures, domains, IgE epitopes, and literature references as well as links to the major protein (UniProt, PDB, InterPro, Pfam) and medical literature (NCBI PubMed) servers. This site also provides sequence searching tools that enable the user to obtain the sequence homology of a certain protein or peptide related to allergens (protein, epitope search). This and other bioinformatics tools can be used to rapidly determine potential cross-reactivities between allergens and to screen novel proteins for the presence of IgE epitopes that they may share with known allergens.



We are members of the Division of Biochemistry of the National Institute of Health Sciences, located in Tokyo, Japan. Dr. Reiko ADACHI and Dr. Shinobu SAKAI, supervised the construction of this site. CTC Life Science Corporation constructed the database system.



When you register with the ADFS website, you will be asked to provide your e-mail address. Your address will be kept strictly private, and we will not send any information to you except in emergencies. We will also collect information regarding the name of the domain from which you access the internet and the internet address of the website from which you linked directly to our site. We cannot identify who you are from this information. If you choose to provide us with personal information, as in an e-mail message, we will store the question and your personal information so that we can answer your question and respond by e-mail or any other contact information that you have provided. We will not disclose, give, sell or transfer any personal information about our visitors unless required by law enforcement officials or a legal statute.



While we aim to provide data that is as accurate as possible, you are strongly recommended to check the data and outcomes that you retrieve from this site against original scientific literature and other sources of information. No rights or damage can be claimed based on data provided and/or generated by this site and its tools.


The allergen and isoallergen data in the ADFS were collected from literature sources and AllergenOnline (http://www.allergenonline.org/) whose allergens have been peer-reviewed by international experts of allergology. As the sequence data of allergens in AllergenOnline are represented in NCBI gene identifier (gi) format, we substituted corresponding UniProt ID for these gi numbers to put them into our database. The latest data were collected in July 2014.


Data published in the NCBI PubMed database was used to obtain the epitope sequences listed in the ADFS; this data was primarily collected using the following keywords: Immunoglobulin E, IgE, IgE-epitope, IgE-binding, Epitope, Epitope Mapping, Linear, Conformational, Structural, Discontinuous, Three-dimensional, Sequence, Analysis, Identification, Peptide, Mimotope, Bacteriophages, Phage, Display. All epitope sequences that were retrieved were added to the database. Some of the epitope data reported before 2002 was cited from the SDAP database (http: //fermi.utmb.edu/SDAP/). The latest data were collected in July 2010. The epitope sequences collected using the above method were linear and were thus labeled as "L". In addition to linear epitopes, we also collected data on conformational epitopes described in published references in the NCBI PubMed database using the following keywords: conformational, structural, discontinuous, and three-dimensional. We labeled conformational epitopes as "C". In several cases, carbohydrate moiety itself would act as an IgE-binding epitope, and we labeled such epitopes as gSh.



Allergen data in the ADFS is presented in two different windows. A summarized table or sequence search results are shown in the main window, and detailed allergen data are shown in an Entry View window. Optional windows can be opened, if necessary. Note that each row in the main window tables corresponds to each allergen/isoallergen data entry which has an individual gi number. That means the number of rows in the table shown on the top-left is not necessarily identical to the number of unique allergens since it is often the case that one allergen has multiple gi numbers. However, if multiple data entries have an identical allergen/isoallergen name, clicking the name will open the same Entry View window.


The allergens listed in the ADFS were named according to IUIS Allergen Nomenclature (http://www.allergen.org/) as far as possible. However, some allergen entries did not have corresponding names assigned by IUIS. In such cases, we have named these allergens as follows: the first three or four letters of the genus of the organism + the first one or two letter(s) of the species of the organism + g?h. For example, we have named an unassigned allergen in peanut (Arachis hypogaea) which has gi:14347293 as "Ara h ?". Taxonomic and common names of allergens and allergen descriptions were automatically collected from UniProt and GenBank.


The allergens in the ADFS were classified into 13 categories: aero animal, aero fungi, aero insect, aero mite, aero plant, contact, food animal, food fungi, food plant, gliadin, protozoan, venom/salivary, and worm, based on allergen types in AllergenOnline.


As described in 4-1, our allergen sequences were derived from AllergenOnline by transffering gi numbers to the corresponding UniProt ID whenever possible. As for the epitope sequences, their amino acid residue numbers were represented in a manner such that the first methionine was regarded as position number one. If the corresponding sequence in UniProt did not include the first methionine, the first residue of the sequence was regarded as position number one.


Based on the UniProt data, PDB and/or HSSP IDs were included as structural information on the allergens. The corresponding PDB entry page will open in a new window.


Based on the UniProt data, InterPro and/or Pfam IDs were represented as domain information applicable to the allergens. The corresponding InterPro and/or Pfam entry page will open in a new window.


Linear epitope sequences are shown on the individual sites for each allergen, and each sequence is directly linked to the BLAST Search tool. If you click "Epitope" or "Protein" next to an epitope sequence, an Epitope Search or Protein Search window will open, respectively, with the epitope sequence pre-entered. Using this function, you can easily search for other allergens or proteins that might be cross-reactive with the original allergen through the epitope. In addition to information on the linear epitope sequences, information on the original reference (PubMed No., UniProt accession number, reference title, author, journal name and methods of epitope analysis) is also presented. When plural references are present, the information on the epitopes in each reference are individually cited. For conformational epitopes, only information on the original reference is cited. If a carbohydrate moiety itself acts as a bioactive epitope, the treatment of deglycosylation, assay method, residue number of glycosylation, are indicated, if available.


If an allergen has carbohydrate information(s) in its UniProt data, a "Sugar" icon will be displayed in the main window and the carbohydrate residue number(s) and the type of the sugar linkage will be shown in the Entry View window.


You can search for structurally related allergens by clicking a loupe icon next to the PDB/HSSP/InterPro/Pfam IDs. The results of a keyword search using the ID will then be opened.



You can search for allergens in the ADFS by name, keywords, or category. Keyword and categ ory searches are novel characteristics of our database. The site also provides sequence se arch tools that enable you to obtain the sequence homology of certain allergen-related pro teins or peptides. The details of these search methods are described below.


The allergens and isoallergens are alphabetically listed by name. You can select an initial letter to list the allergens.


The allergens are grouped into 13 categories: aero animal, aero fungi, aero insect, aero mite, aero plant, contact, food animal, food fungi, food plant, gliadin, protozoan, venom/salivary, and worm. Each category is distinguished by a different color. You can select a category to list all the allergens belonging to it.


You can search several data fields for allergen(s) by entering keywords (phrases). Allergens that include all of your keywords will be returned. If you want results that include a long and exact phrase, please put quotation marks around your keywords (e.g. glipid transfer proteinh). Wildcard search option is also available by checking the gUse Wildcardh checkbox on. By using an optional function, you can restrict your search to a certain category or to allergens that have information on their epitopes, structures, and/or sugars.


You can search for allergen(s) in the ADFS similar to any query protein based on the NCBI BLAST algorithm (protein-protein BLAST(blastp) or position-specific iterated BLAST (PSI-BLAST)) using the appropriate optional parameters. You can also search for proteins in UniProt that are similar to your query protein.


You can search for epitope sequence(s) in ADFS similar to any query peptide based on the NCBI BLAST algorithm (search for short, nearly exact matches) using the appropriate optional parameters. More than four amino acid residues are required to perform a search. You can also search for peptide sequences in UniProt that are similar to your query peptide.


The low mol weight allergens are alphabetically listed by name. You can select an initial letter to list the allergens.


You can search several data fields for low mol weight allergen(s) by entering keywords (phrases). Allergens that include all of your keywords will be returned. If you want results that include a long and exact phrase, please put quotation marks around your keywords. Wildcard search option is also available by checking the gUse Wildcardh checkbox on.
After selecting a "Mol Weight" field, molecular weight can be entered as a number query as follows:
(1) Enter =200 for searching compounds whose molecular weight is equal to 200.
(2) Enter < 200 for searching compounds whose molecular weight is lower than 200.
(3) Enter >= 100 for searching compounds whose molecular weight is greater than or equal to 100.
(4) Enter > 100 , < 200 for searching compounds whose molecular weight is between 100 and 200. You need a comma separater in between two conditions.


You can search allergen from chemical structure by Jchem paint. you can select strucutre search's type, please select Sub Structure or Exact Match.



This site provides tools for allergenicity prediction based on two different methods (the FAO/WHO method and the Motif-based method). It should be noted that the predictions do not necessarily mean that the query protein exhibits allergenicity. Nevertheless, allergenicity predictions based on protein sequences provide important information that is helpful in identifying potential cross-reactivity with known allergens.


The FAO/WHO method for allergenicity prediction used on this site is based on a report from a Joint FAO/WHO Expert Consultation on Foods Derived from Biotechnology. Your query sequence and the ADFS allergen sequence database are searched for shared structural similarities by FASTA, and your sequence is predicted to be allergenic if both amino acid identities and an overlapped amino acid length of your query and any allergen is more than a threshold value. The default threshold values of identity and overlapped length are 35% and 80 residues, respectively. You can also obtain a full FASTA alignment with E-value for your sequence and any allergen in the ADFS database. If a positive result is obtained using full-sequence identities, the allergenicity of the query protein will be judged as positive and the window cells will be colored pink. When the window cells are colored yellow, either the amino acid identities or an overlapped length between the query and an allergen was more than the threshold value.
In addition to the FASTA comparisons used to assess overall structural similarity, short range similarities can be analyzed to find exact wordmatches between contiguous amino acid residues in the query sequence and sequences within the ADFS database. Your sequence is predicted to be allergenic if the length of the longest identical segment is more than the threshold value (6 is the default value). If a positive result is obtained using a small exact wordmatch, the number of the exact wordmatch and the detailed results will be shown.


Motif-based predictions of the allergenicity of a query sequence by comparison with allergens in the ADFS are also available. Following the method described by Stadler et al. (FASEB J. 17, 1141, 2003), 83 allergen motifs and 158 allergen sequences have been extracted from 1,412 individual allergen sequences in the ADFS using the MEME motif discovery tool. The potential allergenicity of your query protein is analyzed using 2-step criteria. First, the structural similarity between your query sequence and the allergen motifs are compared using a profile analysis tool, pftool. If your query matches any of the allergen motifs, the protein is predicted to be allergenic. If not, your query sequence is then successively analyzed for amino acid sequence similarities to the allergen sequences that did not match any of the allergen motifs (158 sequences) using the BLAST algorithm (protein-protein BLAST(blastp)) with an adjustable E-value cut-off. If your query protein is predicted to be allergenic, the Total and Motif and/or BLAST result window cells will be colored pink.

The construction of this site and its updates were supported by a grant from the Ministry of Health, Labor and Welfare.


allergen list
Copyright NIHS . All Rights Reserved.