| |
1) What is
domIns all about?
Welcome
to the new and updated domIns website! DomIns is a web resource
aimed at providing comprehensive information on domain insertions
in proteins of known structure. We have followed the definition
of protein domains as in the SCOP (Structural Classification of
Proteins) database in order to identify insertions. The server is
currently updated to SCOP version 1.71 and PDB_Select March 2006
version. The previous version of domIns used the SCOP version 1.61
and PDB_Select April 2002 version.
2) What are domain insertions?

In the above
figure, the E.coli protein Malonyl-CoA:Acyl Carrier Protein transacylase
has two domains: the catalytic domain (coloured blue and green)
is interrupted by the insertion of the ACP-binding domain (coloured
red and yellow). The parent domain (catalytic domain) has two regions,
with residue position from 3-127 and 128-307 in the same domain.
Both the parent and the insert domains belong to two different superfamily
of proteins. Similar arrangement is seen in Streptomyces coelicolor
malonyl-CoA:ACP transacylase as well (pdb code:1nm2). This is an
example for single insertion, where the parent domain is interrupted
by a single insert domain. In mutiple insertions, there is more
than one insert domain.
3) How does one identify domain insertions?
Although
there are several schemes for protein structure classification for
investigating protein sequences and structures, SCOP is important
as it is a manually curated classification of proteins of know structures
from the protein data Bank based on their structural and evolutionary
relatedness. In SCOP, a protein domain is considered as an unit
of evolution if it occurs independently or in combination with other
domains on the basis of evidence from proteins of known structure.
SCOP has a hierarchical classification scheme with the principal
levels being family, superfamily, fold and class. Proteins clustered
together into families are clearly evolutionarily related, usually
detectable at sequence level. Proteins brought together into superfamilies
although have low sequence identity, their structural and functional
features suggest a common evolutionary origin. Superfamilies with
similar topology, but without evidence for evolutionary relatedness
are grouped under a fold. Folds are then classified into classes
based on the secondary structure elements present.
We have considered only the
first five classes (All-alpha, All-beta, alpha/beta, alpha+beta
and Small proteins), the fold and the superfamily level of SCOP
hierarchy for determining insertions. We excluded mono-domain proteins
and considered chains which have at least two domains in them. In
multi-domain proteins, while it is usual to have two domains linked
in a linear fashion, i.e., the C-terminus of the first domain covalently
linked to the N-terminus of the second domain, we looked for domains
which are interrupted in the middle by the insertion of another
domain. Thus, the second domain (insert) begins and ends inside
the first domain (parent domain). The domains involved in insertions
can come from the same or different SCOP superfamily.
4) About the access methods, how does one use the “browse
all entries” option?
This option allows you to browse
all PDB entries with at least a single insertion. There is also
an option to view entries from a non-redundant set of proteins.
We have used PDB_Select for obtaining a representative list of protein
chains from the PDB. PDB_Select contains several lists, each at
a different cutoff of similarity. Although the most stringent is
the 25% list, in which no two proteins have more than 25% sequence
identity (for alignments of length 80 or more residues), we have
used a 90% list.
The lists can be obtained from:
http://bioinfo.tg.fh-giessen.de/pdbselect/
The algorithm to extract the lists is explained in:
Selection of a representative set of structures from the Brookhaven
Protein Data Bank",Protein Science 1 (1992), 409-417.
5) How can you do a simple search?
A simple search allows one to
look for insertions, given a PDB identifier with or without chain
information. No result for a given query can be because of one of
the following reasons:
(a) No known insertion
(b) There is no SCOP classification available for the structure
(c) The structure is not part of true class (a to g) as defined
in SCOP
(d) We may have missed identifying the insertion in which case it
would nice if you can let us know.
6) Can one obtain a list going by insertion type?
We have categorized known insertions
as single or multiple depending on the number of insert domains
in a given chain. In single insertions, a domain belonging to a
particular superfamily gets inserted into another domain of the
same superfamiy or of a different superfamily. In multiple insertions,
more than one insert, of the same or different superfamily is inserted
into the parent domain. There is a feature to display entries belonging
to either of these categories.
7) How can we obtain a list based on insertion
combination?
We have provided a search facility
where we have grouped insertions based on the combination of SCOP
classes. For example, clicking the cell marked 1 will retrieve the
list of entries where the parent domain belongs to alpha/beta class
and the insert belongs to alpha+beta class.
The list of entries with a specific
parent or insert class can also be obtained by clicking the individual
classes on the top-most horizontal row for parent classes or the
first vertical column for insert classes. For example, the cell
marked 2 will retrieve all entries which have at least one parent
domain belonging to All-alpha class while the clicking the cell
marked 3 will retrieve all entries which have at least one insert
domain belonging to alpha+beta class.
For each entry (chain) in the database, we provide the following
information: the name of the protein, its biochemical function,
Medline reference for the structure, the number of domains, their
boundary (based on SCOP domain definition), sequence information,
links to SCOP, CATH, FSSP, PDBSum and MMDB.
8) What sort of software packages/tools went into the making
of domIns?
We have used mySQL and
HTML pages to create the resource.
|