Regular article
Fully synthetic human combinatorial antibody libraries (HuCAL) based on modular consensus frameworks and CDRs randomized with trinucleotides1,

https://doi.org/10.1006/jmbi.1999.3444Get rights and content

Abstract

By analyzing the human antibody repertoire in terms of structure, amino acid sequence diversity and germline usage, we found that seven VH and seven VL (four Vκ and three Vλ) germline families cover more than 95 % of the human antibody diversity used. A consensus sequence was derived for each family and optimized for expression in Escherichia coli. In order to make all six complementarity determining regions (CDRs) accessible for diversification, the synthetic genes were designed to be modular and mutually compatible by introducing unique restriction endonuclease sites flanking the CDRs. Molecular modeling verified that all canonical classes were present. We could show that all master genes are expressed as soluble proteins in the periplasm of E. coli. A first set of antibody phage display libraries totalling 2×109 members was created after cloning the genes in all 49 combinations into a phagemid vector, itself devoid of the restriction sites in question. Diversity was created by replacing the VH and VL CDR3 regions of the master genes by CDR3 library cassettes, generated from mixed trinucleotides and biased towards natural human antibody CDR3 sequences. The sequencing of 257 members of the unselected libraries indicated that the frequency of correct and thus potentially functional sequences was 61 %. Selection experiments against many antigens yielded a diverse set of binders with high affinities. Due to the modular design of all master genes, either single binders or even pools of binders can now be rapidly optimized without knowledge of the particular sequence, using pre-built CDR cassette libraries. The small number of 49 master genes will allow future improvements to be incorporated quickly, and the separation of the frameworks may help in analyzing why nature has evolved these distinct subfamilies of antibody germline genes.

Introduction

The selection of antibody fragments from libraries using enrichment technologies such as phage-display (Smith & Scott, 1993), ribosome display (Hanes & Plückthun, 1997), bacterial display (Georgiou et al., 1997) or yeast display (Kieke et al., 1997) has proven to be a successful alternative to classical hybridoma technology (for recent reviews, see Winter et al 1994, Hoogenboom et al 1998, Spada et al 1997, Rodi and Makowski 1999). Phage display was developed first (Smith, 1985) and has been improved the furthest, especially in the antibody field. It is likely that conventional hybridoma technology may be superceded by a combination of these technologies, since these approaches are faster, involve no animals, yield antibodies of at least comparable affinities and work also with self-antigens or toxic molecules (Hoogenboom et al., 1998). The selection of antibodies must start from an initial, highly diverse library. Here, we describe the construction of such a library by total gene synthesis, based on a structural analysis of the human antibody repertoire.

Human antibodies are of particular interest, since they are considered to be valuable for therapeutic applications (Carter & Merchant, 1997), avoiding the HAMA (human anti-mouse antibody) response frequently observed with rodent antibodies. Although it has been demonstrated in many examples (Dall’Acqua & Carter, 1998) that chimerization or humanization of rodent antibodies through protein engineering can successfully retain the affinity and specificity of the parental molecule (Baca et al., 1997), this strategy is time-consuming and still does not yield fully human antibodies.

Previous phage-display libraries of human antibodies have been generated from immunized donors (Barbas & Burton, 1996), germline sequences (Griffiths et al., 1994) or, most recently, naive B-cell Ig repertoires Vaughan et al 1996, Sheets et al 1998, De Haard et al 1999. Selection from these libraries by phage-display has yielded human antibodies against numerous haptens, peptides and proteins. While these libraries have all been successful, their uncontrollable composition and problems with the subsequent expression of the antibodies (see below) and restricted engineering possibilities made it desirable to use a complete protein engineering approach to solve the problem.

The success of obtaining high-affinity antibodies is generally assumed to be related to the initial library size (Perelson, 1989), even though the exact relation may not be tractable by theoretical considerations, as it may be antigen-dependent. Consequently, successful “one-pot” libraries have all been large Griffiths et al 1994, Vaughan et al 1996, Sheets et al 1998, De Haard et al 1999. It is important to note that, obviously, only the functional library size, i.e. the number of correctly assembled clones without any frameshift, stop codon or deletion, will contribute to the diversity. This number can be orders of magnitude below the apparent diversity usually reported, which is normally obtained by counting the numbers of transformants.

It has been shown that the Escherichia coli expression yields of functional antibody fragments can vary dramatically, even if the antibody gene is expressed in the same format, vector and expression strain. This effect has been shown to depend on cellular folding, which in turn is influenced by the antibody sequence and can be successfully improved by protein engineering (Knappik & Plückthun, 1995). There is growing evidence that critical amino acid residues located in turns at the surface or at the variable-constant (V-C) interface are responsible for the misfolding, aggregation or even toxic effects on the E. coli cells, hence leading to poor expression yields. Mutating those residues improved expression titers several-fold, without adversely affecting the binding properties Deng et al 1994, Knappik and Pluckthun 1995, Ulrich et al 1995, Jung and Pluckthun 1997, Nieba et al 1997, Forsberg et al 1997. As phage display depends on correctly folded antibodies, there is some selection against poor folders Deng et al 1994, Jackson et al 1995, Jung and Pluckthun 1997, Bothmann and Pluckthun 1998, and thus the functional library size will be decreased. However, the selection is clearly not stringent enough to secure that all molecules selected from a phage display library will have acceptable folding properties. Thus, to maintain diversity and secure reasonable expression properties of the selected molecules, it would be advantageous to create antibody libraries starting from well-expressed frameworks. While such approaches have been reported Pini et al 1998, Jirholt et al 1998, only single frameworks have been used in these attempts, and consequently, the structural diversity does not approach that of other naive libraries.

The humoral immune system, however, does not work by the “single-pot” approach (Nissim et al., 1994), but rather uses an evolutionary strategy. The initial, antigen-independent variability is first generated during B-cell development by gene rearrangements (V(D)J-joining), leading to more than 109 different molecules at any one time in a human being (Winter, 1998). After a B-cell is activated, the antigen-driven process of somatic mutation is initiated (Rajewsky, 1996), and remarkable improvements in binding can be found. It has been shown that mutations occurring in CDRs 1 and 2 are preferentially selected Wagner and Neuberger 1996, Ignatovich et al 1997, Green et al 1998, as their diversity in the initial germline variants is much more limited than that of the CDR3s (Tomlinson et al., 1996). The design of an artificial library should make it convenient to follow this same approach. Indeed, previous experiments with peptides (Cwirla et al., 1997), RNA-aptamers (He et al., 1996) and antibodies Schier et al 1996a, Hanes et al 1998 have shown that the evolutionary approach and, in the case of antibodies, CDR walking Yang et al 1995, Schier et al 1996a, Wu et al 1998 can dramatically improve affinities. However, in the absence of suitably engineered genes, such an optimization can be extremely laborious.

The human antibody germline repertoire has recently been completely sequenced. There are about 50 functional VH germline genes located on chromosome 14 Tomlinson et al 1992, Matsuda and Honjo 1996, which can be grouped into six subfamilies according to sequence homology. About 40 functional VL kappa genes comprising seven subfamilies are located on chromosome 2 Cox et al 1994, Barbie and Lefranc 1998, and about 30 functional VL lambda genes grouped into ten subfamilies can be found on chromosome 22 Williams et al 1996, Kawasaki et al 1997, Pallares et al 1998. The groups vary in size from one member (e.g. VH6 and Vκ4) to up to 22 members (VH3), and the members of each group share a high degree of sequence homology. By comparing rearranged sequences of human antibodies with their germline counterparts we (this work) and others Cox et al 1994, Ignatovich et al 1997 have found that many human germline genes are never or only very rarely used during an immune response.

In structural terms, the VH and VL domains comprising the antigen binding Fv moiety (see Figure 1) share a common fold that, in its central portions, is almost perfectly superimposable, even when fragments from different species are compared (Chothia et al., 1998). Larger differences are observed only in the conformation of the CDRs, and it has been shown in a series of studies Chothia and Lesk 1987, Chothia et al 1989, Al-Lazikani et al 1997 that all CDRs except VH CDR3 adopt only a few distinct conformations. Hence the repertoire of conformations is limited to a relatively small number of discrete structural classes, depending on both the CDR length and the so-called canonical amino acid residues (Chothia & Lesk, 1987).

Here, we report the design, construction and analysis of a novel human antibody library concept designated HuCAL (Human Combinatorial Antibody Libraries). Each of the human VH and VL subfamilies that is frequently used during an immune response is represented by one consensus framework, resulting in seven HuCAL master genes for heavy chains and seven for light chains, and thus 49 combinations. All genes were made by total synthesis, thereby taking into consideration codon usage, unfavorable residues that promote protein aggregation as well as unique and general restriction sites flanking all CDRs, leading to modular genes that contain readily accessible CDRs and can be easily converted into different antibody formats.

A first set of antibody libraries based on the HuCAL concept was created by randomizing both the VH and VL CDR3 encoding regions of the 49 master genes using trinucleotide cassette mutagenesis (Virnekäs et al., 1994), which leads to high-quality libraries. The cassettes were designed such that the naturally occuring diversity was covered, both in terms of length and amino acid composition. The final HuCAL antibody libraries (HuCAL version 1) were extensively characterized by sequencing, expression behavior and numerous selection experiments against a wide variety of antigens.

Section snippets

Sequence analysis

Amino acid sequences from variable domains of human immunoglobulins were collected from Kabat (Kabat et al 1991, Johnson et al 1996;†) and Genbank (Benson et al., 1997) and incorporated into three databases, V heavy chain (VH), V kappa (Vκ) and V lambda (Vλ), and aligned, using the Kabat numbering system. For each of the three chain types, rearranged sequences were collected whenever more than 70 positions had been determined, giving 386, 149 and 675 entries for Vκ, Vλ and VH, respectively, at

Design of consensus frameworks

The compilation of rearranged sequences was first divided into separate groups (four Vκ, three Vλ and seven VH) according to the germline families described above. These protein sequence databases were used to compute the consensus sequences of each subgroup. By using the rearranged sequences instead of the germline sequences for calculating the consensus, the consensus was automatically weighted according to the frequency of usage. Additionally, frequently mutated and highly conserved

Molecular modeling and analysis

To obtain more information about the packing, CDR conformations and framework properties, all seven VH frameworks, all four Vκ frameworks and the three Vλ frameworks were built via homology modeling. As a basis, a complete structural alignment of the approximately 100 independent antibody sequences available in the PDB (Bernstein et al., 1977) was carried out as indicated in the legend to Figure 3. Usually, the template with the highest resolution and the fewest mutations relative to the

Construction of the seven VH and seven VL master genes

The final result of the analysis described above was a collection of 14 amino acid sequences, which represent the frequently used antibody repertoire of the human immune system. These sequences were then back-translated into DNA sequences. In a first step, the back-translation was carried out using only codons that are known to be used frequently in E. coli. In a second step, these gene sequences were then examined for all possible restriction endonuclease sites, which could be introduced

E. coli expression analysis

The E. coli expression of the 49 scFv genes (all containing the same VH and VL CDR3s from the antibody hu4D5, see Carter et al., 1992a) was studied similarly as described by Knappik & Plückthun (1995). We found that all 49 master genes could be expressed as soluble proteins in the periplasm of E. coli, yielding a band of the correct size in FLAG Western blots of soluble E. coli crude extracts (data not shown). This indicates that all 49 combinations are most likely capable of forming VH/VL

Design and construction of CDR3 library cassettes

Our rational approach to creating an antibody library aims at defining, with the smallest number of molecules possible, a structural diversity as large as possible. At the same time, it was important to design molecules that are likely to be stable and fold well. Furthermore, it was essential to direct the sequence diversity to those residues most likely in contact with the antigen. We decided for the first set of HuCAL libraries to randomize both CDR3 regions of the VH and VL genes

Diversity and binding constants

Phage-display as well as ribosome-display selection experiments were performed against a variety of antigens, including proteins, peptides, or whole cells. The HuCAL1 library comprosing all 49 combinations was used for selection experiments. Two or three panning rounds of phage display, or five or six rounds of ribosome display were performed in each case. After the final round, the selected scFv genes were subcloned as a pool in an expression vector and the transformants were screened for

Discussion

Here, we describe the realization of the concept of fully synthetic human antibody libraries, designated HuCAL, which are built on seven VH and seven VL consensus frameworks, yielding 49 combinations in total.

We have extensively used these first libraries for the successful selection of highly specific binders against all kinds of antigens, including haptens, DNA, peptides, and proteins, including cell-bound receptor antigens (unpublished results). Intrinsic affinities down to the sub-nanomolar

Conclusions and perspective

The HuCAL concept is based on covering the essential features of the human antibody repertoire with a minimal number of different sequences, which are designed to facilitate extensive manipulation with standard protein engineering techniques. The 49 combinations of master genes have been cloned as scFv genes in both orientations and as Fab genes. Other formats like Fv fragments stabilized for example by disulfide-bridges Glockshuber et al 1990, Brinkmann et al 1995, Rodrigues et al 1995 or

Bacterial strains, phages, vectors

Molecular cloning was carried out using the E. coli strains JM83 (Yanisch-Perron et al., 1985), XL1-Blue (Stratagene) or Top10 (Invitrogen). For expression experiments, JM83 was used. Phage-display libraries were generated and propagated using E. coli TG1 as host strain and M13K07 or VCSM13 as helper phage (all from Stratagene). The products from gene synthesis were cloned in pZero-1 (Invitrogen) or pCR-Script SK(+) (Stratagene) for sequencing. The pBS vector series used for antibody cloning

Supplementary Files

Acknowledgements

We thank Ilona Meyer, Peter Rudolph, Martina Mayer, Isabel Rapp, Ursi Holzinger and Siegfried Hirler for excellent technical assistance, Leodevico Ilag for advice in structural analysis, and Titus Kretzschmar and William Reisdorf for critical reading of the manuscript. We thank Josef Hanes, Barbara Krebs, Titus Kretzschmar, Ralf Ostendorp, Josef Prassler, Christine Rothe, Silke Reiffert, Christiane Schaffitzel and Robert Schier for providing the data given in Table 4. We thank David Fischer,

References (118)

  • C. Chothia et al.

    Structural determinants in the sequences of immunoglobulin variable domain

    J. Mol. Biol

    (1998)
  • G.P. Cook et al.

    The human immunoglobulin V-H repertoire

    Immunol. Today

    (1995)
  • W. Dall’Acqua et al.

    Antibody engineering

    Curr. Opin. Struct. Biol

    (1998)
  • H.J. De Haard et al.

    A large non-immunized human Fab fragment phage library that permits rapid isolation and kinetic analysis of high affinity antibodies

    J. Biol. Chem

    (1999)
  • J. De Kruif et al.

    Selection and application of human single chain Fv antibody fragments from a semi-synthetic phage antibody display library with designed CDR3 regions

    J. Mol. Biol

    (1995)
  • S.J. Deng et al.

    Selection of antibody single-chain variable fragments with improved carbohydrate binding by phage display

    J. Biol. Chem

    (1994)
  • H. Dorsam et al.

    Antibodies to steroids from a small human naive IgM library

    FEBS Letters

    (1997)
  • J. Foote et al.

    Antibody framework residues affecting the conformation of the hypervariable loops

    J. Mol. Biol

    (1992)
  • G. Forsberg et al.

    Identification of framework residues in a secreted recombinant antibody fragment that control production level and localization in Escherichia coli

    J. Biol. Chem

    (1997)
  • Y.Y. He et al.

    In vitro evolution of the DNA binding sites of Escherichia coli methionine repressor, MetJ

    J. Mol. Biol

    (1996)
  • H.R. Hoogenboom et al.

    By-passing immunisation. Human antibodies from synthetic repertoires of germline VH gene segments rearranged in vitro

    J. Mol. Biol

    (1992)
  • H.R. Hoogenboom et al.

    Antibody phage display technology and its applications

    Immunotechniques

    (1998)
  • O. Ignatovich et al.

    The creation of diversity in the human immunoglobulin V-lambda repertoire

    J. Mol. Biol

    (1997)
  • P. Jirholt et al.

    Exploiting sequence space-shuffling in vivo formed complementarity determining regions into a master framework

    Gene

    (1998)
  • A. Krebber et al.

    Reliable cloning of functional antibody variable domains from hybridomas and spleen cell repertoires employing a reengineered phage display system

    J. Immunol. Methods

    (1997)
  • A.C. Langedijk et al.

    The nature of antibody heavy chain residue H6 strongly influences the stability of a VH domain lacking the disulfide bridge

    J. Mol. Biol

    (1998)
  • F. Matsuda et al.

    Organization of the human immunoglobulin heavy-chain locus

    Advan. Immunol

    (1996)
  • V. Morea et al.

    Conformations of the third hypervariable region in the VH domain of immunoglobulins

    J. Mol. Biol

    (1998)
  • S. Munro et al.

    An Hsp70-like protein in the ER: identity with the 78 kD glucose-regulated protein and immunoglobulin heavy chain binding protein

    Cell

    (1986)
  • B. Oliva et al.

    Automated classification of antibody complementarity determining region 3 of the heavy chain (H3) loops into canonical forms and its application to protein structure prediction

    J. Mol. Biol

    (1998)
  • P.A. Patten et al.

    Applications of DNA shuffling to pharmaceuticals and vaccines

    Curr. Opin. Biotechnol

    (1997)
  • A. Pini et al.

    Design and use of a phage display library-human antibodies with subnanomolar affinity against a marker of angiogenesis eluted from a two-dimensional gel

    J. Biol. Chem

    (1998)
  • K. Proba et al.

    Antibody scFv fragments without disulfide bonds made by molecular evolution

    J. Mol. Biol

    (1998)
  • D.J. Rodi et al.

    Phage-display technology-finding a needle in a vast molecular haystack

    Curr. Opin. Biotechnol

    (1999)
  • F.A. Saul et al.

    Structural patterns at residue positions 9, 18, 67 and 82 in the VH framework regions of human and murine immunoglobulins

    J. Mol. Biol

    (1993)
  • R. Schier et al.

    Isolation of high-affinity monomeric human anti-c-erbb-2 single chain Fv using affinity-driven selection

    J. Mol. Biol

    (1996)
  • R. Schier et al.

    Isolation of picomolar affinity anti-c-erbb-2 single-chain Fv by molecular evolution of the complementarity determining regions in the center of the antibody binding site

    J. Mol. Biol

    (1996)
  • H. Shirai et al.

    Structural classification of CDR-H3 in antibodies

    FEBS Letters

    (1996)
  • G.P. Smith et al.

    Libraries of peptides and proteins displayed on filamentous phage

    Methods Enzymol

    (1993)
  • E. Söderlind et al.

    Domain librariessynthetic diversity for de novo design of antibody V-regions

    Gene

    (1995)
  • S. Spada et al.

    Reproducing the natural evolution of protein structural features with the selectively infective phage (SIP) technologythe kink in the first strand of antibody kappa domains

    J. Mol. Biol

    (1998)
  • B. Steipe et al.

    Sequence statistics reliably predict stabilizing mutations in a protein domain

    J. Mol. Biol

    (1994)
  • I.M. Tomlinson et al.

    The repertoire of human germline VH sequences reveals about 50 groups of VH segments with different hypervariable loops

    J. Mol. Biol

    (1992)
  • J.C. Almagro et al.

    Structural differences between the repertoires of mouse and human germline genes and their evolutionary implications

    Immunogenetics

    (1998)
  • C.F. Barbas et al.

    Semisynthetic combinatorial antibody libraries: a chemical solution to the diversity problem

    Proc. Natl Acad. Sci. USA

    (1992)
  • S.M. Barbas et al.

    Recognition of DNA by synthetic antibodies

    J. Am. Chem. Soc

    (1994)
  • V. Barbie et al.

    The human immunoglobulin kappa variable (IGKV) genes and joining (IGKJ) segments

    Exp. Clin. Immunogenet

    (1998)
  • S. Barre et al.

    Structural conservation of hypervariable regions in immunoglobulins evolution

    Nature Struct. Biol

    (1994)
  • B. Baskin et al.

    Characterization of the CDR3 region of rearranged alpha heavy chain genes in human fetal liver

    Clin. Exp. Immunol

    (1998)
  • D.A. Benson et al.

    Genbank

    Nucl. Acids Res

    (1997)
  • Cited by (637)

    View all citing articles on Scopus

    Supplementary material for this paper is available fromJMB Online.

    1

    Edited by I. A. Wilson

    2

    Present addresses: L. Ge, Xerion Pharmaceuticals, Fraunhoferstr. 9, 82152 Martinsried, Germany; P. Pack, MTM Laboratories AG, Heidelberg, Germany.

    View full text