Transcription

Current Organic Chemistry, 2006, 10, 00-001Quantitative Analysis of Biomolecular NMR Spectra: A Prerequisite for theDetermination of the Structure and Dynamics of BiomoleculesThérèse E. Malliavin*Laboratoire de Biochimie Théorique, CNRS UPR 9080, Institut de Biologie PhysicoChimique, 13 rue P. et M. Curie,75 005 Paris, France.Abstract: Nuclear Magnetic Resonance (NMR) became during the two last decades an important method forbiomolecular structure determination. NMR permits to study biomolecules in solution and gives access to themolecular flexibility at atomic level on a complete structure: in that respect, it is occupying a unique place i nstructural biology. During the first years of its development, NMR was trying to meet the requirements previouslydefined in X-ray crystallography. But, NMR then started to determine its own criteria for the definition of astructure. Indeed, the atomic coordinates of an NMR structure are calculated using restraints on geometricalparameters (angles and distances) of the structure, which are only indirectly related to atom positions: in thatrespect, NMR and X-ray crystallography are very different. The indirect relation between the NMR measurementsand the molecular structure and dynamics makes critical the precision and the interpretation of the NMR parametersand the development of quantitative analysis methods. The methods published since 1997 for liquid-NMR ofproteins are reviewed here. First, methods for structure determination are presented, as well as methods for spectralassignment and for structure quality assessment. Second, the quantitative analysis of structure mobility i sreviewed.1. INTRODUCTIONNuclear Magnetic Resonance (NMR) became at thebeginning of 90's, a method of choice for structural biologyin solution. NMR was facing a previously developed andwell-established technique: the X-ray crystallography, andthe use of NMR had to be developed in structural biology.To favor this development, NMR should keep characteristicssimilar to X-ray crystallography, ie: (i) use as much aspossible the protocols, softwares and force-field parametersdeveloped for X-ray crystallography, (ii) obtain aconvergence of the atomic coordinates similar to thoseobtained for crystal structures.Since the end of the 90's, the definition of abiomolecular structure evolved. The unfolded or the flexiblestate of a protein was becoming a possible biologicallysignificant state, and the conformational transition ofproteins was proposed to be the basis of the neurodegenerative diseases. The ability of NMR to observeinternal mobility of molecules in solution thus became amajor advantage of the method.The structure determination by NMR is based on themeasurement of geometrical parameters (distances andangles) between atoms. The influence of the NMRmeasurements on the structure determination is a key pointfor several reasons. (i) The parameters measured by NMR arequalitatively very different from the electronic density fromwhich the crystallographic structures are determined, and thescientific knowledge accumulated in X-ray crystallographycannot thus be transferred directly to NMR. (ii) Therelationship between the geometrical and the NMR*Address correspondence to this authors at the Laboratoire de BiochimieThéorique, CNRS UPR 9080, Institut de Biologie PhysicoChimique, 13 rueP. et M. Curie, 75 005 Paris, France; Tel: (33) 1 58 41 51 68; Fax: (33) 1 5841 50 26; E-mail: [email protected]/06 50.00 .00parameters is fuzzy, because each NMR parameter isdepending not only on the structure, but also on the internaldynamics of the molecule, and these two aspects are difficultto separate.The development of methods to quantitativelyinvestigate structure and/or internal mobility from NMRmeasurements, is thus playing an important role for thedevelopment of NMR in structure biology, and the presentreview is oriented towards the presentation of these methods.The review will focus on proteins, on liquid-NMR, andon articles published since 1997. The experiments developedfor the measurement of new NMR phenomena andparameters will not be described. This review is not intendedto present the processing methods used to transform the freeinduction decay signal to spectral signal.Other reviews were recently published about specificaspects of the quantitative analysis of NMR spectra. Tworeviews are dealing with the validation of protein modelsdetermined by NMR [1, 2]. A larger number of reviews arepresenting methods for automatic spectral assignment andstructure calculation [3-8]. Reviews are focusing [9, 10] onthe methods for structure calculation and refinement: longrange orientational and distance restraints, rigid-bodydynamics, database potentials. The Ref. 11 is presentingmethods for structure and assignment determination usingdipolar couplings. Two reviews are presenting theinterpretation of chemical shifts and coupling constants inmacromolecules [12] and the theory of chemical shiftanisotropy [13].Several reviews are presenting the analysis of internalmotions using NMR relaxation experiments [14-16], theanalysis of the Brownian tumbling [17] and the prediction ofNMR relaxation data from protein structures [18]. Two morerecent reviews are dealing more specifically with the 2006 Bentham Science Publishers Ltd.

2 Current Organic Chemistry, 2006, Vol. 10, No. 2applications of the studies of protein internal dynamics:analyzing the protein disorder [19] or the molecularrecognition [20].Protein liquid-NMR is now facing two main challenges:(i) to make the structure determination easier, more preciseand of better quality, (ii) to put together structure andinternal mobility, in order to obtain a complete view ofbiomolecules. The present review is thus organized in twomain parts, one devoted to the structure determination, andthe other to the analysis of internal mobility.2. STRUCTURE DETERMINATION:The NMR structure determination requires first theassignment of NMR spectra, ie. the assignments of eachsignal resonance frequency to a least one nucleus in themolecule. Such an assignment is usually solved in aprogressive way, by determining first the spin systems(clustering the chemical shifts according to the residues),then by determining the sequential assignment (ordering thespin systems in the sequence), and finally by performing theassignments of the nuclear Overhauser effects (NOEs).As an NMR molecular structure is mainly defined byinteratomic distances, obtained from the NOE measurements,the first attempts at the end of the 80's to calculate atomiccoordinates of a structure from distance restraints were basedon the use of the Distance Geometry. This method issupplementing the NMR distance restraints by otherrestraints derived from the properties that should have anEuclidian object as the molecular structure in the 3D space.Unfortunately, it turned out that the problem of calculatingthe coordinates from the distances, is quite underdetermined, because the number of NMR restraints is smallwith respect to the number of degrees of freedom, and alsobecause the NMR restraints have a small upperbound, ie.they usually correspond to distances smaller than 5 Å. It isthus difficult to define an Euclidian object from thesedistances, and that results is distortions of the obtainedstructures, which should then be removed by runningmolecular dynamics simulations in order to overcome energybarriers. Because of these disadvantages, the majority of thestructures is now calculated by optimization methods,among them the most popular is the simulated annealing.The NMR structure calculation is based on theoptimization of atomic coordinates with respect to restraintsderived from NMR measurements, in the frame of amolecular modeling force-field. The main source of restraintsis usually the interatomic distances, evaluated from theintensity of magnetization transfer between the spins throughthe 3D space (nuclear Overhauser effect). This evaluation isimprecise, because of the existence of indirect pathways formagnetization transfer (spin diffusion), and because of theinfluence of internal mobility on the phenomenon.Additional restraints are provided by the J-couplingconstants, which permit an estimation of the dihedral angles.The residual dipolar couplings, which can be measured on analigned biomolecule aligned, for example in a liquidcrystalline medium, are giving access to angle valuesbetween internuclear vectors and orientational tensor of themolecule [21]. The angles between interatomic vectors can bedetermined by using the effects of dipole-dipole crosscorrelated relaxation [22].Thérèse E. Milliavin2.1. Spectral AssignmentThe assignment of protein NMR spectra is classicallyperformed in two steps: the sequential assignment, whichdetermines the chemical shifts of the backbone nuclei, andthe NOE assignment, which is usually performed in parallelwith the structure calculation, and which provides thedistance restraints determining the fold. If these two steps areperformed manually, they are demanding several months ofan human expertise. Thus, following the development ofstructural proteomics [23], the search for fast and automaticmethods to assign NMR spectra, was increasing.The NOE assignment, performed in parallel with thestructure determination, is the subject of many developments[24-31]. A seminal work in that direction is the ARIAapproach [32, 33]. The general framework of the automaticNOE assignment is the following. Using the sequentialassignment information, the NOE cross-peaks can beautomatically assigned to several spin pairs. But, many ofthese assignments are false, and the automatic approachesconsist to sort out the false and possible assignments byrunning iteratively structure calculation and removing falseassignments according to their lack of consistency with thestructure. The lack of consistency can be evaluated fromstatistics on the restraint violation (ARIA: Ref. 32,NOAH/DIAMOND: Ref. 25), by network-anchoring score(CANDID: Ref. 28) and by Bayesian (AUREMOL: Ref. 27)or other probabilistic analysis (PASD: Ref. 31). The effect ofthe chemical shift tolerance on the result of ARIA wasinvestigated [34], and it was shown that ARIA protocol candeal with a large number of assignment possibilities for eachpeak, provided the correct option is present. Further step inthe direction of automatic NOE assignment was proposed, inthe case of incomplete assignment of sidechains [35], bysimulating the missing chemical shifts from preliminarystructures and by introducing the simulated chemical shiftinformation into the ARIA protocol.Automatic methods for sequential assignment are basedon algorithms previously developed in computer science, asartificial intelligence (AUTOASSIGN: Ref 36) or patternrecognition [37]. Other approaches (TATAPRO: Ref. 38,MAPPER: Ref. 39) are based on statistical models and arerequiring the measurements of 13C and 15N chemical shiftsfor TATAPRO, and (13C , 13C ) chemical shifts forMAPPER. Another method, MARS, [40] is using only the13C /13C connectivity information, and was extensivelytested with respect to missing peaks and distortions into thepeak alignment. A hierarchical algorithm, HYPER [41], wasproposed to perform stereospecific assignment, and twoBayesian approaches, SPI and BACUS, [42, 43] aredetermining (i) the spin systems from homonuclear and 15Nheteronuclear spectra, (ii) the probabilistic identities ofNOESY cross-peaks in terms of the chemical shifts providedby SPI. Bayesian general frames for backbone resonanceassignment were proposed [44-46]. Methods for automaticsequential assignment based on the use of 15N connectivitiesobserved on HNN and HN(C)N experiments were proposed[47], based on the different typical peak patterns observed inthese experiments. The computational complexity of thesequential assignment problem using only 13C chemicalshift data and C (i, i - 1) sequential connectivityinformation was explored [48].

Quantitative Analysis of Biomolecular NMR SpectraAn approach to automatic assignment is focusing on thedirect determination of the protein tertiary [49, 50] orsecondary [51] structure from the unassigned NMR spectra.The assignment is then a consequence of the structuredetermination. Up to now, the best result in that directionwas obtained using the CLOUD approach [52] which isdetermining the protein structure as a density of protons(cloud) from the NOEs. The protein structure is thendetermined by threading [53] from the proton densities. Analgorithm was proposed [54] for the NMR-constrainedthreading of a protein on a given structure. Several tools forcomputer-aided spectral assignment are available [55-60].A complete set of tools, partially based onAUTOASSIGN, was developed [61] for rapid and automaticdetermination of medium-accuracy protein backbonestructures. Simulations of 3D NOESY-HSQC were proposedto help the spectral assignment [62].2.2. Structure CalculationThe calculation of biomolecular structure under NMRrestraints is facing the problem of finding the global energyminimum of multi-dimensional conformational space,without performing an exhaustive search. The simulatedannealing procedures are using high temperatures (ie. highkinetic energies) to overcome the energy barriers of thepotential hypersurface, but, in Cartesian coordinates, thehigh-frequency vibrations of the bond and angle bondpotentials are introducing dynamics instabilities at hightemperatures, and are thus limiting the efficiency of thesimulated annealing. An alternative to Cartesian coordinatesis to perform the calculations in the torsion angle space,where the bonds and bond angles are intrinsically defined,and cannot give rise to high-frequency vibrations. During thelast years, several algorithms were described forimplementing the torsion angle space dynamics in CNS [63,64], CYANA [65], XPLOR [66], XPLOR-NIH [67, 68] andROSETTA [69]. The use of the torsion angle space wasshown [63] to dramatically improve the convergence radiusof a DNA duplex and to alleviate the steric hindranceobserved in structures calculated in Cartesian coordinates.The torsion angle space was also used to perform asystematic search in the conformational space [70, 71], andto determine the relative orientation of covalently linkedprotein domains using dipolar coupling restraints [72].Additional knowledge obtained from the analysis ofpreviously determined structures, can be included into thealgorithm of structure determination. The analysis of X-raystructures databases was used to propose potential energies tobias the biomolecular conformation during the calculation ofNMR structures. These methods were applied to thedefinition of a torsion angle potential [73-75], and to thedefinition of a Ramachandran potential [76] Similarly, theuse of the protein gyration radius calculated from the proteinsize [77, 78] was proposed as an additional restraint in thestructure calculation.During the simulated annealing procedures, the moleculeis undergoing steep conformational transitions. Moreover,the potentials between non-bonded atoms, used in the firststages of the simulated annealing are simplified. For thesereasons, the structures obtained by the simulated annealingare often presenting steric hindrances, or chemical parametersCurrent Organic Chemistry, 2006, Vol. 10, No. 2 3which disagree with the knowledge based on proteinstructure databases. Short molecular dynamics simulation inwater have been shown to improve the quality of thecalbindin D9k [79] and Interleukine 4 [80] structures. Suchan approach was then more extensively explored [81, 82] onIL-4, crambin and ubiquitin and was proved to improvesome of the parameters used to determine the structurequality. The influence of the non-bonded force fieldparameters on the quality of NMR structure was studied [80]on the case of Interleukine 4, and the PROLSQ non-bondedenergy function was shown to achieve a higher structurequality than other non-bonded representations. A correctingfactor was derived to take into account the bias induced bythe spin diffusion in the distance restraints [83].Once the spectral assignment is performed, ambiguitystill remains for some distance restraints. Two largelyencountered cases are concerning the stereospecificassignments and the disulfide bridges. It was shown [84]that the application of a floating restraint on the methylenegroups permits to obtain structures of a quality comparableto those obtained using experimental stereospecificassignments. A similar approach was used [85] for theassignments of disulfide bridges.A small RMSD between the conformers of an NMRstructure is usually considered as a sign of a goodconvergence of the simulated annealing procedure.Nevertheless, it is possible [86] to maximize the RMSD ofan ensemble of structures, while maintaining the accordancewith the experimentally measured restraints. This resultsindicates that the RMSD of NMR structures is not a goodestimate of the true uncertainty in the atomic coordinates.The distributed computing was applied [87] to thedetermination of the NMR structure of the bio-active peptideendothelin-1: the number of generated conformers was 100times the number usually generated in a structuredetermination in order to allow a better exploration of theconformational space. An implementation of the torsionpotential permitting to decrease the CPU time needed forstructure calculation, was proposed [88].A general sampling algorithm was recently proposed [89]to explore the probability densities arising in Bayesian dataanalysis problems. This algorithm was shown to decreasesignificantly the backbone RMSD between the NMRconformers of the SH3 domain.2.3. Structure Determination from a Minimal Set ofRestraintsA disadvantage of the structure determination by NMR isthat it relies on a heterogeneous set of redundantexperimental restraints. This makes difficult to assess theprecision of the obtained structure from the precision of themeasured NMR parameters. The search of the minimal set ofrestraints permitting to obtain a reasonably accurate structure,is thus a question explored since the first years ofbiomolecular NMR.The effect of the number of restraints on the quality ofobtained structure was examined, during the last years, in theframe of the development of structural proteomics. Thepurpose of this analysis was the prediction of a 3D structurefrom a minimal set of restraints. This problem wasapproached by several ways. A genetic algorithm, adapted to

4 Current Organic Chemistry, 2006, Vol. 10, No. 2the identification of solutions in combinatorial optimizationproblems, was applied [90] to the NMR structuredetermination, which the intention to use it in the case oflarge proteins and few restraints. The precision of long-rangeHN-HN distances measured on deuterated samples was tested[91] as well as the efficiency of these restraints for rapidprotein fold determination [92].A search approach in a protein fragment database wasshown [93] to be an efficient way to generate a backbone foldfrom residual coupling data. The variation of dipolarcouplings along the protein sequence was shown [94-97] toallow the prediction of protein structural motifs andtopology. The threading of a sequence through a library ofcandidate folds using secondary structure information fromNMR, is successful [98], provided that the candidate foldscontain the correct protein fold.The TOUCHSTONEX approach [99] is performingprotein structure prediction using a very limited set of NMRdistance restraints: N/8 long-range restraints betweensidechains, N being the number of residues. Theconformational search is reduced by using a lattice modeland a reduced representation of the residues. Additionalrestraints predicted from a threading of the protein sequence,are used. A branch and bound algorithm was used [100] forprotein structure refinement along with a reducedrepresentation of the protein, in which each residue isrepresented by six atoms: N, H, C , C , C and O.The prediction of the protein structure directly fromunassigned NMR data was also explored [101] in the frameof the ROSETTA algorithm for ab initio 3D structureprediction. The method was shown to produce correct foldedmodel, provided that a least 4 % of the backbone atoms arecorrectly assigned in the initial models. The RMSD of theprotein conformations obtained was in the 4.4-6.5 Å range.The method was then evaluated in a larger scale by applyingit to a benchmark set of the Protein Data Bank [102].2.4. J-Coupling RestraintsThe number of J-coupling restraints used for a structuredetermination is usually one order of value smaller than thenumber of NOE restraints, and they are thus having a smallerinfluence on the structure definition. Nevertheless, as for theNOE restraints, the determination of precise dihedral anglevalues from J-coupling constants is not straightforward, inparticular because of the intrinsic degeneracy of the Karplusrelation. The Refs. 103 and 104 proposed a method toperform a self-consistent analysis of J-coupling constants toimprove the determination of dihedral angles inside a proteinstructure: this approach permits also a new parametrization ofthe Karplus curve. A software, MULDER [105], wasdeveloped to extract the angle torsion information fromNMR data: 3J-coupling constants and sugar pucker data. Theprecision of protein structure calculated using only anglerestraints, was evaluated [106], and a protocol to calculatethe structure using restraints on secondary structures,hydrogen bonds and distances between hydrophobic coreresidues, was proposed.2.5. Dipolar Coupling RestraintsThe values of the residual dipolar couplings (RDCs) arerelated to the orientation of chemical bonds with respect toThérèse E. Milliavinthe alignment tensor of the molecule. In certain cases, inpresence of axial symmetry or if the structure already know,the alignment tensor is known. But, in general, it has to bedetermined in order to use the measurements of RDCs. Aprotocol of structure calculation was proposed [107] where asimulated annealing refinement against the RDCs along witha grid search is used to simultaneously refine the structureand determine the axial and rhombic component of thetensor. This method requires nevertheless that a number ofNOE restraints sufficient to determine the fold, are available.Another method was proposed [108] to determine thealignment tensor directly from the values of the RDCs,without knowledge of the structure. Provided that the vectorbonds for which RDCs were measured, are uniformlydistributed in the 3D space, the histogram of their RDCsapproximates a powder pattern, from which the componentsof the alignment tensor can be extracted. This approach wasrecently continued and amplified in Ref. 109. The alignmentof a well-defined domain in a protein can be also determinedfrom a few RDCs [110] using an inversion of the ordermatrix. The precision of the determination of the alignmenttensor was found is determined not only by the accuracy ofthe measured couplings, but also by the uncertainty onstructure [111]. This uncertainty is leading to anunderestimation of the magnitude of the alignment, if largenumbers of dipolar couplings are available, and is leading toerrors in alignment, if few couplings were measured. Ifligand-protein complexes are studied, it was shown [112]that the symmetry properties of molecular complexes can aidin the definition of reference frame. Molecular alignmenttensors of two partners in a complex can be determined priorto backbone assignment, using dipolar couplings andcharacteristic C /C chemical shifts [113].The RDCs were also used in fast and/or automaticstructure determination. It was shown [114, 115] that thefold of a protein can be calculated using RDCs and few longrange NOEs, by determining the alignment tensor on rigidfragments or on peptide plane. This determination is rapid,as it does not require the complete assignment of the NOEs.A strategy based on the RDCs was proposed [116] tosimultaneously assign the spectral resonances and determinethe structure. Another approach [117] is based on themeasurement of sequential resonances between amide 1H and1513N and C (corresponding to the intra-residue and to thesequential connectivities), along with 13Ca-H dipolarcouplings. This single-step determination of protein structureis intended to provide an efficient tool for the structuredetermination in structural proteomics. The softwareDipoCoup was designed [118] for 3D-structure homologycomparison based on RDCs and pseudo-contact shifts, inorder to recognize protein fold motifs. The use of dipolarcouplings to recognize protein structural motifs wasproposed [94] in order to be included in annotation in theframe of proteomics projects. The RDCs and the chemicalshifts are sufficient [119, 120] for the spectral assignment ofa protein of known structure, without using any sequentialNMR connectivity information. An algorithm called NuclearVector Replacement (NVR) [121, 122] was introduced toperform the assignment of a protein of known structure,based on RDCs and NOEs. The RCD were also used asrestraints to improve the ab initio protein structureprediction [123, 124]. The and angles can be determined

Quantitative Analysis of Biomolecular NMR Spectrafrom the dipolar couplings, and the subsequent use of aprogram for ab initio structure prediction (ROSETTA)permits to determine the structure of ubiquitin [125].Structure determination based only on the use of dipolarcouplings attracted much interest, as the dipolar couplingsvalues are easier to quantitatively measure than the NOEintensities. This approach became attractive, as it was shown[126, 127] that it is possible to eliminate the alignmenttensor from the penalty function incorporating restraints fromthe RDCs. Similarly, the orientation restraints with respectto the alignment tensor can be replaced by inter-vectorprojection angles [128, 129], which are independent of theorientation of the alignment tensor.The orientations of the dipolar vectors in the alignmenttensor cannot be uniquely determined by a single set ofRDCs, because of the uncertainty of the axis of alignmenttensor [130]. A method was developed [131] to incorporatethe RDCs into structure calculation in case of near axialsymmetry of alignment. A two-step approach, MECCANO,was proposed [132] for the determination of proteinbackbone structure using only RDCs measured for twoalignments of the molecule: the parameters of the alignmenttensors are first determined from a least-square searchalgorithm, and the orientations of peptide planes are thenconstructed from the alignment information determinedpreviously and refined by RDC-restrained moleculardynamics (software SCULPTOR). This approach was thenapplied to the structure determination of the reaction site ofmethionine sulfoxide reductase [133]. Similarly, from RDCsmeasured in two media, the direction of an internuclearangles and the backbone angles f and y can be exactlycomputed [134]: this result was used to propose a systematicsearch algorithm for the backbone structure. An interactivetool was developed [135] for rigid-body modeling of multidomain macromolecules using RDCs. A principalcomponent analysis applied to RDC sets measured in at leastsix media with different alignment tensors is sensitive tostructural heterogeneity effects induced by the media [136]and was applied to the analysis of ubiquitin [137]. Thegraphical analysis of the relative values of the RDCs in twomedia, permits [138] to determine the relative orientation ofmolecular alignment tensors.Several methods were proposed [139-141] to predict thesterically induced alignment from the general shape of amolecule. A model taking into account the short-range stericand long-range electrostatic interactions was recentlyproposed [142] and permits to predict the orientation of aprotein in the liquid crystals used for biological NMR.An approach to calculate the alignment tensor from theorientation-dependent 15N TROSY chemical shift changes[143] gives orientation angles consistent with thosedetermined using the RDCs, and may be useful for largeproteins. A strategy for structure determination using RDCsalong with long-range order restraints available fromparamagnetic systems [144] has permitted the de novodetermination of a cytochrome structure.Quantum chemical calculations were performed [145] toevaluate the vibrational averaging effects on the dipolarcoupling between directly bonded nuclei. These effects canbe expressed as effective bond lengths that are 0.3-5 % largerthan the true bond lengths.Current Organic Chemistry, 2006, Vol. 10, No. 2 52.6. Chemical Shift RestraintsThe chemical shift values are depending on the electronicenvironment of the observed nuclei. During the first years ofNMR, they were seldom used as restraints in structurecalculation, but, their efficiency to reduce the exploration ofthe conformational space, became more apparent later.Indeed, the comparison of chemical shifts insidehomologous protein families, showed [146] that it ispossible to predict the chemical shift values in case of largeidentity between the sequences, and to use this informationsto help protein assignment [147]. A method for automaticprediction of chemical shifts was then proposed [148], basedon local sequence alignment between the analyzed protein,and sequences from the BioMagResBank.The protein chemical shifts were analyzed, from astatistical point of view, with respect to several structuraland chemical parameters: the amino-acid types, the secondarystructures, the role of the nearest-neighbor residue [149], thetype of -sheet structure [150]. From these analyses,methods were proposed to predict the amino-acid type andthe secondary structure [151-156]. The chemical shift meanvalues were used to predict the structural class [157] or thesecondary structure content [158]. A method was also15proposed [159] to predict N chemical shifts in proteins

main parts, one devoted to the structure determination, and the other to the analysis of internal mobility. 2. STRUCTURE DETERMINATION: The NMR structure determination requires first the assignment of NMR spectra, ie. the assignments of each signal resonance frequency to a least one nucleus i