On Apr 19, 2010, at 2:56 PM, Nathaniel Echols wrote:

On Mon, Apr 19, 2010 at 11:56 AM, Francis E Reyes <Francis.Reyes@colorado.edu> wrote:

Now let's talk about flexibility during refinement. Without restraining the bases with some kind of strict geometry weight, or in the case of the OP, specifically restraining coplanar base pairing for canonical watson crick pairing, depending on the quality of the phases, I've seen phenix.refine and refmac pull bases that we know to be base paired in an A-form helix out of the 'coplanar base pair' orientation. If I were solving an RNA from scratch, I'd know apriori that this is a true base pair, and it's almost offensive that a refinement program would say otherwise. It's not drastic, but any RNA/DNA structural biologist will look at your structure and clearly see that there's something wrong with the geometry. While it maybe a minor nuisance to correct this manually, I can only wonder how it affects the refinement.

Okay; it probably isn't very difficult to add base pair restraints to Phenix, we're just not quite sure how to make a general solution (which would support planarity restraints in addition to H-bonds). The main bottlenecks right now are a) figuring out a convenient reduced representation for base pairs, and b) identifying base pairs in a model.

Aren't non-WC base pairs going to be very important in large RNA structures? Are there (free, open-source) tools that will generate a listing of *all* base pairs found in a model, not just the canonical ones? (Actually, a simple and more-or-less machine-readable listing of the bonds formed by each base pair type would be close enough.)

http://nar.oxfordjournals.org/cgi/content/full/31/13/3450

http://rna.bgsu.edu/FR3D/basepairs/

Leontis-Westhof nomenclature for RNA base pairs would be the one I'd look to. They have a program called RNAVIEW that'll create an RNAML file specifying the base pair geometries if you give it a PDB.

However, completely specifying base pair restraints for every kind of non-canonical WC base pair is a lot of work (could be worked on eventually though).

A starting point for phenix.refine it would be nice to restrain the sugar puckers, the planarity of the bases (themselves), and the planarity of WC base pairs and the distances between them (the implementation in CNS). A useful machine readable format that the user can use for this is the dot-bracket notation (http://rna.tbi.univie.ac.at/help.html#A6) and the inclusion of the brackets ( [ ] ) for pseudoknots. The user specifies the dot bracket notation and the sequence and its easy to write the secondary structure (essentially establishing all the base pairs).

Eventually it would be nice to input the type of interaction among bases (according to http://rna.bgsu.edu/FR3D/basepairs/).

Offtopic:

Including base pair information into phenix.refine would definitely help with refinements when the phases/resolution are poor. However including this kind of information into autosol would greatly help with RNA/DNA building. I cross-ref autosol in this conversation because the language for specifying this information is established (mostly, see RNA Ontology Consotrium for efforts on more complete descriptions of RNA motifs).

Extremely off topic:

Francois Major (http://www.major.iric.ca/MajorLabEn/Home.html) has developed a pipeline for a user inputting a sequence and folding in 3D (which I believe is based on pattern matching your sequence with what's seen in the PDB). You and I talked about experimental phaseless structure solving (say via molecular replacement) at RapiData. MC-Sym/MC-Fold I think is a step in the right direction. I can only dream of a stage where in AutoSol, you have an RNA brute-force molecular replacement module that takes the models from MC-SYM/MC-Fold pipeline and perform a distributed molecular replacement across cores or clusters.