[phenixbb] RSRZ in Phenix?

Sun Dec 27 14:46:42 PST 2015

Well, actually, there's another wrinkle: since my refinement was twinned, I get the following message:

"Twinned refinement - real-space correlations not available."

Is this just a to-do item for the Phenix team, or is it somehow more profoundly tricky to do this with twinned data? I think the PDB validation software does it for twinned data.

JPK

-----Original Message-----
From: Pavel Afonine [mailto:pafonine at lbl.gov] 
Sent: Sunday, December 27, 2015 3:40 PM
To: Randy Read
Cc: Keller, Jacob; phenixbb at phenix-online.org
Subject: Re: [phenixbb] RSRZ in Phenix?

Hi Randy,

> The argument for RSRZ last time we had the PDB X-ray validation task 
> force was that it's meant to be independent of resolution and residue 
> type, i.e. if you're trying to judge how well someone did in fitting a 
> structure, you expect a much better CC for a Trp in a 1.8A resolution 
> structure than you do for a Lys in a 2.8A resolution structure.  The 
> RSRZ score compares the RSR for the residue type at its resolution 
> with other residues of the same type at similar resolution.  (We used 
> RSR because Gerard K had generated the statistics for that one and 
> hadn't done the same for RSCC, which may or may not have had 
> advantages.)
>
> However, it definitely has its problems.  For example, the RSRZ for a side chain in a 6A structure makes no sense whatsoever because, in order to get enough representatives of each residue type to generate statistics, you have to include all the structures from something like 3.5 to 8A resolution!  So we're looking for better alternatives this time around in the VTF.

thanks a lot for explanation, it is very helpful!

> Suggestions with evidence are welcome!

Assuming we are talking about local model-to-map fit so that we focus on a goodness of fit of an atom or a few atoms (such as residue or ligand), I still think that a triplet of numbers {2mFo-DFc map value, mFo-DFc map value, CC} calculated per group of atoms in question is something to consider.

In case of a group that contains just one atom, we will be looking at 2mFo-DFc and mFo-DFc map values at atom center, and CC between the map in question and Fmodel map, calculated in a sphere R around the atom, with Fmodel map being the Fourier transform of the total model structure factor Fmodel = ktotal * (Fcalc_atoms + Fbulk_solvent).

In case of a group that contains several atoms, we will be looking at average of 2mFo-DFc and mFo-DFc map values at atom centers, and CC is computed as above using map values inside atom spheres.

Why I think {2mFo-DFc map value, mFo-DFc map value, CC} is good?

1. If 2mFo-DFc map is scaled by standard deviation we kind of know, as a rule of thumb, what is good value and what is poor one. With cctbx/Phenix tools it is a day long exercise collect any desired statistics for 2mFo-DFc map values over entire PDB: we can bin it by resolution, various types of completeness, by residue/atom types, etc etc you name it. Thus we could turn "a rule of thumb" above into sometime more scientific-like.

2. Even if 2mFo-DFc map value is good, an atom may still be slightly misplaced or its parameters (xyz, q, B) may not be well refined. This will be immediately highlighted by mFo-DFc map value. In fact, it is a good target in a sense that we don't even need a "rule of thumb" values for it: the flatter mFo-DFc, the better.

3. 2mFo-DFc and mFo-DFc maps as suggested to be used above are still insufficient because the map shape is not taken into account. One may envision a situation of incorrect atoms filling the map such that 2mFo-DFc is strong enough and mFo-DFc is rather flat but the shape of the construct does not resemble the actual map well. This is what the map CC will help us with, as it is scale-insensitive but shape sensitive! Actually, CCr from

http://journals.iucr.org/d/issues/2014/10/00/kw5094/kw5094.pdf
Acta Cryst. (2014). D70, 2593-2606

would be even much better than regular CC.

4. An advantage of using the above mentioned metrics (all together!) is that they are old well established quality measures that majority of crystallographers have more or less clear idea about, both the meaning and what to expect as a numerical value. No need to re-invent the wheel! 
(except from gathering some statistics on expected map values).

All the best,
Pavel