Dear Pavel,

I agree with every single one of your points. As I mentioned, it is not phenix.refine that reports a cutoff, but the Protein Data Bank. My point was to have the Protein Data Bank record correctly what phenix.refine provides, and prevent confusion (as was the case for the original post). The "wishful thinking" you have pointed out is being done by the PDB on every single phenix-generated structure, so I hope that they change this practice if they see this post.

I understand that PHENIX developers cannot be responsible for what third parties, such as the Protein Data Bank, do with files provided by your software.

Thank you for the awesome software,

Engin


On 1/8/18 10:35 AM, Pavel Afonine wrote:
Hi Engin,

thanks for feedback!

There is one complication that arises from the report of the MIN(FOBS/SIGMA_FOBS) value in pdb files from phenix.refine. Nearly every PDB entry using phenix.refine reports a F/sig(F) cutoff value of 1.3x,

As I eluded yesterday, this is not a cutoff but a reported fact about your data. No data is removed or otherwise manipulated related to this number.

while Buster and Refmac-generated pdbs have 0 or -/None for that value (just checked again with this week's released PDBs). This is clearly not intended. Again, the value the Protein Data Bank is reporting is a cutoff, based on the minimum value phenix.refine appears to report. Since I use French-Wilson for I to F conversions, I have had to correct this cutoff value by communicating with PDB with every deposition, but it appears that most users rarely go through the trouble.

REMARK 3 records are free format. It's up to program authors to choose what to print there.

Nowhere in the record in question produced by phenix.refine is said "cutoff":
REMARK   3   MIN(FOBS/SIGMA_FOBS)              : 1.380                          
Refmac and Buster print:
REMARK   3   DATA CUTOFF            (SIGMA(F)) : 0.000                          
which clearly says "cutoff".

So I guess we are fine as long as there is no wishful thinking involved and people carefully read what's written!

The issue seems to arise from a lack of a cutoff value in the phenix.refine generated .pdb files; Refmac has a DATA CUTOFF (SIGMA(F)) (set to NONE by default), which is picked up during structure deposition. So, either PDB has to be told that phenix.refine min value is just a minumum value and not a cutoff, or phenix.refine might add another REMARK card for DATA CUTOFF (SIGMA(F)) under the DATA USED IN REFINEMENT. section in REMARKS.

Sure, we can add "cutoff"record if you think it is helpful.

In general, there are way more facts to reports about the data than this single number. As long as people deposit 1) data actually used in refinement (that may be truncated by sigma, resolution, automated outlier rejection, Iobs converted to Fobs, anomalous F+/- converted to non-anomalous Imean, etc) and 2) original data (not manipulated in any way), and as long as PDB actually accepts these data, then all should be fine. Note: phenix.refine always outputs MTZ containing the original input data and data actually used in refinement.

All the best,
Pavel