[phenixbb] staraniso/phenix.refine

Wed Dec 21 08:31:02 PST 2022

Dear Andrea,

     As Clemens Vonrhein is not not subscribed to the phenixbb, I am sending
the message below on his behalf.

     With best wishes,

          Gerard.

----------------------------------------------------------------------------

Dear Andrea,

Besides the paragraph in the Release Notes of our latest BUSTER release that
Luca directed you to, you may find it useful to consult the following Wiki
page:

     https://www.globalphasing.com/buster/wiki/index.cgi?DepositionMmCif

Although it has not yet been updated so as to mention explicitly the cases
where refinement was done with REFMAC or phenix-refine, the procedure and
commands given there will work in these two cases as well. Our intention was
to get feedback from our users first, then to announce the extended
capability of aB_deposition_combine more broadly - but your question clearly
shows that we should make that announcement sooner rather than later.

Getting back to your message: although uploading either the phenix-generated
mtz file or the mmCIF file generated by aB_deposition_combine would be
acceptable to the OneDep system, we would highly recommend using those from
aB_deposition_combine for several reasons. First of all, these files should
already contain the data from Phenix but also provide additional data blocks
with richer reflection data (and metadata!), going right back to the scaled 
and unmerged data without any cut-off applied. Furthermore, it should ensure
that the correct data quality metrics (i.e. merging statistics) are included
into the mmCIF files uploaded during deposition. Of course, you don't need
to use aB_deposition_combine to generate such a set of mmCIF files (model
and reflection data) - these are after all just text files - but the devil
is often in the details.

It may be useful to add some fairly general background information in order
to avoid misunderstandings and false impressions - so here are some relevant
points you may wish to consider.

 (a) The deposition of model and associated data into the PDB should allow
     for two things: (1) the validation of the current model and of its
     parametrisation on their own, as well as a check against the data used
     during refinement, and (2) the provision of additional data to allow
     further analysis of, and improvements to, the data handling as well as
     the model itself.

     For the first task one needs to deposit the data exactly as used by the
     depositor to arrive at the current model, i.e. the input reflection
     data (as used as input to the refinement program) plus all available
     Fourier coefficients (as output by that refinement program) needed to
     compute the maps used in modeling.

     The second task requires deposition of less and less "processed"
     versions of the experimental data - ultimately going back to the raw
     diffraction images. This might involve several sets of reflection data,
     e.g. (i) scaled, merged and corrected reflection data, (ii) scaled and
     merged data before correction and/or cutoff, and (iii) data scaled and
     unmerged before outlier rejection - plus additional variants.

 (b) When going from the raw diffraction images towards the final model, a
     lot of selection and modification of the initial integrated intensities
     takes place - from profile fitting, partiality scaling, polarization
     correction, outlier rejection, empirical corrections or analytical
     scale determination and error model adjustments, all the way the 
     application of truncation thresholds (isotropically, elliptically or
     anisotropically), conversion to amplitudes (and special treatment of
     weak intensities) and anisotropy correction.

     There are often good reasons for doing all or some of these (with sound
     scientific reasons underpinning them - not just waving some "magic
     stick"), even if under special circumstances a deviation from standard
     protocols is advisable. What is important from a developer's viewpoint
     is to provide users with choices to influence these various steps and
     to ensure that as many intermediate versions of reflection data as
     possible are available for downstream processes and deposition.

 (c) At deposition time, the use of a single reflection datablock is often
     not adequate to provide all that information (e.g. refinement programs
     might output not the original Fobs going in, but those anisotropically
     rescaled against the current model - so a second block might be needed
     to hold the original Fobs data, free from that rescaling). If different
     types of map coefficients are to be provided, they too need to come in
     different mmCIF datablocks (2mFo-DFc and mFo-DFc for observed data
     only; 2mFo-DFc filled in with DFc in a sphere going out to the highest
     diffraction limit; 2mFo-DFc filled in with DFc for only the reflections
     deemed "observable" by STARANISO; F(early)-F(late) map coefficients for
     radiation damage analysis; coefficients for anomalous Fourier maps etc).

     So ultimately we need to combine (going backwards): the refinement
     output data, the refinement input data and all intermediate versions of
     reflection data (see above) ... ideally right back to the
     scaled+unmerged intensities with a full description of context (image
     numbers, detector positions etc). This is what the "Data Processing
     Subgroup" of the PDBx/mmCIF Working Group has been looking at
     extensively over the last months, and about which a paper has just been
     submitted.

 (d) autoPROC (running e.g. XDS, AIMLESS and STARANISO) and the STARANISO
     server provide multi-datablock mmCIF files to simplify the submission
     of a rich set of reflection data. autoPROC provides two versions here:
     one with traditional isotropic analysis, and the another for the
     anisotropic analysis done in STARANISO. It is up to the user to decide
     which one to use for which downstream steps.

     To help in combining the reflection data from data processing with that
     from refinement, and in transferring all relevant meta data (data
     quality metrics) into the model mmCIF file, we provide a tool called
     aB_deposition_combine: it should hopefully work for autoPROC (with or
     without STARANISO) data in conjunction with either BUSTER, REFMAC or
     Phenix refinement results. At the end the user is provided with two
     mmCIF files for deposition: (1) a model file with adequate data quality
     metrics, and (2) a reflection mmCIF file with multiple datablocks all
     the way back to the scaled+unmerged reflection data.

 (e) It is important at the time of deposition to not just select whatever
     reflection data file happens to be the first to make it through the
     OneDep system, as this can often lead to picking up the simplest
     version of an MTZ or mmCIF file, but to choose, if at all possible, the
     most complete reflection data file containing the richest metadata.
     Otherwise we simply increase the number of those rather unsatisfying
     PDB entries whose reflection data files contain only the very minimum
     of information about what they actually represent and what data quality
     metrics (such as multiplicity, internal consistency criteria etc) would
     have been attached to them in a richer deposition.

 *** Please note that the current OneDep system seems to complain about the
     fact that unmerged data blocks come without a test-set flag column:
     this looks to us like an oversight, since test-set flags are attributes
     belonging to the refinement process, so that it makes no logical sense
     to require them for unmerged data. This will probably requires some
     rethinking/clarification/changes on the OneDep side.

A final remark: Instead of trying to give the impression that there is only
one right way of doing things, and therefore only a single set of reflection
data that should (or needs to) be deposited, it would seem more helpful and
constructive to try and provide a clear description of the various "views"
of the same raw diffraction data that can be provided by the various
approaches and data analysis methods we have at our disposal. Together with
more developments regarding the PDBx/mmCIF dictionary, and coordinated
developments in the OneDep deposition infrastructure, this will enable
better and richer depositions to be made, helping future (re)users as well
as software developers.

Cheers,

Clemens

----------------------------------------------------------------------------

--
On Mon, Dec 19, 2022 at 05:42:24PM -0500, Andrea Piserchio wrote:
> So,
> 
> Both the phenix-generated mtz file (silly me for not checking this first)
> and the cif file generated by aB_deposition_combine can be uploaded on the
> PDB server.
> 
> Thank you all for your help!!
> 
> Andrea
> 
> On Sat, Dec 17, 2022 at 5:06 PM Pavel Afonine <pafonine at lbl.gov> wrote:
> 
> > Hi,
> >
> > two hopefully relevant points:
> >
> > - phenix.refine always produces an MTZ file that contains the copy of
> > all inputs plus all is needed to run refinement (free-r flags, for
> > example). So if you use that file for deposition you have all you need.
> >
> > - Unless there are strongly advocated reasons to do otherwise in your
> > particular case, you better use in refinement and deposit the original
> > data and NOT the one touched by any of available these days magic sticks
> > (that "correct" for anisotropy, sharpen or else!).
> >
> > Other comments:
> >
> > > - However, CCP41/Refmac5 does not (yet) read .cif reflection files. As
> > > far as I know, Phenix Refine does not (yet) neither.
> >
> > Phenix supports complete input / outputs of mmcif/cif format. For
> > example, phenix.refine can read/write model and reflection data in cif
> > format. It's been this way for a long time now.
> >
> > Pavel
> >
> >
> > On 12/16/22 17:32, Andrea Piserchio wrote:
> > > Dear all,
> > >
> > >
> > > I am trying to validate and then (hopefully) deposit a structure
> > > generated using the autoproc/staraniso staraniso_alldata-unique.mtz
> > > file as input for phenix.refine.
> > >
> > > Autoproc also produces a cif file ( Data_1_autoPROC_STARANISO_all.cif)
> > > specifically for deposition.
> > >
> > > Long story short, the PDB validation server complains about the lack
> > > of a freeR set for both files. I realized that, at least for the cif
> > > file, the r_free_flag is missing (but why does the .mtz for the
> > > isotropic dataset work??),so I then tried to use for validation the
> > > *.reflections.cif file that can be generated by phenix.refine. This
> > > file can actually produce a validation report, but I still have some
> > > questions:
> > >
> > > 1) Is it proper to use the .reflections.cif file for this purpose?
> > > During the upload I do see some errors (see pics); also, the final
> > > results show various RSRZ outliers in regions of the structure that
> > > look reasonably good by looking at the maps on coot, which seems odd ...
> > >
> > > 2) In case the *.reflections.cif is not adequate/sufficient for
> > > deposition (I sent an inquiry to the PDB, but they did not respond
> > > yet), can I just add a _refln.status column to the autoproc cif file
> > > (within the loop containing the r_free_flag) where I insert “f” for
> > > r_free_flag = 0 and “o” everywhere else?
> > >
> > >
> > > Thank you in advance,
> > >
> > >
> > > Andrea
> > >
> > > _______________________________________________
> > > phenixbb mailing list
> > > phenixbb at phenix-online.org
> > > http://phenix-online.org/mailman/listinfo/phenixbb
> > > Unsubscribe: phenixbb-leave at phenix-online.org
> >

> _______________________________________________
> phenixbb mailing list
> phenixbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/phenixbb
> Unsubscribe: phenixbb-leave at phenix-online.org