[phenixbb] staraniso/phenix.refine
Gerard Bricogne
gb10 at globalphasing.com
Wed Dec 21 08:31:02 PST 2022
Dear Andrea,
As Clemens Vonrhein is not not subscribed to the phenixbb, I am sending
the message below on his behalf.
With best wishes,
Gerard.
----------------------------------------------------------------------------
Dear Andrea,
Besides the paragraph in the Release Notes of our latest BUSTER release that
Luca directed you to, you may find it useful to consult the following Wiki
page:
https://www.globalphasing.com/buster/wiki/index.cgi?DepositionMmCif
Although it has not yet been updated so as to mention explicitly the cases
where refinement was done with REFMAC or phenix-refine, the procedure and
commands given there will work in these two cases as well. Our intention was
to get feedback from our users first, then to announce the extended
capability of aB_deposition_combine more broadly - but your question clearly
shows that we should make that announcement sooner rather than later.
Getting back to your message: although uploading either the phenix-generated
mtz file or the mmCIF file generated by aB_deposition_combine would be
acceptable to the OneDep system, we would highly recommend using those from
aB_deposition_combine for several reasons. First of all, these files should
already contain the data from Phenix but also provide additional data blocks
with richer reflection data (and metadata!), going right back to the scaled
and unmerged data without any cut-off applied. Furthermore, it should ensure
that the correct data quality metrics (i.e. merging statistics) are included
into the mmCIF files uploaded during deposition. Of course, you don't need
to use aB_deposition_combine to generate such a set of mmCIF files (model
and reflection data) - these are after all just text files - but the devil
is often in the details.
It may be useful to add some fairly general background information in order
to avoid misunderstandings and false impressions - so here are some relevant
points you may wish to consider.
(a) The deposition of model and associated data into the PDB should allow
for two things: (1) the validation of the current model and of its
parametrisation on their own, as well as a check against the data used
during refinement, and (2) the provision of additional data to allow
further analysis of, and improvements to, the data handling as well as
the model itself.
For the first task one needs to deposit the data exactly as used by the
depositor to arrive at the current model, i.e. the input reflection
data (as used as input to the refinement program) plus all available
Fourier coefficients (as output by that refinement program) needed to
compute the maps used in modeling.
The second task requires deposition of less and less "processed"
versions of the experimental data - ultimately going back to the raw
diffraction images. This might involve several sets of reflection data,
e.g. (i) scaled, merged and corrected reflection data, (ii) scaled and
merged data before correction and/or cutoff, and (iii) data scaled and
unmerged before outlier rejection - plus additional variants.
(b) When going from the raw diffraction images towards the final model, a
lot of selection and modification of the initial integrated intensities
takes place - from profile fitting, partiality scaling, polarization
correction, outlier rejection, empirical corrections or analytical
scale determination and error model adjustments, all the way the
application of truncation thresholds (isotropically, elliptically or
anisotropically), conversion to amplitudes (and special treatment of
weak intensities) and anisotropy correction.
There are often good reasons for doing all or some of these (with sound
scientific reasons underpinning them - not just waving some "magic
stick"), even if under special circumstances a deviation from standard
protocols is advisable. What is important from a developer's viewpoint
is to provide users with choices to influence these various steps and
to ensure that as many intermediate versions of reflection data as
possible are available for downstream processes and deposition.
(c) At deposition time, the use of a single reflection datablock is often
not adequate to provide all that information (e.g. refinement programs
might output not the original Fobs going in, but those anisotropically
rescaled against the current model - so a second block might be needed
to hold the original Fobs data, free from that rescaling). If different
types of map coefficients are to be provided, they too need to come in
different mmCIF datablocks (2mFo-DFc and mFo-DFc for observed data
only; 2mFo-DFc filled in with DFc in a sphere going out to the highest
diffraction limit; 2mFo-DFc filled in with DFc for only the reflections
deemed "observable" by STARANISO; F(early)-F(late) map coefficients for
radiation damage analysis; coefficients for anomalous Fourier maps etc).
So ultimately we need to combine (going backwards): the refinement
output data, the refinement input data and all intermediate versions of
reflection data (see above) ... ideally right back to the
scaled+unmerged intensities with a full description of context (image
numbers, detector positions etc). This is what the "Data Processing
Subgroup" of the PDBx/mmCIF Working Group has been looking at
extensively over the last months, and about which a paper has just been
submitted.
(d) autoPROC (running e.g. XDS, AIMLESS and STARANISO) and the STARANISO
server provide multi-datablock mmCIF files to simplify the submission
of a rich set of reflection data. autoPROC provides two versions here:
one with traditional isotropic analysis, and the another for the
anisotropic analysis done in STARANISO. It is up to the user to decide
which one to use for which downstream steps.
To help in combining the reflection data from data processing with that
from refinement, and in transferring all relevant meta data (data
quality metrics) into the model mmCIF file, we provide a tool called
aB_deposition_combine: it should hopefully work for autoPROC (with or
without STARANISO) data in conjunction with either BUSTER, REFMAC or
Phenix refinement results. At the end the user is provided with two
mmCIF files for deposition: (1) a model file with adequate data quality
metrics, and (2) a reflection mmCIF file with multiple datablocks all
the way back to the scaled+unmerged reflection data.
(e) It is important at the time of deposition to not just select whatever
reflection data file happens to be the first to make it through the
OneDep system, as this can often lead to picking up the simplest
version of an MTZ or mmCIF file, but to choose, if at all possible, the
most complete reflection data file containing the richest metadata.
Otherwise we simply increase the number of those rather unsatisfying
PDB entries whose reflection data files contain only the very minimum
of information about what they actually represent and what data quality
metrics (such as multiplicity, internal consistency criteria etc) would
have been attached to them in a richer deposition.
*** Please note that the current OneDep system seems to complain about the
fact that unmerged data blocks come without a test-set flag column:
this looks to us like an oversight, since test-set flags are attributes
belonging to the refinement process, so that it makes no logical sense
to require them for unmerged data. This will probably requires some
rethinking/clarification/changes on the OneDep side.
A final remark: Instead of trying to give the impression that there is only
one right way of doing things, and therefore only a single set of reflection
data that should (or needs to) be deposited, it would seem more helpful and
constructive to try and provide a clear description of the various "views"
of the same raw diffraction data that can be provided by the various
approaches and data analysis methods we have at our disposal. Together with
more developments regarding the PDBx/mmCIF dictionary, and coordinated
developments in the OneDep deposition infrastructure, this will enable
better and richer depositions to be made, helping future (re)users as well
as software developers.
Cheers,
Clemens
----------------------------------------------------------------------------
--
On Mon, Dec 19, 2022 at 05:42:24PM -0500, Andrea Piserchio wrote:
> So,
>
> Both the phenix-generated mtz file (silly me for not checking this first)
> and the cif file generated by aB_deposition_combine can be uploaded on the
> PDB server.
>
> Thank you all for your help!!
>
> Andrea
>
> On Sat, Dec 17, 2022 at 5:06 PM Pavel Afonine <pafonine at lbl.gov> wrote:
>
> > Hi,
> >
> > two hopefully relevant points:
> >
> > - phenix.refine always produces an MTZ file that contains the copy of
> > all inputs plus all is needed to run refinement (free-r flags, for
> > example). So if you use that file for deposition you have all you need.
> >
> > - Unless there are strongly advocated reasons to do otherwise in your
> > particular case, you better use in refinement and deposit the original
> > data and NOT the one touched by any of available these days magic sticks
> > (that "correct" for anisotropy, sharpen or else!).
> >
> > Other comments:
> >
> > > - However, CCP41/Refmac5 does not (yet) read .cif reflection files. As
> > > far as I know, Phenix Refine does not (yet) neither.
> >
> > Phenix supports complete input / outputs of mmcif/cif format. For
> > example, phenix.refine can read/write model and reflection data in cif
> > format. It's been this way for a long time now.
> >
> > Pavel
> >
> >
> > On 12/16/22 17:32, Andrea Piserchio wrote:
> > > Dear all,
> > >
> > >
> > > I am trying to validate and then (hopefully) deposit a structure
> > > generated using the autoproc/staraniso staraniso_alldata-unique.mtz
> > > file as input for phenix.refine.
> > >
> > > Autoproc also produces a cif file ( Data_1_autoPROC_STARANISO_all.cif)
> > > specifically for deposition.
> > >
> > > Long story short, the PDB validation server complains about the lack
> > > of a freeR set for both files. I realized that, at least for the cif
> > > file, the r_free_flag is missing (but why does the .mtz for the
> > > isotropic dataset work??),so I then tried to use for validation the
> > > *.reflections.cif file that can be generated by phenix.refine. This
> > > file can actually produce a validation report, but I still have some
> > > questions:
> > >
> > > 1) Is it proper to use the .reflections.cif file for this purpose?
> > > During the upload I do see some errors (see pics); also, the final
> > > results show various RSRZ outliers in regions of the structure that
> > > look reasonably good by looking at the maps on coot, which seems odd ...
> > >
> > > 2) In case the *.reflections.cif is not adequate/sufficient for
> > > deposition (I sent an inquiry to the PDB, but they did not respond
> > > yet), can I just add a _refln.status column to the autoproc cif file
> > > (within the loop containing the r_free_flag) where I insert “f” for
> > > r_free_flag = 0 and “o” everywhere else?
> > >
> > >
> > > Thank you in advance,
> > >
> > >
> > > Andrea
> > >
> > > _______________________________________________
> > > phenixbb mailing list
> > > phenixbb at phenix-online.org
> > > http://phenix-online.org/mailman/listinfo/phenixbb
> > > Unsubscribe: phenixbb-leave at phenix-online.org
> >
> _______________________________________________
> phenixbb mailing list
> phenixbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/phenixbb
> Unsubscribe: phenixbb-leave at phenix-online.org
More information about the phenixbb
mailing list