Re: [phenixbb] staraniso/phenix.refine
Dear Andrea, As Clemens Vonrhein is not not subscribed to the phenixbb, I am sending the message below on his behalf. With best wishes, Gerard. ---------------------------------------------------------------------------- Dear Andrea, Besides the paragraph in the Release Notes of our latest BUSTER release that Luca directed you to, you may find it useful to consult the following Wiki page: https://www.globalphasing.com/buster/wiki/index.cgi?DepositionMmCif Although it has not yet been updated so as to mention explicitly the cases where refinement was done with REFMAC or phenix-refine, the procedure and commands given there will work in these two cases as well. Our intention was to get feedback from our users first, then to announce the extended capability of aB_deposition_combine more broadly - but your question clearly shows that we should make that announcement sooner rather than later. Getting back to your message: although uploading either the phenix-generated mtz file or the mmCIF file generated by aB_deposition_combine would be acceptable to the OneDep system, we would highly recommend using those from aB_deposition_combine for several reasons. First of all, these files should already contain the data from Phenix but also provide additional data blocks with richer reflection data (and metadata!), going right back to the scaled and unmerged data without any cut-off applied. Furthermore, it should ensure that the correct data quality metrics (i.e. merging statistics) are included into the mmCIF files uploaded during deposition. Of course, you don't need to use aB_deposition_combine to generate such a set of mmCIF files (model and reflection data) - these are after all just text files - but the devil is often in the details. It may be useful to add some fairly general background information in order to avoid misunderstandings and false impressions - so here are some relevant points you may wish to consider. (a) The deposition of model and associated data into the PDB should allow for two things: (1) the validation of the current model and of its parametrisation on their own, as well as a check against the data used during refinement, and (2) the provision of additional data to allow further analysis of, and improvements to, the data handling as well as the model itself. For the first task one needs to deposit the data exactly as used by the depositor to arrive at the current model, i.e. the input reflection data (as used as input to the refinement program) plus all available Fourier coefficients (as output by that refinement program) needed to compute the maps used in modeling. The second task requires deposition of less and less "processed" versions of the experimental data - ultimately going back to the raw diffraction images. This might involve several sets of reflection data, e.g. (i) scaled, merged and corrected reflection data, (ii) scaled and merged data before correction and/or cutoff, and (iii) data scaled and unmerged before outlier rejection - plus additional variants. (b) When going from the raw diffraction images towards the final model, a lot of selection and modification of the initial integrated intensities takes place - from profile fitting, partiality scaling, polarization correction, outlier rejection, empirical corrections or analytical scale determination and error model adjustments, all the way the application of truncation thresholds (isotropically, elliptically or anisotropically), conversion to amplitudes (and special treatment of weak intensities) and anisotropy correction. There are often good reasons for doing all or some of these (with sound scientific reasons underpinning them - not just waving some "magic stick"), even if under special circumstances a deviation from standard protocols is advisable. What is important from a developer's viewpoint is to provide users with choices to influence these various steps and to ensure that as many intermediate versions of reflection data as possible are available for downstream processes and deposition. (c) At deposition time, the use of a single reflection datablock is often not adequate to provide all that information (e.g. refinement programs might output not the original Fobs going in, but those anisotropically rescaled against the current model - so a second block might be needed to hold the original Fobs data, free from that rescaling). If different types of map coefficients are to be provided, they too need to come in different mmCIF datablocks (2mFo-DFc and mFo-DFc for observed data only; 2mFo-DFc filled in with DFc in a sphere going out to the highest diffraction limit; 2mFo-DFc filled in with DFc for only the reflections deemed "observable" by STARANISO; F(early)-F(late) map coefficients for radiation damage analysis; coefficients for anomalous Fourier maps etc). So ultimately we need to combine (going backwards): the refinement output data, the refinement input data and all intermediate versions of reflection data (see above) ... ideally right back to the scaled+unmerged intensities with a full description of context (image numbers, detector positions etc). This is what the "Data Processing Subgroup" of the PDBx/mmCIF Working Group has been looking at extensively over the last months, and about which a paper has just been submitted. (d) autoPROC (running e.g. XDS, AIMLESS and STARANISO) and the STARANISO server provide multi-datablock mmCIF files to simplify the submission of a rich set of reflection data. autoPROC provides two versions here: one with traditional isotropic analysis, and the another for the anisotropic analysis done in STARANISO. It is up to the user to decide which one to use for which downstream steps. To help in combining the reflection data from data processing with that from refinement, and in transferring all relevant meta data (data quality metrics) into the model mmCIF file, we provide a tool called aB_deposition_combine: it should hopefully work for autoPROC (with or without STARANISO) data in conjunction with either BUSTER, REFMAC or Phenix refinement results. At the end the user is provided with two mmCIF files for deposition: (1) a model file with adequate data quality metrics, and (2) a reflection mmCIF file with multiple datablocks all the way back to the scaled+unmerged reflection data. (e) It is important at the time of deposition to not just select whatever reflection data file happens to be the first to make it through the OneDep system, as this can often lead to picking up the simplest version of an MTZ or mmCIF file, but to choose, if at all possible, the most complete reflection data file containing the richest metadata. Otherwise we simply increase the number of those rather unsatisfying PDB entries whose reflection data files contain only the very minimum of information about what they actually represent and what data quality metrics (such as multiplicity, internal consistency criteria etc) would have been attached to them in a richer deposition. *** Please note that the current OneDep system seems to complain about the fact that unmerged data blocks come without a test-set flag column: this looks to us like an oversight, since test-set flags are attributes belonging to the refinement process, so that it makes no logical sense to require them for unmerged data. This will probably requires some rethinking/clarification/changes on the OneDep side. A final remark: Instead of trying to give the impression that there is only one right way of doing things, and therefore only a single set of reflection data that should (or needs to) be deposited, it would seem more helpful and constructive to try and provide a clear description of the various "views" of the same raw diffraction data that can be provided by the various approaches and data analysis methods we have at our disposal. Together with more developments regarding the PDBx/mmCIF dictionary, and coordinated developments in the OneDep deposition infrastructure, this will enable better and richer depositions to be made, helping future (re)users as well as software developers. Cheers, Clemens ---------------------------------------------------------------------------- -- On Mon, Dec 19, 2022 at 05:42:24PM -0500, Andrea Piserchio wrote:
So,
Both the phenix-generated mtz file (silly me for not checking this first) and the cif file generated by aB_deposition_combine can be uploaded on the PDB server.
Thank you all for your help!!
Andrea
On Sat, Dec 17, 2022 at 5:06 PM Pavel Afonine
wrote: Hi,
two hopefully relevant points:
- phenix.refine always produces an MTZ file that contains the copy of all inputs plus all is needed to run refinement (free-r flags, for example). So if you use that file for deposition you have all you need.
- Unless there are strongly advocated reasons to do otherwise in your particular case, you better use in refinement and deposit the original data and NOT the one touched by any of available these days magic sticks (that "correct" for anisotropy, sharpen or else!).
Other comments:
- However, CCP41/Refmac5 does not (yet) read .cif reflection files. As far as I know, Phenix Refine does not (yet) neither.
Phenix supports complete input / outputs of mmcif/cif format. For example, phenix.refine can read/write model and reflection data in cif format. It's been this way for a long time now.
Pavel
On 12/16/22 17:32, Andrea Piserchio wrote:
Dear all,
I am trying to validate and then (hopefully) deposit a structure generated using the autoproc/staraniso staraniso_alldata-unique.mtz file as input for phenix.refine.
Autoproc also produces a cif file ( Data_1_autoPROC_STARANISO_all.cif) specifically for deposition.
Long story short, the PDB validation server complains about the lack of a freeR set for both files. I realized that, at least for the cif file, the r_free_flag is missing (but why does the .mtz for the isotropic dataset work??),so I then tried to use for validation the *.reflections.cif file that can be generated by phenix.refine. This file can actually produce a validation report, but I still have some questions:
1) Is it proper to use the .reflections.cif file for this purpose? During the upload I do see some errors (see pics); also, the final results show various RSRZ outliers in regions of the structure that look reasonably good by looking at the maps on coot, which seems odd ...
2) In case the *.reflections.cif is not adequate/sufficient for deposition (I sent an inquiry to the PDB, but they did not respond yet), can I just add a _refln.status column to the autoproc cif file (within the loop containing the r_free_flag) where I insert âfâ for r_free_flag = 0 and âoâ everywhere else?
Thank you in advance,
Andrea
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb Unsubscribe: [email protected]
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb Unsubscribe: [email protected]
participants (1)
-
Gerard Bricogne