Questions about phenix.refine with twin_law

Keitaro Yamashita

25 Dec 2010 25 Dec '10

5:21 a.m.

Dear Phenix developers, I'm working with pseudo-merohedrally twinned data (fraction 45%), using Phenix-dev-616. I have three questions about phenix.refine with twin_law. (running on the command line) During refinement with twin_law, the log file says x-ray target function is set to "twin_lsq_f". Does it mean, we cannot use maximum likelihood target for twinned data? If so, is it recommended to refine without twin_law at first several cycles? When I tried to refine using the model refined without twin_law, the target function was still ml, not twin_lsq_f although I specified twin_law. Is this a bug? When I used twin_law option, in the table "R-free likelihood based estimates", "Scale factor" were None in all resolution bins. Cannot scale factor be calculated during twin refinement? Thanks in advance and Wishing you the best holidays, Keitaro

Show replies by date

Peter Zwart

25 Dec 25 Dec

5:43 a.m.

Hi Keitaro,

...

During refinement with twin_law, the log file says x-ray target function is set to "twin_lsq_f". Does it mean, we cannot use maximum likelihood target for twinned data? If so, is it recommended to refine without twin_law at first several cycles?

I have never done a systematic study to answer this question. How good is your model?

...

When I tried to refine using the model refined without twin_law, the target function was still ml, not twin_lsq_f although I specified twin_law. Is this a bug?

Not sure, can you send (off-list) logfile of that run?

...

When I used twin_law option, in the table "R-free likelihood based estimates", "Scale factor" were None in all resolution bins. Cannot scale factor be calculated during twin refinement?

The scale factor (one per dataset) is refined during twin refinement and subsequently fixed during positional and adp refinement. The lsq_twin target function doesn't have a resolution dependent scale factor.

...

Thanks in advance and Wishing you the best holidays,

Same to you! HTH P

...

Keitaro

...

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

-- ----------------------------------------------------------------- P.H. Zwart Research Scientist Berkeley Center for Structural Biology Lawrence Berkeley National Laboratories 1 Cyclotron Road, Berkeley, CA-94703, USA Cell: 510 289 9246 BCSB: http://bcsb.als.lbl.gov PHENIX: http://www.phenix-online.org SASTBX: http://sastbx.als.lbl.gov -----------------------------------------------------------------

Keitaro Yamashita

26 Dec 26 Dec

12:47 p.m.

Dear Peter and Pavel, Thank you very much for your replies. I just didn't know that the target function is limited to lsq in case of twinning, not suspect that as a bug. Currently, R-work and R-free are 22%, 24%, respectively (with twin_law) at 2.5 Angstrom resolution. In my understanding, least square target function has very poor convergence since it is much biased to the current model. I'm afraid refinement couldn't converge correctly with twin_lsq_f. This is why I thought it could be better to refine without twin_laws for first several cycles. (But if ignoring twinning, the targeted amplitudes should be inappropriate... I have no idea which way to turn.)

...

The next CCN (Computational Crystallography Newsletter; http://www.phenix-online.org/newsletter/) that comes out beginning of January will contain an article about using ML target in twin refinement, which will be implemented in some (hopefully near) future.

I'm looking forward to using it. Will it be different formulation from Refmac?

...

Not sure, can you send (off-list) logfile of that run? This is weird... If you could send me the inputs that I can use to reproduce this problem then I will be able to explain what is going on.

I'm sorry, the model under refinement must be kept confidential so I could not give it to you. But in log file, twin_law keyword is definitely accepted:

...

Command line parameter definitions: refinement.input.xray_data.labels = F,SIGF

refinement.refine.strategy = individual_sites+individual_adp

refinement.twinning.twin_law = -k,-h,-l

However, the refinement target is still ml:

...

=============================== refinement start ============================== ... | maximum likelihood estimate for coordinate error: 0.37 A | | x-ray target function (ml) for work reflections: 6.900796 | |-----------------------------------------------------------------------------|

Does phenix.refine read some informations from pdb file other than coordinates? Thank you very much, Keitaro

Pavel Afonine

5:49 p.m.

Hi Keitaro,

...

Currently, R-work and R-free are 22%, 24%, respectively (with twin_law) at 2.5 Angstrom resolution.

the difference between Rfree and Rwork seems to be suspiciously small given the resolution. Here is the typical distribution of Rfree, Rwork and Rfree-Rwork for structures in PDB refined at 2.5A resolution: Command used: phenix.r_factor_statistics 2.5 Histogram of Rwork for models in PDB at resolution 2.40-2.60 A: 0.115 - 0.141 : 5 0.141 - 0.168 : 81 0.168 - 0.194 : 492 0.194 - 0.220 : 1100 0.220 - 0.246 : 808 0.246 - 0.273 : 166 0.273 - 0.299 : 25 0.299 - 0.325 : 5 0.325 - 0.352 : 0 0.352 - 0.378 : 1 Histogram of Rfree for models in PDB at resolution 2.40-2.60 A: 0.146 - 0.178 : 4 0.178 - 0.210 : 48 0.210 - 0.242 : 483 0.242 - 0.274 : 1273 0.274 - 0.305 : 750 0.305 - 0.337 : 116 0.337 - 0.369 : 8 0.369 - 0.401 : 0 0.401 - 0.433 : 0 0.433 - 0.465 : 1 Histogram of Rfree-Rwork for all model in PDB at resolution 2.40-2.60 A: 0.002 - 0.012 : 30 0.012 - 0.022 : 93 0.022 - 0.031 : 266 0.031 - 0.041 : 502 0.041 - 0.051 : 531 0.051 - 0.061 : 580 0.061 - 0.071 : 349 0.071 - 0.080 : 202 0.080 - 0.090 : 93 0.090 - 0.100 : 37 Number of structures considered: 2683 Did you use PHENIX to select free-R flags? It is important.

...

In my understanding, least square target function has very poor convergence since it is much biased to the current model.

ML is better than LS because ML better account for model errors and incompleteness taking the latter into account statistically.

...

I'm afraid refinement couldn't converge correctly with twin_lsq_f.

It may or may not, depending how far your current model is from the correct one. Small molecule folks use LS all the time.

...

This is why I thought it could be better to refine without twin_laws for first several cycles.

You may try, but in that case you will not be accounting for twinning. By the way, if you run the command: phenix.model_vs_data model.pdb data.mtz does it suggest that you have twinning?

...

...
The next CCN (Computational Crystallography Newsletter; http://www.phenix-online.org/newsletter/) that comes out beginning of January will contain an article about using ML target in twin refinement, which will be implemented in some (hopefully near) future. I'm looking forward to using it. Will it be different formulation from Refmac?

I do not know what's implemented in Refmac - I'm not aware of a corresponding publication.

...

...
Not sure, can you send (off-list) logfile of that run? This is weird... If you could send me the inputs that I can use to reproduce this problem then I will be able to explain what is going on. I'm sorry, the model under refinement must be kept confidential so I could not give it to you.

Typically, when people send us the "reproducer" (all inputs that are enough to reproduce the problem) then we can work much more efficiently, otherwise it takes a lot of emails before one can start having a clue about the problem.

...

But in log file, twin_law keyword is definitely accepted:

...
Command line parameter definitions: refinement.input.xray_data.labels = F,SIGF

refinement.refine.strategy = individual_sites+individual_adp

refinement.twinning.twin_law = -k,-h,-l However, the refinement target is still ml:

...
=============================== refinement start ============================== ... | maximum likelihood estimate for coordinate error: 0.37 A | | x-ray target function (ml) for work reflections: 6.900796 | |-----------------------------------------------------------------------------|

I could reproduce this myself, this is a bug. Sorry for the problem. A work-around for you now is to remove the PDB file header (all REMARK 3 records) from the input PDB file. This problem will be fixed in the next available PHENIX version.

...

Does phenix.refine read some informations from pdb file other than coordinates?

Yes, and this was the root of the problem. This is why I suggested to remove the REMARK 3 records from PDB file header. All the best! Pavel.

Peter Zwart

6:02 p.m.

On 26 December 2010 09:49, Pavel Afonine wrote:

...

Hi Keitaro,

...
Currently, R-work and R-free are 22%, 24%, respectively (with twin_law) at 2.5 Angstrom resolution.

If the twin mates of the free set are members of the working set, this is what can happen. phenix.refine selects the proper test set. HTH Peter

Keitaro Yamashita

27 Dec 27 Dec

1:40 p.m.

Dear Pavel and Peter,

...

Here is the typical distribution of Rfree, Rwork and Rfree-Rwork for structures in PDB refined at 2.5A resolution:

Are their statistics applied to twinning cases? I think such kind of statistics should be (slightly) different from normal cases.. not?

...

Did you use PHENIX to select free-R flags? It is important.

Yes, I used phenix to select R-free-flags with use_lattice_symmetry=true. But, my data have pseudo-translation, too (~20% of origin height in patterson). I'm afraid I should have considered pseudo-translation as well as twinning when selecting R-free-flags, e.g. use_dataman_shells=true. Do you have any way to know the refinement is biased or not because of wrong R-free-flags selections?

...

ML is better than LS because ML better account for model errors and incompleteness taking the latter into account statistically.

Do they come from sigma-A estimation?

...

phenix.model_vs_data model.pdb data.mtz

does it suggest that you have twinning?

Yes, it says: Data: twinned : -k,-h,-l

...

I do not know what's implemented in Refmac - I'm not aware of a corresponding publication.

FYI, I think No. 13 of this slide describes the likelihood function in case of twin.. http://www.ysbl.york.ac.uk/refmac/Presentations/refmac_Osaka.pdf

...

Typically, when people send us the "reproducer" (all inputs that are enough to reproduce the problem) then we can work much more efficiently, otherwise it takes a lot of emails before one can start having a clue about the problem.

I fully understand it, but I'm sorry I couldn't.. I will do my best to give you sufficient information. Thank you for giving me the solution! Cheers, Keitaro

Pavel Afonine

7:26 p.m.

Hi Keitaro,

...

Here is the typical distribution of Rfree, Rwork and Rfree-Rwork for structures in PDB refined at 2.5A resolution: Are their statistics applied to twinning cases? I think such kind of statistics should be (slightly) different from normal cases.. not?

you are right: this analysis does not discriminate structures by twinning, although I don't see why the R-factor stats should be (much) different.

...

...
Did you use PHENIX to select free-R flags? It is important. Yes, I used phenix to select R-free-flags with use_lattice_symmetry=true.

Good.

...

Do you have any way to know the refinement is biased or not because of wrong R-free-flags selections?

If you used PHENIX to select free-R flags then it is unlikely to be wrong. By wrong I mean: - not taking lattice symmetry into account; - making distribution of flags not uniform across the resolution range such that each relatively thin resolution bin receives "enough" of test reflections, etc... However, the refinement outcome may vary depending on the choice of free-R flags anyway, but for the different reasons. This is because of refinement artifacts. For example, if you run a hundred of identical Simulated Annealing refinement jobs where the only difference between each job is the random seed, then you will get an ensemble of somewhat (mostly slightly) different structures, and depending on resolution the R-factors may range within 0-3% (lower the resolution, higher the spread). We know that the profile of a function that we optimize in refinement is very complex, and the optimizers we use are very simple to thoroughly search this profile. So by the end of refinement we never end up in the global minimum, but ALWAYS get stuck in a local minimum. Depending on initial condition the optimization may take a different pathway and end up in a different local minimum. Even plus/minus one reflection may trigger this change, or even rounding errors, etc. So, the ensemble of models you see after multi-start SA refinement does not necessarily reflects what's in the crystal. Yes, among the models in the whole ensemble, some some side chains may adopt one or another alternative conformations and then this variability of refinement results would be reflecting what's in crystal. This is extensively discussed in this paper: Interpretation of ensembles created by multiple iterative rebuilding of macromolecular models. T. C. Terwilliger, R. W. Grosse-Kunstleve, P. V. Afonine, P. D. Adams, N. W. Moriarty, P. H. Zwart, R. J. Read, D. Turk and L.-W. Hung Acta Cryst. D63, 597-610 (2007). Some illustrative discussion is here: http://www.phenix-online.org/presentations/latest/pavel_validation.pdf Having said this, it shouldn't be too surprising if you select say 10 different free-r flag sets, then do thorough refinement (to achieve convergence and remove memory from test reflections), and in the end you get somewhat different Rwork/Rfree. You can try it to get some "confidence range" for the spread of Rwork and Rfree. You can also do the above experiment with the SA. However, apart from academic interest / making yourself confident about the numbers you get, I don't really see any practical use of these tests.

...

...
ML is better than LS because ML better account for model errors and incompleteness taking the latter into account statistically. Do they come from sigma-A estimation?

See (and references therein): V.Yu., Lunin & T.P., Skovoroda. Acta Cryst. (1995). A51, 880-887. "R-free likelihood-based estimates of errors for phases calculated from atomic models" Pannu, N.S., Murshudov, G.N., Dodson, E.J. & Read, R.J. (1998). Acta Cryst. D54, 1285-1294. "Incorporation of Prior Phase Information Strengthens Maximum-Likelihood Structure Refinement" V.Y., Lunin, P.V. Afonine & A.G., Urzhumtsev. Acta Cryst. (2002). A58, 270-282. "Likelihood-based refinement. I. Irremovable model errors" A.G. Urzhumtsev, T.P. Skovoroda & V.Y. Lunin. J. Appl. Cryst. (1996). 29, 741-744. "A procedure compatible with X-PLOR for the calculation of electron-density maps weighted using an R-free-likelihood approach" R. J. Read. Acta Cryst. (1986). A42, 140-149. "Improved Fourier coefficients for maps using phases from partial structures with errors"

...

...
I do not know what's implemented in Refmac - I'm not aware of a corresponding publication. FYI, I think No. 13 of this slide describes the likelihood function in case of twin.. http://www.ysbl.york.ac.uk/refmac/Presentations/refmac_Osaka.pdf

I see a lot of handwaiving and jiggling the magic words "maximum" and "likelihood", but I don't see any details about underlying statistical model, approximation assumptions (if any), derivation and mathematical analysis of the new function behavior, etc... I know all this is beyond the scope of conference slides, so this is why I said above "I'm not aware of a corresponding publication" meaning a proper peer-reviewed publication where all these important details are explained.

...

...
Typically, when people send us the "reproducer" (all inputs that are enough to reproduce the problem) then we can work much more efficiently, otherwise it takes a lot of emails before one can start having a clue about the problem. I fully understand it, but I'm sorry I couldn't..

No problems. Please let us know if we can be of any help. All the best, Pavel.

Andrew T. Torelli

4 Jan 4 Jan

9:27 p.m.

New subject: Filtering ordered solvent molecules based on secondary map

Hello all, I'm looking for a description of how the "secondary_map_and_map_cc_filter" parameter operates. The default is set to be a 2mFobs-DFmodel, and I assume it's calculating a correlation coefficient as the basis for filtering waters that are present in the mFobs-DFmodel maps (i.e. primary map) but not in the 2mFo-DFc map. However, I'd like to confirm my assumption is true and also learn whether the cutoff value for whether or not a putative water molecule gets culled can be changed. I haven't found the information in the online documentation or the Phenixbb archives. Thanks, -Andy Torelli

Pavel Afonine

10:34 p.m.

New subject: Filtering ordered solvent molecules based on secondary map

Hi Andy,

...

I'm looking for a description of how the "secondary_map_and_map_cc_filter" parameter operates. The default is set to be a 2mFobs-DFmodel, and I assume it's calculating a correlation coefficient as the basis for filtering waters that are present in the mFobs-DFmodel maps (i.e. primary map) but not in the 2mFo-DFc map. However, I'd like to confirm my assumption is true and also learn whether the cutoff value for whether or not a putative water molecule gets culled can be changed. I haven't found the information in the online documentation or the Phenixbb archives.

yes, the second map, 2mFo-DFc is used to computed map CC (poor_cc_threshold). Also, it is used to filter waters by the absolute value of the 2mFo-DFc map computed at water oxygen center (poor_map_value_threshold). The default values are good for most of cases (I hope). Pavel.

Andrew T. Torelli

10:45 p.m.

New subject: Filtering ordered solvent molecules based on secondary map

Thanks Pavel, So the default value for the poor_map_value_threshold is 1, which I assume is 1 sigma. So if the electron density for a putative water is less than 1 sigma in the 2mFo-DFc map OR if the calculated correlation coefficient is less than the poor_cc_threshold value (default 0.7), then the water is removed, right? For my own education, what would be the best way of calculating the correlation coefficient values for all the waters or ligands in a given model? What I want to do is get a sense for the relationship between electron density and correlation coefficient values for waters in my structure to understand how "strict" the default poor_cc_threshold value is and whether/how much I might want to raise it. Thanks for your help, -Andy From: [email protected] [mailto:[email protected]] On Behalf Of Pavel Afonine Sent: Tuesday, January 04, 2011 5:35 PM To: [email protected] Subject: Re: [phenixbb] Filtering ordered solvent molecules based on secondary map Hi Andy, I'm looking for a description of how the "secondary_map_and_map_cc_filter" parameter operates. The default is set to be a 2mFobs-DFmodel, and I assume it's calculating a correlation coefficient as the basis for filtering waters that are present in the mFobs-DFmodel maps (i.e. primary map) but not in the 2mFo-DFc map. However, I'd like to confirm my assumption is true and also learn whether the cutoff value for whether or not a putative water molecule gets culled can be changed. I haven't found the information in the online documentation or the Phenixbb archives. yes, the second map, 2mFo-DFc is used to computed map CC (poor_cc_threshold). Also, it is used to filter waters by the absolute value of the 2mFo-DFc map computed at water oxygen center (poor_map_value_threshold). The default values are good for most of cases (I hope). Pavel.

Pavel Afonine

10:50 p.m.

New subject: Filtering ordered solvent molecules based on secondary map

...

So the default value for the poor_map_value_threshold is 1, which I assume is 1 sigma. So if the electron density for a putative water is less than 1 sigma in the 2mFo-DFc map OR if the calculated correlation coefficient is less than the poor_cc_threshold value (default 0.7), then the water is removed, right?

yes.

...

For my own education, what would be the best way of calculating the correlation coefficient values for all the waters or ligands in a given model? What I want to do is get a sense for the relationship between electron density and correlation coefficient values for waters in my structure to understand how "strict" the default poor_cc_threshold value is and whether/how much I might want to raise it.

You can use phenix.model_vs_data model.pdb data.mtz --comprehensive and that will list map CC for all atoms or per residue, OR use phenix.real_space_correlation if you need more fine-tuning. I have to run now - otherwise I miss my flight to London -:) All the best! Pavel.

Pavel Afonine

5 Jan 5 Jan

2:05 a.m.

New subject: Filtering ordered solvent molecules based on secondary map

...

For my own education, what would be the best way of calculating the correlation coefficient values for all the waters or ligands in a given model? What I want to do is get a sense for the relationship between electron density and correlation coefficient values for waters in my structure to understand how "strict" the default poor_cc_threshold value is and whether/how much I might want to raise it.

Continuing while waiting for boarding... Actually, looking at map CC alone is not too informative although is definitely good. Imagine you compare two densities both having max sigma value say ridiculously small like 0.1 sigma. In this case you will still get high map CC (by high I almost arbitrarily mean something like 0.7-0.8 and up to 1). I looked into this at some point and it seems like you need to look at both - map CC and actual density values. Of course this is very resolution dependent... Also, what matters is the region where you compute map CC. Say you have a large residue where only a couple of ending atoms are misplaced (not in density). In this case the map CC computed for the whole residue will still be good, so you will never catch those couple of atoms. Therefore, it's better to compute map CC per atom, and not per residue. But there is a trick here too. It makes sense to compute map CC per atom only when the map shows atomicity, so you can more or less determine individual atoms. At resolutions like 3A and lower you see a blob of density for a residue, and computing CC per atoms doesn't really makes sense... I'm not aware of any systematic research on this subject. I guess it can be a nice a month or two long project for a student. Good luck, Pavel.

Andrew T. Torelli

10:45 p.m.

New subject: Filtering ordered solvent molecules based on secondary map

Pavel, Thanks for the helpful comments. Do both phenix.model_vs_data and phenix.real_space_correlation produce the same values for CC that phenix.refine uses to keep/remove waters when the ordered solvent routine is used? Also, are the default 2mFo-DFc and mFo-DFc map coefficients output by phenix.refine (in the "_maps_coeffs.mtz" file) the same coefficients used by phenix.refine during the ordered solvent checking? Specifically, does phenix.refine use "filled" maps for ordered solvent checking? If you haven't guessed, I am thinking of a way to intelligently pick values for the ordered solvent parameters such as primary_map_cutoff, poor_cc_threshold and poor_map_threshold. My idea is to first run a quick round of refinement with generous ordered solvent parameters so that I get a model that is somewhat overpopulated with automatically-picked waters. Then I can manually inspect the mFobs-DFmodel and 2mFobs-DFmodel maps (i.e. primary and secondary maps) and the CC values in order to decide where to select appropriate cutoffs to limit addition of spurious waters for a given model/dataset, which will differ based on map quality, resolution, etc. So obviously I want to calculate CC values and maps in the same way phenix.refine would to judge waters so that I can "see what phenix sees" in this regard. Thanks for your help (and I hope you made it to London...haha), -Andy From: [email protected]mailto:[email protected] [mailto:[email protected]]mailto:[mailto:[email protected]] On Behalf Of Pavel Afonine Sent: Tuesday, January 04, 2011 9:05 PM To: PHENIX user mailing list Subject: Re: [phenixbb] Filtering ordered solvent molecules based on secondary map For my own education, what would be the best way of calculating the correlation coefficient values for all the waters or ligands in a given model? What I want to do is get a sense for the relationship between electron density and correlation coefficient values for waters in my structure to understand how "strict" the default poor_cc_threshold value is and whether/how much I might want to raise it. Continuing while waiting for boarding... Actually, looking at map CC alone is not too informative although is definitely good. Imagine you compare two densities both having max sigma value say ridiculously small like 0.1 sigma. In this case you will still get high map CC (by high I almost arbitrarily mean something like 0.7-0.8 and up to 1). I looked into this at some point and it seems like you need to look at both - map CC and actual density values. Of course this is very resolution dependent... Also, what matters is the region where you compute map CC. Say you have a large residue where only a couple of ending atoms are misplaced (not in density). In this case the map CC computed for the whole residue will still be good, so you will never catch those couple of atoms. Therefore, it's better to compute map CC per atom, and not per residue. But there is a trick here too. It makes sense to compute map CC per atom only when the map shows atomicity, so you can more or less determine individual atoms. At resolutions like 3A and lower you see a blob of density for a residue, and computing CC per atoms doesn't really makes sense... I'm not aware of any systematic research on this subject. I guess it can be a nice a month or two long project for a student. Good luck, Pavel.

Pavel Afonine

11:08 p.m.

New subject: Filtering ordered solvent molecules based on secondary map

Hi Andrew,

...

Thanks for the helpful comments. Do both phenix.model_vs_data and phenix.real_space_correlation produce the same values for CC that phenix.refine uses to keep/remove waters when the ordered solvent routine is used?

yes, they should. They use the same code - that's the joy of having libraries: multiple applications can re-use the same core routines.

...

Also, are the default 2mFo-DFc and mFo-DFc map coefficients output by phenix.refine (in the "_maps_coeffs.mtz" file) the same coefficients used by phenix.refine during the ordered solvent checking?

This I would need to check since I don't remember. As I hope all phenix.refine users know, by default phenix.refine writes out two 2mFo-DFc maps (one is using original Fobs set, and one using Fobs set where missing Fobs are filled with DFc), and one mFo-DFc map. Most likely, for water picking phenix.refine uses the one 2mFo-DFc map that uses original set of Fobs.

...

Specifically, does phenix.refine use "filled" maps for ordered solvent checking?

Unlikely.

...

If you haven't guessed, I am thinking of a way to intelligently pick values for the ordered solvent parameters such as primary_map_cutoff, poor_cc_threshold and poor_map_threshold.

Yes, I kind of figured this out... -:) Although I'm wondering why you are not happy with the default settings which are supposed to be good enough most of the time?

...

My idea is to first run a quick round of refinement with generous ordered solvent parameters so that I get a model that is somewhat overpopulated with automatically-picked waters. Then I can manually inspect the mFobs-DFmodel and 2mFobs-DFmodel maps (i.e. primary and secondary maps) and the CC values in order to decide where to select appropriate cutoffs to limit addition of spurious waters for a given model/dataset, which will differ based on map quality, resolution, etc. So obviously I want to calculate CC values and maps in the same way phenix.refine would to judge waters so that I can "see what phenix sees" in this regard.

I see. I think you are on the right track to achieve this. Depending how generous you are setting the water selection criteria you can get as many waters as you want. This is not always bad (as many typically think) but just emulates some earlier versions of ARP idea which is in the end is pretty successful. That is: modeling some density peaks that you cant interpret in terms o your model right now with "dummy waters" may improve the overall map quality which might be helpful. Pavel.

Andrew T. Torelli

6 Jan 6 Jan

3:32 a.m.

New subject: Filtering ordered solvent molecules based on secondary map

Pavel, Your answers have been helpful, thank you. Regarding why I'm not happy with the default settings for the ordered solvent parameters, I am actually expecting them to be appropriate in most cases. However, I think it's good to "trust but verify" ;) I'm more interested in seeing firsthand how map density and CC correlate for waters in general for a given model/dataset. It's low priority (and probably won't make too much difference), but if you run across the answer to which 2mFo-DFc map is used for water picking, I'd be interested. I would have guessed it was the original Fobs that was used. Best Regards, -Andy Torelli From: [email protected] [mailto:[email protected]] On Behalf Of Pavel Afonine Sent: Wednesday, January 05, 2011 6:08 PM To: [email protected] Subject: Re: [phenixbb] Filtering ordered solvent molecules based on secondary map Hi Andrew, Thanks for the helpful comments. Do both phenix.model_vs_data and phenix.real_space_correlation produce the same values for CC that phenix.refine uses to keep/remove waters when the ordered solvent routine is used? yes, they should. They use the same code - that's the joy of having libraries: multiple applications can re-use the same core routines. Also, are the default 2mFo-DFc and mFo-DFc map coefficients output by phenix.refine (in the "_maps_coeffs.mtz" file) the same coefficients used by phenix.refine during the ordered solvent checking? This I would need to check since I don't remember. As I hope all phenix.refine users know, by default phenix.refine writes out two 2mFo-DFc maps (one is using original Fobs set, and one using Fobs set where missing Fobs are filled with DFc), and one mFo-DFc map. Most likely, for water picking phenix.refine uses the one 2mFo-DFc map that uses original set of Fobs. Specifically, does phenix.refine use "filled" maps for ordered solvent checking? Unlikely. If you haven't guessed, I am thinking of a way to intelligently pick values for the ordered solvent parameters such as primary_map_cutoff, poor_cc_threshold and poor_map_threshold. Yes, I kind of figured this out... -:) Although I'm wondering why you are not happy with the default settings which are supposed to be good enough most of the time? My idea is to first run a quick round of refinement with generous ordered solvent parameters so that I get a model that is somewhat overpopulated with automatically-picked waters. Then I can manually inspect the mFobs-DFmodel and 2mFobs-DFmodel maps (i.e. primary and secondary maps) and the CC values in order to decide where to select appropriate cutoffs to limit addition of spurious waters for a given model/dataset, which will differ based on map quality, resolution, etc. So obviously I want to calculate CC values and maps in the same way phenix.refine would to judge waters so that I can "see what phenix sees" in this regard. I see. I think you are on the right track to achieve this. Depending how generous you are setting the water selection criteria you can get as many waters as you want. This is not always bad (as many typically think) but just emulates some earlier versions of ARP idea which is in the end is pretty successful. That is: modeling some density peaks that you cant interpret in terms o your model right now with "dummy waters" may improve the overall map quality which might be helpful. Pavel.

Pavel Afonine

8:23 a.m.

New subject: Filtering ordered solvent molecules based on secondary map

Hi Andrew,

...

It's low priority (and probably won't make too much difference), but if you run across the answer to which 2mFo-DFc map is used for water picking, I'd be interested. I would have guessed it was the original Fobs that was used.

yes, we use 2mFo-DFc map computed using original Fobs. Pavel.

Pavel Afonine

25 Dec 25 Dec

7:32 a.m.

Hi Keitaro, phenix.refine uses ml target all the time by default (or mlhl if experimental phases are available). If you ask phenix.refine to use information about twinning (by providing a twin law) then phenix.refine will use a least-squares target function called "twin_lsq_f". So what you observe is expected. Therefore I'm not sure I understand what exactly you suspect as a bug... The next CCN (Computational Crystallography Newsletter; http://www.phenix-online.org/newsletter/) that comes out beginning of January will contain an article about using ML target in twin refinement, which will be implemented in some (hopefully near) future. Also, when using "twin_lsq_f" some of refinement statistics is not available and this is why you see "None" for some of those - this is nothing to worry about but just a matter of fact. May be you can explain some more what you believe is not right?

...

When I tried to refine using the model refined without twin_law, the target function was still ml, not twin_lsq_f although I specified twin_law. Is this a bug?

This is weird... If you could send me the inputs that I can use to reproduce this problem then I will be able to explain what is going on. Thanks! Pavel. On 12/24/10 9:21 PM, Keitaro Yamashita wrote:

...

Dear Phenix developers,

I'm working with pseudo-merohedrally twinned data (fraction 45%), using Phenix-dev-616.

I have three questions about phenix.refine with twin_law. (running on the command line)

During refinement with twin_law, the log file says x-ray target function is set to "twin_lsq_f". Does it mean, we cannot use maximum likelihood target for twinned data? If so, is it recommended to refine without twin_law at first several cycles?

When I tried to refine using the model refined without twin_law, the target function was still ml, not twin_lsq_f although I specified twin_law. Is this a bug?

When I used twin_law option, in the table "R-free likelihood based estimates", "Scale factor" were None in all resolution bins. Cannot scale factor be calculated during twin refinement?

Thanks in advance and Wishing you the best holidays,

Keitaro _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

5435

Age (days ago)

5447

Last active (days ago)

List overview

Download

16 comments

4 participants

participants (4)

Andrew T. Torelli
Keitaro Yamashita
Pavel Afonine
Peter Zwart