Picking Rfree in thin resolution shells using command line

Simon Kolstoe

30 Jan 2012 30 Jan '12

11:43 a.m.

Dear phenixbb, I see from a quick google that it is possible to pick my Rfree's using thin resolution shells (coz I've got 20 fold NCS), however as I am someone who tries to avoid the GUI where at all possible, could someone let me know what the command line way of doing this is? Thanks, Simon --------------------------------------------------------------- Dr Simon Kolstoe Laboratory for Protein Crystallography Wolfson Drug Discovery Unit University College London Rowland Hill Street, London NW3 2PF Tel: 020 7433 2765 http://www.ucl.ac.uk/~rmhasek ---------------------------------------------------------------

Show replies by date

Weiergräber, Oliver H.

30 Jan 30 Jan

12:04 p.m.

I routinely use sftools for this task. After reading the original mtz file, delete the old Rfree column (if any) and create a new one with rfree 0.05 shell (replace 0.05 with whatever fraction you prefer) Regards, Oliver ================================================ PD Dr. Oliver H. Weiergräber Institute of Complex Systems ICS-6: Structural Biochemistry Tel.: +49 2461 61-2028 Fax: +49 2461 61-1448 ================================================ ________________________________________ From: [email protected] [[email protected]] On Behalf Of Simon Kolstoe [[email protected]] Sent: Monday, January 30, 2012 12:43 PM To: PHENIX user mailing list Subject: [phenixbb] Picking Rfree in thin resolution shells using command line Dear phenixbb, I see from a quick google that it is possible to pick my Rfree's using thin resolution shells (coz I've got 20 fold NCS), however as I am someone who tries to avoid the GUI where at all possible, could someone let me know what the command line way of doing this is? Thanks, Simon --------------------------------------------------------------- Dr Simon Kolstoe Laboratory for Protein Crystallography Wolfson Drug Discovery Unit University College London Rowland Hill Street, London NW3 2PF Tel: 020 7433 2765 http://www.ucl.ac.uk/~rmhasek --------------------------------------------------------------- _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Kennen Sie schon unsere app? http://www.fz-juelich.de/app

Simon Kolstoe

12:22 p.m.

New subject: Picking Rfree in thin resolution shells using command line

ahh sorry - I should have said specifically using phenix. I know how to do it in sftools & dataman however command line phenix is so simple and easy compared to these other programs! Simon --------------------------------------------------------------- Dr Simon Kolstoe Laboratory for Protein Crystallography Wolfson Drug Discovery Unit University College London Rowland Hill Street, London NW3 2PF Tel: 020 7433 2765 http://www.ucl.ac.uk/~rmhasek --------------------------------------------------------------- On 30 Jan 2012, at 12:04, Weiergräber, Oliver H. wrote:

...

I routinely use sftools for this task. After reading the original mtz file, delete the old Rfree column (if any) and create a new one with rfree 0.05 shell (replace 0.05 with whatever fraction you prefer)

Regards, Oliver

================================================ PD Dr. Oliver H. Weiergräber Institute of Complex Systems ICS-6: Structural Biochemistry Tel.: +49 2461 61-2028 Fax: +49 2461 61-1448 ================================================

________________________________________ From: [email protected] [[email protected]] On Behalf Of Simon Kolstoe [[email protected]] Sent: Monday, January 30, 2012 12:43 PM To: PHENIX user mailing list Subject: [phenixbb] Picking Rfree in thin resolution shells using command line

Dear phenixbb,

I see from a quick google that it is possible to pick my Rfree's using thin resolution shells (coz I've got 20 fold NCS), however as I am someone who tries to avoid the GUI where at all possible, could someone let me know what the command line way of doing this is?

Thanks,

Simon

--------------------------------------------------------------- Dr Simon Kolstoe Laboratory for Protein Crystallography Wolfson Drug Discovery Unit University College London Rowland Hill Street, London NW3 2PF

Tel: 020 7433 2765 http://www.ucl.ac.uk/~rmhasek ---------------------------------------------------------------

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------- -------------------------------------------------------------------------------

Kennen Sie schon unsere app? http://www.fz-juelich.de/app _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Nathaniel Echols

3:26 p.m.

New subject: Picking Rfree in thin resolution shells using command line

On Mon, Jan 30, 2012 at 3:43 AM, Simon Kolstoe wrote:

...

I see from a quick google that it is possible to pick my Rfree's using thin resolution shells (coz I've got 20 fold NCS), however as I am someone who tries to avoid the GUI where at all possible,

Why? Some things are simply easier to do in the GUI, or at least more obvious - otherwise we wouldn't bother writing one.

...

could someone let me know what the command line way of doing this is?

In phenix.refine, you probably want something like this (some parameters optional, but the defaults are probably not what most people expect): xray_data.r_free_flags.generate=True xray_data.r_free_flags.fraction=0.05 xray_data.r_free_flags.max_free=None xray_data.r_free_flags.use_dataman_shells=True xray_data.r_free_flags.n_shells=20 Randy and Paul claim that this doesn't help very much with the NCS issue, however. -Nat

Randy Read

5:06 p.m.

New subject: Picking Rfree in thin resolution shells using command line

I'd be meaning to contribute to this debate, and now that I see my name mentioned... I used to be a very strong believer in selecting the cross-validation data in thin shells, when you have NCS. I even had a recollection (a case of false memory syndrome, it seems) that we did this for our own case of 20-fold NCS, i.e. four copies of the Shiga-like toxin B-subunit pentamer cocrystallized with the Gb3 trisaccharide (Ling et al, 1998). As a believer in thin shells, I was trying to convince Pavel to put an option for this in Phenix (like the one in sftools). He said that he'd never seen any evidence that it was necessary or made any difference. So I went back to the Shiga-like toxin structure and started parallel refinements from the MR solution, either choosing the cross-validation data randomly or in thin shells. And, guess what, I couldn't see any significant difference in how well the refinement went, even though I was pretty certain before doing that experiment that it would make a big difference. In fact, both refinements went pretty well. So if thin shells aren't necessary even in an extreme case of NCS, then I suspect that they're not that useful in the more usual case of lower-order NCS. In any case, there is a problem even with the thin shells (which Bart Hazes pointed out even as he implemented it in sftools). The theory suggests that reflections within some distance in reciprocal space of some reflection or a point related to it by an NCS rotation should be correlated to the original reflection. All the points related by rotation will fall into the same resolution shell but, since the reciprocal-space distance is related to the inverse of the diameter of the molecule, the shell would have to have some thickness, and the reflections at the edge of the shell would still be correlated to reflections not in the shell. So even thin-shell cross-validation doesn't get around all the theoretical problems. I'd be interested if someone has an example where it really does make a difference, but in the meantime it's hard to argue with Pavel's point of view! Regards, Randy On 30 Jan 2012, at 15:26, Nathaniel Echols wrote:

...

On Mon, Jan 30, 2012 at 3:43 AM, Simon Kolstoe wrote:

...
I see from a quick google that it is possible to pick my Rfree's using thin resolution shells (coz I've got 20 fold NCS), however as I am someone who tries to avoid the GUI where at all possible,

Why? Some things are simply easier to do in the GUI, or at least more obvious - otherwise we wouldn't bother writing one.

...
could someone let me know what the command line way of doing this is?

In phenix.refine, you probably want something like this (some parameters optional, but the defaults are probably not what most people expect):

xray_data.r_free_flags.generate=True xray_data.r_free_flags.fraction=0.05 xray_data.r_free_flags.max_free=None xray_data.r_free_flags.use_dataman_shells=True xray_data.r_free_flags.n_shells=20

Randy and Paul claim that this doesn't help very much with the NCS issue, however.

-Nat _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

------ Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical Research Tel: + 44 1223 336500 Wellcome Trust/MRC Building Fax: + 44 1223 336827 Hills Road E-mail: [email protected] Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk

A Leslie

31 Jan 31 Jan

8:47 a.m.

New subject: Picking Rfree in thin resolution shells using command line

Hi Randy, I can't remember if I ever mentioned this to you, but when I was working on the HepB capsid structure (30 fold ncs if i remember correctly) I tried using a "thin shell within a thick shell" method of selecting Rfree, to avoid the issue that within a thin shell there are still relationships between those reflections within the shell and those just outside it. I forget the details, but I think I used a thin shell of 1-2 rlps wide for the reflections to be used for Rfree, but I also excluded from the refinement reflections within a thick shell 4-5 rlps wide (the thin shell was in the middle of the thick shell). Because this excluded so many reflections I could only have 3 thick/thin shells altogether, so I chose them at low, middle and highish resolution. The upshot of all this was that it was no help at all. Almost regardless of, say, the relative weight I put on the Xray terms, or anything else I did, I could never get the Rfree to go up ! The strict NCS restraints were so strong that the refinement essentially always "behaved". This for me destroyed all my faith in this thin shell idea ! So this is definitely NOT an example where it worked. I have not sent this to the bulletin board because my memory of exactly what I did is a bit hazy, but the message was clear enough. Cheers Andrew On 30 Jan 2012, at 17:06, Randy Read wrote:

...

I'd be meaning to contribute to this debate, and now that I see my name mentioned...

I used to be a very strong believer in selecting the cross- validation data in thin shells, when you have NCS. I even had a recollection (a case of false memory syndrome, it seems) that we did this for our own case of 20-fold NCS, i.e. four copies of the Shiga- like toxin B-subunit pentamer cocrystallized with the Gb3 trisaccharide (Ling et al, 1998).

As a believer in thin shells, I was trying to convince Pavel to put an option for this in Phenix (like the one in sftools). He said that he'd never seen any evidence that it was necessary or made any difference. So I went back to the Shiga-like toxin structure and started parallel refinements from the MR solution, either choosing the cross-validation data randomly or in thin shells. And, guess what, I couldn't see any significant difference in how well the refinement went, even though I was pretty certain before doing that experiment that it would make a big difference. In fact, both refinements went pretty well.

So if thin shells aren't necessary even in an extreme case of NCS, then I suspect that they're not that useful in the more usual case of lower-order NCS.

In any case, there is a problem even with the thin shells (which Bart Hazes pointed out even as he implemented it in sftools). The theory suggests that reflections within some distance in reciprocal space of some reflection or a point related to it by an NCS rotation should be correlated to the original reflection. All the points related by rotation will fall into the same resolution shell but, since the reciprocal-space distance is related to the inverse of the diameter of the molecule, the shell would have to have some thickness, and the reflections at the edge of the shell would still be correlated to reflections not in the shell. So even thin-shell cross-validation doesn't get around all the theoretical problems.

I'd be interested if someone has an example where it really does make a difference, but in the meantime it's hard to argue with Pavel's point of view!

Regards,

Randy

On 30 Jan 2012, at 15:26, Nathaniel Echols wrote:

...
On Mon, Jan 30, 2012 at 3:43 AM, Simon Kolstoe wrote:

...
I see from a quick google that it is possible to pick my Rfree's using thin resolution shells (coz I've got 20 fold NCS), however as I am someone who tries to avoid the GUI where at all possible,

Why? Some things are simply easier to do in the GUI, or at least more obvious - otherwise we wouldn't bother writing one.

...
could someone let me know what the command line way of doing this is?

In phenix.refine, you probably want something like this (some parameters optional, but the defaults are probably not what most people expect):

xray_data.r_free_flags.generate=True xray_data.r_free_flags.fraction=0.05 xray_data.r_free_flags.max_free=None xray_data.r_free_flags.use_dataman_shells=True xray_data.r_free_flags.n_shells=20

Randy and Paul claim that this doesn't help very much with the NCS issue, however.

-Nat _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

------ Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical Research Tel: + 44 1223 336500 Wellcome Trust/MRC Building Fax: + 44 1223 336827 Hills Road E-mail: [email protected] Cambridge CB2 0XY, U.K. www- structmed.cimr.cam.ac.uk

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Simon Kolstoe

9:46 a.m.

New subject: Picking Rfree in thin resolution shells using command line

Thanks for the interesting comments. I was just wondering what sort of "difference" we are expecting to see? Is it just a case of preventing an artificially lowered Rfree or is there an expectation to see a difference in the quality of the electron density? Simon --------------------------------------------------------------- Dr Simon Kolstoe Laboratory for Protein Crystallography Wolfson Drug Discovery Unit University College London Rowland Hill Street, London NW3 2PF Tel: 020 7433 2765 http://www.ucl.ac.uk/~rmhasek --------------------------------------------------------------- On 31 Jan 2012, at 08:47, A Leslie wrote:

...

Hi Randy,

I can't remember if I ever mentioned this to you, but when I was working on the HepB capsid structure (30 fold ncs if i remember correctly) I tried using a "thin shell within a thick shell" method of selecting Rfree, to avoid the issue that within a thin shell there are still relationships between those reflections within the shell and those just outside it. I forget the details, but I think I used a thin shell of 1-2 rlps wide for the reflections to be used for Rfree, but I also excluded from the refinement reflections within a thick shell 4-5 rlps wide (the thin shell was in the middle of the thick shell). Because this excluded so many reflections I could only have 3 thick/thin shells altogether, so I chose them at low, middle and highish resolution.

The upshot of all this was that it was no help at all. Almost regardless of, say, the relative weight I put on the Xray terms, or anything else I did, I could never get the Rfree to go up ! The strict NCS restraints were so strong that the refinement essentially always "behaved".

This for me destroyed all my faith in this thin shell idea !

So this is definitely NOT an example where it worked.

I have not sent this to the bulletin board because my memory of exactly what I did is a bit hazy, but the message was clear enough.

Cheers

Andrew

On 30 Jan 2012, at 17:06, Randy Read wrote:

...
I'd be meaning to contribute to this debate, and now that I see my name mentioned...

I used to be a very strong believer in selecting the cross-validation data in thin shells, when you have NCS. I even had a recollection (a case of false memory syndrome, it seems) that we did this for our own case of 20-fold NCS, i.e. four copies of the Shiga-like toxin B-subunit pentamer cocrystallized with the Gb3 trisaccharide (Ling et al, 1998).

As a believer in thin shells, I was trying to convince Pavel to put an option for this in Phenix (like the one in sftools). He said that he'd never seen any evidence that it was necessary or made any difference. So I went back to the Shiga-like toxin structure and started parallel refinements from the MR solution, either choosing the cross-validation data randomly or in thin shells. And, guess what, I couldn't see any significant difference in how well the refinement went, even though I was pretty certain before doing that experiment that it would make a big difference. In fact, both refinements went pretty well.

So if thin shells aren't necessary even in an extreme case of NCS, then I suspect that they're not that useful in the more usual case of lower-order NCS.

In any case, there is a problem even with the thin shells (which Bart Hazes pointed out even as he implemented it in sftools). The theory suggests that reflections within some distance in reciprocal space of some reflection or a point related to it by an NCS rotation should be correlated to the original reflection. All the points related by rotation will fall into the same resolution shell but, since the reciprocal-space distance is related to the inverse of the diameter of the molecule, the shell would have to have some thickness, and the reflections at the edge of the shell would still be correlated to reflections not in the shell. So even thin-shell cross-validation doesn't get around all the theoretical problems.

I'd be interested if someone has an example where it really does make a difference, but in the meantime it's hard to argue with Pavel's point of view!

Regards,

Randy

On 30 Jan 2012, at 15:26, Nathaniel Echols wrote:

...
On Mon, Jan 30, 2012 at 3:43 AM, Simon Kolstoe wrote:

...
I see from a quick google that it is possible to pick my Rfree's using thin resolution shells (coz I've got 20 fold NCS), however as I am someone who tries to avoid the GUI where at all possible,

Why? Some things are simply easier to do in the GUI, or at least more obvious - otherwise we wouldn't bother writing one.

...
could someone let me know what the command line way of doing this is?

In phenix.refine, you probably want something like this (some parameters optional, but the defaults are probably not what most people expect):

xray_data.r_free_flags.generate=True xray_data.r_free_flags.fraction=0.05 xray_data.r_free_flags.max_free=None xray_data.r_free_flags.use_dataman_shells=True xray_data.r_free_flags.n_shells=20

Randy and Paul claim that this doesn't help very much with the NCS issue, however.

-Nat _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

------ Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical Research Tel: + 44 1223 336500 Wellcome Trust/MRC Building Fax: + 44 1223 336827 Hills Road E-mail: [email protected] Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Pavel Afonine

3:35 p.m.

New subject: Picking Rfree in thin resolution shells using command line

Hi Simon, the difference is well illustrated in F. Fabiola, A. Korostelev and M. S. Chapman Acta Cryst. (2006). D62, 227-238 Bias in cross-validated free R factors: mitigation of the effects of non-crystallographic symmetry The question is whether we can reproduce it in the exact same set of test structures. Pavel On 1/31/12 1:46 AM, Simon Kolstoe wrote:

...

Thanks for the interesting comments.

I was just wondering what sort of "difference" we are expecting to see? Is it just a case of preventing an artificially lowered Rfree or is there an expectation to see a difference in the quality of the electron density?

Simon

--------------------------------------------------------------- Dr Simon Kolstoe Laboratory for Protein Crystallography Wolfson Drug Discovery Unit University College London Rowland Hill Street, London NW3 2PF

Tel: 020 7433 2765 http://www.ucl.ac.uk/~rmhasek ---------------------------------------------------------------

On 31 Jan 2012, at 08:47, A Leslie wrote:

...
Hi Randy,

I can't remember if I ever mentioned this to you, but when I was working on the HepB capsid structure (30 fold ncs if i remember correctly) I tried using a "thin shell within a thick shell" method of selecting Rfree, to avoid the issue that within a thin shell there are still relationships between those reflections within the shell and those just outside it. I forget the details, but I think I used a thin shell of 1-2 rlps wide for the reflections to be used for Rfree, but I also excluded from the refinement reflections within a thick shell 4-5 rlps wide (the thin shell was in the middle of the thick shell). Because this excluded so many reflections I could only have 3 thick/thin shells altogether, so I chose them at low, middle and highish resolution.

The upshot of all this was that it was no help at all. Almost regardless of, say, the relative weight I put on the Xray terms, or anything else I did, I could never get the Rfree to go up ! The strict NCS restraints were so strong that the refinement essentially always "behaved".

This for me destroyed all my faith in this thin shell idea !

So this is definitely NOT an example where it worked.

I have not sent this to the bulletin board because my memory of exactly what I did is a bit hazy, but the message was clear enough.

Cheers

Andrew

On 30 Jan 2012, at 17:06, Randy Read wrote:

...
I'd be meaning to contribute to this debate, and now that I see my name mentioned...

I used to be a very strong believer in selecting the cross-validation data in thin shells, when you have NCS. I even had a recollection (a case of false memory syndrome, it seems) that we did this for our own case of 20-fold NCS, i.e. four copies of the Shiga-like toxin B-subunit pentamer cocrystallized with the Gb3 trisaccharide (Ling et al, 1998).

As a believer in thin shells, I was trying to convince Pavel to put an option for this in Phenix (like the one in sftools). He said that he'd never seen any evidence that it was necessary or made any difference. So I went back to the Shiga-like toxin structure and started parallel refinements from the MR solution, either choosing the cross-validation data randomly or in thin shells. And, guess what, I couldn't see any significant difference in how well the refinement went, even though I was pretty certain before doing that experiment that it would make a big difference. In fact, both refinements went pretty well.

So if thin shells aren't necessary even in an extreme case of NCS, then I suspect that they're not that useful in the more usual case of lower-order NCS.

In any case, there is a problem even with the thin shells (which Bart Hazes pointed out even as he implemented it in sftools). The theory suggests that reflections within some distance in reciprocal space of some reflection or a point related to it by an NCS rotation should be correlated to the original reflection. All the points related by rotation will fall into the same resolution shell but, since the reciprocal-space distance is related to the inverse of the diameter of the molecule, the shell would have to have some thickness, and the reflections at the edge of the shell would still be correlated to reflections not in the shell. So even thin-shell cross-validation doesn't get around all the theoretical problems.

I'd be interested if someone has an example where it really does make a difference, but in the meantime it's hard to argue with Pavel's point of view!

Regards,

Randy

On 30 Jan 2012, at 15:26, Nathaniel Echols wrote:

...
On Mon, Jan 30, 2012 at 3:43 AM, Simon Kolstoe wrote:

...
I see from a quick google that it is possible to pick my Rfree's using thin resolution shells (coz I've got 20 fold NCS), however as I am someone who tries to avoid the GUI where at all possible, Why? Some things are simply easier to do in the GUI, or at least more obvious - otherwise we wouldn't bother writing one.

...
could someone let me know what the command line way of doing this is? In phenix.refine, you probably want something like this (some parameters optional, but the defaults are probably not what most people expect):

xray_data.r_free_flags.generate=True xray_data.r_free_flags.fraction=0.05 xray_data.r_free_flags.max_free=None xray_data.r_free_flags.use_dataman_shells=True xray_data.r_free_flags.n_shells=20

Randy and Paul claim that this doesn't help very much with the NCS issue, however.

-Nat _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical Research Tel: + 44 1223 336500 Wellcome Trust/MRC Building Fax: + 44 1223 336827 Hills Road E-mail: [email protected] Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Felix Frolow

6:07 p.m.

New subject: Picking Rfree in thin resolution shells using command line

I have a question of general significance: in what resolution NCS restrains and Rfree become IRRELEVANT? Axel Brunger invented Rfree to save our necks from refining garbage into the structure distantly looking like protein. Since than Rfree was idolized. However, there is a big difference between structures at 4.1 Angstrom and 1.4 Angstrom. In small molecule crystallography we can easily achieve 10 or 20 observations per refined parameter (depends on presence or absence of inversion center), therefore, no one care about Rfree in the small molecules community. In the well ordered protein structures, the bulk water region is working against us lowering diffraction strength contributing to 1/Volume, but it is also on our side minimizing a volume occupied by protein molecules (less atoms, fewer parameters). I have a structure (not yet published) where for 18000 protein atoms and about 9000 other atoms (water molecules, sulfate ions, sugars from cryo-protection etc) there are 750,000 independent observations. It makes about 28 observations per atom and together with the chemical observations such as bonds and angles which rarely differs from their classical values defined by small structures, if we keep anomalous data properly scaled and separated (there will be differences in good data sets that depends on S atoms and some other ions in solute, or even oxygen atoms) - we have quite good ratio of observations per refined parameter. So my question is: Do WE and WHAT FOR need to mess with Rfree in structures of relatively/very high resolutions? Dr Felix Frolow Professor of Structural Biology and Biotechnology Department of Molecular Microbiology and Biotechnology Tel Aviv University 69978, Israel Acta Crystallographica F, co-editor e-mail: [email protected] Tel: ++972-3640-8723 Fax: ++972-3640-9407 Cellular: 0547 459 608 On Jan 31, 2012, at 17:35 , Pavel Afonine wrote:

...

Hi Simon,

the difference is well illustrated in

F. Fabiola, A. Korostelev and M. S. Chapman Acta Cryst. (2006). D62, 227-238 Bias in cross-validated free R factors: mitigation of the effects of non-crystallographic symmetry

The question is whether we can reproduce it in the exact same set of test structures.

Pavel

On 1/31/12 1:46 AM, Simon Kolstoe wrote:

...
Thanks for the interesting comments.

I was just wondering what sort of "difference" we are expecting to see? Is it just a case of preventing an artificially lowered Rfree or is there an expectation to see a difference in the quality of the electron density?

Simon

--------------------------------------------------------------- Dr Simon Kolstoe Laboratory for Protein Crystallography Wolfson Drug Discovery Unit University College London Rowland Hill Street, London NW3 2PF

Tel: 020 7433 2765 http://www.ucl.ac.uk/~rmhasek ---------------------------------------------------------------

On 31 Jan 2012, at 08:47, A Leslie wrote:

...
Hi Randy,

I can't remember if I ever mentioned this to you, but when I was working on the HepB capsid structure (30 fold ncs if i remember correctly) I tried using a "thin shell within a thick shell" method of selecting Rfree, to avoid the issue that within a thin shell there are still relationships between those reflections within the shell and those just outside it. I forget the details, but I think I used a thin shell of 1-2 rlps wide for the reflections to be used for Rfree, but I also excluded from the refinement reflections within a thick shell 4-5 rlps wide (the thin shell was in the middle of the thick shell). Because this excluded so many reflections I could only have 3 thick/thin shells altogether, so I chose them at low, middle and highish resolution.

The upshot of all this was that it was no help at all. Almost regardless of, say, the relative weight I put on the Xray terms, or anything else I did, I could never get the Rfree to go up ! The strict NCS restraints were so strong that the refinement essentially always "behaved".

This for me destroyed all my faith in this thin shell idea !

So this is definitely NOT an example where it worked.

I have not sent this to the bulletin board because my memory of exactly what I did is a bit hazy, but the message was clear enough.

Cheers

Andrew

On 30 Jan 2012, at 17:06, Randy Read wrote:

...
I'd be meaning to contribute to this debate, and now that I see my name mentioned...

I used to be a very strong believer in selecting the cross-validation data in thin shells, when you have NCS. I even had a recollection (a case of false memory syndrome, it seems) that we did this for our own case of 20-fold NCS, i.e. four copies of the Shiga-like toxin B-subunit pentamer cocrystallized with the Gb3 trisaccharide (Ling et al, 1998).

As a believer in thin shells, I was trying to convince Pavel to put an option for this in Phenix (like the one in sftools). He said that he'd never seen any evidence that it was necessary or made any difference. So I went back to the Shiga-like toxin structure and started parallel refinements from the MR solution, either choosing the cross-validation data randomly or in thin shells. And, guess what, I couldn't see any significant difference in how well the refinement went, even though I was pretty certain before doing that experiment that it would make a big difference. In fact, both refinements went pretty well.

So if thin shells aren't necessary even in an extreme case of NCS, then I suspect that they're not that useful in the more usual case of lower-order NCS.

In any case, there is a problem even with the thin shells (which Bart Hazes pointed out even as he implemented it in sftools). The theory suggests that reflections within some distance in reciprocal space of some reflection or a point related to it by an NCS rotation should be correlated to the original reflection. All the points related by rotation will fall into the same resolution shell but, since the reciprocal-space distance is related to the inverse of the diameter of the molecule, the shell would have to have some thickness, and the reflections at the edge of the shell would still be correlated to reflections not in the shell. So even thin-shell cross-validation doesn't get around all the theoretical problems.

I'd be interested if someone has an example where it really does make a difference, but in the meantime it's hard to argue with Pavel's point of view!

Regards,

Randy

On 30 Jan 2012, at 15:26, Nathaniel Echols wrote:

...
On Mon, Jan 30, 2012 at 3:43 AM, Simon Kolstoe wrote:

...
I see from a quick google that it is possible to pick my Rfree's using thin resolution shells (coz I've got 20 fold NCS), however as I am someone who tries to avoid the GUI where at all possible, Why? Some things are simply easier to do in the GUI, or at least more obvious - otherwise we wouldn't bother writing one.

...
could someone let me know what the command line way of doing this is? In phenix.refine, you probably want something like this (some parameters optional, but the defaults are probably not what most people expect):

xray_data.r_free_flags.generate=True xray_data.r_free_flags.fraction=0.05 xray_data.r_free_flags.max_free=None xray_data.r_free_flags.use_dataman_shells=True xray_data.r_free_flags.n_shells=20

Randy and Paul claim that this doesn't help very much with the NCS issue, however.

-Nat _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical Research Tel: + 44 1223 336500 Wellcome Trust/MRC Building Fax: + 44 1223 336827 Hills Road E-mail: [email protected] Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Pavel Afonine

7:01 p.m.

New subject: Picking Rfree in thin resolution shells using command line

Hi Felix, NCS: given state-of-the-art NCS restraints there is (probably) no clear-cut answer, but there are three ones: "definitely yes", "definitely no", and "try to find out". Obviously, at low enough resolution NCS should be always used (say ~2A and lower), simply because this provides a luxury of additional a priori information to alleviate the poor data-to-parameters ratio problem. Obviously, at high enough resolution (~1.5-1.7A or so) NCS should not be used since the amount of data may be enough to see actual differences between NCS copies, and using NCS would probably wipe out these difference (or at least there is such a risk). In the grey area, ~1.7-2.0A, one should try using vs not using NCS to know for sure. Also, it may be good to mention that if the NCS groups are selected perfectly (that for example includes making sure to not apply NCS to atoms that do not obey NCS) then most likely NCS could be used at any resolution. Rfree: at very high resolution not including 5-10% of the data probably wouldn't hurt too much (provided that the data complete). Having free-R may be handy even at subatomic resolution: for example, to illustrate/prove that using IAS (Interatomic Scatterers model) or Multipoles actually improves your model and not overfits the data. Note, when multipolar model is used, it is 32 (or 28 - I forgot?) refinable parameters per atom, so the data-to-parameters ratio for a macromolecule may not be that great even at ~0.9-0.7A resolution! Computing less biased maps is another reason to keep the free set of reflections: note, the m and D in 2mFo-DFc and mFo-DFc maps have to be computed using free reflections. Pavel On 1/31/12 10:07 AM, Felix Frolow wrote:

...

I have a question of general significance: in what resolution NCS restrains and Rfree become IRRELEVANT? Axel Brunger invented Rfree to save our necks from refining garbage into the structure distantly looking like protein. Since than Rfree was idolized. However, there is a big difference between structures at 4.1 Angstrom and 1.4 Angstrom. In small molecule crystallography we can easily achieve 10 or 20 observations per refined parameter (depends on presence or absence of inversion center), therefore, no one care about Rfree in the small molecules community. In the well ordered protein structures, the bulk water region is working against us lowering diffraction strength contributing to 1/Volume, but it is also on our side minimizing a volume occupied by protein molecules (less atoms, fewer parameters). I have a structure (not yet published) where for 18000 protein atoms and about 9000 other atoms (water molecules, sulfate ions, sugars from cryo-protection etc) there are 750,000 independent observations. It makes about 28 observations per atom and together with the chemical observations such as bonds and angles which rarely differs from their classical values defined by small structures, if we keep anomalous data properly scaled and separated (there will be differences in good data sets that depends on S atoms and some other ions in solute, or even oxygen atoms) - we have quite good ratio of observations per refined parameter. So my question is: Do WE and WHAT FOR need to mess with Rfree in structures of relatively/very high resolutions?

Dr Felix Frolow Professor of Structural Biology and Biotechnology Department of Molecular Microbiology and Biotechnology Tel Aviv University 69978, Israel

Acta Crystallographica F, co-editor

e-mail: [email protected] mailto:[email protected] Tel: ++972-3640-8723 Fax: ++972-3640-9407 Cellular: 0547 459 608

On Jan 31, 2012, at 17:35 , Pavel Afonine wrote:

...
Hi Simon,

the difference is well illustrated in

F. Fabiola, A. Korostelev and M. S. Chapman Acta Cryst. (2006). D62, 227-238 Bias in cross-validated free R factors: mitigation of the effects of non-crystallographic symmetry

The question is whether we can reproduce it in the exact same set of test structures.

Pavel

On 1/31/12 1:46 AM, Simon Kolstoe wrote:

...
Thanks for the interesting comments.

I was just wondering what sort of "difference" we are expecting to see? Is it just a case of preventing an artificially lowered Rfree or is there an expectation to see a difference in the quality of the electron density?

Simon

--------------------------------------------------------------- Dr Simon Kolstoe Laboratory for Protein Crystallography Wolfson Drug Discovery Unit University College London Rowland Hill Street, London NW3 2PF

Tel: 020 7433 2765 http://www.ucl.ac.uk/~rmhasek http://www.ucl.ac.uk/%7Ermhasek ---------------------------------------------------------------

On 31 Jan 2012, at 08:47, A Leslie wrote:

...
Hi Randy,

I can't remember if I ever mentioned this to you, but when I was working on the HepB capsid structure (30 fold ncs if i remember correctly) I tried using a "thin shell within a thick shell" method of selecting Rfree, to avoid the issue that within a thin shell there are still relationships between those reflections within the shell and those just outside it. I forget the details, but I think I used a thin shell of 1-2 rlps wide for the reflections to be used for Rfree, but I also excluded from the refinement reflections within a thick shell 4-5 rlps wide (the thin shell was in the middle of the thick shell). Because this excluded so many reflections I could only have 3 thick/thin shells altogether, so I chose them at low, middle and highish resolution.

The upshot of all this was that it was no help at all. Almost regardless of, say, the relative weight I put on the Xray terms, or anything else I did, I could never get the Rfree to go up ! The strict NCS restraints were so strong that the refinement essentially always "behaved".

This for me destroyed all my faith in this thin shell idea !

So this is definitely NOT an example where it worked.

I have not sent this to the bulletin board because my memory of exactly what I did is a bit hazy, but the message was clear enough.

Cheers

Andrew

On 30 Jan 2012, at 17:06, Randy Read wrote:

...
I'd be meaning to contribute to this debate, and now that I see my name mentioned...

I used to be a very strong believer in selecting the cross-validation data in thin shells, when you have NCS. I even had a recollection (a case of false memory syndrome, it seems) that we did this for our own case of 20-fold NCS, i.e. four copies of the Shiga-like toxin B-subunit pentamer cocrystallized with the Gb3 trisaccharide (Ling et al, 1998).

As a believer in thin shells, I was trying to convince Pavel to put an option for this in Phenix (like the one in sftools). He said that he'd never seen any evidence that it was necessary or made any difference. So I went back to the Shiga-like toxin structure and started parallel refinements from the MR solution, either choosing the cross-validation data randomly or in thin shells. And, guess what, I couldn't see any significant difference in how well the refinement went, even though I was pretty certain before doing that experiment that it would make a big difference. In fact, both refinements went pretty well.

So if thin shells aren't necessary even in an extreme case of NCS, then I suspect that they're not that useful in the more usual case of lower-order NCS.

In any case, there is a problem even with the thin shells (which Bart Hazes pointed out even as he implemented it in sftools). The theory suggests that reflections within some distance in reciprocal space of some reflection or a point related to it by an NCS rotation should be correlated to the original reflection. All the points related by rotation will fall into the same resolution shell but, since the reciprocal-space distance is related to the inverse of the diameter of the molecule, the shell would have to have some thickness, and the reflections at the edge of the shell would still be correlated to reflections not in the shell. So even thin-shell cross-validation doesn't get around all the theoretical problems.

I'd be interested if someone has an example where it really does make a difference, but in the meantime it's hard to argue with Pavel's point of view!

Regards,

Randy

On 30 Jan 2012, at 15:26, Nathaniel Echols wrote:

...
On Mon, Jan 30, 2012 at 3:43 AM, Simon Kolstoemailto:[email protected]> wrote: > I see from a quick google that it is possible to pick my Rfree's > using thin resolution shells (coz I've got 20 fold NCS), however > as I am someone who tries to avoid the GUI where at all possible, Why? Some things are simply easier to do in the GUI, or at least more obvious - otherwise we wouldn't bother writing one.

> could someone let me know what the command line way of doing > this is? In phenix.refine, you probably want something like this (some parameters optional, but the defaults are probably not what most people expect):

xray_data.r_free_flags.generate=True xray_data.r_free_flags.fraction=0.05 xray_data.r_free_flags.max_free=None xray_data.r_free_flags.use_dataman_shells=True xray_data.r_free_flags.n_shells=20

Randy and Paul claim that this doesn't help very much with the NCS issue, however.

-Nat

Nathaniel Echols

7:22 p.m.

New subject: Picking Rfree in thin resolution shells using command line

On Tue, Jan 31, 2012 at 11:01 AM, Pavel Afonine wrote:

...

NCS: given state-of-the-art NCS restraints there is (probably) no clear-cut answer, but there are three ones: "definitely yes", "definitely no", and "try to find out". Obviously, at low enough resolution NCS should be always used (say ~2A and lower), simply because this provides a luxury of additional a priori information to alleviate the poor data-to-parameters ratio problem. Obviously, at high enough resolution (~1.5-1.7A or so) NCS should not be used since the amount of data may be enough to see actual differences between NCS copies, and using NCS would probably wipe out these difference (or at least there is such a risk). In the grey area, ~1.7-2.0A, one should try using vs not using NCS to know for sure.

A more quantitative answer: using the last six months' worth of PDB releases, for structures in the range 1.75 - 2.0 the best choice of NCS restraint (judged by R-free) was: none: 27% cartesian/global NCS: 11% (average improvement: 0.0073) torsion NCS: 62% (average improvement: 0.0036) However, some random checks (across a wider resolution range) indicate that a significant fraction of the structures where Cartesian NCS works best have under-assigned symmetry. Jeff is working on much more thorough tests, so a more definitive answer will eventually be available - but Pavel's recommendation will probably remain true. -Nat

A Leslie

9:21 a.m.

New subject: Picking Rfree in thin resolution shells using command line

Phil Evans kindly pointed out that I did send my message to the bulletin board after all (I just hit reply and as the Email was from Randy I assumed that it would simply go back to Randy !). So, as I suggested in that Email, my memory of the details is a bit hazy, so please do not quote me on any of the numbers. Andrew On 30 Jan 2012, at 17:06, Randy Read wrote:

...

I'd be meaning to contribute to this debate, and now that I see my name mentioned...

I used to be a very strong believer in selecting the cross- validation data in thin shells, when you have NCS. I even had a recollection (a case of false memory syndrome, it seems) that we did this for our own case of 20-fold NCS, i.e. four copies of the Shiga- like toxin B-subunit pentamer cocrystallized with the Gb3 trisaccharide (Ling et al, 1998).

As a believer in thin shells, I was trying to convince Pavel to put an option for this in Phenix (like the one in sftools). He said that he'd never seen any evidence that it was necessary or made any difference. So I went back to the Shiga-like toxin structure and started parallel refinements from the MR solution, either choosing the cross-validation data randomly or in thin shells. And, guess what, I couldn't see any significant difference in how well the refinement went, even though I was pretty certain before doing that experiment that it would make a big difference. In fact, both refinements went pretty well.

So if thin shells aren't necessary even in an extreme case of NCS, then I suspect that they're not that useful in the more usual case of lower-order NCS.

In any case, there is a problem even with the thin shells (which Bart Hazes pointed out even as he implemented it in sftools). The theory suggests that reflections within some distance in reciprocal space of some reflection or a point related to it by an NCS rotation should be correlated to the original reflection. All the points related by rotation will fall into the same resolution shell but, since the reciprocal-space distance is related to the inverse of the diameter of the molecule, the shell would have to have some thickness, and the reflections at the edge of the shell would still be correlated to reflections not in the shell. So even thin-shell cross-validation doesn't get around all the theoretical problems.

I'd be interested if someone has an example where it really does make a difference, but in the meantime it's hard to argue with Pavel's point of view!

Regards,

Randy

On 30 Jan 2012, at 15:26, Nathaniel Echols wrote:

...
On Mon, Jan 30, 2012 at 3:43 AM, Simon Kolstoe wrote:

...
I see from a quick google that it is possible to pick my Rfree's using thin resolution shells (coz I've got 20 fold NCS), however as I am someone who tries to avoid the GUI where at all possible,

Why? Some things are simply easier to do in the GUI, or at least more obvious - otherwise we wouldn't bother writing one.

...
could someone let me know what the command line way of doing this is?

In phenix.refine, you probably want something like this (some parameters optional, but the defaults are probably not what most people expect):

xray_data.r_free_flags.generate=True xray_data.r_free_flags.fraction=0.05 xray_data.r_free_flags.max_free=None xray_data.r_free_flags.use_dataman_shells=True xray_data.r_free_flags.n_shells=20

Randy and Paul claim that this doesn't help very much with the NCS issue, however.

-Nat _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

------ Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical Research Tel: + 44 1223 336500 Wellcome Trust/MRC Building Fax: + 44 1223 336827 Hills Road E-mail: [email protected] Cambridge CB2 0XY, U.K. www- structmed.cimr.cam.ac.uk

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Francis E Reyes

1:27 p.m.

New subject: Picking Rfree in thin resolution shells using command line

I remember the days I had a simple two fold ncs issue and someone commented that NCS may be strong in the low res reflections but usually the NCS is not as strict in the high res reflections. Low and behold while the backbone was pretty close among the NCS copies, side chains (particularly on the surface) were in different conformations. Carefully jumping over into reciprocal space, would it be unwise to choose an R-free set that wasn't evenly distributed across all resolution bins but rather weighted more towards the high resolution reflections? F --------------------------------------------- Francis E. Reyes M.Sc. 215 UCB University of Colorado at Boulder

Nathaniel Echols

3:27 p.m.

New subject: Picking Rfree in thin resolution shells using command line

On Tue, Jan 31, 2012 at 5:27 AM, Francis E Reyes wrote:

...

Carefully jumping over into reciprocal space, would it be unwise to choose an R-free set that wasn't evenly distributed across all resolution bins but rather weighted more towards the high resolution reflections?

I think so, but not necessarily because of any effects on R-free - the test set is also used for the maximum likelihood equations in refinement, which assume that it's more or less evenly distributed across the entire resolution range. Pavel probably can explain the messy details. -Nat

Christian Roth

12:06 p.m.

New subject: Picking Rfree in thin resolution shells using command line

Hi, I worked with crystals with 40 fold NCS and did a refinement with random picks and in thin shells. The difference in the R-values from the respective runs were comparable. and the dletaR was around 4.5 to 6% in both cases. So if there is a bias due to NCS it is equally strong in both cases. I found a paper from Fabiola et.al Acta CrystD (2006) D62 227-238 about the effects of NCS. For me it was quite complicated to follow all the mathematical things, but they mentioned, if I understood and rememberit correctly, that they excluded up to 60 % of the data from the refinement to remove the model bias. The data/parameter ratio will become a real problem I guess. Christian Am Montag 30 Januar 2012 16:26:57 schrieb Nathaniel Echols:

...

On Mon, Jan 30, 2012 at 3:43 AM, Simon Kolstoe wrote:

...
I see from a quick google that it is possible to pick my Rfree's using thin resolution shells (coz I've got 20 fold NCS), however as I am someone who tries to avoid the GUI where at all possible,

Why? Some things are simply easier to do in the GUI, or at least more obvious - otherwise we wouldn't bother writing one.

...
could someone let me know what the command line way of doing this is?

In phenix.refine, you probably want something like this (some parameters optional, but the defaults are probably not what most people expect):

xray_data.r_free_flags.generate=True xray_data.r_free_flags.fraction=0.05 xray_data.r_free_flags.max_free=None xray_data.r_free_flags.use_dataman_shells=True xray_data.r_free_flags.n_shells=20

Randy and Paul claim that this doesn't help very much with the NCS issue, however.

-Nat _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

5145

Age (days ago)

5146

Last active (days ago)

List overview

Download

14 comments

9 participants

participants (9)

A Leslie
Christian Roth
Felix Frolow
Francis E Reyes
Nathaniel Echols
Pavel Afonine
Randy Read
Simon Kolstoe
Weiergräber, Oliver H.