Five sets of PDFs about Refinement, Maps, Validation and PHENIX
Hi Everyone, recently I had to update or do from scratch my talk "slides" on various subjects such as crystallographic structure refinement, maps, validation and some general PHENIX overview. Since I spent pretty significant amount of time making all these "slides" , ... I was thinking that may be this might be of use for some of you, so here it is... - 42 pages of general introduction to structure refinement: http://www.phenix-online.org/presentations/latest/pavel_refinement_general.p... - 45 pages of phenix.refine overview (including extended details about its use from the command line): http://www.phenix-online.org/presentations/latest/pavel_phenix_refine.pdf - 42 pages of "Some Facts About Maps": http://www.phenix-online.org/presentations/latest/pavel_maps.pdf - 50 pages of "Crystallographic Structure Validation": http://www.phenix-online.org/presentations/latest/pavel_validation.pdf - 31 pages of introduction to PHENIX: http://www.phenix-online.org/presentations/latest/pavel_phenix_intro.pdf Most of the slides in "Introduction to PHENIX" came from Paul Adams, Tom Terwilliger and Nat Echols. Thanks Nat and Jeff Headd for providing with interesting examples of structures with unusual geometry. All the best! Pavel.
Hi Pavel A bunch of extremely useful slides, thanks for making them available to the world! Inevitably, though, given the nature of our field, there's one slide I must challenge, the 2nd-last one of the validation pack, where you recommend against the use of UNK atoms, but don't say why: <snip> Some programs and people tend to interpret unknown density using "dummy atoms". In PDB files it typically looks like this: ATOM 10 O UNK 2 6.348 -11.323 10.667 1.00 8.06 X ATOM 11 O UNK 2 6.994 -12.600 10.740 1.00 7.16 X ATOM 12 O UNK 2 6.028 -13.737 10.607 1.00 6.58 X ATOM 13 DUM UNK 2 6.796 -15.043 10.583 1.00 8.28 ATOM 14 DUM UNK 2 5.099 -13.727 11.792 1.00 7.15 - *Do not deposit this in PDB*, especially if chemical element type is undefined (rightmost column) </snip> Why should one not be allowed to indicate in the model that there was very clear, atomic density whose chemistry nevertheless could not be explained? The reason we group atoms into chemical entities is purely so we can impose restraints on interatomic distances and thereby compensate for the poor data-parameter ratio. But chemistry is not a substitute for making /scientific/ sense of the model, that problem frequently lies well beyond the reach of the model -- yet that should not stop the model from being deposited, as would be the logical conclusion of your recommendation. The scenario is universal, particularly acute in structural genomics but possible even in ligand-binding studies. (Of course, if what you meant to say that UNK atoms are to be used extremely judiciously, and it is *not* okay to flood a bad model with UNK atoms only to get the R-factor down -- then I'm totally with you!) phx. _Full disclosure_: I am one of the culprits behind the JCSG persuading the PDB to accept UNK as valid residue type, after we had run into simply too many models where there was clearly something bound, but no quick/cheap way of figuring out what it was. On 27/07/2010 20:43, Pavel Afonine wrote:
Hi Everyone,
recently I had to update or do from scratch my talk "slides" on various subjects such as crystallographic structure refinement, maps, validation and some general PHENIX overview. Since I spent pretty significant amount of time making all these "slides" , ... I was thinking that may be this might be of use for some of you, so here it is...
- 42 pages of general introduction to structure refinement: http://www.phenix-online.org/presentations/latest/pavel_refinement_general.p...
- 45 pages of phenix.refine overview (including extended details about its use from the command line): http://www.phenix-online.org/presentations/latest/pavel_phenix_refine.pdf
- 42 pages of "Some Facts About Maps": http://www.phenix-online.org/presentations/latest/pavel_maps.pdf
- 50 pages of "Crystallographic Structure Validation": http://www.phenix-online.org/presentations/latest/pavel_validation.pdf
- 31 pages of introduction to PHENIX: http://www.phenix-online.org/presentations/latest/pavel_phenix_intro.pdf
Most of the slides in "Introduction to PHENIX" came from Paul Adams, Tom Terwilliger and Nat Echols. Thanks Nat and Jeff Headd for providing with interesting examples of structures with unusual geometry.
All the best! Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Hi Frank, thanks a lot for your feedback - as always very useful and critical which is great!
the 2nd-last one of the validation pack, where you recommend against the use of UNK atoms, but don't say why:
<snip> Some programs and people tend to interpret unknown density using "dummy atoms". In PDB files it typically looks like this: ATOM 10 O UNK 2 6.348 -11.323 10.667 1.00 8.06 X ATOM 11 O UNK 2 6.994 -12.600 10.740 1.00 7.16 X ATOM 12 O UNK 2 6.028 -13.737 10.607 1.00 6.58 X ATOM 13 DUM UNK 2 6.796 -15.043 10.583 1.00 8.28 ATOM 14 DUM UNK 2 5.099 -13.727 11.792 1.00 7.15 - *Do not deposit this in PDB*, especially if chemical element type is undefined (rightmost column) </snip>
Sorry for not saying "why". If it ever happens for me to show these slides again in whatever School I promise to improve the slides to be as clear as possible. The problems with records like: ATOM 10 O UNK 2 6.348 -11.323 10.667 1.00 8.06 X ATOM 10 O UNK 2 6.348 -11.323 10.667 1.00 8.06 ATOM 13 DUM UNK 2 6.796 -15.043 10.583 1.00 8.28 ATOM 13 DUM UNK 2 6.796 -15.043 10.583 1.00 8.28 X are: - the chemical element type (column 77-78 ?) (that one that use in Fcalc calculation and also may provide the charge) is undefined (simply blank or "X"), so there is no way to include these dummy atoms into structure factor calculations; - even if you have "O" like in the first example this often contradicts with "X" in rightmost column, so you have to use guesswork, which is not good for interpreting well defined formatted data files. Plus, of course, not way to tell the charge; - even if you have "O" like in the second example the element type in rightmost column is missing. Therefore it is a weak information to take: we cannot reliably extract scattering type from atom label - classical example CA (Calcium) and CA (C-alpha); - of course, we can make the program simply ignore these atoms (hm... sounds like a bad practice: don't read it if you can't read it - this way we may end up being ignorant -:) ). But are we sure that the original program that put these dummies was also not using them in Fcalc calculation? Or may be it was using some default scattering factor for them? Which one: H or O or N (N better approximates than O)? - furthermore, since we are lacking such a fundamental property of these dummy atoms as scattering type, it it laughable to assign some B-factors to these atoms! Look through PDB: you will find a some smart looking B-factors, such as 8.06 A**2 for an non-existing element X -:) In summary: - do not put there anything hoping that future generation smarter software will find out what it is; - if you want to put something there (which has valid reasons actually - this will improve the overall map quality which is good - then please properly define it). All the best! Pavel.
I think it is very important to be able to include unknown atoms in a deposited pdb file (with echoing the caveat about flooding the structure with UNK's to lower the R-factor). For one thing, these structures are produced not just for structure-factor calculation and validation. Many of the end users will never even bother to do a structure factor calculation. It important for the depositor to be able to refer to an unknown but likely significant ligand and for the reader to be able to go and look at that position (ideally surrounded by electron density). For another thing, the structure factor calculation will give exactly the same result whether the dummy atoms are omitted or are flagged with zero occupancy or atom-type X to be ignored in sf calculation. In the first case the person calculating structure factors can feel good because the results are exactly right for that model. In the second case he feels bad because he wasn't able to correctly account for those atoms. But the first case is actually a better model. Better to get a slightly wrong value for better model than the correct result for the less good model, especially when the two results are exactly the same. Essentially we are faced with an insurmountable problem: we cannot do a proper job of calculating sf's because of the unk atoms. Better to include but ignore them in sf calc, I think, than to eliminate them and kid ourselves that now we have the right answer. However if the depositor has refined them (suggested by the B-factors present in some of the files), and perhaps chosen an atom-type which results in B-factors compatible with surrounding, it should be possible to include the atom type so his R-factor can be reproduced. This runs the risk of someone over-interpreting the PDB ("I thought I knew what the UNK residue is, but my candidate has 3 C and one N where the UNK has 4 C"). my 2 cents, Ed Pavel Afonine wrote:
Hi Frank,
thanks a lot for your feedback - as always very useful and critical which is great!
the 2nd-last one of the validation pack, where you recommend against the use of UNK atoms, but don't say why:
<snip> Some programs and people tend to interpret unknown density using “dummy atoms”. In PDB files it typically looks like this: ATOM 10 O UNK 2 6.348 -11.323 10.667 1.00 8.06 X ATOM 11 O UNK 2 6.994 -12.600 10.740 1.00 7.16 X ATOM 12 O UNK 2 6.028 -13.737 10.607 1.00 6.58 X ATOM 13 DUM UNK 2 6.796 -15.043 10.583 1.00 8.28 ATOM 14 DUM UNK 2 5.099 -13.727 11.792 1.00 7.15 - *Do not deposit this in PDB*, especially if chemical element type is undefined (rightmost column) </snip>
Sorry for not saying "why". If it ever happens for me to show these slides again in whatever School I promise to improve the slides to be as clear as possible.
The problems with records like:
ATOM 10 O UNK 2 6.348 -11.323 10.667 1.00 8.06 X ATOM 10 O UNK 2 6.348 -11.323 10.667 1.00 8.06 ATOM 13 DUM UNK 2 6.796 -15.043 10.583 1.00 8.28 ATOM 13 DUM UNK 2 6.796 -15.043 10.583 1.00 8.28 X
are:
- the chemical element type (column 77-78 ?) (that one that use in Fcalc calculation and also may provide the charge) is undefined (simply blank or "X"), so there is no way to include these dummy atoms into structure factor calculations;
- even if you have "O" like in the first example this often contradicts with "X" in rightmost column, so you have to use guesswork, which is not good for interpreting well defined formatted data files. Plus, of course, not way to tell the charge;
- even if you have "O" like in the second example the element type in rightmost column is missing. Therefore it is a weak information to take: we cannot reliably extract scattering type from atom label - classical example CA (Calcium) and CA (C-alpha);
- of course, we can make the program simply ignore these atoms (hm... sounds like a bad practice: don't read it if you can't read it - this way we may end up being ignorant -:) ). But are we sure that the original program that put these dummies was also not using them in Fcalc calculation? Or may be it was using some default scattering factor for them? Which one: H or O or N (N better approximates than O)?
- furthermore, since we are lacking such a fundamental property of these dummy atoms as scattering type, it it laughable to assign some B-factors to these atoms! Look through PDB: you will find a some smart looking B-factors, such as 8.06 A**2 for an non-existing element X -:)
In summary:
- do not put there anything hoping that future generation smarter software will find out what it is; - if you want to put something there (which has valid reasons actually - this will improve the overall map quality which is good - then please properly define it).
All the best! Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Dear Ed,
I think it is very important to be able to include unknown atoms in a deposited pdb file (with echoing the caveat about flooding the structure with UNK's to lower the R-factor).
yes, as I wrote in original reply, including these atoms may improve the map and in turn may reveal or improve some its other important (biologically) places. The only point is: please define these dummy atoms properly, providing all the information, such as scattering element type that you or your program used for such an approximation.
For one thing, these structures are produced not just for structure-factor calculation and validation. Many of the end users will never even bother to do a structure factor calculation.
The ability to reproduce the R-factor is not only for someones pleasure but for the validation purposes at least. If I've got a PDB file for which I can't compute the R-factors (and, by the way, even the map too), then I don't need the deposited Fobs too, unless I'm going to re-determine the structure from scratch.
It important for the depositor to be able to refer to an unknown but likely significant ligand and for the reader to be able to go and look at that position (ideally surrounded by electron density).
Sure, it is important.
For another thing, the structure factor calculation will give exactly the same result whether the dummy atoms are omitted or are flagged with zero occupancy or atom-type X to be ignored in sf calculation.
If you look in PDB you will find that very often the occupancies are not set up to 1. Plus, as I mentioned, often the B-factors for these atoms are set to some funny numbers (looks like they were refined). Are we sure that these programs were ignoring these dummies in Fcalc calculations? If so how the B-factor were refined, or they were made up? Again, if it is defined properly, for example, like this: ATOM 1959 O DUM A 1 -8.762 8.060 25.324 1.00 31.23 O or ATOM 1959 O UNK A 1 -8.762 8.060 25.324 1.00 31.23 O then it is absolutely OK to have such entries, because it is completely defined and can be used in any calculations without any unnecessary guesswork. But if you start masking things with X or blanks then I (and the software I write) will start asking all these nasty questions... All the best! Pavel.
UNK residues have another valid use where you can see peptide but not assign a sequence register. A poly-Ala model in that case is better labelled UNK than ALA, since it isn't ALA Phil On 28 Jul 2010, at 19:12, Pavel Afonine wrote:
Dear Ed,
I think it is very important to be able to include unknown atoms in a deposited pdb file (with echoing the caveat about flooding the structure with UNK's to lower the R-factor).
yes, as I wrote in original reply, including these atoms may improve the map and in turn may reveal or improve some its other important (biologically) places. The only point is: please define these dummy atoms properly, providing all the information, such as scattering element type that you or your program used for such an approximation.
For one thing, these structures are produced not just for structure-factor calculation and validation. Many of the end users will never even bother to do a structure factor calculation.
The ability to reproduce the R-factor is not only for someones pleasure but for the validation purposes at least. If I've got a PDB file for which I can't compute the R-factors (and, by the way, even the map too), then I don't need the deposited Fobs too, unless I'm going to re-determine the structure from scratch.
It important for the depositor to be able to refer to an unknown but likely significant ligand and for the reader to be able to go and look at that position (ideally surrounded by electron density).
Sure, it is important.
For another thing, the structure factor calculation will give exactly the same result whether the dummy atoms are omitted or are flagged with zero occupancy or atom-type X to be ignored in sf calculation.
If you look in PDB you will find that very often the occupancies are not set up to 1. Plus, as I mentioned, often the B-factors for these atoms are set to some funny numbers (looks like they were refined). Are we sure that these programs were ignoring these dummies in Fcalc calculations? If so how the B-factor were refined, or they were made up?
Again, if it is defined properly, for example, like this:
ATOM 1959 O DUM A 1 -8.762 8.060 25.324 1.00 31.23 O
or
ATOM 1959 O UNK A 1 -8.762 8.060 25.324 1.00 31.23 O
then it is absolutely OK to have such entries, because it is completely defined and can be used in any calculations without any unnecessary guesswork. But if you start masking things with X or blanks then I (and the software I write) will start asking all these nasty questions...
All the best! Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
One disadvantage of using UNK is that it is often a loss of information. For example in the case Phil mentions...we do think that we have a polypeptide. By labelling protein residues UNK we no longer distinguish them from DNA, or depending on HETATM vs ATOM identification, from ligands. -Tom T On Jul 28, 2010, at 4:01 PM, Phil Evans wrote:
UNK residues have another valid use where you can see peptide but not assign a sequence register. A poly-Ala model in that case is better labelled UNK than ALA, since it isn't ALA
Phil
On 28 Jul 2010, at 19:12, Pavel Afonine wrote:
Dear Ed,
I think it is very important to be able to include unknown atoms in a deposited pdb file (with echoing the caveat about flooding the structure with UNK's to lower the R-factor).
yes, as I wrote in original reply, including these atoms may improve the map and in turn may reveal or improve some its other important (biologically) places. The only point is: please define these dummy atoms properly, providing all the information, such as scattering element type that you or your program used for such an approximation.
For one thing, these structures are produced not just for structure-factor calculation and validation. Many of the end users will never even bother to do a structure factor calculation.
The ability to reproduce the R-factor is not only for someones pleasure but for the validation purposes at least. If I've got a PDB file for which I can't compute the R-factors (and, by the way, even the map too), then I don't need the deposited Fobs too, unless I'm going to re-determine the structure from scratch.
It important for the depositor to be able to refer to an unknown but likely significant ligand and for the reader to be able to go and look at that position (ideally surrounded by electron density).
Sure, it is important.
For another thing, the structure factor calculation will give exactly the same result whether the dummy atoms are omitted or are flagged with zero occupancy or atom-type X to be ignored in sf calculation.
If you look in PDB you will find that very often the occupancies are not set up to 1. Plus, as I mentioned, often the B-factors for these atoms are set to some funny numbers (looks like they were refined). Are we sure that these programs were ignoring these dummies in Fcalc calculations? If so how the B-factor were refined, or they were made up?
Again, if it is defined properly, for example, like this:
ATOM 1959 O DUM A 1 -8.762 8.060 25.324 1.00 31.23 O
or
ATOM 1959 O UNK A 1 -8.762 8.060 25.324 1.00 31.23 O
then it is absolutely OK to have such entries, because it is completely defined and can be used in any calculations without any unnecessary guesswork. But if you start masking things with X or blanks then I (and the software I write) will start asking all these nasty questions...
All the best! Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Thomas C. Terwilliger Mail Stop M888 Los Alamos National Laboratory Los Alamos, NM 87545 Tel: 505-667-0072 email: [email protected] Fax: 505-665-3024 SOLVE web site: http://solve.lanl.gov PHENIX web site: http:www.phenix-online.org ISFI Integrated Center for Structure and Function Innovation web site: http://techcenter.mbi.ucla.edu TB Structural Genomics Consortium web site: http://www.doe-mbi.ucla.edu/TB CBSS Center for Bio-Security Science web site: http://www.lanl.gov/cbss
I agree with Phil about UNK - it seems to be good indeed to call unknown (undefined) residue as it appears on the map rather than call ALA something that in fact is TYR, and then later on getting confused about the mismatch between actual sequence and the one derived from PDB file. This is actually what I get confused all the time looking at results of model building programs, because the first thing I always do is I compare the real actual sequence with the one derived from PDB file - just to validate the result of model building. However, I agree with Tom too about loosing identity in cases where we really do know what to expect: polypeptide or rna/dna. Hm... interesting situation -:) I guess UNK is may be still better, ONLY IF you go one level deeper and look at atom names (or make sure you do that consistently). Say you name a "residue" as UNK and name corresponding atoms within this residue as CA, N, C, O (kind of peptide pattern) - then you have a chance to guess what it is. Of course how you then know where you place those CA,N,C and O... Pavel. On 7/28/10 3:19 PM, Tom Terwilliger wrote:
One disadvantage of using UNK is that it is often a loss of information. For example in the case Phil mentions...we do think that we have a polypeptide. By labelling protein residues UNK we no longer distinguish them from DNA, or depending on HETATM vs ATOM identification, from ligands. -Tom T
On Jul 28, 2010, at 4:01 PM, Phil Evans wrote:
UNK residues have another valid use where you can see peptide but not assign a sequence register. A poly-Ala model in that case is better labelled UNK than ALA, since it isn't ALA
Phil
On 28 Jul 2010, at 19:12, Pavel Afonine wrote:
Dear Ed,
I think it is very important to be able to include unknown atoms in a deposited pdb file (with echoing the caveat about flooding the structure with UNK's to lower the R-factor).
yes, as I wrote in original reply, including these atoms may improve the map and in turn may reveal or improve some its other important (biologically) places. The only point is: please define these dummy atoms properly, providing all the information, such as scattering element type that you or your program used for such an approximation.
For one thing, these structures are produced not just for structure-factor calculation and validation. Many of the end users will never even bother to do a structure factor calculation.
The ability to reproduce the R-factor is not only for someones pleasure but for the validation purposes at least. If I've got a PDB file for which I can't compute the R-factors (and, by the way, even the map too), then I don't need the deposited Fobs too, unless I'm going to re-determine the structure from scratch.
It important for the depositor to be able to refer to an unknown but likely significant ligand and for the reader to be able to go and look at that position (ideally surrounded by electron density).
Sure, it is important.
For another thing, the structure factor calculation will give exactly the same result whether the dummy atoms are omitted or are flagged with zero occupancy or atom-type X to be ignored in sf calculation.
If you look in PDB you will find that very often the occupancies are not set up to 1. Plus, as I mentioned, often the B-factors for these atoms are set to some funny numbers (looks like they were refined). Are we sure that these programs were ignoring these dummies in Fcalc calculations? If so how the B-factor were refined, or they were made up?
Again, if it is defined properly, for example, like this:
ATOM 1959 O DUM A 1 -8.762 8.060 25.324 1.00 31.23 O
or
ATOM 1959 O UNK A 1 -8.762 8.060 25.324 1.00 31.23 O
then it is absolutely OK to have such entries, because it is completely defined and can be used in any calculations without any unnecessary guesswork. But if you start masking things with X or blanks then I (and the software I write) will start asking all these nasty questions...
All the best! Pavel.
_______________________________________________ phenixbb mailing list [email protected] mailto:[email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] mailto:[email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Thomas C. Terwilliger Mail Stop M888 Los Alamos National Laboratory Los Alamos, NM 87545
Tel: 505-667-0072 email: [email protected] mailto:[email protected] Fax: 505-665-3024 SOLVE web site: http://solve.lanl.gov PHENIX web site: http:www.phenix-online.org http://www.phenix-online.org ISFI Integrated Center for Structure and Function Innovation web site: http://techcenter.mbi.ucla.edu TB Structural Genomics Consortium web site: http://www.doe-mbi.ucla.edu/TB CBSS Center for Bio-Security Science web site: http://www.lanl.gov/cbss
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
I was thinking of the case (which we have had) where we could place the peptide plausibly (eg in a helix) but not identify the side chain. Maybe there should be different UNK-likes, for unknown amino-acid, unknown nucleotide, unknown thing Phil On 28 Jul 2010, at 23:34, Pavel Afonine wrote:
I agree with Phil about UNK - it seems to be good indeed to call unknown (undefined) residue as it appears on the map rather than call ALA something that in fact is TYR, and then later on getting confused about the mismatch between actual sequence and the one derived from PDB file. This is actually what I get confused all the time looking at results of model building programs, because the first thing I always do is I compare the real actual sequence with the one derived from PDB file - just to validate the result of model building.
However, I agree with Tom too about loosing identity in cases where we really do know what to expect: polypeptide or rna/dna.
Hm... interesting situation -:)
I guess UNK is may be still better, ONLY IF you go one level deeper and look at atom names (or make sure you do that consistently). Say you name a "residue" as UNK and name corresponding atoms within this residue as CA, N, C, O (kind of peptide pattern) - then you have a chance to guess what it is. Of course how you then know where you place those CA,N,C and O...
Pavel.
On 7/28/10 3:19 PM, Tom Terwilliger wrote:
One disadvantage of using UNK is that it is often a loss of information. For example in the case Phil mentions...we do think that we have a polypeptide. By labelling protein residues UNK we no longer distinguish them from DNA, or depending on HETATM vs ATOM identification, from ligands. -Tom T
On Jul 28, 2010, at 4:01 PM, Phil Evans wrote:
UNK residues have another valid use where you can see peptide but not assign a sequence register. A poly-Ala model in that case is better labelled UNK than ALA, since it isn't ALA
Phil
On 28 Jul 2010, at 19:12, Pavel Afonine wrote:
Dear Ed,
I think it is very important to be able to include unknown atoms in a deposited pdb file (with echoing the caveat about flooding the structure with UNK's to lower the R-factor).
yes, as I wrote in original reply, including these atoms may improve the map and in turn may reveal or improve some its other important (biologically) places. The only point is: please define these dummy atoms properly, providing all the information, such as scattering element type that you or your program used for such an approximation.
For one thing, these structures are produced not just for structure-factor calculation and validation. Many of the end users will never even bother to do a structure factor calculation.
The ability to reproduce the R-factor is not only for someones pleasure but for the validation purposes at least. If I've got a PDB file for which I can't compute the R-factors (and, by the way, even the map too), then I don't need the deposited Fobs too, unless I'm going to re-determine the structure from scratch.
It important for the depositor to be able to refer to an unknown but likely significant ligand and for the reader to be able to go and look at that position (ideally surrounded by electron density).
Sure, it is important.
For another thing, the structure factor calculation will give exactly the same result whether the dummy atoms are omitted or are flagged with zero occupancy or atom-type X to be ignored in sf calculation.
If you look in PDB you will find that very often the occupancies are not set up to 1. Plus, as I mentioned, often the B-factors for these atoms are set to some funny numbers (looks like they were refined). Are we sure that these programs were ignoring these dummies in Fcalc calculations? If so how the B-factor were refined, or they were made up?
Again, if it is defined properly, for example, like this:
ATOM 1959 O DUM A 1 -8.762 8.060 25.324 1.00 31.23 O
or
ATOM 1959 O UNK A 1 -8.762 8.060 25.324 1.00 31.23 O
then it is absolutely OK to have such entries, because it is completely defined and can be used in any calculations without any unnecessary guesswork. But if you start masking things with X or blanks then I (and the software I write) will start asking all these nasty questions...
All the best! Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Thomas C. Terwilliger Mail Stop M888 Los Alamos National Laboratory Los Alamos, NM 87545
Tel: 505-667-0072 email: [email protected] Fax: 505-665-3024 SOLVE web site: http://solve.lanl.gov PHENIX web site: http:www.phenix-online.org ISFI Integrated Center for Structure and Function Innovation web site: http://techcenter.mbi.ucla.edu TB Structural Genomics Consortium web site: http://www.doe-mbi.ucla.edu/TB CBSS Center for Bio-Security Science web site: http://www.lanl.gov/cbss
_______________________________________________ phenixbb mailing list
[email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
UNP - for unknown peptides, UND- for DNA, UNR for RNA, UNL- for ligands and UNK for things that we don't know that we don't know (quoting D. Ramsfeld). Boaz ----- Original Message ----- From: Phil Evans
Date: Thursday, July 29, 2010 2:29 Subject: Re: [phenixbb] Dummy atoms To: PHENIX user mailing listI was thinking of the case (which we have had) where we could place the peptide plausibly (eg in a helix) but not identify the side chain. Maybe there should be different UNK-likes, for unknown amino-acid, unknown nucleotide, unknown thing
Phil
On 28 Jul 2010, at 23:34, Pavel Afonine wrote:
I agree with Phil about UNK - it seems to be good indeed to call unknown (undefined) residue as it appears on the map rather than call ALA something that in fact is TYR, and then later on getting confused about the mismatch between actual sequence and the one derived from PDB file. This is actually what I get confused all the time looking at results of model building programs, because the first thing I always do is I compare the real actual sequence with the one derived from PDB file - just to validate the result of model building.
However, I agree with Tom too about loosing identity in cases where we really do know what to expect: polypeptide or rna/dna.
Hm... interesting situation -:)
I guess UNK is may be still better, ONLY IF you go one level deeper and look at atom names (or make sure you do that consistently). Say you name a "residue" as UNK and name corresponding atoms within this residue as CA, N, C, O (kind of peptide pattern) - then you have a chance to guess what it is. Of course how you then know where you place those CA,N,C and O...
Pavel.
One disadvantage of using UNK is that it is often a loss of information. For example in the case Phil mentions...we do think
-Tom T
On Jul 28, 2010, at 4:01 PM, Phil Evans wrote:
UNK residues have another valid use where you can see
Phil
On 28 Jul 2010, at 19:12, Pavel Afonine wrote:
Dear Ed,
I think it is very important to be able to include unknown atoms in a deposited pdb file (with echoing the caveat about flooding the structure with UNK's to lower the R-factor).
yes, as I wrote in original reply, including these atoms
may improve the map and in turn may reveal or improve some its other important (biologically) places. The only point is: please define these dummy atoms properly, providing all the information, such as scattering element type that you or your
For one thing, these structures are produced not just for
structure-factor
calculation and validation. Many of the end users will never even bother to do a structure factor calculation.
The ability to reproduce the R-factor is not only for someones pleasure but for the validation purposes at least. If I've got a PDB file for which I can't compute the R-factors (and, by the way, even the map too), then I don't need the deposited Fobs too, unless I'm going to re-determine the structure from scratch.
It important for the depositor to be able to refer to an unknown but likely significant>>>>> ligand and for the reader to be able to go and look at that position (ideally surrounded by electron density).
Sure, it is important.
For another thing, the structure factor calculation will give exactly the same result whether the dummy atoms are omitted or are flagged>>>>> with zero occupancy or atom-type X to be ignored in sf calculation.
If you look in PDB you will find that very often the occupancies are not set up to 1. Plus, as I mentioned, often the B-factors for these atoms are set to some funny numbers (looks
Are we sure that these programs were ignoring these dummies in Fcalc calculations? If so how the B-factor were refined, or
Again, if it is defined properly, for example, like this:
ATOM 1959 O DUM A
1 -8.762 8.060 25.324 1.00 31.23 O
or
ATOM 1959 O UNK A
1 -8.762 8.060 25.324 1.00 31.23 O
then it is absolutely OK to have such entries, because it
is completely defined and can be used in any calculations without any unnecessary guesswork. But if you start masking
On 7/28/10 3:19 PM, Tom Terwilliger wrote: that we have a polypeptide. By labelling protein residues UNK we no longer distinguish them from DNA, or depending on HETATM vs ATOM identification, from ligands. peptide but not assign a sequence register. A poly-Ala model in that case is better labelled UNK than ALA, since it isn't ALA program used for such an approximation. like they were refined). they were made up? things with X or blanks then I (and the software I write) will start asking all these nasty questions...
All the best! Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Thomas C. Terwilliger Mail Stop M888 Los Alamos National Laboratory Los Alamos, NM 87545
Tel: 505-667- 0072 email: [email protected] Fax: 505-665- 3024 SOLVE web site: http://solve.lanl.gov PHENIX web site: http:www.phenix-online.org ISFI Integrated Center for Structure and Function Innovation web site: http://techcenter.mbi.ucla.edu TB Structural Genomics Consortium web site: http://www.doe- mbi.ucla.edu/TB>> CBSS Center for Bio-Security Science web site: http://www.lanl.gov/cbss>>
_______________________________________________ phenixbb mailing list
[email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel Phone: 972-8-647-2220 ; Fax: 646-1710 Skype: boaz.shaanan
Unfortunately, UND is undecane, but UNP and UNR are not specified yet.
On Thu, Jul 29, 2010 at 11:18 AM, Boaz Shaanan
UNP - for unknown peptides, UND- for DNA, UNR for RNA, UNL- for ligands and UNK for things that we don't know that we don't know (quoting D. Ramsfeld). Boaz
----- Original Message ----- From: Phil Evans
Date: Thursday, July 29, 2010 2:29 Subject: Re: [phenixbb] Dummy atoms To: PHENIX user mailing listI was thinking of the case (which we have had) where we could place the peptide plausibly (eg in a helix) but not identify the side chain. Maybe there should be different UNK-likes, for unknown amino-acid, unknown nucleotide, unknown thing
Phil
On 28 Jul 2010, at 23:34, Pavel Afonine wrote:
I agree with Phil about UNK - it seems to be good indeed to call unknown (undefined) residue as it appears on the map rather than call ALA something that in fact is TYR, and then later on getting confused about the mismatch between actual sequence and the one derived from PDB file. This is actually what I get confused all the time looking at results of model building programs, because the first thing I always do is I compare the real actual sequence with the one derived from PDB file - just to validate the result of model building.
However, I agree with Tom too about loosing identity in cases where we really do know what to expect: polypeptide or rna/dna.
Hm... interesting situation -:)
I guess UNK is may be still better, ONLY IF you go one level deeper and look at atom names (or make sure you do that consistently). Say you name a "residue" as UNK and name corresponding atoms within this residue as CA, N, C, O (kind of peptide pattern) - then you have a chance to guess what it is. Of course how you then know where you place those CA,N,C and O...
Pavel.
One disadvantage of using UNK is that it is often a loss of information. For example in the case Phil mentions...we do think
-Tom T
On Jul 28, 2010, at 4:01 PM, Phil Evans wrote:
UNK residues have another valid use where you can see
Phil
On 28 Jul 2010, at 19:12, Pavel Afonine wrote:
Dear Ed,
> I think it is very important to be able to include unknown atoms > in a deposited pdb file (with echoing the caveat about flooding > the structure with UNK's to lower the R-factor).
yes, as I wrote in original reply, including these atoms
may improve the map and in turn may reveal or improve some its other important (biologically) places. The only point is: please define these dummy atoms properly, providing all the information, such as scattering element type that you or your
> For one thing, these structures are produced not just for
structure-factor
> calculation and validation. Many of the end users will never even > bother to do a structure factor calculation.
The ability to reproduce the R-factor is not only for someones pleasure but for the validation purposes at least. If I've got a PDB file for which I can't compute the R-factors (and, by the way, even the map too), then I don't need the deposited Fobs too, unless I'm going to re-determine the structure from scratch.
> It important for the > depositor to be able to refer to an unknown but likely significant>>>>> ligand and for the reader to be able to go and look at that position > (ideally surrounded by electron density).
Sure, it is important.
> For another thing, the structure factor calculation will give exactly > the same result whether the dummy atoms are omitted or are flagged>>>>> with zero occupancy or atom-type X to be ignored in sf calculation.
If you look in PDB you will find that very often the occupancies are not set up to 1. Plus, as I mentioned, often the B-factors for these atoms are set to some funny numbers (looks
Are we sure that these programs were ignoring these dummies in Fcalc calculations? If so how the B-factor were refined, or
Again, if it is defined properly, for example, like this:
ATOM 1959 O DUM A
1 -8.762 8.060 25.324 1.00 31.23 O
or
ATOM 1959 O UNK A
1 -8.762 8.060 25.324 1.00 31.23 O
then it is absolutely OK to have such entries, because it
is completely defined and can be used in any calculations without any unnecessary guesswork. But if you start masking
On 7/28/10 3:19 PM, Tom Terwilliger wrote: that we have a polypeptide. By labelling protein residues UNK we no longer distinguish them from DNA, or depending on HETATM vs ATOM identification, from ligands. peptide but not assign a sequence register. A poly-Ala model in that case is better labelled UNK than ALA, since it isn't ALA program used for such an approximation. like they were refined). they were made up? things with X or blanks then I (and the software I write) will start asking all these nasty questions...
All the best! Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Thomas C. Terwilliger Mail Stop M888 Los Alamos National Laboratory Los Alamos, NM 87545
Tel: 505-667- 0072 email: [email protected] Fax: 505-665- 3024 SOLVE web site: http://solve.lanl.gov PHENIX web site: http:www.phenix-online.org ISFI Integrated Center for Structure and Function Innovation web site: http://techcenter.mbi.ucla.edu TB Structural Genomics Consortium web site: http://www.doe- mbi.ucla.edu/TB>> CBSS Center for Bio-Security Science web site: http://www.lanl.gov/cbss>>
_______________________________________________ phenixbb mailing list
[email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel Phone: 972-8-647-2220 ; Fax: 646-1710 Skype: boaz.shaanan _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- Nigel W. Moriarty Building 64R0246B, Physical Biosciences Division Lawrence Berkeley National Laboratory Berkeley, CA 94720-8235 Phone : 510-486-5709 Email : [email protected] Fax : 510-486-5909 Web : CCI.LBL.gov
Folks
I think that UNK is an unknown residue, UNL is an unknown ligand, UNX
is an unknown atom or ion and UPL is "UNKNOWN BRANCHED FRAGMENT OF
PHOSPHOLIPID". Some of these have atoms in the chemical components
entry (which is strange) and some don't. Naturally, the user has to
make the decision to use the correct one ...
Nigel
On Wed, Jul 28, 2010 at 3:19 PM, Tom Terwilliger
One disadvantage of using UNK is that it is often a loss of information. For example in the case Phil mentions...we do think that we have a polypeptide. By labelling protein residues UNK we no longer distinguish them from DNA, or depending on HETATM vs ATOM identification, from ligands. -Tom T On Jul 28, 2010, at 4:01 PM, Phil Evans wrote:
UNK residues have another valid use where you can see peptide but not assign a sequence register. A poly-Ala model in that case is better labelled UNK than ALA, since it isn't ALA
Phil
On 28 Jul 2010, at 19:12, Pavel Afonine wrote:
Dear Ed,
I think it is very important to be able to include unknown atoms
in a deposited pdb file (with echoing the caveat about flooding
the structure with UNK's to lower the R-factor).
yes, as I wrote in original reply, including these atoms may improve the map and in turn may reveal or improve some its other important (biologically) places. The only point is: please define these dummy atoms properly, providing all the information, such as scattering element type that you or your program used for such an approximation.
For one thing, these structures are produced not just for structure-factor
calculation and validation. Many of the end users will never even
bother to do a structure factor calculation.
The ability to reproduce the R-factor is not only for someones pleasure but for the validation purposes at least. If I've got a PDB file for which I can't compute the R-factors (and, by the way, even the map too), then I don't need the deposited Fobs too, unless I'm going to re-determine the structure from scratch.
It important for the
depositor to be able to refer to an unknown but likely significant
ligand and for the reader to be able to go and look at that position
(ideally surrounded by electron density).
Sure, it is important.
For another thing, the structure factor calculation will give exactly
the same result whether the dummy atoms are omitted or are flagged
with zero occupancy or atom-type X to be ignored in sf calculation.
If you look in PDB you will find that very often the occupancies are not set up to 1. Plus, as I mentioned, often the B-factors for these atoms are set to some funny numbers (looks like they were refined).
Are we sure that these programs were ignoring these dummies in Fcalc calculations? If so how the B-factor were refined, or they were made up?
Again, if it is defined properly, for example, like this:
ATOM 1959 O DUM A 1 -8.762 8.060 25.324 1.00 31.23 O
or
ATOM 1959 O UNK A 1 -8.762 8.060 25.324 1.00 31.23 O
then it is absolutely OK to have such entries, because it is completely defined and can be used in any calculations without any unnecessary guesswork. But if you start masking things with X or blanks then I (and the software I write) will start asking all these nasty questions...
All the best!
Pavel.
_______________________________________________
phenixbb mailing list
http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Thomas C. Terwilliger Mail Stop M888 Los Alamos National Laboratory Los Alamos, NM 87545
Tel: 505-667-0072 email: [email protected] Fax: 505-665-3024 SOLVE web site: http://solve.lanl.gov PHENIX web site: http:www.phenix-online.org ISFI Integrated Center for Structure and Function Innovation web site: http://techcenter.mbi.ucla.edu TB Structural Genomics Consortium web site: http://www.doe-mbi.ucla.edu/TB CBSS Center for Bio-Security Science web site: http://www.lanl.gov/cbss
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- Nigel W. Moriarty Building 64R0246B, Physical Biosciences Division Lawrence Berkeley National Laboratory Berkeley, CA 94720-8235 Phone : 510-486-5709 Email : [email protected] Fax : 510-486-5909 Web : CCI.LBL.gov
This is a very nice gift to the community - thank you Pavel et al.
Laurie Betts
UNC Chapel Hill
On Tue, Jul 27, 2010 at 2:43 PM, Pavel Afonine
Hi Everyone,
recently I had to update or do from scratch my talk "slides" on various subjects such as crystallographic structure refinement, maps, validation and some general PHENIX overview. Since I spent pretty significant amount of time making all these "slides" , ... I was thinking that may be this might be of use for some of you, so here it is...
- 42 pages of general introduction to structure refinement:
http://www.phenix-online.org/presentations/latest/pavel_refinement_general.p...
- 45 pages of phenix.refine overview (including extended details about its use from the command line): http://www.phenix-online.org/presentations/latest/pavel_phenix_refine.pdf
- 42 pages of "Some Facts About Maps": http://www.phenix-online.org/presentations/latest/pavel_maps.pdf
- 50 pages of "Crystallographic Structure Validation": http://www.phenix-online.org/presentations/latest/pavel_validation.pdf
- 31 pages of introduction to PHENIX: http://www.phenix-online.org/presentations/latest/pavel_phenix_intro.pdf
Most of the slides in "Introduction to PHENIX" came from Paul Adams, Tom Terwilliger and Nat Echols. Thanks Nat and Jeff Headd for providing with interesting examples of structures with unusual geometry.
All the best! Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
participants (8)
-
Boaz Shaanan
-
Edward A. Berry
-
Frank von Delft
-
Laurie Betts
-
Nigel Moriarty
-
Pavel Afonine
-
Phil Evans
-
Tom Terwilliger