model vs Wilson b-factor
Hi all, we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A. The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!). However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2). Obviously I'm not too happy with it. Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%. Any ideas? Thanks in advance, Gino ****************************************************************************** Gino Cingolani, Ph.D. Associate Professor Thomas Jefferson University Dept. of Biochemistry & Molecular Biology 233 South 10th Street - Room 826 Philadelphia PA 19107 Office (215) 503 4573 Lab (215) 503 4595 Fax (215) 923 2117 E-mail: [email protected] ****************************************************************************** "Nati non foste per viver come bruti, ma per seguir virtute e canoscenza" ("You were not born to live like brutes, but to follow virtue and knowledge") Dante, The Divine Comedy (Inferno, XXVI, vv. 119-120)
Hi Gino, here are a few points: - my understanding (please correct me if I'm wrong) is that the accuracy of Wilson B estimate drops with the resolution: lower the resolution, less accurate is the estimate; - Wilson B is not a given calculated value - it's just an estimate; - the total atomic B-factor includes the trace of overall anisotropic scale matrix (see Fmodel formula for the total model structure factor: Fmodel = scale_overall * exp(-h*U_overall*ht) * (Fcalc + k_sol * exp(-B_sol*s^2) * Fmask) ). You can try to disable this and see if this was the cause (use "apply_back_trace_of_b_cart=true" keyword for this). - the things you "tried to resolve this discrepancy" will unlikely to change the average B-factor; - assuming that you used the proper model parameterization and refinement strategy given your model and data quality, I would just accept these values as a matter of fact. Pavel.
we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A.
The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!).
However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2). Obviously I'm not too happy with it.
Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups
So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.
Any ideas?
Hi,
I recently refined a structure at 3.7 A resolution using PHENIX and I got the following result.
R/R(free) rmsd in bond/angle B(ave) Outliers in
Ramachandran
0.2436/0.2822 0.011/0.920 260.12 1.23%
With different trials I found the B(ave) to vary between 230 to 280. Is this B(ave) is ok for pdb submission? What options I should try to reduce B(ave). Is there any chart showing acceptable range of B(ave) at different resolutions?
Thanks...
Raja
----- Original Message -----
From: Pavel Afonine
Hi Gino,
here are a few points:
- my understanding (please correct me if I'm wrong) is that the accuracy of Wilson B estimate drops with the resolution: lower the resolution, less accurate is the estimate;
- Wilson B is not a given calculated value - it's just an estimate;
- the total atomic B-factor includes the trace of overall anisotropic scale matrix (see Fmodel formula for the total model structure factor: Fmodel = scale_overall * exp(-h*U_overall*ht) * (Fcalc + k_sol * exp(-B_sol*s^2) * Fmask) ). You can try to disable this and see if this was the cause (use "apply_back_trace_of_b_cart=true" keyword for this).
- the things you "tried to resolve this discrepancy" will unlikely to change the average B-factor;
- assuming that you used the proper model parameterization and refinement strategy given your model and data quality, I would just accept these values as a matter of fact.
Pavel.
we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A.
The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!). However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2). Obviously I'm not too happy with it.
Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups
So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.
Any ideas?
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Perhaps worth pointing out that that Wilson B is the based on the assumption of randomly distributed atoms. This is not at all how proteins are, and in particular secondary structures give a preponderance of spacings in the 4 angstrom-ish region and a peak of mean intensity in these shells. For this reason the apparent fall-off in the Wilson in this resolution range is steeper as you are falling down off of the peak due to this seondary structure giving favoured spacings that produce a deviation from randomness in this resolution range. So it will be dependent on the secondary structure of the an individual protein . So the Wilson gets about right when you deal with spacings that tend to be unbiased by secondary structure - which unfortunately is the bit that is missing in the low resolution crystal data. Wilson fall off in low resolution looks steep because the random assumption is invalid.
Maybe you can guess the secondary content of your protein from where the bump is in the Wilson plot - beta gives a bulge in the 4 ang region - alpha in the 5 to 9 ang region.
all the best
Martyn
Martyn Symmons
Cambridge
----- Original Message ----
From: Pavel Afonine
we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A.
The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!). However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2). Obviously I'm not too happy with it.
Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups
So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.
Any ideas?
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
It's also true that the Wilson B calculation assumes that all the B factors in the crystal are the same - which is also far from the true in most macromolecular crystals. A person who holds to the practice of aggressively building water molecules and loops will create a model with a higher average B than one who uses the same data but is more restrained. The Wilson B will, of course, be unchanged. If you have a protein that is equally ordered throughout and you do not build in weak water molecules and the crystal diffracts high enough to allow a reasonably accurate calculation of the Wilson B, your average B and and the Wilson B should be close to each other. If your protein has mobile loops, which most lower resolution crystals do (and most higher resolution crystals for that matter) then your average B will be larger than your Wilson B. Since the Wilson B and the average B are such different quantities I don't believe there are any useful conclusions that can be made by comparing the two. If you want to see if your model is consistent with your Wilson B you should calculate F squared values from it and calculate a Wilson B from those. If the calculated Wilson B doesn't match the observed Wilson B your model has a serious problem, but I expect that every refinement program will produce models that match quite closely - even models that are quite wrong will at least match the Wilson B. A discrepancy between calculated and observed Wilson B causes a horrible increase in R values (both kinds) which is easily fixed by the refinement program adjusting whatever B values are being refined. Dale Tronrud On 04/21/10 15:01, MARTYN SYMMONS wrote:
Perhaps worth pointing out that that Wilson B is the based on the assumption of randomly distributed atoms. This is not at all how proteins are, and in particular secondary structures give a preponderance of spacings in the 4 angstrom-ish region and a peak of mean intensity in these shells. For this reason the apparent fall-off in the Wilson in this resolution range is steeper as you are falling down off of the peak due to this seondary structure giving favoured spacings that produce a deviation from randomness in this resolution range. So it will be dependent on the secondary structure of the an individual protein . So the Wilson gets about right when you deal with spacings that tend to be unbiased by secondary structure - which unfortunately is the bit that is missing in the low resolution crystal data. Wilson fall off in low resolution looks steep because the random assumption is invalid.
Maybe you can guess the secondary content of your protein from where the bump is in the Wilson plot - beta gives a bulge in the 4 ang region - alpha in the 5 to 9 ang region.
all the best Martyn
Martyn Symmons Cambridge
----- Original Message ---- From: Pavel Afonine
To: [email protected]; PHENIX user mailing list Sent: Wednesday, 21 April, 2010 18:56:47 Subject: Re: [phenixbb] model vs Wilson b-factor Hi Gino,
here are a few points:
- my understanding (please correct me if I'm wrong) is that the accuracy of Wilson B estimate drops with the resolution: lower the resolution, less accurate is the estimate;
- Wilson B is not a given calculated value - it's just an estimate;
- the total atomic B-factor includes the trace of overall anisotropic scale matrix (see Fmodel formula for the total model structure factor: Fmodel = scale_overall * exp(-h*U_overall*ht) * (Fcalc + k_sol * exp(-B_sol*s^2) * Fmask) ). You can try to disable this and see if this was the cause (use "apply_back_trace_of_b_cart=true" keyword for this).
- the things you "tried to resolve this discrepancy" will unlikely to change the average B-factor;
- assuming that you used the proper model parameterization and refinement strategy given your model and data quality, I would just accept these values as a matter of fact.
Pavel.
we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A.
The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!). However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2). Obviously I'm not too happy with it.
Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups
So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.
Any ideas?
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Hi, Although your resolution is low, I was wondering if you have tried adding hydrogen during refinement, this sometimes makes a difference although with high resolution data mainly. Shya
Hi all,
we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A.
The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!).
However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2). Obviously I'm not too happy with it.
Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups
So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.
Any ideas?
Thanks in advance,
Gino ****************************************************************************** Gino Cingolani, Ph.D. Associate Professor Thomas Jefferson University Dept. of Biochemistry & Molecular Biology 233 South 10th Street - Room 826 Philadelphia PA 19107 Office (215) 503 4573 Lab (215) 503 4595 Fax (215) 923 2117 E-mail: [email protected] ****************************************************************************** "Nati non foste per viver come bruti, ma per seguir virtute e canoscenza" ("You were not born to live like brutes, but to follow virtue and knowledge") Dante, The Divine Comedy (Inferno, XXVI, vv. 119-120) _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Ofcourse, I did both both geometry and shape correction using adding-hydrogen and tls respectively.
Raja
----- Original Message -----
From: [email protected]
Date: Wednesday, April 21, 2010 11:22 am
Subject: Re: [phenixbb] model vs Wilson b-factor
To: [email protected], PHENIX user mailing list
Hi, Although your resolution is low, I was wondering if you have tried addinghydrogen during refinement, this sometimes makes a difference althoughwith high resolution data mainly. Shya
Hi all,
we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A.
The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!).
However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2).> Obviously I'm not too happy with it.
Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups
So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.
Any ideas?
Thanks in advance,
Gino
******************************************************************************> Gino Cingolani, Ph.D.
Associate Professor Thomas Jefferson University Dept. of Biochemistry & Molecular Biology 233 South 10th Street - Room 826 Philadelphia PA 19107 Office (215) 503 4573 Lab (215) 503 4595 Fax (215) 923 2117 E-mail: [email protected]
******************************************************************************> "Nati non foste per viver come bruti, ma per seguir virtute e canoscenza"
("You were not born to live like brutes, but to follow virtue and knowledge") Dante, The Divine Comedy (Inferno, XXVI, vv. 119-120) _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
A couple of questions: - How did you calculate the Wilson B-factor? - What does the Wilson plot look like (can be seen in the Phenix GUI using xtriage) On Apr 21, 2010, at 10:42 AM, Gino Cingolani wrote:
Hi all,
we've solved a large structure (~20,000 residues/asymm unit), with 4- fold ncs and diffraction data to 3.3A.
The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!).
However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2). Obviously I'm not too happy with it.
Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups
So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.
Any ideas?
Thanks in advance,
Gino ****************************************************************************** Gino Cingolani, Ph.D. Associate Professor Thomas Jefferson University Dept. of Biochemistry & Molecular Biology 233 South 10th Street - Room 826 Philadelphia PA 19107 Office (215) 503 4573 Lab (215) 503 4595 Fax (215) 923 2117 E-mail: [email protected] ****************************************************************************** "Nati non foste per viver come bruti, ma per seguir virtute e canoscenza" ("You were not born to live like brutes, but to follow virtue and knowledge") Dante, The Divine Comedy (Inferno, XXVI, vv. 119-120) _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- Paul Adams Acting Division Director, Physical Biosciences Division, Lawrence Berkeley Lab Adjunct Professor, Department of Bioengineering, U.C. Berkeley Vice President for Technology, the Joint BioEnergy Institute Head, Berkeley Center for Structural Biology Building 64, Room 248 Tel: 1-510-486-4225, Fax: 1-510-486-5909 http://cci.lbl.gov/paul Lawrence Berkeley Laboratory 1 Cyclotron Road BLDG 64R0121 Berkeley, CA 94720, USA. Executive Assistant: Patty Jimenez [ [email protected] ] [ 1-510-486-7963 ] --
Dear Gino, *Sometimes* ignoring the strict B-factor police is a valid option. Much of the time, it comes to building in weaker density vs not building, and it is a subjective call. I bet if you removed all weaker-density loops, your average B-factor would decrease, but you probably would have gained nothing, and not improved your model at all (you would probably make it worse, actually). And as Pavel pointed out, at >3 Angstroms, Wilson values are not so accurate. I would also look at phenix.polygon results. By the way, I have a related question for Pavel. When you guys were culling the PDB for structures, was there any attention paid to correct the ones that have residual B-factors in the PDB file? (Urzhumtseva, 2009 is not clear on this) I have found that a structure which was refined by refmac and deposited in 2005, where the residual B-factors are in the B column. This used to be a fairly common practice, before it was (thankfully!) settled by PDB that the B column should contain the full B factor (the way Phenix outputs by default). I still sometimes read articles where people report an average B-factor of 8 for a 2.2 Angstrom structure, and I know what is going on! I am asking this, because a few structures I solved recently to ~2 Angstrom resolution were all closer to the upper range of observed structures according to POLYGON, so I thought this could possibly be one reason. Engin On 4/21/10 10:42 AM, Gino Cingolani wrote:
Hi all,
we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A.
The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!).
However, my refined model b-factor (~130A2) is>> Wilson b-factor (~80A2). Obviously I'm not too happy with it.
Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups
So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.
Any ideas?
Thanks in advance,
Gino ****************************************************************************** Gino Cingolani, Ph.D. Associate Professor Thomas Jefferson University Dept. of Biochemistry& Molecular Biology 233 South 10th Street - Room 826 Philadelphia PA 19107 Office (215) 503 4573 Lab (215) 503 4595 Fax (215) 923 2117 E-mail: [email protected] ****************************************************************************** "Nati non foste per viver come bruti, ma per seguir virtute e canoscenza" ("You were not born to live like brutes, but to follow virtue and knowledge") Dante, The Divine Comedy (Inferno, XXVI, vv. 119-120) _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- Engin Özkan Post-doctoral Scholar Howard Hughes Medical Institute Dept of Molecular and Cellular Physiology 279 Campus Drive, Beckman Center B173 Stanford School of Medicine Stanford, CA 94305 ph: (650)-498-7111
By the way, I have a related question for Pavel. When you guys were culling the PDB for structures, was there any attention paid to correct the ones that have residual B-factors in the PDB file?
Yes, it's taken care of automatically by phenix.model_vs_data: it automatically recognizes if a PDB file contains residual or total B-factors. Twinning is takes care of too. There will be paper about it soon. The results of phenix.model_vs_data run for "all" PDB entries forms the database for POLYGON.
I am asking this, because a few structures I solved recently to ~2 Angstrom resolution were all closer to the upper range of observed structures according to POLYGON, so I thought this could possibly be one reason.
If you tell me which structures concern you then I will check. Pavel.
It is reassuring to know that you guys took care of that. I will be looking forward to the publication. Thanks, Engin On 4/21/10 12:50 PM, Pavel Afonine wrote:
By the way, I have a related question for Pavel. When you guys were culling the PDB for structures, was there any attention paid to correct the ones that have residual B-factors in the PDB file?
Yes, it's taken care of automatically by phenix.model_vs_data: it automatically recognizes if a PDB file contains residual or total B-factors. Twinning is takes care of too. There will be paper about it soon. The results of phenix.model_vs_data run for "all" PDB entries forms the database for POLYGON.
I am asking this, because a few structures I solved recently to ~2 Angstrom resolution were all closer to the upper range of observed structures according to POLYGON, so I thought this could possibly be one reason.
If you tell me which structures concern you then I will check.
Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- Engin Özkan Post-doctoral Scholar Howard Hughes Medical Institute Dept of Molecular and Cellular Physiology 279 Campus Drive, Beckman Center B173 Stanford School of Medicine Stanford, CA 94305 ph: (650)-498-7111
participants (8)
-
Dale Tronrud
-
Engin Ozkan
-
Gino Cingolani
-
MARTYN SYMMONS
-
Paul Adams
-
Pavel Afonine
-
Raja Dey
-
sbiswas2@ncsu.edu