model vs Wilson b-factor

Gino Cingolani

21 Apr 2010 21 Apr '10

7:42 p.m.

Hi all, we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A. The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!). However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2). Obviously I'm not too happy with it. Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%. Any ideas? Thanks in advance, Gino ****************************************************************************** Gino Cingolani, Ph.D. Associate Professor Thomas Jefferson University Dept. of Biochemistry & Molecular Biology 233 South 10th Street - Room 826 Philadelphia PA 19107 Office (215) 503 4573 Lab (215) 503 4595 Fax (215) 923 2117 E-mail: [email protected] ****************************************************************************** "Nati non foste per viver come bruti, ma per seguir virtute e canoscenza" ("You were not born to live like brutes, but to follow virtue and knowledge") Dante, The Divine Comedy (Inferno, XXVI, vv. 119-120)

Show replies by date

Pavel Afonine

21 Apr 21 Apr

7:56 p.m.

Hi Gino, here are a few points: - my understanding (please correct me if I'm wrong) is that the accuracy of Wilson B estimate drops with the resolution: lower the resolution, less accurate is the estimate; - Wilson B is not a given calculated value - it's just an estimate; - the total atomic B-factor includes the trace of overall anisotropic scale matrix (see Fmodel formula for the total model structure factor: Fmodel = scale_overall * exp(-h*U_overall*ht) * (Fcalc + k_sol * exp(-B_sol*s^2) * Fmask) ). You can try to disable this and see if this was the cause (use "apply_back_trace_of_b_cart=true" keyword for this). - the things you "tried to resolve this discrepancy" will unlikely to change the average B-factor; - assuming that you used the proper model parameterization and refinement strategy given your model and data quality, I would just accept these values as a matter of fact. Pavel.

...

we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A.

The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!).

However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2). Obviously I'm not too happy with it.

Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups

So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.

Any ideas?

Raja Dey

8:19 p.m.

Hi, I recently refined a structure at 3.7 A resolution using PHENIX and I got the following result. R/R(free) rmsd in bond/angle B(ave) Outliers in Ramachandran 0.2436/0.2822 0.011/0.920 260.12 1.23% With different trials I found the B(ave) to vary between 230 to 280. Is this B(ave) is ok for pdb submission? What options I should try to reduce B(ave). Is there any chart showing acceptable range of B(ave) at different resolutions? Thanks... Raja ----- Original Message ----- From: Pavel Afonine Date: Wednesday, April 21, 2010 11:00 am Subject: Re: [phenixbb] model vs Wilson b-factor To: [email protected], PHENIX user mailing list

...

Hi Gino,

here are a few points:

- my understanding (please correct me if I'm wrong) is that the accuracy of Wilson B estimate drops with the resolution: lower the resolution, less accurate is the estimate;

- Wilson B is not a given calculated value - it's just an estimate;

- the total atomic B-factor includes the trace of overall anisotropic scale matrix (see Fmodel formula for the total model structure factor: Fmodel = scale_overall * exp(-h*U_overall*ht) * (Fcalc + k_sol * exp(-B_sol*s^2) * Fmask) ). You can try to disable this and see if this was the cause (use "apply_back_trace_of_b_cart=true" keyword for this).

- the things you "tried to resolve this discrepancy" will unlikely to change the average B-factor;

- assuming that you used the proper model parameterization and refinement strategy given your model and data quality, I would just accept these values as a matter of fact.

Pavel.

...
we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A.

The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!). However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2). Obviously I'm not too happy with it.

Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups

So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.

Any ideas?

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

MARTYN SYMMONS

22 Apr 22 Apr

12:01 a.m.

Perhaps worth pointing out that that Wilson B is the based on the assumption of randomly distributed atoms. This is not at all how proteins are, and in particular secondary structures give a preponderance of spacings in the 4 angstrom-ish region and a peak of mean intensity in these shells. For this reason the apparent fall-off in the Wilson in this resolution range is steeper as you are falling down off of the peak due to this seondary structure giving favoured spacings that produce a deviation from randomness in this resolution range. So it will be dependent on the secondary structure of the an individual protein . So the Wilson gets about right when you deal with spacings that tend to be unbiased by secondary structure - which unfortunately is the bit that is missing in the low resolution crystal data. Wilson fall off in low resolution looks steep because the random assumption is invalid. Maybe you can guess the secondary content of your protein from where the bump is in the Wilson plot - beta gives a bulge in the 4 ang region - alpha in the 5 to 9 ang region. all the best Martyn Martyn Symmons Cambridge ----- Original Message ---- From: Pavel Afonine To: [email protected]; PHENIX user mailing list Sent: Wednesday, 21 April, 2010 18:56:47 Subject: Re: [phenixbb] model vs Wilson b-factor Hi Gino, here are a few points: - my understanding (please correct me if I'm wrong) is that the accuracy of Wilson B estimate drops with the resolution: lower the resolution, less accurate is the estimate; - Wilson B is not a given calculated value - it's just an estimate; - the total atomic B-factor includes the trace of overall anisotropic scale matrix (see Fmodel formula for the total model structure factor: Fmodel = scale_overall * exp(-h*U_overall*ht) * (Fcalc + k_sol * exp(-B_sol*s^2) * Fmask) ). You can try to disable this and see if this was the cause (use "apply_back_trace_of_b_cart=true" keyword for this). - the things you "tried to resolve this discrepancy" will unlikely to change the average B-factor; - assuming that you used the proper model parameterization and refinement strategy given your model and data quality, I would just accept these values as a matter of fact. Pavel.

...

we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A.

The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!). However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2). Obviously I'm not too happy with it.

Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups

So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.

Any ideas?

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Dale Tronrud

7:02 p.m.

It's also true that the Wilson B calculation assumes that all the B factors in the crystal are the same - which is also far from the true in most macromolecular crystals. A person who holds to the practice of aggressively building water molecules and loops will create a model with a higher average B than one who uses the same data but is more restrained. The Wilson B will, of course, be unchanged. If you have a protein that is equally ordered throughout and you do not build in weak water molecules and the crystal diffracts high enough to allow a reasonably accurate calculation of the Wilson B, your average B and and the Wilson B should be close to each other. If your protein has mobile loops, which most lower resolution crystals do (and most higher resolution crystals for that matter) then your average B will be larger than your Wilson B. Since the Wilson B and the average B are such different quantities I don't believe there are any useful conclusions that can be made by comparing the two. If you want to see if your model is consistent with your Wilson B you should calculate F squared values from it and calculate a Wilson B from those. If the calculated Wilson B doesn't match the observed Wilson B your model has a serious problem, but I expect that every refinement program will produce models that match quite closely - even models that are quite wrong will at least match the Wilson B. A discrepancy between calculated and observed Wilson B causes a horrible increase in R values (both kinds) which is easily fixed by the refinement program adjusting whatever B values are being refined. Dale Tronrud On 04/21/10 15:01, MARTYN SYMMONS wrote:

...

Perhaps worth pointing out that that Wilson B is the based on the assumption of randomly distributed atoms. This is not at all how proteins are, and in particular secondary structures give a preponderance of spacings in the 4 angstrom-ish region and a peak of mean intensity in these shells. For this reason the apparent fall-off in the Wilson in this resolution range is steeper as you are falling down off of the peak due to this seondary structure giving favoured spacings that produce a deviation from randomness in this resolution range. So it will be dependent on the secondary structure of the an individual protein . So the Wilson gets about right when you deal with spacings that tend to be unbiased by secondary structure - which unfortunately is the bit that is missing in the low resolution crystal data. Wilson fall off in low resolution looks steep because the random assumption is invalid.

Maybe you can guess the secondary content of your protein from where the bump is in the Wilson plot - beta gives a bulge in the 4 ang region - alpha in the 5 to 9 ang region.

all the best Martyn

Martyn Symmons Cambridge

----- Original Message ---- From: Pavel Afonine To: [email protected]; PHENIX user mailing list Sent: Wednesday, 21 April, 2010 18:56:47 Subject: Re: [phenixbb] model vs Wilson b-factor

Hi Gino,

here are a few points:

- my understanding (please correct me if I'm wrong) is that the accuracy of Wilson B estimate drops with the resolution: lower the resolution, less accurate is the estimate;

- Wilson B is not a given calculated value - it's just an estimate;

- the total atomic B-factor includes the trace of overall anisotropic scale matrix (see Fmodel formula for the total model structure factor: Fmodel = scale_overall * exp(-h*U_overall*ht) * (Fcalc + k_sol * exp(-B_sol*s^2) * Fmask) ). You can try to disable this and see if this was the cause (use "apply_back_trace_of_b_cart=true" keyword for this).

- the things you "tried to resolve this discrepancy" will unlikely to change the average B-factor;

- assuming that you used the proper model parameterization and refinement strategy given your model and data quality, I would just accept these values as a matter of fact.

Pavel.

...
we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A.

The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!). However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2). Obviously I'm not too happy with it.

Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups

So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.

Any ideas?

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

sbiswas2＠ncsu.edu

21 Apr 21 Apr

8:22 p.m.

Hi, Although your resolution is low, I was wondering if you have tried adding hydrogen during refinement, this sometimes makes a difference although with high resolution data mainly. Shya

...

Hi all,

we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A.

The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!).

However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2). Obviously I'm not too happy with it.

Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups

So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.

Any ideas?

Thanks in advance,

Gino ****************************************************************************** Gino Cingolani, Ph.D. Associate Professor Thomas Jefferson University Dept. of Biochemistry & Molecular Biology 233 South 10th Street - Room 826 Philadelphia PA 19107 Office (215) 503 4573 Lab (215) 503 4595 Fax (215) 923 2117 E-mail: [email protected] ****************************************************************************** "Nati non foste per viver come bruti, ma per seguir virtute e canoscenza" ("You were not born to live like brutes, but to follow virtue and knowledge") Dante, The Divine Comedy (Inferno, XXVI, vv. 119-120) _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Raja Dey

8:27 p.m.

Ofcourse, I did both both geometry and shape correction using adding-hydrogen and tls respectively. Raja ----- Original Message ----- From: [email protected] Date: Wednesday, April 21, 2010 11:22 am Subject: Re: [phenixbb] model vs Wilson b-factor To: [email protected], PHENIX user mailing list

...

Hi, Although your resolution is low, I was wondering if you have tried addinghydrogen during refinement, this sometimes makes a difference althoughwith high resolution data mainly. Shya

...
Hi all,

we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A.

The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!).

However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2).> Obviously I'm not too happy with it.

Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups

So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.

Any ideas?

Thanks in advance,

Gino

******************************************************************************> Gino Cingolani, Ph.D.

...
Associate Professor Thomas Jefferson University Dept. of Biochemistry & Molecular Biology 233 South 10th Street - Room 826 Philadelphia PA 19107 Office (215) 503 4573 Lab (215) 503 4595 Fax (215) 923 2117 E-mail: [email protected]

******************************************************************************> "Nati non foste per viver come bruti, ma per seguir virtute e canoscenza"

...
("You were not born to live like brutes, but to follow virtue and knowledge") Dante, The Divine Comedy (Inferno, XXVI, vv. 119-120) _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Paul Adams

8:30 p.m.

A couple of questions: - How did you calculate the Wilson B-factor? - What does the Wilson plot look like (can be seen in the Phenix GUI using xtriage) On Apr 21, 2010, at 10:42 AM, Gino Cingolani wrote:

...

Hi all,

we've solved a large structure (~20,000 residues/asymm unit), with 4- fold ncs and diffraction data to 3.3A.

The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!).

However, my refined model b-factor (~130A2) is >> Wilson b-factor (~80A2). Obviously I'm not too happy with it.

Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups

So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.

Any ideas?

Thanks in advance,

Gino ****************************************************************************** Gino Cingolani, Ph.D. Associate Professor Thomas Jefferson University Dept. of Biochemistry & Molecular Biology 233 South 10th Street - Room 826 Philadelphia PA 19107 Office (215) 503 4573 Lab (215) 503 4595 Fax (215) 923 2117 E-mail: [email protected] ****************************************************************************** "Nati non foste per viver come bruti, ma per seguir virtute e canoscenza" ("You were not born to live like brutes, but to follow virtue and knowledge") Dante, The Divine Comedy (Inferno, XXVI, vv. 119-120) _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

-- Paul Adams Acting Division Director, Physical Biosciences Division, Lawrence Berkeley Lab Adjunct Professor, Department of Bioengineering, U.C. Berkeley Vice President for Technology, the Joint BioEnergy Institute Head, Berkeley Center for Structural Biology Building 64, Room 248 Tel: 1-510-486-4225, Fax: 1-510-486-5909 http://cci.lbl.gov/paul Lawrence Berkeley Laboratory 1 Cyclotron Road BLDG 64R0121 Berkeley, CA 94720, USA. Executive Assistant: Patty Jimenez [ [email protected] ] [ 1-510-486-7963 ] --

Engin Ozkan

8:32 p.m.

Dear Gino, *Sometimes* ignoring the strict B-factor police is a valid option. Much of the time, it comes to building in weaker density vs not building, and it is a subjective call. I bet if you removed all weaker-density loops, your average B-factor would decrease, but you probably would have gained nothing, and not improved your model at all (you would probably make it worse, actually). And as Pavel pointed out, at >3 Angstroms, Wilson values are not so accurate. I would also look at phenix.polygon results. By the way, I have a related question for Pavel. When you guys were culling the PDB for structures, was there any attention paid to correct the ones that have residual B-factors in the PDB file? (Urzhumtseva, 2009 is not clear on this) I have found that a structure which was refined by refmac and deposited in 2005, where the residual B-factors are in the B column. This used to be a fairly common practice, before it was (thankfully!) settled by PDB that the B column should contain the full B factor (the way Phenix outputs by default). I still sometimes read articles where people report an average B-factor of 8 for a 2.2 Angstrom structure, and I know what is going on! I am asking this, because a few structures I solved recently to ~2 Angstrom resolution were all closer to the upper range of observed structures according to POLYGON, so I thought this could possibly be one reason. Engin On 4/21/10 10:42 AM, Gino Cingolani wrote:

...

Hi all,

we've solved a large structure (~20,000 residues/asymm unit), with 4-fold ncs and diffraction data to 3.3A.

The Rfree/Rfac is ~28%-24% with OK geometry with no major outliers in the Ramachandran plot. I would think I'm done (.. after 6 years!).

However, my refined model b-factor (~130A2) is>> Wilson b-factor (~80A2). Obviously I'm not too happy with it.

Here is what I tried to resolve this discrepancy: --> play with wxu_scale --> play with B-factor weight in ncs restraint (4-fold ncs) --> play with number of macrocycles --> Redefine tls groups

So far nothing really works, except switching from individual_adp to group_adp. However, this increases my Rfree by almost 3%.

Any ideas?

Thanks in advance,

Gino ****************************************************************************** Gino Cingolani, Ph.D. Associate Professor Thomas Jefferson University Dept. of Biochemistry& Molecular Biology 233 South 10th Street - Room 826 Philadelphia PA 19107 Office (215) 503 4573 Lab (215) 503 4595 Fax (215) 923 2117 E-mail: [email protected] ****************************************************************************** "Nati non foste per viver come bruti, ma per seguir virtute e canoscenza" ("You were not born to live like brutes, but to follow virtue and knowledge") Dante, The Divine Comedy (Inferno, XXVI, vv. 119-120) _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

-- Engin Özkan Post-doctoral Scholar Howard Hughes Medical Institute Dept of Molecular and Cellular Physiology 279 Campus Drive, Beckman Center B173 Stanford School of Medicine Stanford, CA 94305 ph: (650)-498-7111

Pavel Afonine

9:50 p.m.

...

By the way, I have a related question for Pavel. When you guys were culling the PDB for structures, was there any attention paid to correct the ones that have residual B-factors in the PDB file?

Yes, it's taken care of automatically by phenix.model_vs_data: it automatically recognizes if a PDB file contains residual or total B-factors. Twinning is takes care of too. There will be paper about it soon. The results of phenix.model_vs_data run for "all" PDB entries forms the database for POLYGON.

...

I am asking this, because a few structures I solved recently to ~2 Angstrom resolution were all closer to the upper range of observed structures according to POLYGON, so I thought this could possibly be one reason.

If you tell me which structures concern you then I will check. Pavel.

Engin Ozkan

10:57 p.m.

It is reassuring to know that you guys took care of that. I will be looking forward to the publication. Thanks, Engin On 4/21/10 12:50 PM, Pavel Afonine wrote:

...

...
By the way, I have a related question for Pavel. When you guys were culling the PDB for structures, was there any attention paid to correct the ones that have residual B-factors in the PDB file?

Yes, it's taken care of automatically by phenix.model_vs_data: it automatically recognizes if a PDB file contains residual or total B-factors. Twinning is takes care of too. There will be paper about it soon. The results of phenix.model_vs_data run for "all" PDB entries forms the database for POLYGON.

...
I am asking this, because a few structures I solved recently to ~2 Angstrom resolution were all closer to the upper range of observed structures according to POLYGON, so I thought this could possibly be one reason.

If you tell me which structures concern you then I will check.

Pavel.

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

5796

Age (days ago)

5797

Last active (days ago)

List overview

Download

10 comments

8 participants

participants (8)

Dale Tronrud
Engin Ozkan
Gino Cingolani
MARTYN SYMMONS
Paul Adams
Pavel Afonine
Raja Dey
sbiswas2＠ncsu.edu