[phenixbb] Cross-validation when test set is miniscule
Pavel Afonine
pafonine at lbl.gov
Fri Dec 19 08:50:44 PST 2014
One more item I forgot to mention: if necessary you may want to do
weight optimization.
Pavel
On 12/19/14 8:38 AM, Pavel Afonine wrote:
> Hi Derek,
>
> choosing 5% for free set is not a dogma. I always use 10% and that's
> what CNS was doing for years. In your case this will make 200. Not a
> whole lot but better than 100.
>
> You can generate several (say 10-50) different test sets and
> independently refine the model against each of them (from the very
> beginning). Then make a note of differences (in model, R-factors).
> Those differences will be uncertainties likely due to different test
> sets used.
>
> I realize it may be tedious to do 10-50 refinements per each model
> parametrization and refinement strategy that you want to test. In this
> case I would simply reduce choices down to most reasonable given the
> resolution and model quality:
>
> - use individual B-factor refinement. With type of restraints we have
> it is ok to do in most cases. Switch to group B refinement only if you
> have strong reasons to believe that individual B refinement isn't good
> for your case.
> - Use torsion NCS;
> - Use Ramachandran plot restraints only to keep (preserve) good
> conformations during refinement, not to fix bad ones (outliers). That
> is: in case of outlier, for it manually first then refine with
> Ramachandran restraints so that it does not become outlier again.
> - If you have a higher resolution good model, you can use it as a
> reference model, if needed.
>
> In future we will investigate using ideas recently published in Acta D
> that suggest ways to overcome the problem of too small test sets.
>
> Pavel
>
>
> On 12/19/14 3:18 AM, Derek Logan wrote:
>> Hi everyone,
>>
>> Right now we have one of those very difficult Rfree situations where
>> it's impossible to generate a single meaningful Rfree set. Since
>> we're in a bit of a hurry with this structure it would be good if
>> someone could point me in the right direction. We have crystals with
>> 1542 non-H atoms in the asymmetric unit that diffract to only 3.6 Å
>> in P65, which gives us a whopping 2300 reflections in total. 5% of
>> this is only about 100 reflections. Luckily the protein is only a
>> single point mutation of a wild type that has been solved to much
>> better resolution, so we know what it should look like and I simply
>> want to investigate the effect of different levels of conservatism in
>> the refinement, e.g. NCS in xyz and B, group B-factors, reference
>> model, Ramachandran restraints etc. However since the quality
>> criterion for this is Rfree I'm not able to do this.
>>
>> I believe the correct approach is k-fold statistical
>> cross-validation, but can someone remind me of the correct way to do
>> this? I've done a bit of Googling without finding anything very helpful.
>>
>> Thanks
>> Derek
>> ________________________________________________________________________
>> Derek Logan tel: +46 46 222 1443
>> Associate Professor mob: +46 76 8585 707
>> Dept. of Biochemistry and Structural Biology www.cmps.lu.se
>> <http://www.cmps.lu.se>
>> Centre for Molecular Protein Science www.maxlab.lu.se/crystal
>> Lund University, Box 124, 221 00 Lund, Sweden www.saromics.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> phenixbb mailing list
>> phenixbb at phenix-online.org
>> http://phenix-online.org/mailman/listinfo/phenixbb
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phenix-online.org/pipermail/phenixbb/attachments/20141219/b9e34976/attachment.htm>
More information about the phenixbb
mailing list