On Thu, Dec 6, 2012 at 2:35 PM, Douglas Theobald
Many have argued that we should include weak data in refinement --- e.g., reflections much weaker than I/sigI=2 --- in order to take advantage of the useful information found in large numbers of uncertain data points (like argued in the recent Karplus and Diederichs Science paper on CC1/2). This makes sense to me as long as the uncertainty attached to each HKL is properly accounted for. However, I was surprised to hear rumors that with phenix "the data are not properly weighted in refinement by incorporating observed sigmas" and such. I was wondering if the phenix developers could comment on the sanity of including weak data in phenix refinement, and on how phenix handles it.
As a supplement to what Pavel said: yes, phenix.refine does not use experimental sigmas in the refinement target. I am very hazy on the details, but it is not at all clear that this actually matters in refinement when maximum likelihood weighting is used; if there is a reference that argues otherwise I would be interested in seeing it. (I don't know what the least-squares target does - these are problematic even with experimental sigmas.) In practice I do not think it will be a problem to use data out to whatever resolution you feel appropriate - and it may indeed help - but of course your overall R-factors will be slightly higher, and the refinement will take much longer. Having experimented with this recently, my instinct would be to refine at a "traditional" resolution cutoff for as long as possible, and add the extra, weak data near the end. If I was working at a resolution that were borderline for automated building, I might be more aggressive. -Nat