[phenixbb] random half data sets

Tue Aug 12 14:15:06 PDT 2014

On Tue, Aug 12, 2014 at 4:14 PM, Michael Thompson <miket at chem.ucla.edu>
wrote:

> I'm interested in doing some comparisons of random half data sets,
> inspired by statistics like CC1/2, CC*, etc. Does phenix contain some tool
> to split unmerged data into 2 random sets? Thought I would ask before
> trying to write my own script.
>

Didn't see this before I replied off-list to your ccp4bb post, but for the
record:

You can do this very easily with CCTBX, for instance:

from iotbx.file_reader import any_file
from cctbx import miller
from scitbx.array_family import flex
hkl_in = any_file("data.hkl")
i_obs = hkl_in.file_object.as_miller_arrays(merge_equivalents=False)
i_obs = i_obs.select(i_obs.sigmas() > 0) # filter out bad sigmas
if (not keep_friedel_pairs_separate) :
  i_obs = i_obs.as_non_anomalous_array().map_to_asu()
split_datasets = miller.split_unmerged(
      unmerged_indices=i_obs.indices(),
      unmerged_data=i_obs.data(),
      unmerged_sigmas=i_obs.sigmas())
data_1 = split_datasets.data_1
data_2 = split_datasets.data_2
cc = flex.linear_correlation(data_1, data_2).coefficient()

(Note that if you use an unmerged Scalepack file, you may need to supply
the unit cell information separately since the format is broken.)

Note that this is already built in to the Miller array class, i.e. this
would work:

cc = i_obs.cc_one_half(anomalous_flag= keep_friedel_pairs_separate)

or this:

cc_anom = i_obs.cc_anom()

although it is somewhat inefficient to keep doing the splitting.

-Nat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phenix-online.org/pipermail/phenixbb/attachments/20140812/cc99435c/attachment.htm>