On Tue, Aug 12, 2014 at 4:14 PM, Michael Thompson <miket@chem.ucla.edu> wrote:
I'm interested in doing some comparisons of random half data sets, inspired by statistics like CC1/2, CC*, etc. Does phenix contain some tool to split unmerged data into 2 random sets? Thought I would ask before trying to write my own script.

Didn't see this before I replied off-list to your ccp4bb post, but for the record:

You can do this very easily with CCTBX, for instance:

from iotbx.file_reader import any_file
from cctbx import miller
from scitbx.array_family import flex
hkl_in = any_file("data.hkl")
i_obs = hkl_in.file_object.as_miller_arrays(merge_equivalents=False)
i_obs = i_obs.select(i_obs.sigmas() > 0) # filter out bad sigmas
if (not keep_friedel_pairs_separate) :
  i_obs = i_obs.as_non_anomalous_array().map_to_asu()
split_datasets = miller.split_unmerged(
      unmerged_indices=i_obs.indices(),
      unmerged_data=i_obs.data(),
      unmerged_sigmas=i_obs.sigmas())
data_1 = split_datasets.data_1
data_2 = split_datasets.data_2
cc = flex.linear_correlation(data_1, data_2).coefficient()

(Note that if you use an unmerged Scalepack file, you may need to supply the unit cell information separately since the format is broken.)

Note that this is already built in to the Miller array class, i.e. this would work:

cc = i_obs.cc_one_half(anomalous_flag= keep_friedel_pairs_separate)

or this:

cc_anom = i_obs.cc_anom()

although it is somewhat inefficient to keep doing the splitting.

-Nat