scale_and_merge is a tool for scaling unmerged anomalous data or multiple data files and creating a scaled dataset and two scaled half-datasets. This tool normally is used in combination with anomalous_signal to create a scaled dataset and analyze anomalous signal in a SAD dataset. It can also be used to scale any other group of datasets.
scale_and_merge provides a summary half-dataset correlations in your dataset. Here is an example with very weak anomalous signal (but present):
Scale and merge...analysis of multi-dataset SAD data input\_files { data = "dd" data\_labels = None paired\_group\_ids = None } output\_files { output\_file = "scaled\_data.mtz" output\_half\_dataset\_a = "half\_dataset\_a.mtz" output\_half\_dataset\_b = "half\_dataset\_b.mtz" output\_file\_format = *mtz sca } crystal\_info { resolution = None low\_resolution = None space\_group = None unit\_cell = None } data\_selection { minimum\_datafile\_fraction = 0.3 require\_fpfm = True only\_similar\_datasets = True relative\_length\_tolerance = None absolute\_angle\_tolerance = None choose\_optimal\_datasets = False sort\_datasets\_by\_anomalous\_cc = True } scaling { make\_anisotropy\_uniform = True overallscale = False skip\_scaling = False lowest\_resolution\_range = 6 } merging { optimize\_anomalous = True use\_best\_group\_as\_target = False rescale\_sigmas = False } half\_dataset\_cc { get\_half\_dataset\_cc = True half\_dataset\_cc\_by\_files = True split\_as\_first\_second\_half = True split\_alternately = True split\_randomly = True } directories { temp\_dir = None output\_dir = "" gui\_output\_dir = None } control { verbose = False random\_seed = 714215 comparison\_file = None clean\_up = True } Comparing crystal symmetries and noting anisotropy in data Relative length tolerance: 0.010 Angle tolerance: 1.000 degrees Taking initial symmetry from /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w2\_2.9\_90.HKL with 3 similar files File SG A B C Alpha Beta Gamma Symmetry group 1 WNV\_NS1\_202\_w1\_2.9\_90.HKL P 3 2 1 166.997 166.997 94.156 90.000 90.000 120.000 WNV\_NS1\_202\_w2\_2.9\_90.HKL P 3 2 1 166.704 166.704 94.113 90.000 90.000 120.000 WNV\_NS1\_205\_w1\_2.9\_90.HKL P 3 2 1 167.578 167.578 93.934 90.000 90.000 120.000 WNV\_NS1\_205\_w2\_2.9\_90.HKL P 3 2 1 167.639 167.639 93.859 90.000 90.000 120.000 Scaling 4 datasets Scaling data files with local scaling Files to scale: /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_202\_w1\_2.9\_90.HKL /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_202\_w2\_2.9\_90.HKL /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w1\_2.9\_90.HKL /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w2\_2.9\_90.HKL Selecting just reflections that have both F+ and F- in the same dataset or are centric Splitting datafiles into sub-files with one copy of each unique hkl File Refl (rejected) B1 B2 B3 B-avg TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1.sca: 138180 ( 6378) 63.9 63.9 55.00 60.9 TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_2.sca: 229 ( 0) 63.9 63.9 55.00 60.9 TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1.sca: 136254 ( 6096) 59.4 59.4 49.78 56.2 TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_2.sca: 206 ( 0) 59.4 59.4 49.78 56.2 TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1.sca: 148714 ( 4464) 73.7 73.7 63.92 70.4 TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_2.sca: 239 ( 0) 73.7 73.7 63.92 70.4 TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_1.sca: 147256 ( 4560) 70.3 70.3 60.50 67.0 TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_2.sca: 241 ( 0) 70.3 70.3 60.50 67.0 Notes: Splitting /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_202\_w1\_2.9\_90.HKL into 2 files: Splitting /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_202\_w2\_2.9\_90.HKL into 2 files: Splitting /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w1\_2.9\_90.HKL into 2 files: Splitting /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w2\_2.9\_90.HKL into 2 files: Keeping split datafiles with at least 44614 reflections High-resolution limit: 2.89 Scaling data in batches from individual data files List of scaled data files: TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1\_scale.mtz (30844 refl) TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_1\_scale.mtz (30828 refl) TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1\_scale.mtz (29641 refl) TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1\_scale.mtz (29527 refl) Scaling and merging data files with overall scale factor Files to scale: TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1\_scale.mtz TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_1\_scale.mtz TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1\_scale.mtz TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1\_scale.mtz ****** Putting all data on common scale ****** Standard dataset: 1 I/sigma: 8.30 Nrefl: 59133 Total of 4 datasets to be used Scale factors for data groups: ID Scale file\_name 1 1.000 TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1\_scale.mtz 2 0.683 TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_1\_scale.mtz 3 1.198 TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1\_scale.mtz 4 1.366 TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1\_scale.mtz Mean I of scaled unmerged datasets: 1: 9940.78 (N=59133) 2: 9926.98 (N=59085) 3: 10203.75 (N=56663) 4: 10305.51 (N=56463) Getting overall merged dataset using original sigmas Datasets to be merged in estimation of variances: 1 2 3 4 Datasets to be merged in final merging step: 1 2 3 4 Getting scale factors, dataset variances and scaled intensities Merged mean I: 8966.65 (N=64581) Dataset variances (RMS difference from target dataset after accounting for sigmas) Dataset Resolution 1 2 3 4 48.40 - 4.94 2923.90 2353.89 6175.92 8461.67 4.94 - 3.92 1348.27 1407.39 1191.09 1267.29 3.92 - 3.43 0.00 0.00 0.00 0.00 3.43 - 3.11 0.00 0.00 0.00 0.00 3.11 - 2.89 0.00 0.00 0.00 0.00 ALL 2515.66 1935.84 3450.29 5499.59 Getting merged dataset including dataset variances Dataset correlations with merged dataset Dataset Resolution 1 2 3 4 48.40 - 4.94 1.00 1.00 1.00 0.99 4.94 - 3.92 1.00 1.00 1.00 0.99 3.92 - 3.43 0.99 0.99 0.98 0.97 3.43 - 3.11 0.93 0.92 0.87 0.85 3.11 - 2.89 0.84 0.94 0.69 0.55 ALL 1.00 1.00 1.00 0.99 New merged mean I: 9035.38 (N=64581) and I/sigma: 10.89 NOTE: I/sigma cannot be directly compared to original due to including estimates of dataset variances and changes in number of reflections. Optimizing anomalous differences Merged mean anomalous difference: -3.48 (N=31009) Dataset variances (anomalous differences) (RMS difference from target dataset after accounting for sigmas) Dataset Resolution 1 2 3 4 47.38 - 4.94 0.00 1652.13 993.28 0.00 4.94 - 3.92 732.44 0.00 503.86 0.00 3.92 - 3.43 690.71 0.00 0.00 0.00 3.43 - 3.11 272.68 0.00 0.00 0.00 3.11 - 2.89 0.00 0.00 0.00 0.00 ALL 468.90 0.00 435.50 0.00 Getting merged dataset including dataset variances Dataset correlations with merged dataset (anomalous differences) Dataset Resolution 1 2 3 4 47.38 - 4.94 0.77 0.79 0.69 0.72 4.94 - 3.92 0.65 0.73 0.64 0.67 3.92 - 3.43 0.70 0.70 0.63 0.66 3.43 - 3.11 0.72 0.71 0.70 0.66 3.11 - 2.89 0.73 0.81 0.71 0.78 ALL 0.73 0.75 0.67 0.70 Anom correlation on I of std merged and anom scaled: 0.83 (N=31009) Merged scaled data optimized for anomalous differences: scaled\_data.mtz ============================================================================== Splitting data into groups for half-dataset CC Keeping datafiles intact within each half-dataset Half-dataset groups: Group A TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1\_scale.mtz TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1\_scale.mtz Group B TEMP0/WNV\n_NS1\_205\_w2\_2.9\_90\_1\_scale.mtz TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1\_scale.mtz ---- Calculating anomalous CC between half-datasets A,B---- Half-dataset A: half\_dataset\_a.mtz Half-dataset B: half\_dataset\_b.mtz Unique reflections: half\_dataset\_a.mtz: 30242 half\_dataset\_b.mtz: 30222 Reflections in common: 29455 Overall resolution: 2.90 A Nrefl: 29455 Overall anomalous correlation: -0.018 Anomalous correlation with varying high-resolution limits d\_min ---- CC ------ ---- N ----- Shell Cumulative Shell Cumulative 6.00 0.080 0.080 3392 3392 5.50 -0.060 0.043 1052 4444 5.00 -0.115 -0.006 1519 5963 4.50 -0.040 -0.018 2256 8219 4.00 -0.065 -0.036 3540 11759 3.50 -0.016 -0.026 5862 17621 3.00 -0.009 -0.018 9997 27618 2.90 -0.022 -0.018 1837 29455