scale_and_merge is a tool for scaling unmerged anomalous data or multiple data files and creating a scaled dataset and two scaled half-datasets. This tool normally is used in combination with anomalous_signal to create a scaled dataset and analyze anomalous signal in a SAD dataset. It can also be used to scale any other group of datasets.
scale_and_merge provides a summary half-dataset correlations in your dataset. Here is an example with very weak anomalous signal (but present):
Scale and merge...analysis of multi-dataset SAD data
input\_files {
data = "dd"
data\_labels = None
paired\_group\_ids = None
}
output\_files {
output\_file = "scaled\_data.mtz"
output\_half\_dataset\_a = "half\_dataset\_a.mtz"
output\_half\_dataset\_b = "half\_dataset\_b.mtz"
output\_file\_format = *mtz sca
}
crystal\_info {
resolution = None
low\_resolution = None
space\_group = None
unit\_cell = None
}
data\_selection {
minimum\_datafile\_fraction = 0.3
require\_fpfm = True
only\_similar\_datasets = True
relative\_length\_tolerance = None
absolute\_angle\_tolerance = None
choose\_optimal\_datasets = False
sort\_datasets\_by\_anomalous\_cc = True
}
scaling {
make\_anisotropy\_uniform = True
overallscale = False
skip\_scaling = False
lowest\_resolution\_range = 6
}
merging {
optimize\_anomalous = True
use\_best\_group\_as\_target = False
rescale\_sigmas = False
}
half\_dataset\_cc {
get\_half\_dataset\_cc = True
half\_dataset\_cc\_by\_files = True
split\_as\_first\_second\_half = True
split\_alternately = True
split\_randomly = True
}
directories {
temp\_dir = None
output\_dir = ""
gui\_output\_dir = None
}
control {
verbose = False
random\_seed = 714215
comparison\_file = None
clean\_up = True
}
Comparing crystal symmetries and noting anisotropy in data
Relative length tolerance: 0.010 Angle tolerance: 1.000 degrees
Taking initial symmetry from /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w2\_2.9\_90.HKL
with 3 similar files
File SG A B C Alpha Beta Gamma
Symmetry group 1
WNV\_NS1\_202\_w1\_2.9\_90.HKL P 3 2 1 166.997 166.997 94.156 90.000 90.000 120.000
WNV\_NS1\_202\_w2\_2.9\_90.HKL P 3 2 1 166.704 166.704 94.113 90.000 90.000 120.000
WNV\_NS1\_205\_w1\_2.9\_90.HKL P 3 2 1 167.578 167.578 93.934 90.000 90.000 120.000
WNV\_NS1\_205\_w2\_2.9\_90.HKL P 3 2 1 167.639 167.639 93.859 90.000 90.000 120.000
Scaling 4 datasets
Scaling data files with local scaling
Files to scale:
/Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_202\_w1\_2.9\_90.HKL
/Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_202\_w2\_2.9\_90.HKL
/Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w1\_2.9\_90.HKL
/Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w2\_2.9\_90.HKL
Selecting just reflections that have both F+ and F- in the same
dataset or are centric
Splitting datafiles into sub-files with one copy of each unique hkl
File Refl (rejected) B1 B2 B3 B-avg
TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1.sca: 138180 ( 6378) 63.9 63.9 55.00 60.9
TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_2.sca: 229 ( 0) 63.9 63.9 55.00 60.9
TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1.sca: 136254 ( 6096) 59.4 59.4 49.78 56.2
TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_2.sca: 206 ( 0) 59.4 59.4 49.78 56.2
TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1.sca: 148714 ( 4464) 73.7 73.7 63.92 70.4
TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_2.sca: 239 ( 0) 73.7 73.7 63.92 70.4
TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_1.sca: 147256 ( 4560) 70.3 70.3 60.50 67.0
TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_2.sca: 241 ( 0) 70.3 70.3 60.50 67.0
Notes:
Splitting /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_202\_w1\_2.9\_90.HKL into 2 files:
Splitting /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_202\_w2\_2.9\_90.HKL into 2 files:
Splitting /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w1\_2.9\_90.HKL into 2 files:
Splitting /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w2\_2.9\_90.HKL into 2 files:
Keeping split datafiles with at least 44614 reflections
High-resolution limit: 2.89
Scaling data in batches from individual data files
List of scaled data files:
TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1\_scale.mtz (30844 refl)
TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_1\_scale.mtz (30828 refl)
TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1\_scale.mtz (29641 refl)
TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1\_scale.mtz (29527 refl)
Scaling and merging data files with overall scale factor
Files to scale:
TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1\_scale.mtz
TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_1\_scale.mtz
TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1\_scale.mtz
TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1\_scale.mtz
****** Putting all data on common scale ******
Standard dataset: 1 I/sigma: 8.30 Nrefl: 59133
Total of 4 datasets to be used
Scale factors for data groups:
ID Scale file\_name
1 1.000 TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1\_scale.mtz
2 0.683 TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_1\_scale.mtz
3 1.198 TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1\_scale.mtz
4 1.366 TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1\_scale.mtz
Mean I of scaled unmerged datasets:
1: 9940.78 (N=59133)
2: 9926.98 (N=59085)
3: 10203.75 (N=56663)
4: 10305.51 (N=56463)
Getting overall merged dataset using original sigmas
Datasets to be merged in estimation of variances: 1 2 3 4
Datasets to be merged in final merging step: 1 2 3 4
Getting scale factors, dataset variances and scaled intensities
Merged mean I: 8966.65 (N=64581)
Dataset variances
(RMS difference from target dataset after accounting for sigmas)
Dataset
Resolution 1 2 3 4
48.40 - 4.94 2923.90 2353.89 6175.92 8461.67
4.94 - 3.92 1348.27 1407.39 1191.09 1267.29
3.92 - 3.43 0.00 0.00 0.00 0.00
3.43 - 3.11 0.00 0.00 0.00 0.00
3.11 - 2.89 0.00 0.00 0.00 0.00
ALL 2515.66 1935.84 3450.29 5499.59
Getting merged dataset including dataset variances
Dataset correlations with merged dataset
Dataset
Resolution 1 2 3 4
48.40 - 4.94 1.00 1.00 1.00 0.99
4.94 - 3.92 1.00 1.00 1.00 0.99
3.92 - 3.43 0.99 0.99 0.98 0.97
3.43 - 3.11 0.93 0.92 0.87 0.85
3.11 - 2.89 0.84 0.94 0.69 0.55
ALL 1.00 1.00 1.00 0.99
New merged mean I: 9035.38 (N=64581) and I/sigma: 10.89
NOTE: I/sigma cannot be directly compared to original due to including
estimates of dataset variances and changes in number of reflections.
Optimizing anomalous differences
Merged mean anomalous difference: -3.48 (N=31009)
Dataset variances (anomalous differences)
(RMS difference from target dataset after accounting for sigmas)
Dataset
Resolution 1 2 3 4
47.38 - 4.94 0.00 1652.13 993.28 0.00
4.94 - 3.92 732.44 0.00 503.86 0.00
3.92 - 3.43 690.71 0.00 0.00 0.00
3.43 - 3.11 272.68 0.00 0.00 0.00
3.11 - 2.89 0.00 0.00 0.00 0.00
ALL 468.90 0.00 435.50 0.00
Getting merged dataset including dataset variances
Dataset correlations with merged dataset (anomalous differences)
Dataset
Resolution 1 2 3 4
47.38 - 4.94 0.77 0.79 0.69 0.72
4.94 - 3.92 0.65 0.73 0.64 0.67
3.92 - 3.43 0.70 0.70 0.63 0.66
3.43 - 3.11 0.72 0.71 0.70 0.66
3.11 - 2.89 0.73 0.81 0.71 0.78
ALL 0.73 0.75 0.67 0.70
Anom correlation on I of std merged and anom scaled: 0.83 (N=31009)
Merged scaled data optimized for anomalous differences: scaled\_data.mtz
==============================================================================
Splitting data into groups for half-dataset CC
Keeping datafiles intact within each half-dataset
Half-dataset groups:
Group A
TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1\_scale.mtz
TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1\_scale.mtz
Group B
TEMP0/WNV\n_NS1\_205\_w2\_2.9\_90\_1\_scale.mtz
TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1\_scale.mtz
---- Calculating anomalous CC between half-datasets A,B----
Half-dataset A: half\_dataset\_a.mtz
Half-dataset B: half\_dataset\_b.mtz
Unique reflections:
half\_dataset\_a.mtz: 30242
half\_dataset\_b.mtz: 30222
Reflections in common: 29455
Overall resolution: 2.90 A Nrefl: 29455
Overall anomalous correlation: -0.018
Anomalous correlation with varying high-resolution
limits
d\_min ---- CC ------ ---- N -----
Shell Cumulative Shell Cumulative
6.00 0.080 0.080 3392 3392
5.50 -0.060 0.043 1052 4444
5.00 -0.115 -0.006 1519 5963
4.50 -0.040 -0.018 2256 8219
4.00 -0.065 -0.036 3540 11759
3.50 -0.016 -0.026 5862 17621
3.00 -0.009 -0.018 9997 27618
2.90 -0.022 -0.018 1837 29455