Scaling unmerged anomalous data or multiple datasets with scale_and_merge

Author(s)

Purpose

scale_and_merge is a tool for scaling unmerged anomalous data or multiple data files and creating a scaled dataset and two scaled half-datasets. This tool normally is used in combination with anomalous_signal to create a scaled dataset and analyze anomalous signal in a SAD dataset. It can also be used to scale any other group of datasets.

Usage

How scale_and_merge works:

Output from scale_and_merge

scale_and_merge provides a summary half-dataset correlations in your dataset. Here is an example with very weak anomalous signal (but present):

Scale and merge...analysis of multi-dataset SAD data
input\_files {
  data = "dd"
  data\_labels = None
  paired\_group\_ids = None
}
output\_files {
  output\_file = "scaled\_data.mtz"
  output\_half\_dataset\_a = "half\_dataset\_a.mtz"
  output\_half\_dataset\_b = "half\_dataset\_b.mtz"
  output\_file\_format = *mtz sca
}
crystal\_info {
  resolution = None
  low\_resolution = None
  space\_group = None
  unit\_cell = None
}
data\_selection {
  minimum\_datafile\_fraction = 0.3
  require\_fpfm = True
  only\_similar\_datasets = True
  relative\_length\_tolerance = None
  absolute\_angle\_tolerance = None
  choose\_optimal\_datasets = False
  sort\_datasets\_by\_anomalous\_cc = True
}
scaling {
  make\_anisotropy\_uniform = True
  overallscale = False
  skip\_scaling = False
  lowest\_resolution\_range = 6
}
merging {
  optimize\_anomalous = True
  use\_best\_group\_as\_target = False
  rescale\_sigmas = False
}
half\_dataset\_cc {
  get\_half\_dataset\_cc = True
  half\_dataset\_cc\_by\_files = True
  split\_as\_first\_second\_half = True
  split\_alternately = True
  split\_randomly = True
}
directories {
  temp\_dir = None
  output\_dir = ""
  gui\_output\_dir = None
}
control {
  verbose = False
  random\_seed = 714215
  comparison\_file = None
  clean\_up = True
}
Comparing crystal symmetries and noting anisotropy in data
Relative length tolerance:  0.010  Angle tolerance: 1.000 degrees
Taking initial symmetry from /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w2\_2.9\_90.HKL
with 3 similar files


            File             SG         A       B       C      Alpha   Beta   Gamma

Symmetry group 1
WNV\_NS1\_202\_w1\_2.9\_90.HKL    P 3 2 1  166.997 166.997  94.156  90.000  90.000 120.000
WNV\_NS1\_202\_w2\_2.9\_90.HKL    P 3 2 1  166.704 166.704  94.113  90.000  90.000 120.000
WNV\_NS1\_205\_w1\_2.9\_90.HKL    P 3 2 1  167.578 167.578  93.934  90.000  90.000 120.000
WNV\_NS1\_205\_w2\_2.9\_90.HKL    P 3 2 1  167.639 167.639  93.859  90.000  90.000 120.000

Scaling 4 datasets

Scaling data files with local scaling

Files to scale:
/Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_202\_w1\_2.9\_90.HKL
/Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_202\_w2\_2.9\_90.HKL
/Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w1\_2.9\_90.HKL
/Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w2\_2.9\_90.HKL

Selecting just reflections that have both F+ and F- in the same
dataset or are centric

Splitting datafiles into sub-files with one copy of each unique hkl

              File                    Refl       (rejected)   B1      B2     B3        B-avg

TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1.sca:   138180     (    6378)    63.9    63.9   55.00      60.9
TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_2.sca:      229     (       0)    63.9    63.9   55.00      60.9
TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1.sca:   136254     (    6096)    59.4    59.4   49.78      56.2
TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_2.sca:      206     (       0)    59.4    59.4   49.78      56.2
TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1.sca:   148714     (    4464)    73.7    73.7   63.92      70.4
TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_2.sca:      239     (       0)    73.7    73.7   63.92      70.4
TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_1.sca:   147256     (    4560)    70.3    70.3   60.50      67.0
TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_2.sca:      241     (       0)    70.3    70.3   60.50      67.0

Notes:
Splitting /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_202\_w1\_2.9\_90.HKL into 2 files:
Splitting /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_202\_w2\_2.9\_90.HKL into 2 files:
Splitting /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w1\_2.9\_90.HKL into 2 files:
Splitting /Users/terwill/unix/misc/scale\_and\_merge/dd/WNV\_NS1\_205\_w2\_2.9\_90.HKL into 2 files:
Keeping split datafiles with at least 44614 reflections

High-resolution limit:    2.89

Scaling data in batches from individual data files

List of scaled data files:
TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1\_scale.mtz (30844 refl)
TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_1\_scale.mtz (30828 refl)
TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1\_scale.mtz (29641 refl)
TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1\_scale.mtz (29527 refl)


Scaling and merging data files with overall scale factor

Files to scale:
TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1\_scale.mtz
TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_1\_scale.mtz
TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1\_scale.mtz
TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1\_scale.mtz

 ****** Putting all data on common scale ******

Standard dataset: 1  I/sigma:    8.30  Nrefl: 59133


Total of 4 datasets to be used

Scale factors for data groups:
 ID    Scale   file\_name
  1  1.000  TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1\_scale.mtz
  2  0.683  TEMP0/WNV\_NS1\_205\_w2\_2.9\_90\_1\_scale.mtz
  3  1.198  TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1\_scale.mtz
  4  1.366  TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1\_scale.mtz

Mean I of scaled unmerged datasets:
1:   9940.78 (N=59133)
2:   9926.98 (N=59085)
3:  10203.75 (N=56663)
4:  10305.51 (N=56463)


Getting overall merged dataset using original sigmas


Datasets to be merged in estimation of variances:  1 2 3 4
Datasets to be merged in final merging step:  1 2 3 4

Getting scale factors, dataset variances and scaled intensities

Merged mean I:  8966.65 (N=64581)


Dataset variances
(RMS difference from target dataset after accounting for sigmas)
                          Dataset
   Resolution        1        2        3        4
 48.40 -  4.94   2923.90  2353.89  6175.92  8461.67
  4.94 -  3.92   1348.27  1407.39  1191.09  1267.29
  3.92 -  3.43      0.00     0.00     0.00     0.00
  3.43 -  3.11      0.00     0.00     0.00     0.00
  3.11 -  2.89      0.00     0.00     0.00     0.00

  ALL            2515.66  1935.84  3450.29  5499.59

Getting merged dataset including dataset variances

Dataset correlations with merged dataset
                          Dataset
   Resolution        1        2        3        4
 48.40 -  4.94      1.00     1.00     1.00     0.99
  4.94 -  3.92      1.00     1.00     1.00     0.99
  3.92 -  3.43      0.99     0.99     0.98     0.97
  3.43 -  3.11      0.93     0.92     0.87     0.85
  3.11 -  2.89      0.84     0.94     0.69     0.55

  ALL               1.00     1.00     1.00     0.99

New merged mean I:  9035.38 (N=64581) and I/sigma:   10.89

NOTE: I/sigma cannot be directly compared to original due to including
estimates of dataset variances and changes in number of reflections.


Optimizing anomalous differences

Merged mean anomalous difference:    -3.48 (N=31009)


Dataset variances (anomalous differences)
(RMS difference from target dataset after accounting for sigmas)
                          Dataset
   Resolution        1        2        3        4
 47.38 -  4.94      0.00  1652.13   993.28     0.00
  4.94 -  3.92    732.44     0.00   503.86     0.00
  3.92 -  3.43    690.71     0.00     0.00     0.00
  3.43 -  3.11    272.68     0.00     0.00     0.00
  3.11 -  2.89      0.00     0.00     0.00     0.00

  ALL             468.90     0.00   435.50     0.00

Getting merged dataset including dataset variances

Dataset correlations with merged dataset (anomalous differences)
                          Dataset
   Resolution        1        2        3        4
 47.38 -  4.94      0.77     0.79     0.69     0.72
  4.94 -  3.92      0.65     0.73     0.64     0.67
  3.92 -  3.43      0.70     0.70     0.63     0.66
  3.43 -  3.11      0.72     0.71     0.70     0.66
  3.11 -  2.89      0.73     0.81     0.71     0.78

  ALL               0.73     0.75     0.67     0.70
Anom correlation on I of std merged and anom scaled:     0.83 (N=31009)
Merged scaled data optimized for anomalous differences: scaled\_data.mtz


==============================================================================

Splitting data into groups for half-dataset CC

Keeping datafiles intact within each half-dataset

Half-dataset groups:

Group A
 TEMP0/WNV\_NS1\_205\_w1\_2.9\_90\_1\_scale.mtz
TEMP0/WNV\_NS1\_202\_w1\_2.9\_90\_1\_scale.mtz

Group B
 TEMP0/WNV\n_NS1\_205\_w2\_2.9\_90\_1\_scale.mtz
TEMP0/WNV\_NS1\_202\_w2\_2.9\_90\_1\_scale.mtz

---- Calculating anomalous CC between half-datasets A,B----

Half-dataset A: half\_dataset\_a.mtz
Half-dataset B: half\_dataset\_b.mtz
Unique reflections:
half\_dataset\_a.mtz: 30242
half\_dataset\_b.mtz: 30222
Reflections in common:    29455
Overall resolution:    2.90 A  Nrefl: 29455
Overall anomalous correlation:   -0.018

Anomalous correlation with varying high-resolution
limits

  d\_min     ---- CC ------       ----  N -----
           Shell   Cumulative   Shell  Cumulative
   6.00     0.080    0.080      3392      3392
   5.50    -0.060    0.043      1052      4444
   5.00    -0.115   -0.006      1519      5963
   4.50    -0.040   -0.018      2256      8219
   4.00    -0.065   -0.036      3540     11759
   3.50    -0.016   -0.026      5862     17621
   3.00    -0.009   -0.018      9997     27618
   2.90    -0.022   -0.018      1837     29455

Possible Problems

Literature

List of all available keywords