Python-based Hierarchical ENvironment for Integrated Xtallography |
Documentation Home |
Data quality assessment with phenix.xtriage
Author(s)
PurposeThe xtriage method is a tool for analyzing structure factor data to identify outliers, presence of twinning and other conditions that the user should be aware of. UsageHow xtriage worksBasic sanity checks performed by xtriage are
Output files from xtriageXtriage keywords in detailScope: parameters.asu_contents keys: * n_residues :: Number of residues per monomer/unit * n_bases :: Number of nucleotides per monomer/unit * n_copies_per_asu :: Number of copies in the ASU.These keywords control the determination of the absolute scale. If the number of residues/bases is not specified, a solvent content of 50% is assumed. Scope: parameters.misc_twin_parameters.missing_symmetry keys: * tanh_location :: tanh decision rule parameter * tanh_slope :: tanh decision rule parameterThe tanh_location and tanh_slope parameter control what R-value is considered to be low enough to be considered a 'proper' symmetry operator. the tanh_location parameter corresponds to the inflection point of the approximate step function. Increasing tanh_location will result in large R-value thresholds. tanh_slope is set to 50 and should be okai. Scope: parameters.misc_twin_parameters.twinning_with_ncs keys: * perform_test :: can be set to True or False * n_bins :: Number of bins in determination of D_ncsThe perform_test is by default set to False. Setting it to True triggers the determination of the twin fraction while taking into account NCS parallel to the twin axis. Scope: parameters.misc_twin_parameters.twin_test_cuts keys: * high_resolution : high resolution for twin tests * low_resolution: low resolution for twin tests * isigi_cut: I/sig(I) threshold in automatic determination of high resolution limit * completeness_cut: completeness threshold in automatic determination of high resolution limitThe automatic determination of the resolution limit for the twinning test is determined on the basis of the completeness after removing intensities for which I/sigI < isigi_cut. The lowest limit obtain in this way is 3.5A. The value determined by the automatic procedure can be overruled by specification of the high_resolution keyword. The low resolution is set to 10A by default. Scope: parameters.reporting keys: * verbose :: verbosity level. * log :: log file name * ccp4_style_graphs :: Either True or False. Determines whether or not ccp4 style logfgra plots are written to the log fileScope: xray_data keys: * file_name :: file name with xray data. * obs_labels :: labels for observed data is format is mtz or XPLOR/CNS * calc_labels :: optional; labels for calculated data * unit_cell :: overrides unit cell in reflection file (if present) * space_group :: overrides space group in reflection file (if present) * high_resolution :: High resolution limit of the data * low_resolution :: Low resolution limit of the dataNote that the matching of specified and present labels involves a sub-string matching algorithm. Scope: optional keys: * hklout :: output mtz file * twinning.action :: Whether to detwin the data * twinning.twin_law :: using this twin law (h,k,l or x,y,z notation) * twinning.fraction :: The detwinning fraction. * b_value :: the resulting Wilson B valueThe output mtz file contains an anisotropy corrected mtz file, with suspected outliers removed. The data is put scaled and has the specified Wilson B value. These options have an associated expert level of 10, and are not shown by default. Specification of the expert level on the command line as 'level=100' will show all available options. Interpreting Xtriage output%phenix.xtriage some_data.sca residues=290 log=some_data.logresults in the following output (parts omitted). Matthews analysis First, a cell contents analysis is performed. Matthews coefficients, solvent content and solvent content probabilities are listed, and the most likely composition is guessed Matthews coefficient and Solvent content statistics ---------------------------------------------------------------- | Copies | Solvent content | Matthews Coed. | P(solvent cont.) | |--------|-----------------|----------------|------------------| | 1 | 0.705 | 4.171 | 0.241 | | 2 | 0.411 | 2.085 | 0.750 | | 3 | 0.116 | 1.390 | 0.009 | ---------------------------------------------------------------- | Best guess : 2 copies in the asu | ----------------------------------------------------------------Data strength The next step, the strength of the data is gauged by determining the completeness of the in resolution bins after application of several I/sigI cut off values Completeness and data strength analysis The following table lists the completeness in various resolution ranges, after applying a I/sigI cut. Miller indices for which individual I/sigI values are larger than the value specified in the top row of the table, are retained, while other intensities are discarded. The resulting completeness profiles are an indication of the strength of the data. ---------------------------------------------------------------------------------------- | Res. Range | I/sigI>1 | I/sigI>2 | I/sigI>3 | I/sigI>5 | I/sigI>10 | I/sigI>15 | ---------------------------------------------------------------------------------------- | 19.87 - 7.98 | 96.4% | 95.3% | 94.5% | 93.6% | 91.7% | 89.3% | | 7.98 - 6.40 | 99.2% | 98.2% | 97.1% | 95.5% | 90.9% | 84.7% | | 6.40 - 5.61 | 97.8% | 95.4% | 93.3% | 87.1% | 76.6% | 66.8% | | 5.61 - 5.11 | 98.2% | 95.9% | 94.0% | 87.9% | 74.1% | 58.0% | | 5.11 - 4.75 | 97.9% | 96.2% | 94.5% | 91.1% | 79.2% | 62.5% | | 4.75 - 4.47 | 97.4% | 95.4% | 93.1% | 88.9% | 76.6% | 56.9% | | 4.47 - 4.25 | 96.5% | 94.5% | 92.1% | 88.0% | 75.3% | 56.5% | | 4.25 - 4.07 | 96.6% | 94.0% | 91.2% | 85.4% | 69.3% | 44.9% | | 4.07 - 3.91 | 95.6% | 92.1% | 87.8% | 80.1% | 61.9% | 34.8% | | 3.91 - 3.78 | 94.3% | 89.6% | 83.7% | 71.1% | 48.7% | 20.5% | | 3.78 - 3.66 | 95.7% | 90.9% | 85.6% | 71.5% | 42.4% | 14.8% | | 3.66 - 3.56 | 91.6% | 85.0% | 78.0% | 63.3% | 34.1% | 9.5% | | 3.56 - 3.46 | 89.8% | 80.4% | 70.2% | 52.8% | 22.2% | 3.8% | | 3.46 - 3.38 | 87.4% | 76.3% | 64.6% | 46.7% | 15.5% | 1.7% | ----------------------------------------------------------------------------------------This analysis is also used in the automatic determination of the high resolution limit used in the intensity statistics and twin analyses. Absolute, likelihood based Wilson scaling The (anisotropic) B value of the data is determined using a likelihood based approach. The resulting B value/tensor is reported: Maximum likelihood isotropic Wilson scaling ML estimate of overall B value of sec17.sca:i_obs,sigma: 75.85 A**(-2) Estimated -log of scale factor of sec17.sca:i_obs,sigma: -2.50 Maximum likelihood anisotropic Wilson scaling ML estimate of overall B_cart value of sec17.sca:i_obs,sigma: 68.92, 0.00, 0.00 68.92, 0.00 91.87 Equivalent representation as U_cif: 0.87, -0.00, -0.00 0.87, 0.00 1.16 ML estimate of -log of scale factor of sec17.sca:i_obs,sigma: -2.50 Correcting for anisotropy in the dataA large spread in (especially the diagonal) values indicates anisotropy. The anisotropy is corrected for. This clears up intensity statistics. Low resolution completeness analysis Mostly data processing software do not provide a clear picture of the completeness of the data at low resolution. For this reason, xtriage lists the completeness of the data up to 5 Angstrom: Low resolution completeness analysis The following table shows the completeness of the data to 5 Angstrom. unused: - 19.8702 [ 0/68 ] 0.000 bin 1: 19.8702 - 10.3027 [425/455] 0.934 bin 2: 10.3027 - 8.3766 [443/446] 0.993 bin 3: 8.3766 - 7.3796 [446/447] 0.998 bin 4: 7.3796 - 6.7336 [447/449] 0.996 bin 5: 6.7336 - 6.2673 [450/454] 0.991 bin 6: 6.2673 - 5.9080 [428/429] 0.998 bin 7: 5.9080 - 5.6192 [459/466] 0.985 bin 8: 5.6192 - 5.3796 [446/450] 0.991 bin 9: 5.3796 - 5.1763 [437/440] 0.993 bin 10: 5.1763 - 5.0006 [460/462] 0.996 unused: 5.0006 - [ 0/0 ]This analysis allows one to quickly see if there is any unusually low completeness at low resolution, for instance due to missing overloads. Wilson plot analysis A Wilson plot analysis a la ARP/wARP is carried out, albeit with a slightly different standard curve: Mean intensity analysis Analysis of the mean intensity. Inspired by: Morris et al. (2004). J. Synch. Rad.11, 56-59. The following resolution shells are worrisome: ------------------------------------------------ | d_spacing | z_score | compl. | <Iobs>/<Iexp> | ------------------------------------------------ | 5.773 | 7.95 | 0.99 | 0.658 | | 5.423 | 8.62 | 0.99 | 0.654 | | 5.130 | 6.31 | 0.99 | 0.744 | | 4.879 | 5.36 | 0.99 | 0.775 | | 4.662 | 4.52 | 0.99 | 0.803 | | 3.676 | 5.45 | 0.99 | 1.248 | ------------------------------------------------ Possible reasons for the presence of the reported unexpected low or elevated mean intensity in a given resolution bin are : - missing overloaded or weak reflections - suboptimal data processing - satellite (ice) crystals - NCS - translational pseudo symmetry (detected elsewhere) - outliers (detected elsewhere) - ice rings (detected elsewhere) - other problems Note that the presence of abnormalities in a certain region of reciprocal space might confuse the data validation algorithm throughout a large region of reciprocal space, even though the data is acceptable in those areas.A very long list of warnings could indicate a serious problem with your data. Decisions on whether or not the data is useful, should be cut or should thrown away altogether, is not straightforward and falls beyond the scope of xtriage. Outlier detection and rejection Possible outliers are detected on the basis Wilson statistics: Possible outliers Inspired by: Read, Acta Cryst. (1999). D55, 1759-1764 Acentric reflections: ----------------------------------------------------------------- | d_space | H K L | |E| | p(wilson) | p(extreme) | ----------------------------------------------------------------- | 3.716 | 8, 6, 31 | 3.52 | 4.06e-06 | 5.87e-02 | ----------------------------------------------------------------- p(wilson) : 1-(1-exp[-|E|^2]) p(extreme) : 1-(1-exp[-|E|^2])^(n_acentrics) p(wilson) is the probability that an E-value of the specified value would be observed when it would selected at random from the given data set. p(extreme) is the probability that the largest |E| value is larger or equal than the observed largest |E| value. Both measures can be used for outlier detection. p(extreme) takes into account the size of the data set.Outliers are removed from the data set in the further analysis. Note that if pseudo translational symmetry is present, a large number of 'outliers' will be present. Ice ring detection Ice rings in the data are detected by analyzing the completeness and the mean intensity: Ice ring related problems The following statistics were obtained from ice-ring insensitive resolution ranges mean bin z_score : 3.47 ( rms deviation : 2.83 ) mean bin completeness : 0.99 ( rms deviation : 0.00 ) The following table shows the z-scores and completeness in ice-ring sensitive areas. Large z-scores and high completeness in these resolution ranges might be a reason to re-assess your data processing if ice rings were present. ------------------------------------------------ | d_spacing | z_score | compl. | Rel. Ice int. | ------------------------------------------------ | 3.897 | 0.12 | 0.97 | 1.000 | | 3.669 | 0.96 | 0.95 | 0.750 | | 3.441 | 2.14 | 0.94 | 0.530 | ------------------------------------------------ Abnormalities in mean intensity or completeness at resolution ranges with a relative ice ring intensity lower then 0.10 will be ignored. At 3.67 A there is an lower occupancy then expected from the rest of the data set. Even though the completeness is lower as expected, the mean intensity is still reasonable at this resolution At 3.44 A there is an lower occupancy then expected from the rest of the data set. Even though the completeness is lower as expected, the mean intensity is still reasonable at this resolution There were 2 ice ring related warnings This could indicate the presence of ice rings.Anomalous signal If the input reflection file contains separate intensities for each Friedel mate, a quality measure of the anomalous signal is reported: Analysis of anomalous differences Table of measurability as a function of resolution The measurability is defined as the fraction of Bijvoet related intensity differences for which |delta_I|/sigma_delta_I > 3.0 min[I(+)/sigma_I(+), I(-)/sigma_I(-)] > 3.0 holds. The measurability provides an intuitive feeling of the quality of the data, as it is related to the number of reliable Bijvoet differences. When the data is processed properly and the standard deviations have been estimated accurately, values larger than 0.05 are encouraging. unused: - 19.8704 [ 0/68 ] bin 1: 19.8704 - 7.0211 [1551/1585] 0.1924 bin 2: 7.0211 - 5.6142 [1560/1575] 0.0814 bin 3: 5.6142 - 4.9168 [1546/1555] 0.0261 bin 4: 4.9168 - 4.4729 [1563/1582] 0.0081 bin 5: 4.4729 - 4.1554 [1557/1577] 0.0095 bin 6: 4.1554 - 3.9124 [1531/1570] 0.0083 bin 7: 3.9124 - 3.7178 [1541/1585] 0.0069 bin 8: 3.7178 - 3.5569 [1509/1552] 0.0028 bin 9: 3.5569 - 3.4207 [1522/1606] 0.0085 bin 10: 3.4207 - 3.3032 [1492/1574] 0.0044 unused: 3.3032 - [ 0/0 ] The anomalous signal seems to extend to about 5.9 A (or to 5.2 A, from a more optimistic point of view) The quoted resolution limits can be used as a guideline to decide where to cut the resolution for phenix.hyss As the anomalous signal is not very strong in this data set substructure solution via SAD might prove to be a challenge. Especially if only low resolution reflections are used, the resulting substructures could contain a significant amount of of false positives.Determination of twin laws Twin laws are found using a modified le-Page algorithm and classified as merohedral and pseudo merohedral: Determining possible twin laws. The following twin laws have been found: ------------------------------------------------------------------------------- | Type | Axis | R metric (%) | delta (le Page) | delta (Lebedev) | Twin law | ------------------------------------------------------------------------------- | M | 2-fold | 0.000 | 0.000 | 0.000 | -h,k,-l | ------------------------------------------------------------------------------- M: Merohedral twin law PM: Pseudomerohedral twin law 1 merohedral twin operators found 0 pseudo-merohedral twin operators found In total, 1 twin operator were foundNon-merohedral (reticular) twinning is not considered. The R-metric is equal to : Sum (M_i-N_i)^2 / Sum M_i^2M_i are elements of the original metric tensor and N_i are elements of the metric tensor after 'idealizing' the unit cell, in compliance with the restrictions the twin law poses on the lattice if it would be a 'true' symmetry operator. The delta le-Page is the familiar obliquity. The delta Lebedev is a twin law quality measure developed by A. Lebedev (Lebedev, Vagin & Murshudov; Acta Cryst. (2006). D62, 83-95.). Note that for merohedral twin laws, all quality indicators are 0. For non-merohedral twin laws, this value is larger or equal to zero. If a twin law is classified as non-merohedral, but has a delta le-page equal to zero, the twin law is sometimes referred to as a metric merohedral twin law. Locating translational pseudo symmetry (TPS) TPS is located by inspecting a low resolution Patterson function. Peaks and their significance levels are reported: Largest Patterson peak with length larger then 15 Angstrom Frac. coord. : 0.027 0.057 0.345 Distance to origin : 17.444 Height (origin=100) : 3.886 p_value(height) : 9.982e-01 The reported p_value has the following meaning: The probability that a peak of the specified height or larger is found in a Patterson function of a macro molecule that does not have any translational pseudo symmetry is equal to 9.982e-01 p_values smaller then 0.05 might indicate weak translation pseudo symmetry, or the self vector of a large anomalous scatterer such as Hg, whereas values smaller then 1e-3 are a very strong indication for the presence of translational pseudo symmetry.Moments of the observed intensities The moment of the observed intensity/amplitude distribution, are reported, as well as their expected values: Wilson ratio and moments Acentric reflections <I^2>/<I>^2 :1.955 (untwinned: 2.000; perfect twin 1.500) <F>^2/<F^2> :0.796 (untwinned: 0.785; perfect twin 0.885) <|E^2 - 1|> :0.725 (untwinned: 0.736; perfect twin 0.541) Centric reflections <I^2>/<I>^2 :2.554 (untwinned: 3.000; perfect twin 2.000) <F>^2/<F^2> :0.700 (untwinned: 0.637; perfect twin 0.785) <|E^2 - 1|> :0.896 (untwinned: 0.968; perfect twin 0.736)Significant departure from the ideal values could indicate the presence of twinning or pseudo translations. For instance, an <I^2>/<I>^2 value significantly lower than 2.0, might point to twinning, whereas a value significantly larger than 2.0, might point towards pseudo translational symmetry. Cumulative intensity distribution The cumulative intensity distribution is reported: ----------------------------------------------- | Z | Nac_obs | Nac_theo | Nc_obs | Nc_theo | ----------------------------------------------- | 0.0 | 0.000 | 0.000 | 0.000 | 0.000 | | 0.1 | 0.081 | 0.095 | 0.168 | 0.248 | | 0.2 | 0.167 | 0.181 | 0.292 | 0.345 | | 0.3 | 0.247 | 0.259 | 0.354 | 0.419 | | 0.4 | 0.321 | 0.330 | 0.420 | 0.474 | | 0.5 | 0.392 | 0.394 | 0.473 | 0.520 | | 0.6 | 0.452 | 0.451 | 0.521 | 0.561 | | 0.7 | 0.506 | 0.503 | 0.570 | 0.597 | | 0.8 | 0.552 | 0.551 | 0.603 | 0.629 | | 0.9 | 0.593 | 0.593 | 0.636 | 0.657 | | 1.0 | 0.635 | 0.632 | 0.673 | 0.683 | ----------------------------------------------- | Maximum deviation acentric : 0.015 | | Maximum deviation centric : 0.080 | | | | <NZ(obs)-NZ(twinned)>_acentric : -0.004 | | <NZ(obs)-NZ(twinned)>_centric : -0.039 | -----------------------------------------------The N(Z) test is related to the moments based test discussed above. Nac_obs is the observed cumulative distribution of normalized intensities of the acentric data, and uses the full distribution rather then just a moment. The effects of twinning shows itself for Nac_obs having a more sigmoidal character. In the case of pseudo centering, Nac_obs will tend towards Nc_theo. The L test The L-test is an intensity statistic developed by Padilla and Yeates (Acta Cryst. (2003), D59: 1124-1130) and is reasonably robust in the presence of anisotropy and pseudo centering, especially if the miller indices are partitioned properly. Partitioning is carried out on the basis of a Patterson analysis. A significant deviation of both <|L|> and <L^2> from the expected values indicate twinning or other problems: L test for acentric data using difference vectors (dh,dk,dl) of the form: (2hp,2kp,2lp) where hp, kp, and lp are random signed integers such that 2 <= |dh| + |dk| + |dl| <= 8 Mean |L| :0.482 (untwinned: 0.500; perfect twin: 0.375) Mean L^2 :0.314 (untwinned: 0.333; perfect twin: 0.200) The distribution of |L| values indicates a twin fraction of 0.00. Note that this estimate is not as reliable as obtained via a Britton plot or H-test if twin laws are available.Whether or not the <|L|> and <L^2> differ significantly from the expected values, is shown in the final summary (see below). Analysis of twin laws Twin law specific tests (Britton, H and RvsR) are performed: Results of the H-test on a-centric data: (Only 50.0% of the strongest twin pairs were used) mean |H| : 0.183 (0.50: untwinned; 0.0: 50% twinned) mean H^2 : 0.055 (0.33: untwinned; 0.0: 50% twinned) Estimation of twin fraction via mean |H|: 0.317 Estimation of twin fraction via cum. dist. of H: 0.308 Britton analysis Extrapolation performed on 0.34 < alpha < 0.495 Estimated twin fraction: 0.283 Correlation: 0.9951 R vs R statistic: R_abs_twin = <|I1-I2|>/<|I1+I2|> Lebedev, Vagin, Murshudov. Acta Cryst. (2006). D62, 83-95 R_abs_twin observed data : 0.193 R_abs_twin calculated data : 0.328 R_sq_twin = <(I1-I2)^2>/<(I1+I2)^2> R_sq_twin observed data : 0.044 R_sq_twin calculated data : 0.120 Maximum Likelihood twin fraction determination Zwart, Read, Grosse-Kunstleve & Adams, to be published. The estimated twin fraction is equal to 0.227These tests allow one to estimate the twin fraction and (if calculated data is provided) determine if rotational pseudo symmetry is present. Another option (albeit more computationally expensive), is to estimate the correlation between error free, untwinned, twin related normalized intensities (use the key perform=True on the command line) Estimation of twin fraction, while taking into account the effects of possible NCS parallel to the twin axis. Zwart, Read, Grosse-Kunstleve & Adams, to be published. A parameters D_ncs will be estimated as a function of resolution, together with a global twin fraction. D_ncs is an estimate of the correlation coefficient between untwinned, error-free, twin related, normalized intensities. Large values (0.95) could indicate an incorrect point group. Value of D_ncs larger than say, 0.5, could indicate the presence of NCS. The twin fraction should be smaller or similar to other estimates given elsewhere. The refinement can take some time. For numerical stability issues, D_ncs is limited between 0 and 0.95. The twin fraction is allowed to vary between 0 and 0.45. Refinement cycle numbers are printed out to keep you entertained. . . . . 5 . . . . 10 . . . . 15 . . . . 20 . . . . 25 . . . . 30 . . . . 35 . . . . 40 . . . . 45 . . . . 50 . . . . 55 . . . . 60 . . . . 65 . . . . 70 . . . . 75 . . . Cycle : 78 ----------- Log[likelihood]: 22853.700 twin fraction: 0.201 D_ncs in resolution ranges: 9.8232 -- 4.5978 :: 0.830 4.5978 -- 3.7139 :: 0.775 3.7139 -- 3.2641 :: 0.745 3.2641 -- 2.9747 :: 0.746 2.9747 -- 2.7666 :: 0.705 2.7666 -- 2.6068 :: 0.754 2.6068 -- 2.4784 :: 0.735 The correlation of the calculated F^2 should be similar to the estimated values. Observed correlation between twin related, untwinned calculated F^2 in resolution ranges, as well as estimates D_ncs^2 values: Bin d_max d_min CC_obs D_ncs^2 1) 9.8232 -- 4.5978 :: 0.661 0.689 2) 4.5978 -- 3.7139 :: 0.544 0.601 3) 3.7139 -- 3.2641 :: 0.650 0.556 4) 3.2641 -- 2.9747 :: 0.466 0.557 5) 2.9747 -- 2.7666 :: 0.426 0.497 6) 2.7666 -- 2.6068 :: 0.558 0.569 7) 2.6068 -- 2.4784 :: 0.531 0.540The twin fraction obtained via this method is usually lower than what is obtained by refinement. The estimated correlation coefficient (D_ncs^2) between the twin related F^2 values, is however reasonably accurate. Exploring higher metric symmetry The fact that a twin law is present, could indicate that the data was incorrectly processed as well. The example below, shows a P41212 data set processed in P1: Exploring higher metric symmetry Point group of data as dictated by the space group is P 1 the point group in the Niggli setting is P 1 The point group of the lattice is P 4 2 2 A summary of R values for various possible point groups follow. ----------------------------------------------------------------------------------------------- | Point group | mean R_used | max R_used | mean R_unused | min R_unused | choice | ----------------------------------------------------------------------------------------------- | P 1 | None | None | 0.022 | 0.017 | | | P 4 2 2 | 0.022 | 0.025 | None | None | <--- | | P 1 2 1 | 0.017 | 0.017 | 0.026 | 0.024 | | | Hall: C 2y (x-y,x+y,z) | 0.025 | 0.025 | 0.022 | 0.017 | | | P 4 | 0.025 | 0.028 | 0.025 | 0.025 | | | Hall: C 2 2 (x-y,x+y,z) | 0.024 | 0.025 | 0.017 | 0.017 | | | Hall: C 2y (x+y,-x+y,z) | 0.024 | 0.024 | 0.023 | 0.017 | | | P 1 1 2 | 0.028 | 0.028 | 0.021 | 0.017 | | | P 2 1 1 | 0.027 | 0.027 | 0.022 | 0.017 | | | P 2 2 2 | 0.023 | 0.028 | 0.025 | 0.025 | | ----------------------------------------------------------------------------------------------- R_used: mean and maximum R value for symmetry operators *used* in this point group R_unused: mean and minimum R value for symmetry operators *not used* in this point group The likely point group of the data is: P 4 2 2As in phenix.explore_metric_symmetry, the possible space groups are listed as well (not shown here). Twin analysis summary The results of the twin analysis are summarized. Typical outputs look as follows for cases of wrong symmetry, twin laws but no suspected twinning and twinned data respectively. Wrong symmetry: ------------------------------------------------------------------------------- Twinning and intensity statistics summary (acentric data): Statistics independent of twin laws - <I^2>/<I>^2 : 2.104 - <F>^2/<F^2> : 0.770 - <|E^2-1|> : 0.757 - <|L|>, <L^2>: 0.512, 0.349 Multivariate Z score L-test: 2.777 The multivariate Z score is a quality measure of the given spread in intensities. Good to reasonable data is expected to have a Z score lower than 3.5. Large values can indicate twinning, but small values do not necessarily exclude it. Statistics depending on twin laws ------------------------------------------------------ | Operator | type | R obs. | Britton alpha | H alpha | ------------------------------------------------------ | k,h,-l | PM | 0.025 | 0.458 | 0.478 | | -h,k,-l | PM | 0.017 | 0.459 | 0.487 | | -k,h,l | PM | 0.024 | 0.458 | 0.478 | | -k,-h,-l | PM | 0.024 | 0.458 | 0.478 | | -h,-k,l | PM | 0.028 | 0.458 | 0.476 | | h,-k,-l | PM | 0.027 | 0.458 | 0.477 | | k,-h,l | PM | 0.024 | 0.457 | 0.478 | ------------------------------------------------------ Patterson analysis - Largest peak height : 6.089 (corresponding p value : 6.921e-01) The largest off-origin peak in the Patterson function is 6.09% of the height of the origin peak. No significant pseudo-translation is detected. The results of the L-test indicate that the intensity statistics behave as expected. No twinning is suspected. The symmetry of the lattice and intensity however suggests that the input space group is too low. See the relevant sections of the log file for more details on your choice of space groups. As the symmetry is suspected to be incorrect, it is advisable to reconsider data processing. -------------------------------------------------------------------------------Twin laws present but no suspected twinning: ------------------------------------------------------------------------------- Twinning and intensity statistics summary (acentric data): Statistics independent of twin laws - <I^2>/<I>^2 : 1.955 - <F>^2/<F^2> : 0.796 - <|E^2-1|> : 0.725 - <|L|>, <L^2>: 0.482, 0.314 Multivariate Z score L-test: 1.225 The multivariate Z score is a quality measure of the given spread in intensities. Good to reasonable data is expected to have a Z score lower than 3.5. Large values can indicate twinning, but small values do not necessarily exclude it. Statistics depending on twin laws ------------------------------------------------------ | Operator | type | R obs. | Britton alpha | H alpha | ------------------------------------------------------ | -h,k,-l | M | 0.455 | 0.016 | 0.035 | ------------------------------------------------------ Patterson analysis - Largest peak height : 3.886 (corresponding p value : 9.982e-01) The largest off-origin peak in the Patterson function is 3.89% of the height of the origin peak. No significant pseudo-translation is detected. The results of the L-test indicate that the intensity statistics behave as expected. No twinning is suspected. Even though no twinning is suspected, it might be worthwhile carrying out a refinement using a dedicated twin target anyway, as twinned structures with low twin fractions are difficult to distinguish from non-twinned structures. -------------------------------------------------------------------------------Twinned data: ------------------------------------------------------------------------------- Twinning and intensity statistics summary (acentric data): Statistics independent of twin laws - <I^2>/<I>^2 : 1.587 - <F>^2/<F^2> : 0.871 - <|E^2-1|> : 0.568 - <|L|>, <L^2>: 0.387, 0.212 Multivariate Z score L-test: 11.589 The multivariate Z score is a quality measure of the given spread in intensities. Good to reasonable data is expected to have a Z score lower than 3.5. Large values can indicate twinning, but small values do not necessarily exclude it. Statistics depending on twin laws ------------------------------------------------------ | Operator | type | R obs. | Britton alpha | H alpha | ------------------------------------------------------ | -l,-k,-h | PM | 0.170 | 0.330 | 0.325 | ------------------------------------------------------ Patterson analysis - Largest peak height : 7.300 (corresponding p value : 4.454e-01) The largest off-origin peak in the Patterson function is 7.30% of the height of the origin peak. No significant pseudo-translation is detected. The results of the L-test indicate that the intensity statistics are significantly different then is expected from good to reasonable, untwinned data. As there are twin laws possible given the crystal symmetry, twinning could be the reason for the departure of the intensity statistics from normality. It might be worthwhile carrying refinement with a twin specific target function. -------------------------------------------------------------------------------In the summary, the significance of the departure of the values of the L-test from normality are stated. The multivariate Z-score (also known as the Mahalanobis distance) is used for this purpose. ExamplesStandard run of xtriageRunning xtriage is easy. From the command-line you can type: phenix.xtriage data.scaWhen an MTZ or CNS file is used, labels have to be specified: phenix.xtriage file=my_brilliant_data.mtz obs_labels='F(+),SIGF(+),F(-),SIGF(-)'In order to perform a Matthews analysis, it might be useful to specify the number of residues/nucleotides in the crystallized macro molecule: phenix.xtriage data.sca n_residues=230 n_bases=25By default, the screen output plus additional ccp4 style graphs (viewable with the ccp4 programs loggraph) are echoed to a file named logfile.log. The command line arguments and all other defaults settings are summarized in a PHIL parameter data block given at the beginning of the logfile / screen output: scaling.input { parameters { asu_contents { n_residues = None n_bases = None n_copies_per_asu = None } misc_twin_parameters { missing_symmetry { tanh_location = 0.08 tanh_slope = 50 } twinning_with_ncs { perform_analysis = False n_bins = 7 } twin_test_cuts { low_resolution = 10 high_resolution = None isigi_cut = 3 completeness_cut = 0.85 } } reporting { verbose = 1 log = "logfile.log" ccp4_style_graphs = True } } xray_data { file_name = "some_data.sca" obs_labels = None calc_labels = None unit_cell = 64.5 69.5 45.5 90 104.3 90 space_group = "P 1 21 1" high_resolution = None low_resolution = None } }The defaults are good for most applications. Possible ProblemsSpecific limitations and problemsLiteratureAdditional informationList of all xtriage keywords------------------------------------------------------------------------------- Legend: black bold - scope names black - parameter names red - parameter values blue - parameter help blue bold - scope help Parameter values: * means selected parameter (where multiple choices are available) False is No True is Yes None means not provided, not predefined, or left up to the program "%3d" is a Python style formatting descriptor ------------------------------------------------------------------------------- scaling input expert_level= 1 Expert level asu_contents Defines the ASU contents sequence_file= None File containing protein or nucleic acid sequences. Values for n_residues and n_bases will be extracted automatically if this is provided. n_residues= None Number of residues in structural unit n_bases= None Number of nucleotides in structural unit n_copies_per_asu= None Number of copies per ASU. If not specified, Matthews analyses is performed xray_data Defines xray data file_name= None File name with data obs_labels= None Labels for observed data calc_labels= None Lables for calculated data unit_cell= None Unit cell parameters space_group= None space group high_resolution= None High resolution limit low_resolution= None Low resolution limit reference A reference data set. For the investigation of possible reindexing options data Defines an x-ray dataset file_name= None File name labels= None Labels unit_cell= None Unit cell parameters" space_group= None Space group structure file_name= None Filename of reference PDB file parameters Basic settings reporting Some output issues verbose= 1 Verbosity log= logfile.log Logfile ccp4_style_graphs= True SHall we include ccp4 style graphs? misc_twin_parameters Various settings for twinning or symmetry tests apply_basic_filters_prior_to_twin_analysis= True Keep data cutoffs from the basic_analyses module (I/sigma,Wilson scaling,Anisotropy) when twin stats are computed. missing_symmetry Settings for missing symmetry tests sigma_inflation= 1.25 Standard deviations of intensities can be increased to make point group determination more reliable. twinning_with_ncs Analysing the possibility of an NCS operator parallel to a twin law. perform_analyses= False Determines whether or not this analyses is carried out. n_bins= 7 Number of bins used in NCS analyses. twin_test_cuts Various cuts used in determining resolution limit for data used in intensity statistics low_resolution= 10.0 Low resolution high_resolution= None High resolution isigi_cut= 3.0 I/sigI ratio used in completeness cut completeness_cut= 0.85 Data is cut at resolution where intensities with I/sigI greater than isigi_cut are more than completeness_cut complete optional Optional data massage possibilities hklout= None HKL out hklout_type= mtz sca *mtz_or_sca Output format label_extension= "massaged" Label extension aniso Parameters dealing with anisotropy correction action= *remove_aniso None Remove anisotropy? final_b= *eigen_min eigen_mean user_b_iso Final b value b_iso= None User specified B value outlier Outlier analyses action= *extreme basic beamstop None Outlier protocol parameters Parameters for outlier detection basic_wilson level= 1E-6 extreme_wilson level= 0.01 beamstop level= 0.001 d_min= 10.0 symmetry action= detwin twin *None twinning_parameters twin_law= None fraction= None gui GUI-specific parameters, not applicable to command-line version. result_file= None Pickled result file for Phenix GUI job_title= None Job title in PHENIX GUI, not used on command line |