############################################################# ## phenix.xtriage ## ## ## ## P.H. Zwart, R.W. Grosse-Kunstleve & P.D. Adams ## ## ## ############################################################# #phil __OFF__ Date 2011-01-13 Time 17:31:58 EST -0500 (1294957918.07 s) ##-------------------------------------------## ## WARNING: ## ## Number of residues unspecified ## ##-------------------------------------------## ##-------------------------------------------## ## Unit cell defined manually, will ignore ## specification in reflection file: ## From file : (88.66, 151.07, 88.74, 90, 107.01, 90) ## From input: (88.66, 151.07, 88.74, 90, 107.01, 90) ##-------------------------------------------## ##-------------------------------------------## ## Space group defined manually, will ignore ## specification in reflection file: ## From file : P 1 21 1 ## From input: P 1 21 1 ##-------------------------------------------## Symmetry, cell and reflection file content summary Miller array info: /Users/nick/k94e_208/Refine_30M/k94e_2_refine_data.mtz:F-obs,SIGF-obs Observation type: xray.amplitude Type of data: double, size=131093 Type of sigmas: double, size=131093 Number of Miller indices: 131093 Anomalous flag: False Unit cell: (88.66, 151.07, 88.74, 90, 107.01, 90) Space group: P 1 21 1 (No. 4) Systematic absences: 0 Centric reflections: 2068 Resolution range: 43.3056 2.08007 Completeness in resolution range: 0.980963 Completeness with d_max=infinity: 0.980861 ##----------------------------------------------------## ## Basic statistics ## ##----------------------------------------------------## Matthews coefficient and Solvent content statistics Number of residues unknown, assuming 50% solvent content ---------------------------------------------------------------- | Best guess : 2079 residues in the asu | ---------------------------------------------------------------- Completeness and data strength analyses The following table lists the completeness in various resolution ranges, after applying a I/sigI cut. Miller indices for which individual I/sigI values are larger than the value specified in the top row of the table, are retained, while other intensities are discarded. The resulting completeness profiles are an indication of the strength of the data. ---------------------------------------------------------------------------------------- | Res. Range | I/sigI>1 | I/sigI>2 | I/sigI>3 | I/sigI>5 | I/sigI>10 | I/sigI>15 | ---------------------------------------------------------------------------------------- | 43.31 - 5.13 | 98.4% | 98.3% | 98.3% | 98.1% | 97.4% | 94.8% | | 5.13 - 4.07 | 97.8% | 97.7% | 97.7% | 97.4% | 96.5% | 93.6% | | 4.07 - 3.56 | 97.9% | 97.7% | 97.5% | 97.0% | 94.5% | 88.2% | | 3.56 - 3.23 | 97.9% | 97.4% | 96.9% | 95.4% | 88.6% | 74.8% | | 3.23 - 3.00 | 98.4% | 97.0% | 95.8% | 92.3% | 79.3% | 57.8% | | 3.00 - 2.82 | 98.6% | 96.2% | 93.8% | 87.9% | 67.9% | 41.0% | | 2.82 - 2.68 | 98.6% | 93.4% | 88.9% | 79.9% | 54.2% | 26.1% | | 2.68 - 2.56 | 98.4% | 89.7% | 83.0% | 71.2% | 39.4% | 14.2% | | 2.56 - 2.47 | 98.5% | 86.6% | 78.5% | 63.3% | 28.5% | 7.6% | | 2.47 - 2.38 | 98.4% | 82.3% | 72.2% | 54.2% | 19.9% | 4.0% | | 2.38 - 2.31 | 98.7% | 76.5% | 63.8% | 43.9% | 12.7% | 1.7% | | 2.31 - 2.24 | 98.5% | 71.1% | 56.5% | 35.3% | 7.5% | 0.8% | | 2.24 - 2.18 | 98.4% | 64.0% | 46.8% | 25.0% | 3.8% | 0.2% | | 2.18 - 2.13 | 98.4% | 55.3% | 36.5% | 16.6% | 1.4% | 0.1% | ---------------------------------------------------------------------------------------- The completeness of data for which I/sig(I)>3.00, exceeds 85% for resolution ranges lower than 2.68A. The data are cut at this resolution for the potential twin tests and intensity statistics. Maximum likelihood isotropic Wilson scaling ML estimate of overall B value of None: 33.95 A**2 Estimated -log of scale factor of None: 0.44 Maximum likelihood anisotropic Wilson scaling ML estimate of overall B_cart value of None: 34.95, 0.00, 0.07 34.22, -0.00 32.86 Equivalent representation as U_cif: 0.44, -0.00, 0.12 0.43, 0.00 0.42 Eigen analyses of B-cart: Value Vector Eigenvector 1 : 34.950 ( 1.00, 0.00, 0.03) Eigenvector 2 : 34.220 (0.00, 1.00, 0.00) Eigenvector 3 : 32.857 (-0.03, 0.00, 1.00) ML estimate of -log of scale factor of None: 0.44 Correcting for anisotropy in the data Some basic intensity statistics follow. Low resolution completeness analyses The following table shows the completeness of the data to 5 Angstrom. unused: - 43.3072 [ 0/14 ] 0.000 bin 1: 43.3072 - 10.7233 [987/1006] 0.981 bin 2: 10.7233 - 8.5327 [977/986 ] 0.991 bin 3: 8.5327 - 7.4604 [941/955 ] 0.985 bin 4: 7.4604 - 6.7811 [969/977 ] 0.992 bin 5: 6.7811 - 6.2966 [943/957 ] 0.985 bin 6: 6.2966 - 5.9263 [950/973 ] 0.976 bin 7: 5.9263 - 5.6302 [970/988 ] 0.982 bin 8: 5.6302 - 5.3856 [976/993 ] 0.983 bin 9: 5.3856 - 5.1786 [932/948 ] 0.983 bin 10: 5.1786 - 5.0002 [934/950 ] 0.983 unused: 5.0002 - [ 0/0 ] Mean intensity analyses Analyses of the mean intensity. Inspired by: Morris et al. (2004). J. Synch. Rad.11, 56-59. The following resolution shells are worrisome: ------------------------------------------------ | d_spacing | z_score | compl. | / | ------------------------------------------------ | 8.451 | 4.88 | 0.99 | 1.351 | ------------------------------------------------ Possible reasons for the presence of the reported unexpected low or elevated mean intensity in a given resolution bin are : - missing overloaded or weak reflections - suboptimal data processing - satellite (ice) crystals - NCS - translational pseudo symmetry (detected elsewhere) - outliers (detected elsewhere) - ice rings (detected elsewhere) - other problems Note that the presence of abnormalities in a certain region of reciprocal space might confuse the data validation algorithm throughout a large region of reciprocal space, even though the data are acceptable in those areas. Possible outliers Inspired by: Read, Acta Cryst. (1999). D55, 1759-1764 Acentric reflections: None Centric reflections: None Ice ring related problems The following statistics were obtained from ice-ring insensitive resolution ranges mean bin z_score : 1.24 ( rms deviation : 1.16 ) mean bin completeness : 0.98 ( rms deviation : 0.02 ) The following table shows the z-scores and completeness in ice-ring sensitive areas. Large z-scores and high completeness in these resolution ranges might be a reason to re-assess your data processsing if ice rings were present. ------------------------------------------------ | d_spacing | z_score | compl. | Rel. Ice int. | ------------------------------------------------ | 3.897 | 2.06 | 0.98 | 1.000 | | 3.669 | 1.04 | 0.98 | 0.750 | | 3.441 | 3.08 | 0.98 | 0.530 | | 2.671 | 0.91 | 0.98 | 0.170 | | 2.249 | 0.00 | 0.99 | 0.390 | ------------------------------------------------ Abnormalities in mean intensity or completeness at resolution ranges with a relative ice ring intensity lower than 0.10 will be ignored. No ice ring related problems detected. If ice rings were present, the data does not look worse at ice ring related d_spacings as compared to the rest of the data set. Basic analyses completed ##----------------------------------------------------## ## Twinning Analyses ## ##----------------------------------------------------## Using data between 10.00 to 2.68 Angstrom. Determining possible twin laws. The following twin laws have been found: ------------------------------------------------------------------------------- | Type | Axis | R metric (%) | delta (le Page) | delta (Lebedev) | Twin law | ------------------------------------------------------------------------------- | PM | 2-fold | 0.069 | 0.054 | 0.001 | l,-k,h | ------------------------------------------------------------------------------- M: Merohedral twin law PM: Pseudomerohedral twin law 0 merohedral twin operators found 1 pseudo-merohedral twin operators found In total, 1 twin operator were found The presence of twin laws indicates the following: The symmetry of the lattice (unit cell) is higher (has more elements) than the point group of the assigned space group. There are four likely scenarios associated with the presence of twin laws: i. The assigned space group is incorrect (too low). ii. The assigned space group is correct and the data *is not* twinned. iii. The assigned space group is correct and the data *is* twinned. iv. The assigned space group is not correct (too low) and at the same time, the data *is* twinned. Xtriage tries to distinguish between these cases by inspecting the intensity statistics. It never hurts to carefully inspect statistics yourself and make sure that the automated interpretation is correct. Details of automated twin law derivation ---------------------------------------- Below, the results of the coset decomposition are given. Each coset represents a single twin law, and all symmetry equivalent twin laws are given. For each coset, the operator in (x,y,z) and (h,k,l) notation are given. The direction of the axis (in fractional coordinates), the type and possible offsets are given as well. Furthermore, the result of combining a certain coset with the input space group is listed. This table can be usefull when comparing twin laws generated by xtriage with those listed in lookup tables In the table subgroup H denotes the *presumed intensity symmetry*. Group G is the symmetry of the lattice. Left cosets of : subgroup H: P 1 2 1 and group G: C 2 2 2 (-x+y,z,x+y) Coset number : 0 (all operators from H) x,y,z h,k,l Rotation: 1 ; direction: (0, 0, 0) ; screw/glide: (0,0,0) -x,y,-z -h,k,-l Rotation: 2 ; direction: (0, 1, 0) ; screw/glide: (0,0,0) Coset number : 1 (H+coset[1] = C 2 2 2 (-x+y,z,x+y)) z,-y,x l,-k,h Rotation: 2 ; direction: (1, 0, 1) ; screw/glide: (0,0,0) -z,-y,-x -l,-k,-h Rotation: 2 ; direction: (-1, 0, 1) ; screw/glide: (0,0,0) Note that if group H is centered (C,P,I,F), elements corresponding to centering operators are omitted. (This is because internally the calculations are done with the symmetry of the reduced cell) Splitting data in centrics and acentrics Number of centrics : 1114 Number of acentrics : 59115 Patterson analyses ------------------ Largest Patterson peak with length larger than 15 Angstrom Frac. coord. : 0.127 0.000 -0.127 Distance to origin : 18.144 Height (origin=100) : 3.857 p_value(height) : 9.985e-01 The reported p_value has the following meaning: The probability that a peak of the specified height or larger is found in a Patterson function of a macro molecule that does not have any translational pseudo symmetry is equal to 9.985e-01. p_values smaller than 0.05 might indicate weak translational pseudo symmetry, or the self vector of a large anomalous scatterer such as Hg, whereas values smaller than 1e-3 are a very strong indication for the presence of translational pseudo symmetry. Systematic absences ------------------- The following table gives information about systematic absences. For each operator, the reflections are split in three classes: Absent : Reflections that are absent for this operator. Non Absent: Reflection of the same type (i.e. (0,0,l)) as above, but they should be present. Complement: All other reflections. For each class, the is reported, as well as the number of 'violations'. A 'violation' is designated as a reflection for which a I/sigI criterion is not met. The criteria are Absent violation : I/sigI > 3.0 Non Absent violation : I/sigI < 3.0 Complement violation : I/sigI < 3.0 Operators with low associated violations for *both* absent and non absent reflections, are likely to be true screw axis or glide planes. Both the number of violations and their percentages are given. The number of violations within the 'complement' class, can be used as a comparison for the number of violations in the non-absent class. -------------------------------------------------------------------------------------------------------------------------------------------- | Operator | absent under operator | | not absent under operator | | all other reflections | | | | | (violations) | n absent | (violations) | n not absent | (violations) | n compl | Score | -------------------------------------------------------------------------------------------------------------------------------------------- | 2_0 (b) | 0.00 (0, 0.0%) | 0 | 15.68 (0, 0.0%) | 21 | 17.74 (1483, 2.5%) | 60208 | 1.69e+00 | | 2_1 (b) | 0.00 (0, 0.0%) | 0 | 15.68 (0, 0.0%) | 21 | 17.74 (1483, 2.5%) | 60208 | 1.69e+00 | -------------------------------------------------------------------------------------------------------------------------------------------- Analyses of the absences table indicates a number of likely space group candidates, which are listed below. For each space group, the number of absent violations are listed under the '+++' column. The number of present violations (weak reflections) are listed under '---'. The last column is a likelihood based score for the particular space group. Note that enantiomorphic spacegroups will have equal scores. Also, if absences were removed while processing the data, they will be regarded as missing information, rather then as enforcing that absence in the space group choices. ----------------------------------------------------------------------------------- | space group | n absent | _absent | _absent | +++ | --- | score | ----------------------------------------------------------------------------------- | P 1 2 1 | 0 | 0.00 | 0.00 | 0 | 0 | 0.000e+00 | | P 1 21 1 | 0 | 0.00 | 0.00 | 0 | 0 | 0.000e+00 | ----------------------------------------------------------------------------------- Wilson ratio and moments Acentric reflections /^2 :1.685 (untwinned: 2.000; perfect twin 1.500) ^2/ :0.852 (untwinned: 0.785; perfect twin 0.885) <|E^2 - 1|> :0.614 (untwinned: 0.736; perfect twin 0.541) Centric reflections /^2 :2.577 (untwinned: 3.000; perfect twin 2.000) ^2/ :0.722 (untwinned: 0.637; perfect twin 0.785) <|E^2 - 1|> :0.850 (untwinned: 0.968; perfect twin 0.736) NZ test (0<=z<1) to detect twinning and possible translational NCS ----------------------------------------------- | Z | Nac_obs | Nac_theo | Nc_obs | Nc_theo | ----------------------------------------------- | 0.0 | 0.000 | 0.000 | 0.000 | 0.000 | | 0.1 | 0.033 | 0.095 | 0.146 | 0.248 | | 0.2 | 0.095 | 0.181 | 0.249 | 0.345 | | 0.3 | 0.167 | 0.259 | 0.315 | 0.419 | | 0.4 | 0.240 | 0.330 | 0.396 | 0.474 | | 0.5 | 0.313 | 0.394 | 0.457 | 0.520 | | 0.6 | 0.381 | 0.451 | 0.507 | 0.561 | | 0.7 | 0.446 | 0.503 | 0.550 | 0.597 | | 0.8 | 0.506 | 0.551 | 0.583 | 0.629 | | 0.9 | 0.561 | 0.593 | 0.622 | 0.657 | | 1.0 | 0.611 | 0.632 | 0.664 | 0.683 | ----------------------------------------------- | Maximum deviation acentric : 0.092 | | Maximum deviation centric : 0.104 | | | | _acentric : -0.058 | | _centric : -0.059 | ----------------------------------------------- L test for acentric data using difference vectors (dh,dk,dl) of the form: (2hp,2kp,2lp) where hp, kp, and lp are random signed integers such that 2 <= |dh| + |dk| + |dl| <= 8 Mean |L| :0.409 (untwinned: 0.500; perfect twin: 0.375) Mean L^2 :0.235 (untwinned: 0.333; perfect twin: 0.200) The distribution of |L| values indicates a twin fraction of 0.00. Note that this estimate is not as reliable as obtained via a Britton plot or H-test if twin laws are available. --------------------------------------------- Analysing possible twin law : l,-k,h --------------------------------------------- Results of the H-test on acentric data: (Only 50.0% of the strongest twin pairs were used) mean |H| : 0.185 (0.50: untwinned; 0.0: 50% twinned) mean H^2 : 0.063 (0.33: untwinned; 0.0: 50% twinned) Estimation of twin fraction via mean |H|: 0.315 Estimation of twin fraction via cum. dist. of H: 0.318 Britton analyses Extrapolation performed on 0.38 < alpha < 0.495 Estimated twin fraction: 0.321 Correlation: 0.9954 R vs R statistic: R_abs_twin = <|I1-I2|>/<|I1+I2|> Lebedev, Vagin, Murshudov. Acta Cryst. (2006). D62, 83-95 R_abs_twin observed data : 0.196 R_abs_twin calculated data : 0.416 R_sq_twin = <(I1-I2)^2>/<(I1+I2)^2> R_sq_twin observed data : 0.054 R_sq_twin calculated data : 0.191 Perfoming correlation analyses The supplied calculated data are normalized and artificially twinned Subsequently a correlation with the observed data is computed. Results: Correlation : 0.882179755975 Estimated twin fraction : 0.28 Maximum Likelihood twin fraction determination Zwart, Read, Grosse-Kunstleve & Adams, to be published. The estimated twin fraction is equal to 0.097 Exploring higher metric symmetry The point group of data as dictated by the space group is P 1 2 1 The point group in the niggli setting is P 1 1 2 The point group of the lattice is C 2 2 2 (x-y,x+y,z) A summary of R values for various possible point groups follow. ------------------------------------------------------------------------------------------------------ | Point group | mean R_used | max R_used | mean R_unused | min R_unused | BIC | choice | ------------------------------------------------------------------------------------------------------ | P 1 1 2 | None | None | 0.196 | 0.196 | 3.314e+05 | <--- | | C 2 2 2 (x-y,x+y,z) | 0.196 | 0.196 | None | None | 5.632e+05 | | ------------------------------------------------------------------------------------------------------ R_used: mean and maximum R value for symmetry operators *used* in this point group R_unused: mean and minimum R value for symmetry operators *not used* in this point group An automated point group suggestion is made on the basis of the BIC (Bayesian information criterion). The likely point group of the data is: P 1 1 2 Possible space groups in this point group are: Unit cell: (88.66, 151.07, 88.74, 90, 107.01, 90) Space group: P 1 21 1 (No. 4) Note that this analysis does not take into account the effects of twinning. If the data are (almost) perfectly twinned, the symmetry will appear to be higher than it actually is. ----------------------------------------------------------------------------------------------------------------------------- Merging in *highest possible* point group C 2 2 2 (x-y,x+y,z). ***** THIS MIGHT NOT BE THE BEST POINT GROUP SYMMETRY ***** ----------------------------------------------------------------------------------------------------------------------------- R-linear = sum(abs(data - mean(data))) / sum(abs(data)) R-square = sum((data - mean(data))**2) / sum(data**2) In these sums single measurements are excluded. Redundancy Mean Mean Min Max Mean R-linear R-square unused: - 9.9938 bin 1: 9.9938 - 5.4746 1 2 1.869 0.1903 0.0704 bin 2: 5.4746 - 4.4712 1 2 1.892 0.1999 0.0768 bin 3: 4.4712 - 3.9456 1 2 1.892 0.2021 0.0783 bin 4: 3.9456 - 3.6034 1 2 1.907 0.2071 0.0803 bin 5: 3.6034 - 3.3556 1 2 1.906 0.2173 0.0862 bin 6: 3.3556 - 3.1644 1 2 1.917 0.2259 0.0904 bin 7: 3.1644 - 3.0105 1 2 1.921 0.2423 0.0975 bin 8: 3.0105 - 2.8827 1 2 1.925 0.2515 0.1007 bin 9: 2.8827 - 2.7742 1 2 1.933 0.2576 0.1014 bin 10: 2.7742 - 2.6804 1 2 1.917 0.2669 0.1045 unused: 2.6804 - Suggesting various space group choices on the basis of systematic absence analyses Analyses of the absences table indicates a number of likely space group candidates, which are listed below. For each space group, the number of absent violations are listed under the '+++' column. The number of present violations (weak reflections) are listed under '---'. The last column is a likelihood based score for the particular space group. Note that enantiomorphic spacegroups will have equal scores. Also, if absences were removed while processing the data, they will be regarded as missing information, rather then as enforcing that absence in the space group choices. ----------------------------------------------------------------------------------- | space group | n absent | _absent | _absent | +++ | --- | score | ----------------------------------------------------------------------------------- | C 2 2 21 | 0 | 0.00 | 0.00 | 0 | 0 | 0.000e+00 | | C 2 2 2 | 0 | 0.00 | 0.00 | 0 | 0 | 0.000e+00 | ----------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Twinning and intensity statistics summary (acentric data): Statistics independent of twin laws /^2 : 1.685 ^2/ : 0.852 <|E^2-1|> : 0.614 <|L|>, : 0.409, 0.235 Multivariate Z score L-test: 7.632 The multivariate Z score is a quality measure of the given spread in intensities. Good to reasonable data are expected to have a Z score lower than 3.5. Large values can indicate twinning, but small values do not necessarily exclude it. Statistics depending on twin laws ----------------------------------------------------------------------------------- | Operator | type | R_abs obs. | R_abs calc. | Britton alpha | H alpha | ML alpha | ----------------------------------------------------------------------------------- | l,-k,h | PM | 0.196 | 0.416 | 0.321 | 0.318 | 0.097 | ----------------------------------------------------------------------------------- Patterson analyses - Largest peak height : 3.857 (corresponding p value : 0.99852) The largest off-origin peak in the Patterson function is 3.86% of the height of the origin peak. No significant pseudotranslation is detected. The results of the L-test indicate that the intensity statistics are significantly different than is expected from good to reasonable, untwinned data. As there are twin laws possible given the crystal symmetry, twinning could be the reason for the departure of the intensity statistics from normality. It might be worthwhile carrying out refinement with a twin specific target function. -------------------------------------------------------------------------------