Sorting heteroatoms

The program phenix.sort_hetatms is a utility designed to re-group the non-polymeric molecules (heteroatoms) in a model in a roughly similar manner to the format used by the Protein Data Bank. This consists of matching all heteroatoms to the nearest polymer chain, resetting the chain IDs and residue numbers, converting ATOM labels to HETATM, and optionally sorting waters by B-factor. It is primarily intended or internal use, and is also run within phenix.ligand_pipeline. The only input required is a PDB file, since the program does not deal with molecular geometry or experimental data. Any REMARK records will be removed, but this can be disabled by setting preserve_remarks=True.

Note that the PDB will, inevitably, further modify the contents of the model according to its own rules. However, the program output is significantly closer to the PDB conventions than the output of phenix.refine or similar programs.

List of all available parameters

file_name = None Input file
output_file = None
unit_cell = None
space_group = None
ignore_symmetry = False Don't take symmetry-related chains into account when determining the nearest macromolecule.
preserve_remarks = False Propagate all REMARK records to the output file.
verbose = False
remove_hetatm_ter_records = True The official PDB format only allows TER records at the end of polymer chains, whereas the CCTBX PDB-handling tools will insert TER after each chain of any type. If this parameter is True, the extra TER records will be removed.
preserve_chain_id = False The default behavior is to group heteroatoms with the nearest macromolecule chain, whose ID is inherited. This parameter disables the change of chain ID, and preserves the original chain ID.
waters_only = False Rearrange waters, but leave all other ligands alone.
sort_waters_by = none *b_iso Ordering of waters - by default it will sort them by the isotropic B-factor.
set_hetatm_record = True Convert ATOM to HETATM where appropriate.
ignore_selection = None Selection of atoms to skip. Any residue group which overlaps with this selection will be preserved with the original chain ID and numbering.
renumber = True Renumber heteroatoms once they are in new chains.
sequential_numbering = True If True, the heteroatoms will be renumbered starting from the next available residue number after the end of the associated macromolecule chain. Otherwise, numbering will start from 1.
distance_cutoff = 6.0 Cutoff for identifying nearby macromolecule chains. This should be kept relatively small for speed reasons, but it may miss waters that are far out in solvent channels.
remove_waters_outside_radius = False Remove waters more than the specified distnace cutoff to the nearest polymer chain (to avoid PDB complaints).
loose_chain_id = X Chain ID assigned to heteroatoms that can't be mapped to a nearby macromolecule chain.
job_title = None Job title in PHENIX GUI, not used on command line