mmtbx.prepare_pdb_deposition

Description

This tool is primarily for adding sequence information to the mmCIF output from phenix.refine to make the file suitable for deposition into the Protein Data Bank. The minimum inputs for this use case are the model from phenix.refine and a sequence file.

If you are starting with a model in PDB format, we recommended that you run phenix.refine on your model and data to generate the necessary fields related to the data, then run this tool to add the sequence information. If your model contains ligands, additional restraints files for the ligands may be necessary.

The input sequence should just be the one letter canonical sequence. For non-standard amino acids or residues, this tool will replace the one letter code with the three letter code surrounded by the parentheses. For example, if there is a selenomethionine, your input sequence should represent that residue as "M". This tool will replace the "M" with "(MSE)" in the correct field in the output mmCIF file.

However, this feature is dependent on curated monomer libraries, so if there is an ambiguity in the parent residue (e.g. methionine is the parent residue for selenomethionine), the conversion will not be automatic. For this situation, you can manually specify which residues to convert from the one letter code to the three letter code with the "custom_residues" parameter. You just need to provide a space-separated sequence of residues. Two examples where this is necessary are SUI in PDB code 2jjj and OTY in PDB code 4jye.

Command line usage examples

% mmtbx.prepare_pdb_deposition model.cif sequence.fa

% mmtbx.prepare_pdb_deposition 2jjj.cif sequence.fa custom_residues="SUI OTY"

% mmtbx.prepare_pdb_deposition 4jye.cif sequence.fa custom_residues=OTY

GUI

This tool is available in the Phenix GUI in the "PDB deposition" section.

List of all available keywords

custom_residues = None Space-separated list of three letter codes to be included in the entity_poly.pdbx_one_letter_code mmCIF loop (e.g. custom_residues='SUI PYL')
keep_original_loops = True Preserves mmCIF data from the input model file (if available) that is not overwritten by other input
align_columns = False When set to True, the columns are aligned. This will take longer because the column widths need to be deteremined before outputting.
job_title = None Job title in PHENIX GUI, not used on command line
mmtbx
- validation
  - sequence
    - sequence_alignment
      - similarity_matrix = blosum50 dayhoff *identity
      - min_allowable_identity = 0.5
pdb_interpretation
- sort_atoms = True
- use_ncs_to_build_restraints = True
- show_restraints_histograms = True
- flip_symmetric_amino_acids = True
- superpose_ideal_ligand = *None all SF4 F3S DVT
- disable_uc_volume_vs_n_atoms_check = False
- allow_polymer_cross_special_position = False
- correct_hydrogens = True
- c_beta_restraints = True
- use_neutron_distances = False Use neutron X-H distances (which are longer than X-ray ones)
- disulfide_bond_exclusions_selection_string = None
- exclusion_distance_cutoff = 3 If SG of CYS forming SS bond is closer than this distance to an atom that it may coordinate then this SG is excluded from SS bond.
- link_distance_cutoff = 3 Length of link between the linked residues
- disulfide_distance_cutoff = 3
- add_angle_and_dihedral_restraints_for_disulfides = True
- dihedral_function_type = *determined_by_sign_of_periodicity all_sinusoidal all_harmonic
- chir_volume_esd = 0.2
- max_reasonable_bond_distance = 50.0
- nonbonded_distance_cutoff = None
- default_vdw_distance = 1
- min_vdw_distance = 1
- nonbonded_buffer = 1 **EXPERIMENTAL, developers only**
- nonbonded_weight = None Weighting of nonbonded restraints term. By default, this will be set to 16 if explicit hydrogens are used (this was the default in earlier versions of Phenix), or 100 if hydrogens are missing.
- const_shrink_donor_acceptor = 0 **EXPERIMENTAL, developers only**
- vdw_1_4_factor = 0.8
- min_distance_sym_equiv = 0.5
- custom_nonbonded_symmetry_exclusions = None
- translate_cns_dna_rna_residue_names = None
- proceed_with_excessive_length_bonds = False
- restraints_library
  - cdl = True Use Conformation Dependent Library (CDL) for geometry restraints
  - include_modified_amino_acid_in_cdl = False
  - mcl = True Use Metal Coordination Library (MCL) for tetrahedral Zn++ and iron-sulfur clusters SF4, FES, F3S, ...
  - cis_pro_eh99 = True
  - omega_cdl = False Use Omega Conformation Dependent Library (omega-CDL) for geometry restraints
  - cdl_svl = False
  - rdl = False
  - rdl_selection = *all TRP
  - hpdl = False
  - cdl_nucleotides
    - enable = False Use RestraintsLib for DNA and RNA for geometry restraints
    - esd = *phenix csd
    - factor = 2.0
  - user_supplied
    - path = None Root directory of user supplied restraints
    - action = *pre post Choice to look here before GeoStd, the default, or after
- secondary_structure
  - ss_by_chain = True Only applies if search_method = from_ca. Find secondary structure only within individual chains. Alternative is to allow H-bonds between chains. Can be much slower with ss_by_chain=False. If your model is complete use ss_by_chain=True. If your model is many fragments, use ss_by_chain=False.
  - from_ca_conservative = False various parameters changed to make from_ca method more conservative, hopefully to closer resemble ksdssp.
  - max_rmsd = 1 Only applies if search_method = from_ca. Maximum rmsd to consider two chains with identical sequences as the same for ss identification
  - use_representative_chains = True Only applies if search_method = from_ca. Use a representative of all chains with the same sequence. Alternative is to examine each chain individually. Can be much slower with use_representative_of_chain=False if there are many symmetry copies. Ignored unless ss_by_chain is True.
  - max_representative_chains = 100 Only applies if search_method = from_ca. Maximum number of representative chains
  - enabled = False Turn on secondary structure restraints (main switch)
  - protein
    - enabled = True Turn on secondary structure restraints for protein
    - search_method = *ksdssp from_ca cablam Particular method to search protein secondary structure.
    - distance_ideal_n_o = 2.9 Target length for N-O hydrogen bond
    - distance_cut_n_o = 3.5 Hydrogen bond with length exceeding this value will not be established
    - remove_outliers = True If true, h-bonds exceeding distance_cut_n_o length will not be established
    - restrain_hbond_angles = True
    - helix
      - serial_number = None
      - helix_identifier = None
      - enabled = True Restrain this particular helix
      - selection = None
      - helix_type = *alpha pi 3_10 unknown Type of helix, defaults to alpha. Only alpha, pi, and 3_10 helices are used for hydrogen-bond restraints.
      - sigma = 0.05
      - slack = 0
      - top_out = False
      - angle_sigma_scale = 1 Multiply sigmas for h-bond angles by this value. Original sigmas range from 5 to 10.
      - angle_sigma_set = None Use this parameter to set sigmas for h-bond angles to a particular value
      - hbond
        donor = None
        acceptor = None
    - sheet
      - enabled = True Restrain this particular sheet
      - first_strand = None
      - sheet_id = None
      - sigma = 0.05
      - slack = 0
      - top_out = False
      - angle_sigma_scale = 1 Multiply sigmas for h-bond angles by this value. Original sigmas range from 5 to 10.
      - angle_sigma_set = None Use this parameter to set sigmas for h-bond angles to a particular value
      - strand
        selection = None
        sense = parallel antiparallel *unknown
        bond_start_current = None
        bond_start_previous = None
      - hbond
        donor = None
        acceptor = None
  - nucleic_acid
    - enabled = True Turn on secondary structure restraints for nucleic acids
    - hbond_distance_cutoff = 3.4 Hydrogen bonds with length exceeding this limit will not be established
    - scale_bonds_sigma = 1. All sigmas for h-bond length will be multiplied by this number. The smaller number is tighter restraints.
    - base_pair
      - enabled = True Restraint this particular base-pair
      - base1 = None Selection string selecting at least one atom in the desired residue
      - base2 = None Selection string selecting at least one atom in the desired residue
      - saenger_class = 0 Type of base-pairing
      - restrain_planarity = False Apply planarity restraint to this base-pair
      - planarity_sigma = 0.176
      - restrain_hbonds = True Restrain hydrogen bonds length for this base-pair
      - restrain_hb_angles = True Restrain angles around hydrogen bonds for this base-pair
      - restrain_parallelity = True Apply parallelity restraint to this base-pair
      - parallelity_target = 0
      - parallelity_sigma = 0.0335
    - stacking_pair
      - enabled = True Restraint this particular base-pair
      - base1 = None Selection string selecting at least one atom in the desired residue
      - base2 = None Selection string selecting at least one atom in the desired residue
      - angle = 0
      - sigma = 0.027
- reference_coordinate_restraintsRestrains coordinates in Cartesian space to stay near their starting positions. This is intended for use in generating simulated annealing omit maps, to prevent refined atoms from collapsing in on the region missing atoms. For conserving geometry quality at low resolution, the more flexible reference model restraints should be used instead.
  - enabled = False
  - exclude_outliers = True
  - selection = all
  - sigma = 0.2
  - limit = 1.0
  - top_out = False
  - reference_is_average_alt_confs = False Sets the reference coordinate to the average of two alt. confs. Ignores selection option.
- automatic_restraints
  - side_chain_parallelity = False
- automatic_linking
  - link_all = False If True, bond restraints will be generated for any appropriate ligand-protein or ligand-nucleic acid covalent bonds. This includes sugars, amino acid modifications, and other prosthetic groups.
  - link_none = False
  - link_metals = Auto
  - link_residues = True
  - link_amino_acid_rna_dna = False
  - link_carbohydrates = True
  - link_ligands = True
  - link_small_molecules = False
  - metal_coordination_cutoff = 3.0
  - amino_acid_bond_cutoff = 1.9
  - inter_residue_bond_cutoff = 2.2
  - buffer_for_second_row_elements = 0.5
  - carbohydrate_bond_cutoff = 1.99
  - ligand_bond_cutoff = 1.99
  - small_molecule_bond_cutoff = 1.98
  - exclude_hydrogens_from_bonding_decisions = False
- include_in_automatic_linking
  - selection_1 = None
  - selection_2 = None
  - bond_cutoff = 4.5
- exclude_from_automatic_linking
  - selection_1 = None
  - selection_2 = None
- apply_cis_trans_specification
  - cis_trans_mod = cis *trans
  - residue_selection = None Residues containing C-alpha atom of omega dihedral
- apply_cif_restraints
  - restraints_file_name = None
  - residue_selection = None
- apply_cif_modification
  - data_mod = None
  - residue_selection = None
- apply_cif_link
  - data_link = None
  - residue_selection_1 = None
  - residue_selection_2 = None
- peptide_link
  - ramachandran_restraints = False !!! OBSOLETED. Kept for backward compatibility only !!! Restrains peptide backbone to fall within allowed regions of Ramachandran plot. Although it does not eliminate outliers, it can significantly improve the percent favored and percent outliers at low resolution. Probably not useful (and maybe even harmful) at resolutions much higher than 3.5A.
  - cis_threshold = 45
  - apply_all_trans = False
  - discard_omega = False
  - discard_psi_phi = True
  - apply_peptide_plane = False
  - omega_esd_override_value = None
  - rama_weight = 1.0
  - scale_allowed = 1.0
  - rama_potential = *oldfield emsley
  - rama_selection = None Selection of part of the model for which Ramachandran restraints will be set up.
  - restrain_rama_outliers = True Apply restraints to Ramachandran outliers
  - restrain_rama_allowed = True Apply restraints to residues in allowed region on Ramachandran plot
  - restrain_allowed_outliers_with_emsley = False In case of restrain_rama_outliers=True and/or restrain_rama_allowed=True still restrain these residues with emsley. Make sense only in case of using oldfield potential.
  - oldfield
    - esd = 10.0
    - weight_scale = 1.0
    - dist_weight_max = 10.0
    - weight = None
    - plot_cutoff = 0.027
- ramachandran_plot_restraints
  - enabled = False
  - favored = *oldfield emsley emsley8k phi_psi_2
  - allowed = *oldfield emsley emsley8k phi_psi_2
  - outlier = *oldfield emsley emsley8k phi_psi_2
  - selection = None Selection of part of the model for which Ramachandran restraints will be set up.
  - inject_emsley8k_into_oldfield_favored = True Backdoor to disable temporary dirty hack to use both
  - oldfield
    - weight = 0. Direct weight value. If 0 the weight will be calculated as following: (w, op.esd, op.dist_weight_max, 2.0, op.weight_scale) 1 / esd^2 * max(2.0, min(current_distance_to_allowed, dist_weight_max)) * weight_scale max(2.0, current_distance_to_allowed) 1 / esd^2 * weight_scale * max(distance_to_allowed_cutoff, current_distance_to_allowed) weight_scale(=0.01) * max(distance_weight_min(=2.), min(distance_weight_max(=10.), current_distance_to_allowed))
    - weight_scale = 0.01
    - distance_weight_min = 2.0 minimum coefficient when scaling depending on how far the residue is from allowed region.
    - distance_weight_max = 10.0 maximum coefficient when scaling depending on how far the residue is from allowed region.
    - plot_cutoff = 0.027
  - emsley
    - weight = 1.0
    - scale_allowed = 1.0
  - emsley8k
    - weight_favored = 5.0
    - weight_allowed = 10.0
    - weight_outlier = 10.0
  - phi_psi_2
    - favored_strategy = *closest highest_probability random weighted_random
    - allowed_strategy = *closest highest_probability random weighted_random
    - outlier_strategy = *closest highest_probability random weighted_random
- rna_sugar_pucker_analysis
  - bond_min_distance = 1.2
  - bond_max_distance = 1.8
  - epsilon_range_min = 155.0
  - epsilon_range_max = 310.0
  - delta_range_2p_min = 129.0
  - delta_range_2p_max = 162.0
  - delta_range_3p_min = 65.0
  - delta_range_3p_max = 104.0
  - p_distance_c1p_outbound_line_2p_max = 2.9
  - o3p_distance_c1p_outbound_line_2p_max = 2.4
  - bond_detection_distance_tolerance = 0.5
  - enable = True
- show_histogram_slots
  - bond_lengths = 5
  - nonbonded_interaction_distances = 5
  - bond_angle_deviations_from_ideal = 5
  - dihedral_angle_deviations_from_ideal = 5
  - chiral_volume_deviations_from_ideal = 5
- show_max_items
  - not_linked = 5
  - bond_restraints_sorted_by_residual = 5
  - nonbonded_interactions_sorted_by_model_distance = 5
  - bond_angle_restraints_sorted_by_residual = 5
  - dihedral_angle_restraints_sorted_by_residual = 3
  - chirality_restraints_sorted_by_residual = 3
  - planarity_restraints_sorted_by_residual = 3
  - residues_with_excluded_nonbonded_symmetry_interactions = 12
  - fatal_problem_max_lines = 10
- ncs_groupThe definition of one NCS group. Note, that almost always in refinement programs they will be checked and filtered if needed.
  - reference = None Residue selection string for the complete master NCS copy
  - selection = None Residue selection string for each NCS copy location in ASU
- ncs_searchSet of parameters for NCS search procedure. Some of them also used for filtering user-supplied ncs_group.
  - enabled = False Enable NCS restraints or constraints in refinement (in some cases may be switched on inside refinement program).
  - exclude_selection = "element H or element D or water" Atoms selected by this selection will be excluded from the model before any NCS search and/or filtering procedures. There is no way atoms defined by this selection will be in NCS.
  - chain_similarity_threshold = 0.85 Threshold for sequence similarity between matching chains. A smaller value may cause more chains to be grouped together and can lower the number of common residues
  - chain_max_rmsd = 2. limit of rms difference between chains to be considered as copies
  - residue_match_radius = 4.0 Maximum allowed distance difference between pairs of matching atoms of two residues
  - try_shortcuts = False Try very quick check to speed up the search when chains are identical. If failed, regular search will be performed automatically.
  - minimum_number_of_atoms_in_copy = 3 Do not create ncs groups where master and copies would contain less than specified amount of atoms
  - validate_user_supplied_groups = True Enable validation of user-supplied ncs_group. Need to exercise a lot of caution turning this off. This option is for developers only.
- clash_guard
  - nonbonded_distance_threshold = 0.5
  - max_number_of_distances_below_threshold = 100
  - max_fraction_of_distances_below_threshold = 0.1
geometry_restraints
- edits
  - excessive_bond_distance_limit = 10
  - bond
    - action = *add delete change
    - atom_selection_1 = None
    - atom_selection_2 = None
    - symmetry_operation = None The bond is between atom_1 and symmetry_operation * atom_2, with atom_1 and atom_2 given in fractional coordinates. Example: symmetry_operation = -x-1,-y,z
    - distance_ideal = None
    - sigma = None
    - slack = None
    - limit = -1.0
    - top_out = False
  - angle
    - action = *add delete change
    - atom_selection_1 = None
    - atom_selection_2 = None
    - atom_selection_3 = None
    - angle_ideal = None
    - sigma = None
  - dihedral
    - action = *add delete change
    - atom_selection_1 = None
    - atom_selection_2 = None
    - atom_selection_3 = None
    - atom_selection_4 = None
    - angle_ideal = None
    - alt_angle_ideals = None
    - sigma = None
    - periodicity = 1
  - planarity
    - action = *add delete change
    - atom_selection = None
    - sigma = None
  - parallelity
    - action = *add delete change
    - atom_selection_1 = None
    - atom_selection_2 = None
    - sigma = 0.027
    - target_angle_deg = 0
- remove
  - angles = None
  - dihedrals = None
  - chiralities = None
  - planarities = None
  - parallelities = None
reference_model
- enabled = False Restrains the dihedral angles to a high-resolution reference structure to reduce overfitting at low resolution. You will need to specify a reference PDB file (in the input list in the main window) to use this option.
- file = None
- use_starting_model_as_reference = False
- sigma = 1.0
- limit = 15.0
- hydrogens = False Include dihedrals with hydrogen atoms
- main_chain = True Include dihedrals formed by main chain atoms
- side_chain = True Include dihedrals formed by side chain atoms
- fix_outliers = True Try to fix rotamer outliers in refined model
- strict_rotamer_matching = False Make sure that rotamers in refinement model matches those in reference model even when they are not outliers
- auto_shutoff_for_ncs = False Do not apply to parts of structure covered by NCS restraints
- secondary_structure_only = False Only apply reference model restraints to secondary structure elements (helices and sheets)
- reference_group
  - reference = None
  - selection = None
  - file_name = None this is to used internally to disambiguate cases where multiple reference models contain the same chain ID. This normally does not need to be set by the user
- search_optionsSet of parameters for NCS search procedure. Some of them also used for filtering user-supplied ncs_group.
  - exclude_selection = "element H or element D or water" Atoms selected by this selection will be excluded from the model before any NCS search and/or filtering procedures. There is no way atoms defined by this selection will be in NCS.
  - chain_similarity_threshold = 0.85 Threshold for sequence similarity between matching chains. A smaller value may cause more chains to be grouped together and can lower the number of common residues
  - chain_max_rmsd = 100. limit of rms difference between chains to be considered as copies
  - residue_match_radius = 1000 Maximum allowed distance difference between pairs of matching atoms of two residues
  - try_shortcuts = False Try very quick check to speed up the search when chains are identical. If failed, regular search will be performed automatically.
  - minimum_number_of_atoms_in_copy = 3 Do not create ncs groups where master and copies would contain less than specified amount of atoms
  - validate_user_supplied_groups = True Enable validation of user-supplied ncs_group. Need to exercise a lot of caution turning this off. This option is for developers only.
amberParameters for using Amber in refinement.
- use_amber = False Use Amber for all the gradients in refinement
- topology_file_name = None A topology file needed by Amber. Can be generated using phenix.AmberPrep.
- coordinate_file_name = None A coordinate file needed by Amber. Can be generated using phenix.AmberPrep.
- order_file_name = None A file that maps amber atom numbers to phenix atom numbers.
- wxc_factor = 1.
- restraint_wt = 0.
- restraintmask = ''
- reference_file_name = ''
- bellymask = '' If given, turn on belly in sander
- qmmask = '' If given, turn on QM/MM with the given mask
- qmcharge = 0 Charge of the QM/MM region
- netcdf_trajectory_file_name = '' If given, turn on writing netcdf trajectory
- print_amber_energies = False Print details of Amber energies during refinement
qiQM
- working_directory = None
- qm_restraints
  - selection = None selection for core of atoms to calculate new restraints via a QM geometry minimisation
  - run_in_macro_cycles = *first_only first_and_last all last_only test the steps of the refinement that the restraints generation is run
  - buffer = 3.5 distance to include entire residues into the enviroment of the core
  - ignore_lack_of_h_on_ligand = False skip check on protonation of ligand for entities such as MgF3
  - capping_groups = True
  - cleanup = all *most None
  - calculate = *in_situ_opt starting_energy final_energy starting_strain final_strain starting_bound final_bound starting_binding final_binding gradients Choose QM calculations to run
  - write_files = *restraints pdb_core pdb_buffer pdb_final_core pdb_final_buffer which ligand or cluster files to write
  - protein_optimisation_freeze = *all None main_chain main_chain_to_beta main_chain_to_delta torsions the parts of protein residues that are frozen when an amino acid is the main selection
  - remove_water = False
  - restraints_filename = Auto restraints filename is based on model name if not specified
  - ignore_x_h_distance_protein = False skip check on transfer of proton during QM optimisation
  - include_nearest_neighbours_in_optimisation = False include the side chains of protein in the QM optimisation
  - include_inter_residue_restraints = False
  - do_not_update_restraints = False For testing and maybe getting strain energy of standard restraints
  - buffer_selection = None use this instead of distance from selection
  - specific_atom_charges
    - atom_selection = None
    - charge = None
  - specific_atom_multiplicities
    - atom_selection = None
    - multiplicity = None
  - freeze_specific_atoms
    - atom_selection = None
  - package
    - program = *mopac test
    - charge = Auto
    - multiplicity = Auto
    - method = Auto
    - basis_set = Auto
    - solvent_model = None
    - nproc = 1
    - read_output_to_skip_opt_if_available = False
    - ignore_input_differences = False
    - view_output = None
- qm_gradients
  - selection = None selection for core of atoms to calculate new restraints via a QM geometry minimisation
  - run_in_macro_cycles = first_only first_and_last *all last_only test the steps of the refinement that the restraints generation is run
  - buffer = 3.5 distance to include entire residues into the enviroment of the core
  - ignore_lack_of_h_on_ligand = False skip check on protonation of ligand for entities such as MgF3
  - capping_groups = True
  - cleanup = all *most None
  - specific_atom_charges
    - atom_selection = None
    - charge = None
  - specific_atom_multiplicities
    - atom_selection = None
    - multiplicity = None
  - package
    - program = *mopac test
    - charge = Auto
    - multiplicity = Auto
    - method = Auto
    - basis_set = Auto
    - solvent_model = None
    - nproc = 1
    - read_output_to_skip_opt_if_available = False
    - ignore_input_differences = False
    - view_output = None
output
- suffix = '.deposit' Suffix string added to automatically generated output filenames
guiGUI-specific parameter required for output directory
- output_dir = None