mmtbx.prepare_pdb_deposition

Description

This tool is primarily for adding sequence information to the mmCIF output from phenix.refine to make the file suitable for deposition into the Protein Data Bank. The minimum inputs for this use case are the model from phenix.refine and a sequence file.

If you are starting with a model in PDB format, we recommended that you run phenix.refine on your model and data to generate the necessary fields related to the data, then run this tool to add the sequence information. If your model contains ligands, additional restraints files for the ligands may be necessary.

The input sequence should just be the one letter canonical sequence. For non-standard amino acids or residues, this tool will replace the one letter code with the three letter code surrounded by the parentheses. For example, if there is a selenomethionine, your input sequence should represent that residue as "M". This tool will replace the "M" with "(MSE)" in the correct field in the output mmCIF file.

However, this feature is dependent on curated monomer libraries, so if there is an ambiguity in the parent residue (e.g. methionine is the parent residue for selenomethionine), the conversion will not be automatic. For this situation, you can manually specify which residues to convert from the one letter code to the three letter code with the "custom_residues" parameter. You just need to provide a space-separated sequence of residues. Two examples where this is necessary are SUI in PDB code 2jjj and OTY in PDB code 4jye.

Command line usage examples

% mmtbx.prepare_pdb_deposition model.cif sequence.fa

% mmtbx.prepare_pdb_deposition 2jjj.cif sequence.fa custom_residues="SUI OTY"

% mmtbx.prepare_pdb_deposition 4jye.cif sequence.fa custom_residues=OTY

GUI

This tool is available in the Phenix GUI in the "PDB deposition" section.

List of all available keywords

custom_residues = None Space-separated list of three letter codes to be included in the entity_poly.pdbx_one_letter_code mmCIF loop (e.g. custom_residues='SUI PYL')
keep_original_loops = True Preserves mmCIF data from the input model file (if available) that is not overwritten by other input
align_columns = False When set to True, the columns are aligned. This will take longer because the column widths need to be deteremined before outputting.
job_title = None Job title in PHENIX GUI, not used on command line
mmtbx
- validation
  - sequence
    - sequence_alignment
      - similarity_matrix = blosum50 dayhoff *identity
      - min_allowable_identity = 0.5
pdb_interpretation
- sort_atoms = True
- superpose_ideal_ligand = *None all SF4 F3S DVT
- flip_symmetric_amino_acids = False
- disable_uc_volume_vs_n_atoms_check = False
- allow_polymer_cross_special_position = False
- correct_hydrogens = True
- c_beta_restraints = True
- use_neutron_distances = False Use neutron X-H distances (which are longer than X-ray ones)
- disulfide_bond_exclusions_selection_string = None
- exclusion_distance_cutoff = 3 If SG of CYS forming SS bond is closer than this distance to an atom that it may coordinate then this SG is excluded from SS bond.
- link_distance_cutoff = 3 Length of link between the linked residues
- disulfide_distance_cutoff = 3
- add_angle_and_dihedral_restraints_for_disulfides = True
- dihedral_function_type = *determined_by_sign_of_periodicity all_sinusoidal all_harmonic
- chir_volume_esd = 0.2
- max_reasonable_bond_distance = 50.0
- nonbonded_distance_cutoff = None
- default_vdw_distance = 1
- min_vdw_distance = 1
- nonbonded_buffer = 1 **EXPERIMENTAL, developers only**
- nonbonded_weight = None Weighting of nonbonded restraints term. By default, this will be set to 16 if explicit hydrogens are used (this was the default in earlier versions of Phenix), or 100 if hydrogens are missing.
- const_shrink_donor_acceptor = 0.6 **EXPERIMENTAL, developers only**
- vdw_1_4_factor = 0.8
- min_distance_sym_equiv = 0.5
- custom_nonbonded_symmetry_exclusions = None
- translate_cns_dna_rna_residue_names = None
- proceed_with_excessive_length_bonds = False
- restraints_library
  - cdl = True Use Conformation Dependent Library (CDL) for geometry restraints
  - mcl = True Use Metal Coordination Library (MCL) for tetrahedral Zn++ and iron-sulfur clusters SF4, FES, F3S, ...
  - cis_pro_eh99 = False
  - omega_cdl = False Use Omega Conformation Dependent Library (omega-CDL) for geometry restraints
  - cdl_svl = False
  - rdl = False
  - hpdl = False
- secondary_structure
  - ss_by_chain = True Find secondary structure only within individual chains. Alternative is to allow H-bonds between chains. Can be much slower with ss_by_chain=False. If your model is complete use ss_by_chain=True. If your model is many fragments, use ss_by_chain=False.
  - from_ca_conservative = False various parameters changed to make from_ca method more conservative, hopefully to closer resemble ksdssp.
  - max_rmsd = 1 Maximum rmsd to consider two chains with identical sequences as the same for ss identification
  - use_representative_chains = True Use a representative of all chains with the same sequence. Alternative is to examine each chain individually. Can be much slower with use_representative_of_chain=False if there are many symmetry copies. Ignored unless ss_by_chain is True.
  - max_representative_chains = 100 Maximum number of representative chains
  - enabled = False Turn on secondary structure restraints (main switch)
  - protein
    - enabled = True Turn on secondary structure restraints for protein
    - search_method = *ksdssp mmtbx_dssp from_ca cablam Particular method to search protein secondary structure.
    - distance_ideal_n_o = 2.9 Target length for N-O hydrogen bond
    - distance_cut_n_o = 3.5 Hydrogen bond with length exceeding this value will not be established
    - remove_outliers = True If true, h-bonds exceeding distance_cut_n_o length will not be established
    - restrain_hbond_angles = True
    - helix
      - serial_number = None
      - helix_identifier = None
      - enabled = True Restrain this particular helix
      - selection = None
      - helix_type = *alpha pi 3_10 unknown Type of helix, defaults to alpha. Only alpha, pi, and 3_10 helices are used for hydrogen-bond restraints.
      - sigma = 0.05
      - slack = 0
      - top_out = False
      - angle_sigma_scale = 1 Multiply sigmas for h-bond angles by this value. Original sigmas range from 5 to 10.
      - angle_sigma_set = None Use this parameter to set sigmas for h-bond angles to a particular value
      - hbond
        donor = None
        acceptor = None
    - sheet
      - enabled = True Restrain this particular sheet
      - first_strand = None
      - sheet_id = None
      - sigma = 0.05
      - slack = 0
      - top_out = False
      - angle_sigma_scale = 1 Multiply sigmas for h-bond angles by this value. Original sigmas range from 5 to 10.
      - angle_sigma_set = None Use this parameter to set sigmas for h-bond angles to a particular value
      - strand
        selection = None
        sense = parallel antiparallel *unknown
        bond_start_current = None
        bond_start_previous = None
      - hbond
        donor = None
        acceptor = None
  - nucleic_acid
    - enabled = True Turn on secondary structure restraints for nucleic acids
    - hbond_distance_cutoff = 3.4 Hydrogen bonds with length exceeding this limit will not be established
    - angle_between_bond_and_nucleobase_cutoff = 35.0 If angle between supposed hydrogen bond and basepair plane (defined by C4, C5, C6 atoms) is less than this value (in degrees), the bond will not be established.
    - scale_bonds_sigma = 1. All sigmas for h-bond length will be multiplied by this number. The smaller number is tighter restraints.
    - base_pair
      - enabled = True Restraint this particular base-pair
      - base1 = None Selection string selecting at least one atom in the desired residue
      - base2 = None Selection string selecting at least one atom in the desired residue
      - saenger_class = 0 Type of base-pairing
      - restrain_planarity = False Apply planarity restraint to this base-pair
      - planarity_sigma = 0.176
      - restrain_hbonds = True Restrain hydrogen bonds length for this base-pair
      - restrain_hb_angles = True Restrain angles around hydrogen bonds for this base-pair
      - restrain_parallelity = True Apply parallelity restraint to this base-pair
      - parallelity_target = 0
      - parallelity_sigma = 0.0335
    - stacking_pair
      - enabled = True Restraint this particular base-pair
      - base1 = None Selection string selecting at least one atom in the desired residue
      - base2 = None Selection string selecting at least one atom in the desired residue
      - angle = 0
      - sigma = 0.027
- reference_coordinate_restraintsRestrains coordinates in Cartesian space to stay near their starting positions. This is intended for use in generating simulated annealing omit maps, to prevent refined atoms from collapsing in on the region missing atoms. For conserving geometry quality at low resolution, the more flexible reference model restraints should be used instead.
  - enabled = False
  - exclude_outliers = True
  - selection = all
  - sigma = 0.2
  - limit = 1.0
  - top_out = False
- automatic_linking
  - link_all = False If True, bond restraints will be generated for any appropriate ligand-protein or ligand-nucleic acid covalent bonds. This includes sugars, amino acid modifications, and other prosthetic groups.
  - link_none = False
  - link_metals = False
  - link_residues = False
  - link_amino_acid_rna_dna = False
  - link_carbohydrates = True
  - link_ligands = True
  - link_small_molecules = False
  - metal_coordination_cutoff = 3.5
  - amino_acid_bond_cutoff = 1.9
  - inter_residue_bond_cutoff = 2.2
  - buffer_for_second_row_elements = 0.5
  - carbohydrate_bond_cutoff = 1.99
  - ligand_bond_cutoff = 1.99
  - small_molecule_bond_cutoff = 1.98
- include_in_automatic_linking
  - selection_1 = None
  - selection_2 = None
  - bond_cutoff = 4.5
- exclude_from_automatic_linking
  - selection_1 = None
  - selection_2 = None
- apply_cis_trans_specification
  - cis_trans_mod = cis *trans
  - residue_selection = None Residues containing C-alpha atom of omega dihedral
- apply_cif_restraints
  - restraints_file_name = None
  - residue_selection = None
- apply_cif_modification
  - data_mod = None
  - residue_selection = None
- apply_cif_link
  - data_link = None
  - residue_selection_1 = None
  - residue_selection_2 = None
- peptide_link
  - ramachandran_restraints = False Restrains peptide backbone to fall within allowed regions of Ramachandran plot. Although it does not eliminate outliers, it can significantly improve the percent favored and percent outliers at low resolution. Probably not useful (and maybe even harmful) at resolutions much higher than 3.5A.
  - cis_threshold = 45
  - apply_all_trans = False
  - discard_omega = False
  - discard_psi_phi = True
  - apply_peptide_plane = False
  - omega_esd_override_value = None
  - rama_weight = 1.0
  - scale_allowed = 1.0
  - rama_potential = *oldfield emsley
  - rama_selection = None Selection of part of the model for which Ramachandran restraints will be set up.
  - restrain_rama_outliers = True Apply restraints to Ramachandran outliers
  - restrain_rama_allowed = True Apply restraints to residues in allowed region on Ramachandran plot
  - restrain_allowed_outliers_with_emsley = False In case of restrain_rama_outliers=True and/or restrain_rama_allowed=True still restrain these residues with emsley. Make sense only in case of using oldfield potential.
  - oldfield
    - esd = 10.0
    - weight_scale = 1.0
    - dist_weight_max = 10.0
    - weight = None
    - plot_cutoff = 0.027
- rna_sugar_pucker_analysis
  - bond_min_distance = 1.2
  - bond_max_distance = 1.8
  - epsilon_range_min = 155.0
  - epsilon_range_max = 310.0
  - delta_range_2p_min = 129.0
  - delta_range_2p_max = 162.0
  - delta_range_3p_min = 65.0
  - delta_range_3p_max = 104.0
  - p_distance_c1p_outbound_line_2p_max = 2.9
  - o3p_distance_c1p_outbound_line_2p_max = 2.4
  - bond_detection_distance_tolerance = 0.5
- show_histogram_slots
  - bond_lengths = 5
  - nonbonded_interaction_distances = 5
  - bond_angle_deviations_from_ideal = 5
  - dihedral_angle_deviations_from_ideal = 5
  - chiral_volume_deviations_from_ideal = 5
- show_max_items
  - not_linked = 5
  - bond_restraints_sorted_by_residual = 5
  - nonbonded_interactions_sorted_by_model_distance = 5
  - bond_angle_restraints_sorted_by_residual = 5
  - dihedral_angle_restraints_sorted_by_residual = 3
  - chirality_restraints_sorted_by_residual = 3
  - planarity_restraints_sorted_by_residual = 3
  - residues_with_excluded_nonbonded_symmetry_interactions = 12
  - fatal_problem_max_lines = 10
- ncs_groupThe definition of one NCS group. Note, that almost always in refinement programs they will be checked and filtered if needed.
  - reference = None Residue selection string for the complete master NCS copy
  - selection = None Residue selection string for each NCS copy location in ASU
- ncs_searchSet of parameters for NCS search procedure. Some of them also used for filtering user-supplied ncs_group.
  - enabled = False Enable NCS restraints or constraints in refinement (in some cases may be switched on inside refinement program).
  - exclude_selection = "element H or element D or water" Atoms selected by this selection will be excluded from the model before any NCS search and/or filtering procedures. There is no way atoms defined by this selection will be in NCS.
  - chain_similarity_threshold = 0.85 Threshold for sequence similarity between matching chains. A smaller value may cause more chains to be grouped together and can lower the number of common residues
  - chain_max_rmsd = 2. limit of rms difference between chains to be considered as copies
  - residue_match_radius = 4.0 Maximum allowed distance difference between pairs of matching atoms of two residues
  - try_shortcuts = False Try very quick check to speed up the search when chains are identical. If failed, regular search will be performed automatically.
  - minimum_number_of_atoms_in_copy = 3 Do not create ncs groups where master and copies would contain less than specified amount of atoms
- clash_guard
  - nonbonded_distance_threshold = 0.5
  - max_number_of_distances_below_threshold = 100
  - max_fraction_of_distances_below_threshold = 0.1
geometry_restraints
- edits
  - excessive_bond_distance_limit = 10
  - bond
    - action = *add delete change
    - atom_selection_1 = None
    - atom_selection_2 = None
    - symmetry_operation = None The bond is between atom_1 and symmetry_operation * atom_2, with atom_1 and atom_2 given in fractional coordinates. Example: symmetry_operation = -x-1,-y,z
    - distance_ideal = None
    - sigma = None
    - slack = None
    - limit = -1.0
    - top_out = False
  - angle
    - action = *add delete change
    - atom_selection_1 = None
    - atom_selection_2 = None
    - atom_selection_3 = None
    - angle_ideal = None
    - sigma = None
  - dihedral
    - action = *add delete change
    - atom_selection_1 = None
    - atom_selection_2 = None
    - atom_selection_3 = None
    - atom_selection_4 = None
    - angle_ideal = None
    - alt_angle_ideals = None
    - sigma = None
    - periodicity = 1
  - planarity
    - action = *add delete change
    - atom_selection = None
    - sigma = None
  - parallelity
    - action = *add delete change
    - atom_selection_1 = None
    - atom_selection_2 = None
    - sigma = 0.027
    - target_angle_deg = 0
- remove
  - angles = None
  - dihedrals = None
  - chiralities = None
  - planarities = None
  - parallelities = None
reference_model
- enabled = False Restrains the dihedral angles to a high-resolution reference structure to reduce overfitting at low resolution. You will need to specify a reference PDB file (in the input list in the main window) to use this option.
- file = None
- use_starting_model_as_reference = False
- sigma = 1.0
- limit = 15.0
- hydrogens = False
- main_chain = True
- side_chain = True
- fix_outliers = True
- strict_rotamer_matching = False
- auto_shutoff_for_ncs = False
- secondary_structure_only = False
- reference_group
  - reference = None
  - selection = None
  - file_name = None this is to used internally to disambiguate cases where multiple reference models contain the same chain ID. This normally does not need to be set by the user
- search_optionsSet of parameters for NCS search procedure. Some of them also used for filtering user-supplied ncs_group.
  - exclude_selection = "element H or element D or water" Atoms selected by this selection will be excluded from the model before any NCS search and/or filtering procedures. There is no way atoms defined by this selection will be in NCS.
  - chain_similarity_threshold = 0.85 Threshold for sequence similarity between matching chains. A smaller value may cause more chains to be grouped together and can lower the number of common residues
  - chain_max_rmsd = 100. limit of rms difference between chains to be considered as copies
  - residue_match_radius = 1000 Maximum allowed distance difference between pairs of matching atoms of two residues
  - try_shortcuts = False Try very quick check to speed up the search when chains are identical. If failed, regular search will be performed automatically.
  - minimum_number_of_atoms_in_copy = 3 Do not create ncs groups where master and copies would contain less than specified amount of atoms
amberParameters for using Amber in refinement.
- use_amber = False Use Amber for all the gradients in refinement
- topology_file_name = None A topology file needed by Amber. Can be generated using phenix.AmberPrep.
- coordinate_file_name = None A coordinate file needed by Amber. Can be generated using phenix.AmberPrep.
- order_file_name = None A file that maps amber atom numbers to phenix atom numbers.
- wxc_factor = .2
- restraint_wt = 0.
- restraintmask = ''
- reference_file_name = ''
- bellymask = '' If given, turn on belly in sander
- qmmask = '' If given, turn on QM/MM with the given mask
- qmcharge = 0 Charge of the QM/MM region
- netcdf_trajectory_file_name = '' If given, turn on writing netcdf trajectory
- print_amber_energies = False Print details of Amber energies during refinement
output
- suffix = '.deposit' Suffix string added to automatically generated output filenames
guiGUI-specific parameter required for output directory
- output_dir = None