Python-based Hierarchical ENvironment for Integrated Xtallography |
Documentation Home |
PHENIX FAQS
How should I cite PHENIX?If you use PHENIX please cite: PHENIX: a comprehensive Python-based system for macromolecular structure solution. P. D. Adams, P. V. Afonine, G. Bunkóczi, V. B. Chen, I. W. Davis, N. Echoo ls, J. J. Headd, L.-W. Hung, G. J. Kapral, R. W. Grosse-Kunstleve, A. J. McCoy, N. W. Moriarty, R. Oeffner, R. J. Read, D. C. Richardson, J. S. Richardson, T. C. Terwilliger and P. H. Zwart. Acta Cryst. D66, 213-221 (2010). How can I use multiple processors to run a job?Only AutoBuild, LigandFit, the structure comparison GUI, and phenix.find_tls_groups support runtime configuration of parallel processing. In most cases this is done by adding the "nproc" keyword, for instance: phenix.autobuild data.mtz model.pdb seq.dat nproc=5Equivalent controls are usually displayed in the GUI. In addition to these options, it is also possible to compile phenix.refine and Phaser with the OpenMP library, which automatically parallelizes specific instructions such as the FFT. This requires using the source installer for Phenix, and adding the argument "--openmp" to the install command. Because of threading conflicts, OpenMP is not compatible with the Phenix GUI. How can I include high-resolution data and phase extend my map?You can do this in AutoBuild with: phenix.autobuild data=data.mtz hires_file=high_res_data.mtz maps_only=TrueThere are many variations on using maps_only=True as a way to run density modification. You can also specify a model with model=mymodel.pdb and the model information will be used in density modification. If you have a model you can also specify ps_in_rebuild=True to get a prime-and-switch map. Why does mr_rosetta bomb and say "error while loading shared libraries: libdevel.so: cannot open shared object file: No such file or directory ?This may indicate that somewhere your system is defining the shared libraries that Rosetta needs, and these are for a place that is not where Rosetta expects them to be. You can try to ignore the previous definitions this way: If you are using the bash or sh shells: export PHENIX_TRUST_OTHER_ENV=1or csh (C-shell): setenv PHENIX_TRUST_OTHER_ENV 1in the script where you run mr_rosetta, or before you run it from the command line. Why does mr_rosetta or mr_model_preparation bomb and say "RuntimeError: Cannot contact EBI DbFetch service"?This could mean just what it says...but also it could mean that you are behind a firewall and there is a proxy server you need to go through. You can use the following command to specify the proxy server (replacing it with YOUR proxy server). If you are using the bash or sh shells: export HTTP_PROXY=proxyout.mydomain.edu:8080or csh (C-shell): setenv HTTP_PROXY proxyout.mydomain.edu:8080 Why does AutoBuild bomb and say "Corrupt gradient calculations"?If an atom is placed very near a special position then sometimes refinement will fail and an error message starting with "Corrupt gradient calculations" is printed out. If the starting PDB file has the atom near a special position, then the best thing to do is move it away from the special position. If AutoBuild builds a model that has this problem, then it may be easier to rerun the job, specifying "ignore_errors_in_subprocess=True" which should allow it to continue past this error (by simply ignoring that refinement step). You can also try setting correct_special_position_tolerance=0 (to turn off the check) or correct_special_position_tolerance=5 (to check over a wider range of distances from the special position; default=1). Why does AutoBuild bomb and say it cannot find a TEMP file?By default the AutoBuild Wizard splits jobs into one or more parts (determined by the parameter "nbatch") and runs them as sub-processes. These may run sequentially or in parallel, depending on the value of the parameter "nproc" . In some cases the running of sub-processes can lead to timing errors in which a file is not written fully before it is to be read by the next process. This appears more often when jobs are run on nfs-mounted disks than on a local disk. If this occurs, a solution is to set the parameter "nbatch=1" so that the jobs not be run as sub-processes. You can also specify"number_of_parallel_models=1" which will do much the same thing. Note that changing the value of "nbatch" will normally change the results of running the Wizard. (Changing the value of "nproc" does not change the results, it changes only how many jobs are run at once.) Where can I find sample data?You can find sample data in the directories located in: $PHENIX/examples. Additionally there is sample MR data in $PHENIX/phaser/tutorial. Can I easily run a Wizard with some sample data?You can run sample data with a Wizard with a simple command. To run p9-sad sample data with the AutoSol wizard, you type: phenix.run_example p9-sadThis command copies the $PHENIX/examples/p9-sad directory to your working directory and executes the commands in the file run.sh. What sample data are available to run automatically?You can see which sample data are set up to run automatically by typing: phenix.run_example --helpThis command lists all the directories in $PHENIX/examples/ that have a command file run.sh ready to use. For example: phenix.run_example --help PHENIX run_example script. Fri Jul 6 12:07:08 MDT 2007 Use: phenix.run_example example_name [--all] [--overwrite] Data will be copied from PHENIX examples into subdirectories of this working directory If --all is set then all examples will be run (takes a long time!) If --overwrite is set then the script will overwrite subdirectories List of available examples: 1J4R-ligand a2u-globulin-mr gene-5-mad p9-build p9-sad Are any of the sample datasets annotated?The PHENIX tutorials listed on the main PHENIX web page will walk you through sample datasets, telling you what to look for in the output files. For example, the Tutorial 1: Solving a structure using SAD data tutorial uses the p9-sad dataset as example. It tells you how to run this example data in AutoSol and how to interpret the results. Why does the AutoBuild Wizard say it is doing 2 rebuild cycles but I specified one?The AutoBuild wizard adds a cycle just before the rebuild cycles in which nothing happens except refinement and grouping of models from any previous build cycles. What is the difference between overall_best.pdb and cycle_best_1.pdb in the AutoBuild Wizard?The AutoBuild Wizard saves the best model (and map coefficient file, etc) for each build cycle nn as cycle_best_nn.pdb. Also the Wizard copies the current overall best model to overall_best.pdb. In this way you can always pull the overall_best.pdb file and you will have the current best model. If you wait until the end of the run you will get a summary that lists the files corresponding to the best model. These will have the same contents as the overall_best files. Can PHENIX do MRSAD?Yes, PHENIX can run MRSAD (molecular replacement, combined with SAD phases) by determining the anomalous scatterer substructure from a model-phased anomalous difference Fourier. There two simple ways to do this; both are described in the AutoSol documentation. How can I tell the AutoSol Wizard which columns to use from my mtz file?The AutoSol Wizard will normally try to guess the appropriate columns of data from an input data file. If there are several choices, then you can tell the Wizard which one to use with the command_line keywords labels, peak.labels, infl.labels etc. For example if you have two input datafiles w1 and w2 for a 2-wavelength MAD dataset and you want to select the w1(+) and w1(-) data from the first file and w2(+) and w2(-1) from the second, you could use following keywords (see "How do I know what my choices of labels are for my data file" to know what to put in these lines): input_file_list=" w1.mtz w2.mtz" group_labels_list=" 'w1(+) SIGw1(+) w1(-) SIGw1(-)' 'w2(+) SIGw2(+) w2(-) SIGw2(-)'"Note that all the labels for one set of anomalous data from one file are grouped together in each set of quotes. You could accomplish the same thing from a parameters file specifying something like: wavelength{ wavelength_name = peak data = w1.mtz labels = w1(+) SIGw1(+) w1(-) SIGw1(-) } wavelength{ wavelength_name = infl data = w2.mtz labels = w2(+) SIGw2(+) w2(-) SIGw2(-) } How do I know what my choices of labels are for my data file?You can find out what your choices of labels are by running the command: phenix.autosol show_labels=w1.mtzThis will provide a listing of the labels in w1.mtz and suggestions for their use in the PHENIX Wizards. For example the labels for w1.mtz yields: List of all anomalous datasets in w1.mtz 'w1(+) SIGw1(+) w1(-) SIGw1(-)' List of all datasets in w1.mtz 'w1(+) SIGw1(+) w1(-) SIGw1(-)' List of all individual labels in w1.mtz 'w1(+)' 'SIGw1(+)' 'w1(-)' 'SIGw1(-)' Suggested uses: labels='w1(+) SIGw1(+) w1(-) SIGw1(-)' input_labels='w1(+) SIGw1(+) None None None None None None None' input_refinement_labels='w1(+) SIGw1(+) None' input_map_labels='w1(+) None None' What can I do if a Wizard says this version does not seem big enough?The Wizards try to automatically determine the size of solve or resolve, but if your data is very high resolution or a very large unit cell, you can get the message: *************************************************** Sorry, this version does not seem big enough... (Current value of isizeit is 30) Unfortunately your computer will only accept a size of 30 with your current settings. You might try cutting back the resolution You might try "coarse_grid" to reduce memory You might try "unlimit" allow full use of memory ***************************************************You cannot get rid of this problem by specifying the resolution with resolution=4.0because the Wizards use the resolution cutoff you specify in all calculations, but the high-res data is still carried along. The easiest solution to this problem is to edit your data file to have lower- resolution data. You can do it like this: phenix.reflection_file_converter huge.sca --sca=big.sca --resolution=4.0A second solution is to tell the Wizard to ignore the high-res data explicitly with: resolution=4.0 \ resolve_command="'resolution 200 4.0'" \ solve_command="'resolution 200 4.0'" \ resolve_pattern_command="'resolution 200 4.0'"Note the two sets of quotes; both are required for this command-line input. These commands are applied after all other inputs in resolve/solve/resolve_pattern and therefore all data outside these limits will be ignored. Why does the AutoBuild Wizard say Sorry, you need to define FP in labin but AutoMR was able to read my data file just fine?When you run AutoMR and let it continue on to the AutoBuild Wizard automatically, the AutoBuild wizard guesses the input file contents separately from AutoMR. Usually it can guess correctly, but if it cannot then you can tell it what the labels for FP SIGFP FreeR_flag are like this: autobuild_input_labels="myFP mySIGFP myFreeR_flag"where you can say None for anything that you do not want to define. This has an effect that is identical to specifying input_labels directly when you run AutoBuild. Why does the AutoBuild Wizard just stop after a few seconds?When you run AutoBuild from the command line it writes the output to a file and says something like: Sending output to AutoBuild_run_3_/AutoBuild_run_3_1.logUsually if something goes wrong with the inputs then it will give you an error message right on the screen. However a few types of errors are only written to the log file, so if AutoBuild just stops after a few seconds, have a look at this log file and it should have an error message at the end of the file. What is an R-free flags mismatch?When you run AutoBuild or phenix.refine you may get this error message or a similar one: ************************************************************ Failed to carry out AutoBuild_build_cycle: Please resolve the R-free flags mismatch. ************************************************************Phenix.refine keeps track of which reflections are used as the test set (i.e., not used in refinement but only in estimation of overall parameters). The test set identity is saved as a hex-digest and written to the output PDB file produced by phenix.refine as a REMARK record: REMARK r_free_flags.md5.hexdigest 41aea2bced48fbb0fde5c04c7b6fb64Then when phenix.refine reads a PDB file and a set of data, it checks to make sure that the same test set is about to be used in refinement as it was in the previous refinement of this model. If it does not, you get the error message about an R-free flags mismatch. Sometimes the R-free flags mismatch error is telling you something important: you need to make sure that the same test set is used throughout refinement. In this case, you might need to change the data file you are using to match the one previously used with this PDB file. Alternatively you might need to start your refinement over with the desired data and test set. Other times the warning is not applicable. If you have two datasets with the same test set, but one dataset has one extra reflection that contains no data, only indices, then the two datasets will have different hex digests even though they are for all practical purposes equivalent. In this case you would want to ignore the hex-digest warning. If you get an R-free flags mismatch error, you can tell the AutoBuild Wizard to ignore the warning with : skip_hexdigest=Trueand you can tell phenix.refine to ignore it with: refinement.input.r_free_flags.ignore_pdb_hexdigest=TrueYou can also simply delete the REMARK record from your PDB file if you wish to ignore the hex-digest warnings. Can I use the AutoBuild wizard at low resolution?The standard building with AutoBuild does not work very well at resolutions below about 3-3.2 A. In particular, the wizard tends to build strands into helical regions at low resolution. However you can specify "helices_strands_only=True" and the wizard will just build regions that are helical or beta-sheet, using a completely different algorithm. This is much quicker than standard building but much less complete as well. Why doesn't COOT recognize my MTZ file from AutoBuild?This happens if you use "auto-open MTZ" in COOT. COOT will say: FAILED TO FIND COLUMNS FWT AND PHWT IN THAT MTZ FILE FAILED TO FIND COLUMNS DELFWT AND PHDELFWT IN THAT MTZ FILE. The solution is to use "Open MTZ" and then to select the columns (usually FP PHIM FOMM, and yes, do use weights). My AutoBuild composite OMIT job crashed because my computer crashed. Can I go on without redoing all the work that has been done?Yes, but it involves several steps:
Does the RESOLVE database of density distributions contain RNA/protein examples?The RESOLVE database doesn't have RNA+protein in it, nor does it have low-resolution histograms, but you can create a new entry very easily. Here is how:
Why do I get "None of the solve versions worked" in AutoSol?
If I run AutoBuild with after_autosol=True, how do I know which run of AutoSol it will use?AutoBuild will look through all the autosol runs and choose the solution with the highest final score, and use that one. You can see this near the beginning of the AutoBuild run: Appending solution 4060.75360229 1 75.3602294036 exptl_fobs_phases_freeR_flags_1.mtz solve_1.mtz Appending solution 59.3469818876 2 59.3469818876 None solve_2.mtz Best solution 4060.75360229 1 75.3602294036 exptl_fobs_phases_freeR_flags_1.mtz solve_1.mtz AutoSol_run_2_In this case it took run 2 with the solution solve_1.mtz with score of 4060.7 over the solution solve_2.mtz with score of 59. If you want to choose a different AutoSol solution, then you will need to explicitly tell AutoBuild all the files that you want to use: phenix.autobuild data=AutoSol_run_5_/exptl_fobs_freer_flags_3.mtz \ map_file=AutoSol_run_5_/resolve_3.mtz \ seq_file=my_seq_file.seqNotes:
How can I do a quick check for iso and ano differences in an MIR dataset?phenix.autosol native.data=native.sca deriv.data=deriv.scaand wait a couple minutes until it has scaled the data (once it says "RUNNING HYSS" you are far enough) and then have a look at AutoSol_run_1_/TEMP0/dataset_1_scale.logwhich will say near the end.. isomorphous differences derivs 1 - native Differences by shell: shell dmin nobs Fbar R scale SIGNAL NOISE S/N 1 5.600 1018 285.012 0.287 0.998 105.05 26.73 3.93 2 4.200 1386 324.927 0.216 1.000 84.78 26.76 3.17 3 3.920 542 330.807 0.214 1.002 85.00 28.36 3.00 4 3.710 523 286.487 0.237 1.002 81.31 27.29 2.98 5 3.500 662 282.383 0.235 1.001 75.58 37.12 2.04 6 3.360 518 255.782 0.241 1.003 72.69 27.18 2.67 7 3.220 630 237.778 0.253 1.000 68.87 29.94 2.30 8 3.080 727 208.271 0.255 1.000 61.39 29.19 2.10 9 2.940 897 190.044 0.254 0.999 42.78 42.99 1.00 10 2.800 1067 169.022 0.280 0.999 50.54 33.24 1.52 Total: 7970 256.096 0.245 1.000 75.29 31.41 2.48Here R is <Fderiv-Fnative>/(2 <Fderiv+Fnative>), noise is <sigma>, signal is sqrt(<(Fderiv-Fnative)**2>-<sigma**2>), and S/N is the ratio of signal to noise. If you want to force the NCS to come from the ha file, first identify the NCS with phenix.find_ncs: phenix.find_ncs eden-unique.mtz hatom.pdbThis should find the NCS and write out a file called something like find_ncs.ncs_spec . Now use the keyword ncs_file=find_ncs.ncs_specin phenix.autobuild and you should be ok. Is there a way to use AutoBuild to combine a set of models created by multi-start simulated annealing?You can do this in two ways. Both involve the keyword, consider_main_chain_list="pdb1.pdb pdb2.pdb pdb3.pdb"which lets you suggest a set of models to autobuild to consider in model-building.
Why am I not allowed to use a file with FAVG SIGFAVG DANO SIGDANO in autosol or autobuild?The group of MTZ columns FAVG SIGFAVG DANO SIGDANO is a special one that should normally not be used in Phenix. The reason is that Phenix stores this data as F+ SIGF+ F- SIGF-, but in the conversion process between F+/F- and FAVG/DANO, information is lost. Therefore you should normally supply data files with F+ SIGF+ F- SIGF- (or intensities), or fully merged data (F,SIG) to Phenix routines. As a special case, if you have anomalous data saved as FAVG SIGFAVG DANO SIGDANO you can supply this to AutoSol, however this requires either that (1) you supply a refinement file with F SIG, or that (2) your data file has a separate F SIG pair of columns (other than the FAVG SIGFAVG columns that are part of the FAVG/DANO group). I am using phenix.automr with a dimer (copies=1). However, Phenix gives me a warning that the unit cell is too full.In this case, check to make sure that you have specified that the contents of the unit cell include two copies of your sequence with component_copies=2. (In automr the composition of the asymmetric unit is specified independently of the model). How do I run AutoBuild on a cluster?Phenix.autobuild is set up so that you can specify the number of processors (nproc) and the number of batches (nbatch). Additionally you will want to set two more parameters: run_command ="command you use to submit a job to your system" background=False # probably false if this is a cluster, true if this is a multiprocessor machineIf you have a queueing system with 20 nodes, then you probably submit jobs with something like "qsub -someflags myjob.sh" # where someflags are whatever flags you use(or just "qsub myjob.sh" if no flags) Then you might use run_command="qsub -someflags" background=False nproc=20 nbatch=20If you have a 20-processor machine instead, then you might say run_command=sh background=True nproc=20 nbatch=20so that it would run your jobs with sh on your machine, and run them all in the background (i.e., all at one time). How do I tell AutoBuild to use phenix.refine maps instead of density-modified maps for model-building?To use the phenix.refine maps instead of density-modified maps, use the keyword: two_fofc_in_rebuild=True How do I include a twin-law for refinement in AutoBuild?you can include the twin law in autobuild for refinement with the keyword: refine_eff_file=refinement_params.effwhere refinement_params.eff says something like: refinement { twinning { twin_law = "-k, -h, -l" } }(You can get the twin law "-k, -h, -l" from phenix.xtriage.) Why is there no no exptl_fobs_phases_freeR_flags_*.mtz file in my AutoSol_run_xx_ directory?In AutoSol the file exptl_fobs_phases_freeR_flags_*.mtz normally contains the experimental Fobs and free R flags for refinement, along with phases and HL coefficients from the experimental phasing. (The * here is the solution number) However if an anisotropy correction is applied to the data, then by default no refinement is done in AutoSol and no exptl_fobs_phases_freeR_flags_*.mtz file is created. This is to ensure that refinement is not carried out against anisotropy-corrected data (you want to refine against the original data, and have phenix.refine apply an anisotropy correction as part of refinement). If you supply input_refinement_file=my_data.scathen my_data.sca will be used for refinement and an exptl_fobs_phases_freeR_flags_*.mtz will be created. Note that my_data.sca can be identical to your input data file if you want. AutoBuild seems to be taking a long time. What is the usual time for a run?For typical structures, AutoBuild runs can take from 30 minutes to several days using a single processor. You can speed up your jobs by using several processors with a command such as "'nproc=4". for AutoBuild you can speed up by up to a factor of 5 in this way. You can also speed up rebuild_in_place AutoBuild jobs (where your model is being adjusted, not built from scratch) by specifying fewer cycles: "n_cycle_rebuild_max=1" will use 1 cycle of rebuilding instead of the usual 5. Often that is plenty. Why does autobuild or ligandfit crash with "sh: not found"?This usually means that you do not have the application "csh" installed on your system. If you have Ubuntu linux, csh and tcsh are not included in a normal installation. It is easy to install csh and tcsh under linux and it just takes a minute. You can say: yum install tcshand that should do it. When should I use multi-crystal averaging?Multi-crystal averaging is going to be useful only if the crystals are completely different or the amplitudes are nearly uncorrelated. In cases where there are only small changes the averaging procedure has almost nothing different in the two structures to work with and it won't do much. Another way to say this is that multi-crystal averaging works because two or more very different ways of sampling the Fourier transform of the molecule are occurring, and each must be consistent with the corresponding measured data. If the molecules are nearly the same and the measured data are nearly the same in all cases, then there are few constraints on the phases. Yes, experimental phases can be included in multi-crystal averaging, just as for NCS averaging. And yes, experimental phases are most helpful. If some regions are different in the different crystals, then the masking procedure needs to be adjusted to exclude the variable regions from the averaging process. Can I make density modified phase combination (partial model phases and experimental phases) in PHENIX?Yes, you get these if you use: phenix.autobuild model=partial_model.pdb data=exptl_phases_hl_etc.mtz rebuild_in_place=False seq_file=seq.datThe model is used to generate phases by a variation on statistical density modification. These phases are then combined with the experimental phases and then the combined phases are density modified. Then the result is density modified including the model. So the file image.mtzis exptl phases + model phases, and image_only_dm.mtzis image.mtz, density modified. Then resolve_work.mtzis image_only_dm.mtz, density modified further using the model as a target for density modification along with histograms, solvent flattening, ncs, etc. How can I specify a mask for density modification in AutoSol/AutoBuild?If you want to specify a mask, add this command: resolve_command_list=" 'model ../../coords.pdb' 'use_model_mask' "where there are " and ' quotes and coords.pdb is the model to use for a mask. Note the "../../" because coords.pdb is in your working directory but when resolve runs the run directory is 2 directories lower, so relative to that directory your coords.pdb is at "../../coords.pdb". You will know it is working if your resolve_xx.log says: Using model mask calculated from coordinatesNote: this command is most appropriate for use with the keyword "maps_only=True" because phenix.autobuild also uses "model=..." so that iterative model-building may not work entirely correctly in this case. Two parts that may not function correctly are "build_outside_model" (which will use your model as a mask and not the current one), and evaluate_model (which will evaluate your starting model, not the current model). Is there anyway to get phenix.autobuild to NOT delete multiple conformers when doing a SA-omit map?At present, if you put you multiple conformations in for the protein autobuild will take only conformation 1 and it will ignore the others. As a work-around, you can try this: call all the protein a "ligand" and put it in this way (you need to give it one complete residue in the model as "one_residue.pdb" (or any part of the model that has just one conformation): phenix.autobuild data=data.mtz \ model=one_residue.pdb \ input_lig_file_list=model.pdb \ composite_omit_type=sa_omitAutobuild treats ligands as a fixed structure during model building and in omit maps, only adjusted during refinement, which is what you want in this case. What do I do if autobuild says TRIED resolve_extra_huge ...but not OK?In most cases when you get this error in phenix TRIED resolve_extra_huge ...but not OKit actually means "your computer does not have enough memory to run resolve_extra_huge". If that is the case then you are kind of stuck unless you have another computer with even more memory+swap space, or you cut back on the resolution of the input data (Note that you have to actually lower the resolution in the input file, not just set "resolution=" because all the data is kept but not used if you just set the resolution). You can also try the keyword resolve_command_list=" 'coarse_grid' "(note 2 sets of quotes) Sometimes the not OK message can happen if your system and PHENIX are not matching, so that resolve or solve cannot run at all. You can test for this by typing phenix.resolveand if it loads up (just type QUIT or END or control-C to end it) then it runs, and if it doesn't, there is a system mismatch. What are my options for OMIT maps if I have 4 fold NCS axis?
Problems installing Rosetta? Here are some suggestions:find $PHENIX_ROSETTA_PATH -type d -exec chmod 755 '{}' \; |