Using the Same Test Set in AutoBuild and Phenix.Refine
Hi all, I have another problem, I'm afraid. I have built a model using phenix.autobuild and now want to run some refinement. While in the long run I'll do some manually rebuilding using Coot I just wanted to run a test of phenix.refine to ensure I have the script right and have a baseline to compare against later. My autobuild script is: phenix.autobuild model=AutoMR_run_4_/MR.1-protein.pdb data=1M50-2.mtz \ input_refinement_labels="FP SIGFP None None None None None None FreeR_flag" \ map_file=AutoMR_run_4_/MR.MAP_COEFFS.1.mtz \ seq_file=../fmo-ct.pir \ resolution=2.2 dmax=20 refinement_resolution=2.2 \ cif_def_file_list=/usr/users/dale/geometry/chromophores/bcl_tnt.cif \ input_lig_file_list=AutoMR_run_4_/MR.1-Bchl-a.pdb \ rebuild_in_place=Yes My refine script is phenix.refine AutoBuild_run_12_/overall_best.pdb \ refinement.input.xray_data.file_name=1M50-2.mtz 1M50-2.mtz \ refinement.main.high_resolution=2.2 refinement.main.low_resolution=20 \ /usr/users/dale/geometry/chromophores/bcl_tnt.cif As you can guess, my test set flags are in the same mtz file as the amplitudes. I'm feeding exactly the same file into both runs. Despite this I get in my output ******************************************************************************* ******************************************************************************* The MD5 checksum for the R-free flags array summarized above is: 785fd03f6881898dcd91bc5f8c3e5b26 The corresponding MD5 checksum in the PDB file summarized above is: 6f86ee71dbcd2dc1f5282cf18547c79b These checksums should be identical but are in fact different. This is because the R-free flags used at previous stages of refinement are different from the R-free flags summarized above. As a consequence, the values for R-free will be biased and misleading. It is best to avoid this situation by consistently using the same R-free flags throughout the refinement of a model. If the previously used R-free flags are still available run this command again with the name of the file containing the original flags as an additional input. If the original R-free flags are unrecoverable, remove the REMARK r_free_flags.md5.hexdigest 6f86ee71dbcd2dc1f5282cf18547c79b record from the input PDB file to proceed with the refinement. In this case the values for R-free will become meaningful only after many cycles of refinement. ******************************************************************************* ******************************************************************************* Sorry: Please resolve the R-free flags mismatch. While I'm glad that Phenix is checking to ensure that I haven't goofed and tried to switch test sets, I believe I'm being unjustly accused. Why does phenix.refine think I'm a bad boy? Dale Tronrud
Hi Dale, 1) Why you specify reflection MTZ file twice in phenix.refine script? 2) Try exactly the same Autobuild and phenix.refine runs but without any resolution limits. Is it still crashing in this case? The guess is that it could be a bug somewhere that causes MD5 calculation on processed data file (with resolution cutoffs applied) and then this "wrong" MD5 record goes into overall_best.pdb and causes inconsistency when compared with native 1M50-2.mtz when you start phenix.refine. Again, this is just a guess... 3) Does this work: a) phenix.refine MR.1-protein.pdb 1M50-2.mtz output.prefix=junk phenix.refine junk_001.pdb 1M50-2.mtz b) phenix.refine MR.1-protein.pdb 1M50-2.mtz output.prefix=junk xray_data.high_res=2.2 xray_data.low_res=20 phenix.refine junk_001.pdb 1M50-2.mtz c) phenix.refine MR.1-protein.pdb 1M50-2.mtz output.prefix=junk phenix.refine junk_001.pdb 1M50-2.mtz xray_data.high_res=2.2 xray_data.low_res=20 Pavel. PS> I will be away Dec, 30 - Jan, 1. I will have no email access during this time. Tom is unreachable Jan 1 - 21. Dale Tronrud wrote:
Hi all,
I have another problem, I'm afraid. I have built a model using phenix.autobuild and now want to run some refinement. While in the long run I'll do some manually rebuilding using Coot I just wanted to run a test of phenix.refine to ensure I have the script right and have a baseline to compare against later.
My autobuild script is:
phenix.autobuild model=AutoMR_run_4_/MR.1-protein.pdb data=1M50-2.mtz \ input_refinement_labels="FP SIGFP None None None None None None FreeR_flag" \ map_file=AutoMR_run_4_/MR.MAP_COEFFS.1.mtz \ seq_file=../fmo-ct.pir \ resolution=2.2 dmax=20 refinement_resolution=2.2 \ cif_def_file_list=/usr/users/dale/geometry/chromophores/bcl_tnt.cif \ input_lig_file_list=AutoMR_run_4_/MR.1-Bchl-a.pdb \ rebuild_in_place=Yes
My refine script is
phenix.refine AutoBuild_run_12_/overall_best.pdb \ refinement.input.xray_data.file_name=1M50-2.mtz 1M50-2.mtz \ refinement.main.high_resolution=2.2 refinement.main.low_resolution=20 \ /usr/users/dale/geometry/chromophores/bcl_tnt.cif
As you can guess, my test set flags are in the same mtz file as the amplitudes. I'm feeding exactly the same file into both runs. Despite this I get in my output
******************************************************************************* *******************************************************************************
The MD5 checksum for the R-free flags array summarized above is: 785fd03f6881898dcd91bc5f8c3e5b26
The corresponding MD5 checksum in the PDB file summarized above is: 6f86ee71dbcd2dc1f5282cf18547c79b
These checksums should be identical but are in fact different. This is because the R-free flags used at previous stages of refinement are different from the R-free flags summarized above. As a consequence, the values for R-free will be biased and misleading. It is best to avoid this situation by consistently using the same R-free flags throughout the refinement of a model. If the previously used R-free flags are still available run this command again with the name of the file containing the original flags as an additional input.
If the original R-free flags are unrecoverable, remove the
REMARK r_free_flags.md5.hexdigest 6f86ee71dbcd2dc1f5282cf18547c79b
record from the input PDB file to proceed with the refinement. In this case the values for R-free will become meaningful only after many cycles of refinement.
******************************************************************************* *******************************************************************************
Sorry: Please resolve the R-free flags mismatch.
While I'm glad that Phenix is checking to ensure that I haven't goofed and tried to switch test sets, I believe I'm being unjustly accused. Why does phenix.refine think I'm a bad boy?
Dale Tronrud _______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
Dear Pavel, Pavel Afonine wrote:
Hi Dale,
1) Why you specify reflection MTZ file twice in phenix.refine script?
I put the mtz in twice because if I put it in once phenix.refine complains that I have no free R flags. It seems to want one file with the amplitudes and another with the flags. Since I have both in the same file I put that file on the line twice and phenix.refine finds everything it needs.
2) Try exactly the same Autobuild and phenix.refine runs but without any resolution limits. Is it still crashing in this case? The guess is that it could be a bug somewhere that causes MD5 calculation on processed data file (with resolution cutoffs applied) and then this "wrong" MD5 record goes into overall_best.pdb and causes inconsistency when compared with native 1M50-2.mtz when you start phenix.refine. Again, this is just a guess...
If the MD5 hash of the test set depends on the resolution then certainly I could be in trouble. phenix.autobuild does seem to have a problem passing my resolution limits down to phenix.refine so that program incorrectly uses all the data in the mtz. In my manual run of phenix.refine I give it the proper resolution limits. That is one reason I want to run some additional refinement before heading off to Coot. Does the resolution limit affect the MD5 hash of the test set?
3) Does this work:
a) phenix.refine MR.1-protein.pdb 1M50-2.mtz output.prefix=junk phenix.refine junk_001.pdb 1M50-2.mtz
b) phenix.refine MR.1-protein.pdb 1M50-2.mtz output.prefix=junk xray_data.high_res=2.2 xray_data.low_res=20 phenix.refine junk_001.pdb 1M50-2.mtz
c) phenix.refine MR.1-protein.pdb 1M50-2.mtz output.prefix=junk phenix.refine junk_001.pdb 1M50-2.mtz xray_data.high_res=2.2 xray_data.low_res=20
I'll try these but it will take a bit of time.
Pavel.
PS> I will be away Dec, 30 - Jan, 1. I will have no email access during this time. Tom is unreachable Jan 1 - 21.
Dale Tronrud wrote:
Hi all,
I have another problem, I'm afraid. I have built a model using phenix.autobuild and now want to run some refinement. While in the long run I'll do some manually rebuilding using Coot I just wanted to run a test of phenix.refine to ensure I have the script right and have a baseline to compare against later.
My autobuild script is:
phenix.autobuild model=AutoMR_run_4_/MR.1-protein.pdb data=1M50-2.mtz \ input_refinement_labels="FP SIGFP None None None None None None FreeR_flag" \ map_file=AutoMR_run_4_/MR.MAP_COEFFS.1.mtz \ seq_file=../fmo-ct.pir \ resolution=2.2 dmax=20 refinement_resolution=2.2 \ cif_def_file_list=/usr/users/dale/geometry/chromophores/bcl_tnt.cif \ input_lig_file_list=AutoMR_run_4_/MR.1-Bchl-a.pdb \ rebuild_in_place=Yes
My refine script is
phenix.refine AutoBuild_run_12_/overall_best.pdb \ refinement.input.xray_data.file_name=1M50-2.mtz 1M50-2.mtz \ refinement.main.high_resolution=2.2 refinement.main.low_resolution=20 \ /usr/users/dale/geometry/chromophores/bcl_tnt.cif
As you can guess, my test set flags are in the same mtz file as the amplitudes. I'm feeding exactly the same file into both runs. Despite this I get in my output
******************************************************************************* *******************************************************************************
The MD5 checksum for the R-free flags array summarized above is: 785fd03f6881898dcd91bc5f8c3e5b26
The corresponding MD5 checksum in the PDB file summarized above is: 6f86ee71dbcd2dc1f5282cf18547c79b
These checksums should be identical but are in fact different. This is because the R-free flags used at previous stages of refinement are different from the R-free flags summarized above. As a consequence, the values for R-free will be biased and misleading. It is best to avoid this situation by consistently using the same R-free flags throughout the refinement of a model. If the previously used R-free flags are still available run this command again with the name of the file containing the original flags as an additional input.
If the original R-free flags are unrecoverable, remove the
REMARK r_free_flags.md5.hexdigest 6f86ee71dbcd2dc1f5282cf18547c79b
record from the input PDB file to proceed with the refinement. In this case the values for R-free will become meaningful only after many cycles of refinement.
******************************************************************************* *******************************************************************************
Sorry: Please resolve the R-free flags mismatch.
While I'm glad that Phenix is checking to ensure that I haven't goofed and tried to switch test sets, I believe I'm being unjustly accused. Why does phenix.refine think I'm a bad boy?
Dale Tronrud _______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
Hi Dale,
1) Why you specify reflection MTZ file twice in phenix.refine script?
I put the mtz in twice because if I put it in once phenix.refine complains that I have no free R flags. It seems to want one file with the amplitudes and another with the flags. Since I have both in the same file I put that file on the line twice and phenix.refine finds everything it needs.
phenix.refine looks for free-R flags in your main data file (1M50-2.mtz). Optionally you can provide a separate file containing free-R flags (I have to write about this in the manual). However, if your 1M50-2.mtz contains free-R flags then you don't need to give it twice. So clearly something is wrong at this step and we need to find out what is wrong before doing anything else. Could you send the result of the command "phenix.mtz.dump 1M50-2.mtz" to see what's inside of your data file? Or I can debug it myself if you send me the data and model.
If the MD5 hash of the test set depends on the resolution then certainly I could be in trouble.
No. It must always use the original files before any processing.
Does the resolution limit affect the MD5 hash of the test set?
No. If it does then it is a very bad bug. I will play with this myself later tonight.
3) Does this work:
(...)
I'll try these but it will take a bit of time.
Don't run it until completion. Just make sure it passed through the processing step. Pavel.
Requested mtz.dump inserted later in letter. Pavel Afonine wrote:
Hi Dale,
1) Why you specify reflection MTZ file twice in phenix.refine script?
I put the mtz in twice because if I put it in once phenix.refine complains that I have no free R flags. It seems to want one file with the amplitudes and another with the flags. Since I have both in the same file I put that file on the line twice and phenix.refine finds everything it needs.
phenix.refine looks for free-R flags in your main data file (1M50-2.mtz). Optionally you can provide a separate file containing free-R flags (I have to write about this in the manual). However, if your 1M50-2.mtz contains free-R flags then you don't need to give it twice. So clearly something is wrong at this step and we need to find out what is wrong before doing anything else. Could you send the result of the command "phenix.mtz.dump 1M50-2.mtz" to see what's inside of your data file? Or I can debug it myself if you send me the data and model.
dale@fluorine [2] phenix.mtz.dump 1M50-2.mtz Processing: 1M50-2.mtz Title: [No title given] Space group symbol from file: P 43 3 2 Space group number from file: 212 Space group from matrices: P 43 3 2 (No. 212) Point group symbol from file: PG432 Number of crystals: 2 Number of Miller indices: 45448 Resolution range: 119.572 2.14861 History: From FREERFLAG 8/12/2007 00:31:08 with fraction 0.050 From f2mtz 8/12/2007 00:31:02 data from CAD on 8/12/07 Crystal 1: Name: HKL_base Project: HKL_base Id: 1 Unit cell: (169.1, 169.1, 169.1, 90, 90, 90) Number of datasets: 1 Dataset 1: Name: HKL_base Id: 0 Wavelength: 0 Number of columns: 4 label #valid %valid min max type H 45448 100.00% 0.00 45.00 H: index h,k,l K 45448 100.00% 1.00 78.00 H: index h,k,l L 45448 100.00% 0.00 55.00 H: index h,k,l FreeR_flag 45448 100.00% 0.00 19.00 I: integer Crystal 2: Name: allen-2002 Project: FMO-ct Id: 2 Unit cell: (169.1, 169.1, 169.1, 90, 90, 90) Number of datasets: 1 Dataset 1: Name: 1 Id: 1 Wavelength: 0 Number of columns: 2 label #valid %valid min max type FP 41607 91.55% 0.00 15171.00 F: amplitude SIGFP 41607 91.55% 0.00 1716.00 Q: standard deviation
If the MD5 hash of the test set depends on the resolution then certainly I could be in trouble.
No. It must always use the original files before any processing.
Does the resolution limit affect the MD5 hash of the test set?
No. If it does then it is a very bad bug. I will play with this myself later tonight.
3) Does this work:
(...) I'll try these but it will take a bit of time.
Don't run it until completion. Just make sure it passed through the processing step.
Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
Pavel Afonine wrote:
Hi Dale,
1) Why you specify reflection MTZ file twice in phenix.refine script?
2) Try exactly the same Autobuild and phenix.refine runs but without any resolution limits. Is it still crashing in this case? The guess is that it could be a bug somewhere that causes MD5 calculation on processed data file (with resolution cutoffs applied) and then this "wrong" MD5 record goes into overall_best.pdb and causes inconsistency when compared with native 1M50-2.mtz when you start phenix.refine. Again, this is just a guess...
3) Does this work:
a) phenix.refine MR.1-protein.pdb 1M50-2.mtz output.prefix=junk phenix.refine junk_001.pdb 1M50-2.mtz
These command run.
b) phenix.refine MR.1-protein.pdb 1M50-2.mtz output.prefix=junk xray_data.high_res=2.2 xray_data.low_res=20 phenix.refine junk_001.pdb 1M50-2.mtz
The result is: Processing inputs. This may take a minute or two. Sorry: Unknown command line parameter definition: high_res = 2.2 Instead I tried: phenix.refine AutoMR_run_4_/MR.1-protein.pdb 1M50-2.mtz output.prefix=junk refinement.main.high_resolution=2.2 refinement.main.low_resolution=20 phenix.refine junk_001.pdb 1M50-2.mtz This runs w/o error messages, although the resolution limits are not passed to the second phenix.refine. It uses all the data in the mtz.
c) phenix.refine MR.1-protein.pdb 1M50-2.mtz output.prefix=junk phenix.refine junk_001.pdb 1M50-2.mtz xray_data.high_res=2.2 xray_data.low_res=20
phenix.refine AutoMR_run_4_/MR.1-protein.pdb 1M50-2.mtz output.prefix=junk phenix.refine junk_001.pdb 1M50-2.mtz refinement.main.high_resolution=2.2 refinement.main.low_resolution=20 These run w/o error messages. The first run uses all the data while the second uses the restricted resolution limits, as requested. Given these results I don't know why my attempt to run phenix.refine following autobuild is failing. Then again, I didn't write the thing.
Pavel.
PS> I will be away Dec, 30 - Jan, 1. I will have no email access during this time. Tom is unreachable Jan 1 - 21.
Dale Tronrud wrote:
Hi all,
I have another problem, I'm afraid. I have built a model using phenix.autobuild and now want to run some refinement. While in the long run I'll do some manually rebuilding using Coot I just wanted to run a test of phenix.refine to ensure I have the script right and have a baseline to compare against later.
My autobuild script is:
phenix.autobuild model=AutoMR_run_4_/MR.1-protein.pdb data=1M50-2.mtz \ input_refinement_labels="FP SIGFP None None None None None None FreeR_flag" \ map_file=AutoMR_run_4_/MR.MAP_COEFFS.1.mtz \ seq_file=../fmo-ct.pir \ resolution=2.2 dmax=20 refinement_resolution=2.2 \ cif_def_file_list=/usr/users/dale/geometry/chromophores/bcl_tnt.cif \ input_lig_file_list=AutoMR_run_4_/MR.1-Bchl-a.pdb \ rebuild_in_place=Yes
My refine script is
phenix.refine AutoBuild_run_12_/overall_best.pdb \ refinement.input.xray_data.file_name=1M50-2.mtz 1M50-2.mtz \ refinement.main.high_resolution=2.2 refinement.main.low_resolution=20 \ /usr/users/dale/geometry/chromophores/bcl_tnt.cif
As you can guess, my test set flags are in the same mtz file as the amplitudes. I'm feeding exactly the same file into both runs. Despite this I get in my output
******************************************************************************* *******************************************************************************
The MD5 checksum for the R-free flags array summarized above is: 785fd03f6881898dcd91bc5f8c3e5b26
The corresponding MD5 checksum in the PDB file summarized above is: 6f86ee71dbcd2dc1f5282cf18547c79b
These checksums should be identical but are in fact different. This is because the R-free flags used at previous stages of refinement are different from the R-free flags summarized above. As a consequence, the values for R-free will be biased and misleading. It is best to avoid this situation by consistently using the same R-free flags throughout the refinement of a model. If the previously used R-free flags are still available run this command again with the name of the file containing the original flags as an additional input.
If the original R-free flags are unrecoverable, remove the
REMARK r_free_flags.md5.hexdigest 6f86ee71dbcd2dc1f5282cf18547c79b
record from the input PDB file to proceed with the refinement. In this case the values for R-free will become meaningful only after many cycles of refinement.
******************************************************************************* *******************************************************************************
Sorry: Please resolve the R-free flags mismatch.
While I'm glad that Phenix is checking to ensure that I haven't goofed and tried to switch test sets, I believe I'm being unjustly accused. Why does phenix.refine think I'm a bad boy?
Dale Tronrud _______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
participants (2)
-
Dale Tronrud
-
Pavel Afonine