Memory and CPU usage
Dear community, I'm planning on buying a new macbook for my first abroad postdoc venture, and I'm currently trying decide what amount of RAM & CPU to choose. My question is a little bit intricate: is there any rule of thumb to estimate the needed RAM versus the number of input atoms for any particular refinement type? I'm planning on tackling some big (18,000 residues/asu) refinements but I don't have any estimation as to how many GBs of RAM they might require. The maximum amount of RAM I can install is 16GB, but it is a non-standard configuration that might void Apple's warranty. About the CPU: how scalable are typical phenix processes? Would it be sensible to invest in a quad-core machine with HT? In this particular case, and since HT would present 8 logical cores, would I get any speed up from launching phenix tasks configured for 8 processors instead of 4? Thanks in advance! Jon
On Sat, Jul 21, 2012 at 3:05 AM, Jon Agirre
I'm planning on buying a new macbook for my first abroad postdoc venture, and I'm currently trying decide what amount of RAM & CPU to choose. My question is a little bit intricate: is there any rule of thumb to estimate the needed RAM versus the number of input atoms for any particular refinement type?
We do not have any exact rule of thumb, but it is mostly dependent on the resolution and size of the unit cell, rather than the number of atoms. You can blame crystal symmetry for this, and the fact that our FFTs are done in P1. Assuming for the moment that you have 90x90x90 unit cell edges, you can calculate the approximate memory size of an FFT'd map using this formula: map_size = 8 * a * b * c * (d_min/3)^3 So if you are lucky and it crystallizes in P1 with no NCS, it will have much less overhead than if it is (for example) P622 with 3-fold NCS. This still doesn't tell you exactly how much memory the overall program will use, however. One caution I have is that you can cut down the actual memory usage by being careful with the phenix.refine parameters - in particular, the fill_missing_f_obs feature of the output maps takes up a lot of extra memory, so disable this if you're worried about exceeding the memory limit.
I'm planning on tackling some big (18,000 residues/asu) refinements but I don't have any estimation as to how many GBs of RAM they might require. The maximum amount of RAM I can install is 16GB, but it is a non-standard configuration that might void Apple's warranty.
My instinct is that you probably want to go with the maximum amount of memory just on general principle, but we certainly don't want to encourage anyone to void their hardware warranty. I would check with Apple on this.
About the CPU: how scalable are typical phenix processes? Would it be sensible to invest in a quad-core machine with HT? In this particular case, and since HT would present 8 logical cores, would I get any speed up from launching phenix tasks configured for 8 processors instead of 4?
The HT speedup only really helps for genuinely threaded processes, so the OpenMP FFT (or OpenMP in Phaser) *might* improve it a little bit, but in our experience the OpenMP FFT in phenix.refine is not very effective at reducing overall runtime anyway, certainly much less so than the parallelization of the weight optimization. (Also, you can't use the GUI if you compile with OpenMP.) Here is a quick summary of the parallelization supported for the default installation: AutoBuild: up to 5 cores for building, or unlimited for composite omit map calculation LigandFit: up to 7 cores (I think) phaser.MRage: unlimited cores MR-Rosetta: unlimited cores (Linux/Mac only) phenix.refine: up to 18 cores when weight optimization is used (Linux/Mac only) phenix.den_refine: up to 30 cores (Linux/Mac only) I do think getting 4 cores instead of just 2, regardless of hyperthreading, is a good idea if you can afford it. A secondary problem, however, is that these processes will eventually create their own memory segments, so if you're constrained by physical memory, the degree to which you can take advantage of multiple cores will be limited. (OpenMP, in contrast, does not have this problem.) -Nat
Hi Nat,
Assuming for the moment that you have 90x90x90 unit cell edges, you can calculate the approximate memory size of an FFT'd map using this formula:
map_size = 8 * a * b * c * (d_min/3)^3
I see (d_min/3) in your formula above as a grid step factor. Most of map calculations in phenix.refine and tools around it use (d_min/4). Putting this into your formula map_size = 8 * a * b * c * (d_min/4)^3 will obviously result in a smaller map, which isn't true. Am I missing something? Pavel
On Sat, Jul 21, 2012 at 9:43 AM, Pavel Afonine
I see (d_min/3) in your formula above as a grid step factor. Most of map calculations in phenix.refine and tools around it use (d_min/4). Putting this into your formula
map_size = 8 * a * b * c * (d_min/4)^3
will obviously result in a smaller map, which isn't true. Am I missing something?
No, I am - the final multiply operation should be a divide: map_size = 8 * a * b * c / (d_min/resolution_factor)^3 But the resolution_factor is inconsistent - for the FFT structure factors calculation (which is unavoidable), we are definitely using 1/3 (I assume for speed reasons). For most of the other optional tasks like rotamer correction and filling missing F-obs, it's 1/4. -Nat
But the resolution_factor is inconsistent - for the FFT structure factors calculation (which is unavoidable), we are definitely using 1/3 (I assume for speed reasons). For most of the other optional tasks like rotamer correction and filling missing F-obs, it's 1/4.
Yes, we use 1/3 for structure factors and gradients calculations, and 1/4 in map calculation if the map is going to be used for things like water picking, real-space refinement, etc. This is intentional. Pavel
Dear Nat and Pavel,
thank you so much for your explanations. Assuming that the 8 in Nat's
formula provides conversion to bytes, it is not such a big RAM requirement.
I guess most virus structures should be approachable with an 8GB machine.
About the CPU, I think I'm going to invest in a quad-core. I feel quite
comfortable in command line and I don't have fear paralelizing existing
code.
Thanks again,
Jon
2012/7/21 Pavel Afonine
But the resolution_factor is inconsistent - for the FFT structure
factors calculation (which is unavoidable), we are definitely using 1/3 (I assume for speed reasons). For most of the other optional tasks like rotamer correction and filling missing F-obs, it's 1/4.
Yes, we use 1/3 for structure factors and gradients calculations, and 1/4 in map calculation if the map is going to be used for things like water picking, real-space refinement, etc. This is intentional.
Pavel
______________________________**_________________ phenixbb mailing list [email protected] http://phenix-online.org/**mailman/listinfo/phenixbbhttp://phenix-online.org/mailman/listinfo/phenixbb
-- Jon Agirre, PhD Biophysics Unit (CSIC-UPV/EHU) http://www.ehu.es/jon.agirre +34656756888
What are the current best practices for modified aminoacids? I have a peptide with and N-terminal Acetyl-proline which I am trying to model. The initial approach was to locate the aminoacid in the chemical components dictionary (N7P) and replace the residue in coot. Unfortunately the link to the next residue is not recognized, so the residue is just floating about. Next I just added an acetyl next to the proline and added some restraints in the .def file. That sort of works, but looks a bit cludgy. Is there a way to do this more cleanly? Cheers, Carsten
Carsten
phenix.refine will automatically link a non-standard aminoacid into
the chain if the main chain atoms are named in the same fashion as the
standard, i.e. C, CA, N, O. Unfortunately, the PDB decided to name the
N7D in a "non-compliant" fashion. You have choices.
1. Use the cif link facility to link them.
phenix.ligand_linking model.pdb
may provide this for you but I'd appreciate the copy of the three
residues involved to double check the validity of the links.
2. Rename the atoms in your model to the standard names for the main
chain (and ideally change the residue to something else) and run
phenix.elbow model.pdb --residue=NWM
to get a new cif file for the new residue. eLBOW attempts to match the
main chain restraints in the other restraints files if it detects that
the residue is an aminoacid.
Cheers
Nigel
On Fri, Jul 27, 2012 at 10:43 AM, Schubert, Carsten [JRDUS]
What are the current best practices for modified aminoacids? I have a peptide with and N-terminal Acetyl-proline which I am trying to model. The initial approach was to locate the aminoacid in the chemical components dictionary (N7P) and replace the residue in coot. Unfortunately the link to the next residue is not recognized, so the residue is just floating about. Next I just added an acetyl next to the proline and added some restraints in the .def file. That sort of works, but looks a bit cludgy. Is there a way to do this more cleanly?
Cheers,
Carsten
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- Nigel W. Moriarty Building 64R0246B, Physical Biosciences Division Lawrence Berkeley National Laboratory Berkeley, CA 94720-8235 Phone : 510-486-5709 Email : [email protected] Fax : 510-486-5909 Web : CCI.LBL.gov
On 27/07/12 13:43, Schubert, Carsten [JRDUS] wrote:
What are the current best practices for modified aminoacids? I have a peptide with and N-terminal Acetyl-proline which I am trying to model. The initial approach was to locate the aminoacid in the chemical components dictionary (N7P) and replace the residue in coot.
do the restraints specify that N7P is a non-polymer (in which case coot will treat it as such) or as an L-peptide?
Unfortunately the link to the next residue is not recognized, so the residue is just floating about.
Doesn't sound very much like how a L-peptide should behave to me... Paul.
Bump.
I have an S-methylcysteine sulfoxide so I downloaded it from WebCSD from
the CCDC website. So far so good. Available in three formats but I wanted
it to work as a modified residue so I followed the advice here. Opened it
in PyMol, saved it as a pdb file and edited the atoms in a text editor to
fit with a normal residue. But phenix.elbow doesn't treat sulfur correctly.
The geometry is right but it adds an extra hydrogen to the sulfur and the
oxygen - they should have a double bond instead. If I run phenix.elbow
directly on the files downloaded from WebCSD I get the double bond but I
also get a flat system; as if my sulfur was a carbon.
Yes, I tried --opt.
Attached the best results.
My phenix version is 1299.
tl;dr phenix.elbow fails at parameterising DMSO.
Help?
Cheers,
Morten
On 29 July 2012 04:41, Paul Emsley
On 27/07/12 13:43, Schubert, Carsten [JRDUS] wrote:
What are the current best practices for modified aminoacids? I have a peptide with and N-terminal Acetyl-proline which I am trying to model. The initial approach was to locate the aminoacid in the chemical components dictionary (N7P) and replace the residue in coot.
do the restraints specify that N7P is a non-polymer (in which case coot will treat it as such) or as an L-peptide?
Unfortunately the link to the next residue is not recognized, so the residue is just floating about.
Doesn't sound very much like how a L-peptide should behave to me...
Paul.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- Morten K Grøftehauge, PhD Pohl Group Durham University
Morten It is always best not to use the PDB format as input if another is available as it suffers from a number of limitations. However, I am surprised that the other formats didn't provide a better result. I'd be interested in the 3 files you downloaded. To solve your problem, one of the best ways to get restraints is with the help of the Chemical Components. Your ligand, CYM, is specified by the PDB and you can use eLBOW to create a restraints file. phenix.elbow --chemical-component=cym and its available in the GUI also. Cheers Nigel On Tue, Mar 5, 2013 at 6:16 AM, Morten Groftehauge < [email protected]> wrote:
Bump.
I have an S-methylcysteine sulfoxide so I downloaded it from WebCSD from the CCDC website. So far so good. Available in three formats but I wanted it to work as a modified residue so I followed the advice here. Opened it in PyMol, saved it as a pdb file and edited the atoms in a text editor to fit with a normal residue. But phenix.elbow doesn't treat sulfur correctly. The geometry is right but it adds an extra hydrogen to the sulfur and the oxygen - they should have a double bond instead. If I run phenix.elbow directly on the files downloaded from WebCSD I get the double bond but I also get a flat system; as if my sulfur was a carbon. Yes, I tried --opt. Attached the best results. My phenix version is 1299.
tl;dr phenix.elbow fails at parameterising DMSO.
Help?
Cheers, Morten
On 29 July 2012 04:41, Paul Emsley
wrote: On 27/07/12 13:43, Schubert, Carsten [JRDUS] wrote:
What are the current best practices for modified aminoacids? I have a peptide with and N-terminal Acetyl-proline which I am trying to model. The initial approach was to locate the aminoacid in the chemical components dictionary (N7P) and replace the residue in coot.
do the restraints specify that N7P is a non-polymer (in which case coot will treat it as such) or as an L-peptide?
Unfortunately the link to the next residue is not recognized, so the residue is just floating about.
Doesn't sound very much like how a L-peptide should behave to me...
Paul.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- Morten K Grøftehauge, PhD Pohl Group Durham University
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- Nigel W. Moriarty Building 64R0246B, Physical Biosciences Division Lawrence Berkeley National Laboratory Berkeley, CA 94720-8235 Phone : 510-486-5709 Email : [email protected] Fax : 510-486-5909 Web : CCI.LBL.gov
Hi Jon, you can find the answer without wasting time on guesswork... Take a few extreme cases from PDB (big model, many reflections, tricky space group), and run them from the command line: phenix.refine model.pdb data.mtz ordered_solvent=true --show-process-info The log file should contain memory usage throughout the run. Look for the max memory intake in the last record (towards the end of log file). This will give you an idea about how much memory you may need. Pavel On 7/22/12 4:31 AM, Jon Agirre wrote:
Dear Nat and Pavel,
thank you so much for your explanations. Assuming that the 8 in Nat's formula provides conversion to bytes, it is not such a big RAM requirement. I guess most virus structures should be approachable with an 8GB machine.
About the CPU, I think I'm going to invest in a quad-core. I feel quite comfortable in command line and I don't have fear paralelizing existing code.
Thanks again,
Jon
2012/7/21 Pavel Afonine
mailto:[email protected]> But the resolution_factor is inconsistent - for the FFT structure factors calculation (which is unavoidable), we are definitely using 1/3 (I assume for speed reasons). For most of the other optional tasks like rotamer correction and filling missing F-obs, it's 1/4.
Yes, we use 1/3 for structure factors and gradients calculations, and 1/4 in map calculation if the map is going to be used for things like water picking, real-space refinement, etc. This is intentional.
Pavel
_______________________________________________ phenixbb mailing list [email protected] mailto:[email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- Jon Agirre, PhD Biophysics Unit (CSIC-UPV/EHU) http://www.ehu.es/jon.agirre +34656756888
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
participants (7)
-
Jon Agirre
-
Morten Groftehauge
-
Nathaniel Echols
-
Nigel Moriarty
-
Paul Emsley
-
Pavel Afonine
-
Schubert, Carsten [JRDUS]