Hi Jenn,
1) The B factors are very high (300-400). The Wilson B calculated by phenix.xtriage is ~200. Is this something I should be concerned about?
Probably not (too much). It's typical for low resolution structures. Also, Wilson B is an estimation which becomes poorer and poorer as you go towards low resolution.
2) What is the expected (or 'acceptable') gap between R and Rfree at this resolution. (Current R = 27%, Rfree = 33%) Does this sound reasonable?
Looks more or less what once can expect at this resolution... Also: phenix.r_factor_statistics 4.4 left_offset=0.5 right_offset=0.5 n_bins=5 Histogram of Rwork for models in PDB at resolution 3.90-4.90 A: 0.198 - 0.257 : 26 0.257 - 0.315 : 55 0.315 - 0.374 : 23 <<< your structure 0.374 - 0.433 : 10 0.433 - 0.492 : 4 Histogram of Rfree for models in PDB at resolution 3.90-4.90 A: 0.226 - 0.280 : 17 <<< your structure 0.280 - 0.335 : 48 0.335 - 0.389 : 36 0.389 - 0.444 : 10 0.444 - 0.498 : 7 Histogram of Rfree-Rwork for all model in PDB at resolution 3.90-4.90 A: 0.001 - 0.019 : 23 0.019 - 0.037 : 31 0.037 - 0.054 : 38 0.054 - 0.072 : 19 <<< your structure 0.072 - 0.090 : 7 Number of structures considered: 118 So, you are not alone -:)
3) It has been suggested to me that I should try adding riding hydrogens during the last round of refinement (to help with geometry). Is this something I should do?
Try: it may or may not help. I'm working on a better implementation of handling H atoms that, once ready, will always help. Make sure to use "contribute_to_f_calc=false".
4) What is the 'acceptable' coordinate and phase errors for structures at this resolution. (They are now 1.4Å and 36˚)
These are estimates and should not be taken too literally (for discussion, see R.Read, Methods in Enzymol.).
5) I have been using secondary structure restraints during the refinement, which seems to work reasonably well. I also tried refinement using reference model (the structure of the individual components). But refining with reference models seems to result in high RMSD bonds/angles. Is there something I'm missing?
If NCS is available - use it. If you have many Ramachandran plot outliers - try Ramachandran plot restraints. If using reference model restraints, then the key parameter to play with is "limit". In my recent experience changing it from the default 15 to 180 dropped both Rwork and Rfree by 4%, but you need to try different values before you know. Also, make sure you run as many macro-cycles as necessary to achieve convergence (at low resolution is is slower); in the example I mentioned above I had to do 10 macro-cycles. Pavel.