[phenixbb] phenix.map_to_model input mtz file failure --caution on using map_to_model with X-ray data

Dale Tronrud detBB at daletronrud.com
Tue Jun 13 16:44:44 PDT 2017


   First we have to agree on exactly what we are talking about.  I
presumed we were talking about real space refinement against an
experimental map such as one gets in cryo-EM.  In that case there are no
Fobs.  Reciprocal space is a fiction and should be avoided.

   If you are working, instead, with a 2Fo-Fc map then just do
reciprocal space refinement.  I don't know of any reason to do
whole-molecule real space refinement when you are working with crystal
diffraction data.  Reciprocal space is where the experiment lives and
the analysis should be done there.

   In cases of model building it is computationally quicker to do a
local real space refinement to touch up a model just so you can see if
it looks reasonable before going back into reciprocal space.  This real
space refinement is quick-and-dirty and any flaws will be erased by the
proper reciprocal space refinement that follows.

   As for "neighborhood correlation" I was thinking of cryo-EM maps.
Since the individual measurements (pictures) are real space in nature, I
can't imagine an experimental error in the voltage of one voxel wouldn't
tend to show up similarly in its neighbors.  The whole group of voxels
will be illuminated by electrons who all had very similar histories
passing through the microscope.

   We have similar situations with diffraction data.  A reflection whose
neighbor is in a shadow has a much higher chance of being shadowed
itself.  Our spots, however, are much further apart on the detector than
the voxels of an EM image.

   There is another type of correlation that is probably more important.
 Our diffraction spots are separated enough that you cannot predict the
intensity of a reflection based on its neighbors.  You can make a very
good prediction of the darkness of a voxel based on its neighbors.  If
you leave out one voxel, as a test set member, you could easily deduce
its hidden value without even building a molecular model - just
interpolate.  You can't do that with diffraction data.

   This means if you want to leave out a chunk of map data for a test
set you have to pull out a big enough piece (many contiguous voxels)
that you can't deduce anything about their opaqueness from the remaining
image.  To do this you have to know something about how the microscope
works.

Dale Tronrud

On 6/13/2017 4:08 PM, Edward A. Berry wrote:
>>  To unbias you would have to calculate a
>> new map with current Fcalc's for every iteration of the model, but this
>> method would not take into account the neighborhood correlation present
>> in experimental maps.)
> 
> Thanks, Dale,
> Could you explain this "neighborhood correlation"?
> 
> My very simple (maybe too simple) understanding of how real space would
> bias reflections is as follows:
> 
> You make a map using phases (and Fc?) from the current model and Fobs.
> But you omit the free set.
> 
> Now if you take the fourier transform of that unmodified map, you would
> get back
> exactly the coefficients you put in: 2Fo-Fc (?) for the working
> reflections,
> and zero for the free set.
> 
> Then you make modifications to the model to make its density match as
> nearly
> as possible the density of the map. If you were able to make the density of
> the model exactly match that of the map, then the Fc for the model would
> be that of the map.
> 
> Of course you can never make the density of the model exactly match that of
> the map - modelization is the severest form of density modification.
> But, to the extent that you make the model's density more nearly like that
> of the map, the Fourier transform of the new model will be more
> like that of the map.
> 
> That means for the working reflections, Fc will get closer to 2Fo-Fc
> which brings them closer to Fo; and the R-work improves.
> (If there is error in the Fobs, that will be reflected in the map,
> and the Fcalc of the model will tend toward these eromeous Fobs
> (fitting the error) and Rwork will get better than it should (bias).)
> Free reflections will move closer to zero, and most likely Rfree
> will get worse.
> 
> I think that's all consistent with what you wrote, but then
> I had the impression that the bias could be prevented by making the
> map with Fc for the test set (proposed in an old paper by Ivan
> Rayment.  That way the free reflections get are following the
> process by their coupling to neighboring reflections in reciprocal
> space (neighborhood correlation?), the same way  they do in reciprocal
> space refinement, rather than the Fobs being used. The information
> in these free Fcalc is coming from the neighboring working reflections
> due to redundancy of information in a finely sampled molecular transform.
> 
> Ed
> 
> 
> 
> On 06/13/2017 05:40 PM, Dale Tronrud wrote:
>>
>>
>> On 6/13/2017 12:30 PM, Edward A. Berry wrote:
>>> Thanks, Pavel,
>>> I really appreciate your taking the time to generate the example.
>>>
>>> While I agree with Tim and Ian that refinement to convergence should
>>> remove the bias making it perhaps not a serious problem, my question was
>>> in fact whether there is any bias immediately after the refinement.
>>>
>>> I will need to study this example a bit, but one thing I notice is
>>> that you are doing exactly what I was guessing, comparing Rfree
>>> after real-space refinement with and without using the free set.
>>> Then, I still think, we
>>>>>> have to think about how much of that difference results from
>>>>>> bias towards the observed values (when the reflections are included)
>>>>>> and
>>>>>> how much is from bias towards zero (when the free set is excluded).
>>>
>>     Of course the model is refined as though the test set Fourier
>> components were equal to zero.  In reciprocal space refinement when you
>> leave a reflection out of the "sum over all reflections" when
>> calculating the difference map you are saying that you have no opinion
>> about the amplitude of that reflection.  When you calculate a real space
>> map from Fourier coefficients you can't not have an opinion,  i.e. you
>> can't leave a term out of the sum you can only set that term to zero.
>> If your model produces a prediction for that term which is not equal to
>> zero it will be penalized.  (If you set that term to Fcalc you tie your
>> model to its starting point.  To unbias you would have to calculate a
>> new map with current Fcalc's for every iteration of the model, but this
>> method would not take into account the neighborhood correlation present
>> in experimental maps.)
>>
>>     What this means is that Rfree is not a meaningful stat for assessing
>> overfitting of real space refinement.  This is hardly a surprise.  A
>> test of a refinement protocol has to be based on the mathematics of that
>> protocol, not the protocol you happened to have used yesterday.  If you
>> want an unbiased estimate of the quality of a real space refinement you
>> have to leave out a region of the map and then see how well the model
>> fits that region.  This is harder to do in an automated fashion and
>> there will be a lot of caveats about your results (e.g. you know about
>> the ability to fit one region but does that generalize to other areas?).
>>   If you recall there are a lot of caveats about Rfree too - we have just
>> stopped worrying about them.  (e.g. low resolution vrs high resolution
>> reflections, choosing based on shells or randomly, what to do about
>> ncs...)
>>
>>     I think you should consider yourself on the wrong track if you come
>> up with a statistical test, but haven't given any thought to the actual
>> experiment that produced your map.
>>
>> Dale Tronrud
>>
>>> Things I need to look at-
>>> What are R and R-free for the original refined model
>>> What are R and R-free after shaking (did RSR lower R but not Rfree, or
>>> did it raise Rfree?
>>> What if RSR is done using a map made with fill-in strategy?
>>>
>>> Ed
>>>
>>> On 06/13/2017 02:15 PM, Pavel Afonine wrote:
>>>> Hi Ed,
>>>>
>>>> Including free-r reflections into map calculation and then using such
>>>> map in real-space refinement of entire model will affect Rfree. Here
>>>> is a simple example that illustrates my statement, step-by-step:
>>>>
>>>> 1) Get data and model from PDB:
>>>>
>>>> phenix.fetch_pdb 1f8t --mtz
>>>>
>>>> 2) Compute two 2mFo-DFc maps: one includes all reflections the other
>>>> one has no free-r terms:
>>>>
>>>> phenix.python run.py 1f8t.{pdb,mtz}
>>>>
>>>> This will create an MTZ file (map_coeffs.mtz) that contains Fourier
>>>> map coefficients for both maps.
>>>>
>>>> 3) Shake model a bit:
>>>>
>>>> phenix.dynamics 1f8t.pdb number_of_steps=500
>>>>
>>>> 4) Run real-space refinement using two maps:
>>>>
>>>> phenix.real_space_refine map_coeffs.mtz 1f8t_shaken.pdb
>>>> label="work,PHIwork" ncs_constraints=false output.file_name_prefix=work
>>>>
>>>> phenix.real_space_refine map_coeffs.mtz 1f8t_shaken.pdb
>>>> label="all,PHIall" ncs_constraints=false output.file_name_prefix=all
>>>>
>>>> 5) Compute R-factors using data and real-space refined models:
>>>>
>>>> phenix.model_vs_data 1f8t.mtz all_real_space_refined.pdb
>>>>       r_work(re-computed)                : 0.2419
>>>>       r_free(re-computed)                : 0.2441
>>>>
>>>> phenix.model_vs_data 1f8t.mtz work_real_space_refined.pdb
>>>>       r_work(re-computed)                : 0.2444
>>>>       r_free(re-computed)                : 0.2756
>>>>
>>>> The result is self-explicable and is inline with Tom's reply to Wei.
>>>>
>>>> All files necessary to reproduce calculations above are here:
>>>> http://cci.lbl.gov/~afonine/tmp/
>>>>
>>>> All the best,
>>>> Pavel
>>>>
>>>>
>>>> On 6/8/17 10:05, Tim Gruene wrote:
>>>>> Hi Ed,
>>>>>
>>>>> including the 'free' reflections in the map for modelling does not
>>>>> taint the
>>>>> value of Rfree. That is a misconception that i s very persistent (as
>>>>> prejudice
>>>>> usually are). I believe it was Ian Tickle who formulated that when
>>>>> you simply
>>>>> refine long enough towards convergence, all reflections excluded from
>>>>> refinement
>>>>> will become independent, i.e. you can assign a new set for Rfree
>>>>> every time
>>>>> you refine, if you wish so.
>>>>>
>>>>> This concept is the reason why Rcomplete (the "better" equivalent to
>>>>> Rfree for
>>>>> small data sets with < 10,000 unique reflections), introduced by Axel
>>>>> Brunger,
>>>>> works, as we could demonstrate in     doi: 10.1073/pnas.1502136112
>>>>>
>>>>> So nothing to worry about when including all reflections in map
>>>>> calculations.
>>>>>
>>>>> Cheers,
>>>>> Tim
>>>>>
>>>>> On Thursday, June 8, 2017 12:42:53 PM CEST Edward A. Berry wrote:
>>>>>> Hi, Tom,
>>>>>> Please forgive what may be a silly question from an outsider who
>>>>>> hasn't
>>>>>> really kept up with the crystallography literature or even all the
>>>>>> Phenix
>>>>>> newsletters- What is the evidence that including the free set in
>>>>>> real space
>>>>>> refinement biases R-free of the resulting model? Is this Rfree also
>>>>>> biased
>>>>>> when map coefficients use "fill-in" for the excluded free
>>>>>> reflections (and
>>>>>> is that what phenix.remove_free_from_map does?).
>>>>>>
>>>>>> My point is that literally excluding the free reflections, as
>>>>>> opposed to
>>>>>> substituting their values with Fc, will bias the free set toward
>>>>>> grossly
>>>>>> incorrect values (namely zero) and therefore greatly worsen R-free.
>>>>>> Thus if
>>>>>> the evidence for bias is that you get worse R-free when you
>>>>>> exclude the
>>>>>> free set, you have to think about how much of that difference
>>>>>> results from
>>>>>> bias towards the observed values (when the reflections are included)
>>>>>> and
>>>>>> how much is from bias towards zero (when the free set is excluded).
>>>>>> (Again, I realize this may be all very well understood by the
>>>>>> crystallography community and properly taken care of in phenix; I'm
>>>>>> just
>>>>>> asking for my own information) eab
>>>>>>
>>>>>> On 06/08/2017 07:28 AM, Terwilliger, Thomas Charles wrote:
>>>>>>> ​Hi Wei,
>>>>>>>
>>>>>>>
>>>>>>> I want to give a word of caution about how to use
>>>>>>> phenix.map_to_model on
>>>>>>> crystallographic data...The bottom line is you should remove the
>>>>>>> test set
>>>>>>> from your map coefficients before running phenix.map_to model on
>>>>>>> X-ray
>>>>>>> data.  Here is why:
>>>>>>>
>>>>>>>
>>>>>>> phenix.map_to_model uses real-space refinement, which is refinement
>>>>>>> against the map. If you supply map coefficients that include your
>>>>>>> test
>>>>>>> reflections, then you will be refining against data that is in your
>>>>>>> test
>>>>>>> set.   This will make your Rfree invalid when you go back and
>>>>>>> refine your
>>>>>>> model against the original crystallographic data.
>>>>>>>
>>>>>>>
>>>>>>> To remove the test set from your map coefficients you can use:
>>>>>>>
>>>>>>>
>>>>>>> phenix.remove_free_from_map  map_coeffs=my_map_coeffs.mtz
>>>>>>> free_in=my_data_file_with_freeR_flags.mtz
>>>>>>> mtz_out=my_map_coeffs_no_free.mtz
>>>>>>>
>>>>>>>
>>>>>>> Also note that phenix.map_to_model uses a fixed map (it does not do
>>>>>>> density modification).  Consequently for most crystallographic
>>>>>>> data at
>>>>>>> moderate resolution or higher phenix.autobuild is going to do much
>>>>>>> better
>>>>>>> than phenix.map_to_model.
>>>>>>>
>>>>>>>
>>>>>>> All the best,
>>>>>>>
>>>>>>> Tom T
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------- *From:*dingding830106 at 163.com
>>>>>>> <dingding830106 at 163.com>  on behalf ofdancingdream at 163.com
>>>>>>> <dancingdream at 163.com>  *Sent:* Tuesday, June 6, 2017 9:16 PM
>>>>>>> *To:* Terwilliger, Thomas Charles
>>>>>>> *Cc:*phenixbb at phenix-online.org
>>>>>>> *Subject:* Re:Re: [phenixbb] phenix.map_to_model input mtz file
>>>>>>> failure
>>>>>>> Dear Thomas,
>>>>>>> I use CAD to convert the labels from FDM->FWT, PHIDM->PHFWT, then
>>>>>>> submit
>>>>>>> this job again (without map_coeffs_labels=... ), and everything
>>>>>>> seems ok.
>>>>>>> Thank you very much for you help.
>>>>>>> Best!
>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> Wei Ding
>>>>>>> P.O.Box 603
>>>>>>> The Institute of Physics,Chinese Academy of Sciences
>>>>>>> Beijing,China
>>>>>>> 100190
>>>>>>> Tel: +86-10-82649083
>>>>>>>
>>>>>>> E-mail:dingwei at iphy.ac.cn  <mailto:wangli at moon.ibp.ac.cn>
>>>>>>>
>>>>>>> At 2017-06-07 10:32:14, "Terwilliger, Thomas Charles"
>>>>> <terwilliger at lanl.gov>  wrote:
>>>>>>>       Hi Wei,
>>>>>>>
>>>>>>>
>>>>>>>       I'm sorry for the trouble!
>>>>>>>
>>>>>>>
>>>>>>>       If you supply an MTZ file that has FWT,PHFWT or similar
>>>>>>> labels, then
>>>>>>>       you can skip the "labels=...." statement and it should run.
>>>>>>>
>>>>>>>
>>>>>>>       Let me know if that does not work!
>>>>>>>       All the best,
>>>>>>>
>>>>>>>       Tom T
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>>       ---------- *From:*phenixbb-bounces at phenix-online.org
>>>>>>>       <mailto:phenixbb-bounces at phenix-online.org>
>>>>>>>       <phenixbb-bounces at phenix-online.org
>>>>>>>       <mailto:phenixbb-bounces at phenix-online.org>> on behalf of
>>>>>>>       dancingdream at 163.com  <mailto:dancingdream at 163.com>
>>>>>>>       <dancingdream at 163.com  <mailto:dancingdream at 163.com>> *Sent:*
>>>>>>> Tuesday,
>>>>>>>       June 6, 2017 8:19 PM
>>>>>>>       *To:*phenixbb at phenix-online.org
>>>>>>> <mailto:phenixbb at phenix-online.org>
>>>>>>>       *Subject:* [phenixbb] phenix.map_to_model input mtz file
>>>>>>> failure
>>>>>>>       Dear Phenix bb,
>>>>>>>       I intend to build a initial model by phenix.map_to_model.
>>>>>>> And the
>>>>>>>       command line is as follows: phenix.map_to_model_1.12rc0-2787
>>>>>>>       map_coeffs_file=../rep_dm.mtz map_coeffs_labels="'FP,SIGFP'
>>>>>>> 'PHIDM'
>>>>>>>       'FOMDM'" seq_file=../resolve.seq  is_crystal=True
>>>>>>>       use_sg_symmetry=True  density_select=False
>>>>>>> truncate_at_d_min=True
>>>>>>>       and the feedback like this:
>>>>>>>       Sorry: No initial assignment made for map_coeffs. Labels used:
>>>>>>>       FP,SIGFP PHIDM FOMDM. Available labels: ['PHIB', 'FOM',
>>>>>>>       'HLA,HLB,HLC,HLD', 'FP,SIGFP', 'PHIDM', 'FOMDM', 'FDM',
>>>>>>>       'HLADM,HLBDM,HLCDM,HLDDM'] NOTE: grouped labels like
>>>>>>> 'FP,SIGFP' must
>>>>>>>       stay together,
>>>>>>>       have commas, and have no spaces. If they come from an MTZ
>>>>>>> file,
>>>>>>>       they must be in adjacent columns as well.
>>>>>>>       Suggested labels to use:  PHIDM  FOMDM
>>>>>>>       I try many other input format of map_coeffs_labels, such as
>>>>>>>       map_coeffs_labels="FP,SIGFP PHIDM FOMDM"
>>>>>>>       map_coeffs_labels=["FP,SIGFP PHIDM FOMDM"]
>>>>>>>       ... ...
>>>>>>>       but the result is the same. Dose anyone can tell me how to fix
>>>>>>> this
>>>>>>>       problem? Thank a lot.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>       --
>>>>>>>       Wei Ding
>>>>>>>       P.O.Box 603
>>>>>>>       The Institute of Physics,Chinese Academy of Sciences
>>>>>>>       Beijing,China
>>>>>>>       100190
>>>>>>>       Tel: +86-10-82649083
>>>>>>>       E-mail:dingwei at iphy.ac.cn  <mailto:wangli at moon.ibp.ac.cn>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> phenixbb mailing list
>>>>>>> phenixbb at phenix-online.org
>>>>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>>>>> Unsubscribe:phenixbb-leave at phenix-online.org
>>>>>> _______________________________________________
>>>>>> phenixbb mailing list
>>>>>> phenixbb at phenix-online.org
>>>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>>>> Unsubscribe:phenixbb-leave at phenix-online.org
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> phenixbb mailing list
>>>>> phenixbb at phenix-online.org
>>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>>> Unsubscribe:phenixbb-leave at phenix-online.org
>>>>
>>> _______________________________________________
>>> phenixbb mailing list
>>> phenixbb at phenix-online.org
>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>> Unsubscribe: phenixbb-leave at phenix-online.org
>> _______________________________________________
>> phenixbb mailing list
>> phenixbb at phenix-online.org
>> http://phenix-online.org/mailman/listinfo/phenixbb
>> Unsubscribe: phenixbb-leave at phenix-online.org
>>
> 


More information about the phenixbb mailing list