[phenixbb] how to remove hydrogen from pdb

Edward A. Berry BerryE at upstate.edu
Thu Feb 6 13:07:23 PST 2014


For old or new-style pdb files, try:

  awk '$1!~/ATOM|HETATM/ || $3!~/^H/'  old.pdb > new.pdb

But I understand some new H-atom names don't start with H,
then this won't work.

And if the number of atoms is large enough to fill the second field
and fuse with HETATM in the first field, it won't work unless you
define fixed field widths:

  awk '$1!~/ATOM|HETATM/ || $3!~/^H/' \
  FIELDWIDTHS="6 5 5  4  2 4 4 8 8 8 6 6" \
  old.pdb > new.pdb

.
Maybe a specialized tool like phenix.refine is safest!

Pavel Afonine wrote:
> Hi Phil,
>
> ok, now I tried this:
>
> egrep -v '^ATOM|HETATM.*H$' m.pdb > m_noH.pdb
>
> Input (m.pdb - file right from PDB):
>
> ATOM      1  N   GLY A   1       0.504  -0.494   0.924  1.00
> 7.85           N
> ATOM      2  CA  GLY A   1       1.272   0.589   0.277  1.00
> 6.79           C
> ATOM      3  C   GLY A   1       1.700   1.614   1.301  1.00
> 5.59           C
> ATOM      4  O   GLY A   1       1.434   1.460   2.496  1.00
> 6.04           O
> ATOM      0  H1  GLY A   1       0.408  -1.171   0.354  1.00
> 7.85           H
> ATOM      0  H2  GLY A   1       0.939  -0.775   1.648  1.00
> 7.85           H
> ATOM      0  H3  GLY A   1      -0.298  -0.189   1.160  1.00
> 7.85           H
> ATOM      0  HA2 GLY A   1       2.052   0.220  -0.166  1.00
> 6.79           H
> ATOM      0  HA3 GLY A   1       0.731   1.013  -0.407  1.00
> 6.79           H
> END
>
> Output (m_noH.pdb):
>
> END
>
> Just to be clear: none of the two commands suggested so far worked on a
> valid PDB file (above). So I thought it might be useful to point this out.
>
> All the best,
> Pavel
>
>
> On 2/6/14, 12:21 PM, Phil Jeffrey wrote:
>> So I guess your contribution at this point in the thread is just to be
>> as difficult as possible ?  I dare say if you use it on PROTIN format
>> it won't work either.
>>
>> Try it on a file that actually puts out something that conforms to the
>> PDB standard with the element line at the end, not something that
>> almost conforms to the standard, has non-distinct index numbers,
>> apparently missing spaces on the GLY:N atom.
>>
>> I would hope, modulo the usual list of bugs, that phenix.refine
>> actually writes out something more closely resembling the correct
>> format, in which case Tim's regular expression would actually work.
>>
>> From the original post:
>> >>  Actually i refine my structure with phenix along all hydrogen now
>>
>> Phil Jeffrey
>> Princeton
>>
>> On 2/6/14 3:09 PM, Pavel Afonine wrote:
>>> Thanks Phil,
>>>
>>> did this:
>>>
>>> egrep -v '^ATOM|HETATM.*H$' m.pdb > m_noH.pdb
>>>
>>> Result:
>>>
>>> in input file (m.pdb) I have:
>>>
>>> ATOM      1  N GLY A   1       0.504  -0.494   0.924  1.00  7.85
>>> ATOM      2  CA  GLY A   1       1.272   0.589   0.277  1.00 6.79
>>> ATOM      3  C   GLY A   1       1.700   1.614   1.301  1.00 5.59
>>> ATOM      4  O   GLY A   1       1.434   1.460   2.496  1.00 6.04
>>> ATOM      0  H1  GLY A   1       0.452  -1.280   0.308  1.00 7.85
>>> ATOM      0  H2  GLY A   1       0.959  -0.765   1.772  1.00 7.85
>>> ATOM      0  H3  GLY A   1      -0.420  -0.171   1.131  1.00 7.85
>>> ATOM      0  HA2 GLY A   1       2.157   0.171  -0.225  1.00 6.79
>>> ATOM      0  HA3 GLY A   1       0.659   1.070  -0.499  1.00 6.79
>>> END
>>>
>>> Output file (m_noH.pdb) contains only:
>>>
>>> END
>>>
>>> Pavel
>>>
>>> On 2/6/14, 12:03 PM, Phil Jeffrey wrote:
>>>> Of course, because in the shells that I use it will attempt to do
>>>> variable name substitution in strings that are double-quoted. (I make
>>>> no warranties about all possible shells).  However if you use single
>>>> quotes:
>>>>
>>>> egrep  -v '^ATOM|HETATM.*H$' your.pdb > your_noH.pdb
>>>>
>>>> Should work just fine in tcsh, csh at the very least.
>>>>
>>>> Phil
>>>>
>>>> On 2/6/14 2:52 PM, Pavel Afonine wrote:
>>>>> Hi Tim,
>>>>>
>>>>> On 2/6/14, 10:52 AM, Tim Gruene wrote:
>>>>>> the simple and qucik command
>>>>>>
>>>>>> egrep  -v "^ATOM|HETATM.*H$" your.pdb > your_noH.pdb
>>>>>>
>>>>>> should also work.
>>>>>
>>>>> just out of curiosity I did (copy-paste of your example)
>>>>>
>>>>> egrep -v "^ATOM|HETATM.*H$\" m.pdb > m_noH.pdb
>>>>>
>>>>> and I got:
>>>>>
>>>>> Illegal variable name.
>>>>>
>>>>> Pavel
>>>>>
>>>>> _______________________________________________
>>>>> phenixbb mailing list
>>>>> phenixbb at phenix-online.org
>>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>>
>>>> _______________________________________________
>>>> phenixbb mailing list
>>>> phenixbb at phenix-online.org
>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>
>>>
>>>
>>> _______________________________________________
>>> phenixbb mailing list
>>> phenixbb at phenix-online.org
>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>
>>
>> _______________________________________________
>> phenixbb mailing list
>> phenixbb at phenix-online.org
>> http://phenix-online.org/mailman/listinfo/phenixbb
>
> _______________________________________________
> phenixbb mailing list
> phenixbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/phenixbb
>


More information about the phenixbb mailing list