Dear Louis,

apologies for the late response, it was a long time ago I wrote the code and had to look at it to be able to answer your query.

I am assuming that you want to search .hhr-files from hhsearch (i.e. with multiple hits) as opposed to hhalign. If this is case, tha parser goes ca 50% of what you need, in that in captures the PDB id and also the alignment sequence, but not the midline. It would not be impossible to extend the parser to handle this, but currently, it does not. Would this be sufficient?

However, if you plan to process hhalign output, the parser gets everything out, including the midline.

Best wishes, Gabor

On Wed, Apr 22, 2020 at 2:51 PM Louis Dumas <louis.dumas@epfl.ch> wrote:

Dear CCTBX developers,

I am a postdoc at EPFL working with HHpred for homology modeling of membrane proteins.

I have been trying to write my own HHpred alignment parser until I found the python script under “cctbx_fork/iotbx/bioinformatics/__init__.py/” that contains an HHpred parser.

My goal is to correctly parse the raw HHpred output file (.hhr), which involves unwrapping every alignment, parsing out a lot of text to finally obtain something like this:

>pdb_name
query-sequence
column score

Example:

>4U15
VYGFIGGIFGFMSIMTMAMISIDRYNVIGRPMAASKKMSHRRAFIMIIFVWLWS
+........+..+..++|+++|++++.++.+.++++ +..+.++.+|+++|++.++...+........ +...|..

Being somewhat new to python, I was wondering whether the people who wrote this script are still around and could help me figure out whether the parser could be implemented in such a way.

Thanks for any help you can provide!

Best,

Louis D
_______________________________________________
cctbxbb mailing list
cctbxbb@phenix-online.org
http://phenix-online.org/mailman/listinfo/cctbxbb