<div dir="ltr">Ok, the problem occurs in the cif parser that gets tried in reflection_file_reader.try_all_readers(file_name). In incomprehensible antlr3 code, every word in the file gets processed into a token, so any large file with many words per line will temporarily use gigabytes of memory when constructing a iotbx.cif.reader, specifically on line <div><br></div><div>iotbx/cif/__init__.py:68: self.parser = ext.fast_reader(builder, input_string, file_path, strict)<div><br></div><div>The most limited fix would be that try_all_readers will only try the cif reader if the file extension is cif (mod capitalization). Is that an ok assumption? </div></div><div><br></div><div>Martin, for a temporary fix, you can apply this diff in cctbx_project:</div><div><br></div><div><font face="monospace">diff --git a/iotbx/reflection_file_reader.py b/iotbx/reflection_file_reader.py<br>index 3c1274fb27..3548eb8415 100644<br>--- a/iotbx/reflection_file_reader.py<br>+++ b/iotbx/reflection_file_reader.py<br>@@ -126,6 +126,7 @@ def try_all_readers(file_name):<br>   except Exception: pass<br>   else: return (&quot;shelx_hklf&quot;, content)<br>   try:<br>+    assert os.path.splitext(file_name)[1].lower() == &#39;.cif&#39;<br>     content = cif_reader(file_path=file_name)<br>     looks_like_a_reflection_file = False<br>     for block in content.model().values():</font><br></div><div><font face="monospace"><br></font></div><div><font face="arial, sans-serif">Best,</font></div><div><font face="arial, sans-serif">Dan</font></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 29, 2021 at 11:33 AM Daniel Paley &lt;<a href="mailto:dwpaley@lbl.gov">dwpaley@lbl.gov</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="auto">Hi Martin,</div><div dir="auto">Are you able to share the hkl file? (Privately if necessary)</div><div dir="auto">I just wrote some steps for memory analysis here: <div dir="auto"><a href="https://github.com/cctbx/cctbx_project/blob/master/dox/rst/debug.md" target="_blank">https://github.com/cctbx/cctbx_project/blob/master/dox/rst/debug.md</a> so might be able to help. </div><div dir="auto">Dan</div></div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 29, 2021 at 11:30 AM Martin Malý &lt;<a href="mailto:martin.maly@ibt.cas.cz" target="_blank">martin.maly@ibt.cas.cz</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">Dear PHENIX &amp; CCTBX developers and users,<br>

<br>

I tried to calculate merging statistics with CCTBX tools using Python 3.<br>

I realized that the memory requirements are much higher comparing with<br>

Python 2... I closed all programs firstly so I had just 1.6 GB RAM used<br>

of 7.6 GB total RAM. Then I ran these two lines of code in the<br>

cctbx.python shell:<br>

<br>

import iotbx.merging_statistics<br>

i_obs = iotbx.merging_statistics.select_data(file_name=&quot;XDS_ASCII.HKL&quot;,<br>

data_labels=None)<br>

<br>

Python 2.7: The RAM usage went to 4.8 GB and then I was able to<br>

calculate merging statistics.<br>

Python 3.7: The module import was successful. Then the RAM usage went to<br>

total 7.6 GB and then the process was killed by the operating system<br>

(CentOS 7).<br>

<br>

Please, do you have any suggestion how to use this module &quot;more<br>

carefully&quot; and save memory?<br>

Thank you!<br>

Best regards,<br>

Martin Malý<br>

-----<br>

<br>

Upozornění: Není-li v této zprávě výslovně uvedeno jinak, má tato e-mailová zpráva nebo její přílohy pouze informativní charakter. Tato zpráva ani její přílohy v žádném ohledu Biotechnologický ústav AV ČR, v. v. i. k ničemu nezavazují. Text této zprávy nebo jejích příloh není návrhem na uzavření smlouvy, ani přijetím případného návrhu na uzavření smlouvy, ani jiným právním jednáním směřujícím k uzavření jakékoliv smlouvy a nezakládá předsmluvní odpovědnost Biotechnologického ústavu AV ČR, v. v. i.<br>

<br>

Disclaimer: If not expressly stated otherwise, this e-mail message (including any attached files) is intended purely for informational purposes and does not represent a binding agreement on the part of Institute of Biotechnology of the Czech Academy of Sciences. The text of this message and its attachments cannot be considered as a proposal to conclude a contract, nor the acceptance of a proposal to conclude a contract, nor any other legal act leading to concluding any contract, nor does it create any pre-contractual liability on the part of Institute of Biotechnology of the Czech Academy of Sciences.<br>

<br>

_______________________________________________<br>

phenixbb mailing list<br>

<a href="mailto:phenixbb@phenix-online.org" target="_blank">phenixbb@phenix-online.org</a><br>

<a href="http://phenix-online.org/mailman/listinfo/phenixbb" rel="noreferrer" target="_blank">http://phenix-online.org/mailman/listinfo/phenixbb</a><br>

Unsubscribe: <a href="mailto:phenixbb-leave@phenix-online.org" target="_blank">phenixbb-leave@phenix-online.org</a><br>

</blockquote></div></div>

</blockquote></div>