Ok, the problem occurs in the cif parser that gets tried in reflection_file_reader.try_all_readers(file_name). In incomprehensible antlr3 code, every word in the file gets processed into a token, so any large file with many words per line will temporarily use gigabytes of memory when constructing a iotbx.cif.reader, specifically on line 

iotbx/cif/__init__.py:68: self.parser = ext.fast_reader(builder, input_string, file_path, strict)

The most limited fix would be that try_all_readers will only try the cif reader if the file extension is cif (mod capitalization). Is that an ok assumption? 

Martin, for a temporary fix, you can apply this diff in cctbx_project:

diff --git a/iotbx/reflection_file_reader.py b/iotbx/reflection_file_reader.py
index 3c1274fb27..3548eb8415 100644
--- a/iotbx/reflection_file_reader.py
+++ b/iotbx/reflection_file_reader.py
@@ -126,6 +126,7 @@ def try_all_readers(file_name):
   except Exception: pass
   else: return ("shelx_hklf", content)
   try:
+    assert os.path.splitext(file_name)[1].lower() == '.cif'
     content = cif_reader(file_path=file_name)
     looks_like_a_reflection_file = False
     for block in content.model().values():


Best,
Dan

On Mon, Nov 29, 2021 at 11:33 AM Daniel Paley <dwpaley@lbl.gov> wrote:
Hi Martin,
Are you able to share the hkl file? (Privately if necessary)
I just wrote some steps for memory analysis here: 
https://github.com/cctbx/cctbx_project/blob/master/dox/rst/debug.md so might be able to help. 
Dan

On Mon, Nov 29, 2021 at 11:30 AM Martin Malý <martin.maly@ibt.cas.cz> wrote:
Dear PHENIX & CCTBX developers and users,

I tried to calculate merging statistics with CCTBX tools using Python 3.
I realized that the memory requirements are much higher comparing with
Python 2... I closed all programs firstly so I had just 1.6 GB RAM used
of 7.6 GB total RAM. Then I ran these two lines of code in the
cctbx.python shell:

import iotbx.merging_statistics
i_obs = iotbx.merging_statistics.select_data(file_name="XDS_ASCII.HKL",
data_labels=None)

Python 2.7: The RAM usage went to 4.8 GB and then I was able to
calculate merging statistics.
Python 3.7: The module import was successful. Then the RAM usage went to
total 7.6 GB and then the process was killed by the operating system
(CentOS 7).

Please, do you have any suggestion how to use this module "more
carefully" and save memory?
Thank you!
Best regards,
Martin Malý
-----

Upozornění: Není-li v této zprávě výslovně uvedeno jinak, má tato e-mailová zpráva nebo její přílohy pouze informativní charakter. Tato zpráva ani její přílohy v žádném ohledu Biotechnologický ústav AV ČR, v. v. i. k ničemu nezavazují. Text této zprávy nebo jejích příloh není návrhem na uzavření smlouvy, ani přijetím případného návrhu na uzavření smlouvy, ani jiným právním jednáním směřujícím k uzavření jakékoliv smlouvy a nezakládá předsmluvní odpovědnost Biotechnologického ústavu AV ČR, v. v. i.

Disclaimer: If not expressly stated otherwise, this e-mail message (including any attached files) is intended purely for informational purposes and does not represent a binding agreement on the part of Institute of Biotechnology of the Czech Academy of Sciences. The text of this message and its attachments cannot be considered as a proposal to conclude a contract, nor the acceptance of a proposal to conclude a contract, nor any other legal act leading to concluding any contract, nor does it create any pre-contractual liability on the part of Institute of Biotechnology of the Czech Academy of Sciences.

_______________________________________________
phenixbb mailing list
phenixbb@phenix-online.org
http://phenix-online.org/mailman/listinfo/phenixbb
Unsubscribe: phenixbb-leave@phenix-online.org