AI automation of structure determination

Authors

Thomas C Terwilliger

Purpose

The Phenix AI agent will carry out an automated analysis of your X-ray or cryo-EM data. Its purpose is to help beginners see how to run Phenix tools, and to help experienced users to quickly evaluate their data and standard approaches to structure determination. The AI analysis tool can be accessed in the Phenix GUI.

How the AI Agent works

The AI Agent expects to be given data files from crystallography (mtz files) or from cryo-EM (mrc or ccp4 maps), a sequence file, and optional model files or ligand files. You can tell it what files to use by specifying a directory containing the files or by loading the files in the AI Agent GUI window.

The AI Agent also accepts instructions. You can say things like "first run xtriage and then molecular replacement", or "solve the structure", or "use main.number_of_macro_cycles=1 in phenix refine" or whatever else you want to tell it to do with your X-ray or cryo-EM data.

Note on instructions to the AI Agent: if you are supplying keywords for the agent, then you usually need to specify them exactly as you would on the command-line. For example you can say for phenix.refine:

use main.number_of_macro_cycles=1 in refinement

but you should not usually say:

use one macro_cycle in refinement

(Actually macro-cycles is a special case that the agent will recognize and convert to main.number_of_macro_cycles=1, but most commands are not converted like this.)

You can supply those directions in the GUI. If your directory contains a README file it can read that file and do (more or less) what it says.

The AI Agent is supplied with a number of standard structure solution pathways. It will try to fit your data into one of these, guided additionally by any instructions you supply. A typical pathway for X-ray data, for example, might be xtriage -> autosol -> autobuild if the data are found to have an anomalous signal in xtriage, or xtriage -> predict_and_build -> refine if they do not. For cryo-EM data, a typical pathway might be mtriage -> resolve_cryo_em -> dock_in_map -> real_space_refine. If you supply a ligand the pathways would end up with fitting the ligand and refining the model with the ligand.

You can modify most aspects of the structure solution pathways with your instructions, as long as the necessary data is available.

When you run the agent in the GUI, it will load all the successful runs in the normal way, so you have a complete record of what was done and so you can restore any of the runs that it made for you.

This type of AI does not save or learn from your questions. However the information in your log file is sent to the Phenix server, and from there, on to Google gemini and OpenAI.

What to do with the AI analysis

The purpose of the AI Agent is to help you interpret your data and to suggest ways to analyze it. The agent will also suggest next steps. You should keep in mind that these tools can make mistakes and give you incorrect interpretations and poor suggestions at times, so you want to just take the information as suggestions to think about.

You can combine this AI analysis with the Phenix chatbot . The chatbot can give you interactive answers to your questions using the same database of information as the AI analysis. This allows you to follow up on the AI analysis with questions to the chatbot. You can also paste part of the output from the AI analysis into the chatbot along with a question to get more context.

Job control with the AI Agent

If you have already run an AI agent job, you can restore it in the GUI in the usual way by clicking on Job History and then clicking on the run you are interested in. At the bottom of the Configure panel of the restored AI Agent job, there are keywords you can set for:

Restart mode: Fresh (start a new job) or Resume (keep going with this job)

Display Session and stop: None (do not do this) or Basic (show basic info and stop or Detailed (show a lot of info and stop)

Remove last N cycles and stop: blank (do not do this) or a number like 5 (remove last 5 cycles and stop).

You can use these commands to see what the AI Agent has done, to remove the last few jobs, or to go on from where you are now.

You can change the advice for the AI Agent when you run with Restart mode = Resume if you want. It will try to do what you ask it to do.

Limitations of the AI Agent

The AI agent is limited to the programs that it has been set up to use, so some Phenix programs cannot be run with the AI agent. The LLMs used in choosing programs to run have varying capabilities, and they can sometimes suggest doing steps that might not be the best choices. The AI analysis used in the agent only knows what is in the documentation, the videos and newsletters, and the papers we have supplied.

AI tools like this one can also just make mistakes and give incorrect answers. This does not seem to happen too often with this tool, but you need to always be on alert when using it. Use the tool as a helper, don't expect it to always be right.

If a detail is missing in the documentation, the AI may not know about it.

Privacy in the AI Agent

If you use the AI Agent tool, your log files are sent to the Phenix server, and from there on to OpenAI and Gemini. That means the data could potentially be used by OpenAI, and Google in any way that they use other AI data that is sent to them.

Note that access to OpenAI and Gemini is non-commerical only.

Running with Google, Ollama, or OpenAI

Normally you will use Google (gemini) as the AI LLM for your AI agent. This requires an API key for google (supplied with Phenix). You can also use Ollama, which is run on the Phenix server, or OpenAI, which also uses a supplied API key. You can run AI analyses with Google or OpenAI without getting your own key, as a shared set of keys is supplied. The number of analyses with these shared keys is limited, however.

Running locally with Ollama

If you install Ollama on your machine (normally this is if you have a GPU machine), you can use you local installation of Ollama for all the LLM calls that the AI Agent makes. In this way, you can run everything locally and not send any data outside of your machine.

Once you install Ollama, you will want to it set up like this to get the right models (note: these may change with Phenix versions):

ollama pull llama3.1:70b

ollama pull llama3.1:8b

ollama pull nomic-embed-text

You will want to run it with environmental variables like these:

setenv CUDA_VISIBLE_DEVICES 0,1,2

setenv OLLAMA_NUM_PARALLEL 6

setenv OLLAMA_HOST 0.0.0.0:11434

Then when you run the agent, use the setting:

run_on_server=False