Solving MR solution without sequence information
Hi all, I'm in a bit of bind here and am seeking some advice. For context, a former graduate student in our lab set crystal trays of an MBP fusion protein, the fused part after MBP being ~400 amino acids long. This region is also predicted to be mostly unstructured, but has a C-terminal SH3 domain. Our graduate student then graduated and before throwing out some of her trays a year or two later, we found some hits of the MBP fusion protein that actually diffracted to 2.9 Angstrom. I spent some time working on it after we collected the data (June 2021), but because I didn't know what crystallized specifically, it was impossible to phase, and replicating seemed next to impossible, too. The Matthew's coefficient was predicting ~130 amino acids in the ASU, space group C222 or C2221. Since whatever crystallized was clearly a degradation product of the MBP fusion, I tried phasing with SH3 domains and a lot of other things to no avail. As a final last ditch effort I eventually submitted the .mtz file to SBGrid to perform a Wide Search MR job, and low and behold it actually found MR solutions that had TFZ scores ~17 in space group C2221! So here's my current situation-I have been able to phase the data set using the MR search model, but again, I don't know what specifically it is that I've crystallized. I'm currently able to get the Rfree to ~0.4, but can't seem to improve it. I am really at a loss of what to do, since there are obvious backbone issues with the protein (as seen from iterative build composite omit maps), but every time I try to manually correct them it seems to make the Rfree worse. The MR solution does not align very well at all to the MBP fusion, only ~20 identity, and again, I don't know to which ~130 amino acids I crystallized out of the ~400 of the MBP fusion. Is it one continuous stretch, two copies of a shorter stretch, etc.? I tried phasing with a polyalanine model of the MR search model and then tried autobuilding just a polyalanine sequence to get the backbone right, but that doesn't seem to work. Autobuild also fails when trying to various fragments of the MBP fusion sequence. Other than opening coot and manually building the entire polypeptide chain, is there an easier method? I think that once the backbone is totally right the phases will improve so I can start putting in side chains, but I'm not sure. My latest effort is to just use Sculptor prior to Phaser in order to force the sequences to match, but again, I don't know precisely what sequence was crystallized. I have tried both the Phenix and CCP4 software suites, for reference. Any and all help would be much appreciated (and yield an acknowledgement on a paper, if this ever works). Best, Eric Rosenberg CRTA Postdoctoral Fellow Randazzo Lab Laboratory of Cellular and Molecular Biology National Cancer Institute, US
Hi Eric, If I found myself in your situation, I would check to see if there are anymore crystals in the plate. Fish one out and take it to a mass spec facility and have them ’sequence’ it and see if it matches any known proteins. At the very least, it’ll be a way to get information about the sequence of the crystallized protein. Good luck! Eta __ Eta A. Isiorho, Ph.D. Research Assistant Professor Macromolecular Crystallization Facility Manager CUNY Advanced Science Research Center 85 Saint Nicholas Terrace, 3.352B New York, NY 10031 [email protected]
On Feb 3, 2023, at 3:22 PM, Rosenberg, Eric (NIH/NCI) [F]
wrote: Hi all,
I’m in a bit of bind here and am seeking some advice. For context, a former graduate student in our lab set crystal trays of an MBP fusion protein, the fused part after MBP being ~400 amino acids long. This region is also predicted to be mostly unstructured, but has a C-terminal SH3 domain. Our graduate student then graduated and before throwing out some of her trays a year or two later, we found some hits of the MBP fusion protein that actually diffracted to 2.9 Angstrom. I spent some time working on it after we collected the data (June 2021), but because I didn’t know what crystallized specifically, it was impossible to phase, and replicating seemed next to impossible, too. The Matthew’s coefficient was predicting ~130 amino acids in the ASU, space group C222 or C2221. Since whatever crystallized was clearly a degradation product of the MBP fusion, I tried phasing with SH3 domains and a lot of other things to no avail. As a final last ditch effort I eventually submitted the .mtz file to SBGrid to perform a Wide Search MR job, and low and behold it actually found MR solutions that had TFZ scores ~17 in space group C2221!
So here’s my current situation—I have been able to phase the data set using the MR search model, but again, I don’t know what specifically it is that I’ve crystallized. I’m currently able to get the Rfree to ~0.4, but can’t seem to improve it. I am really at a loss of what to do, since there are obvious backbone issues with the protein (as seen from iterative build composite omit maps), but every time I try to manually correct them it seems to make the Rfree worse. The MR solution does not align very well at all to the MBP fusion, only ~20 identity, and again, I don’t know to which ~130 amino acids I crystallized out of the ~400 of the MBP fusion. Is it one continuous stretch, two copies of a shorter stretch, etc.?
I tried phasing with a polyalanine model of the MR search model and then tried autobuilding just a polyalanine sequence to get the backbone right, but that doesn’t seem to work. Autobuild also fails when trying to various fragments of the MBP fusion sequence. Other than opening coot and manually building the entire polypeptide chain, is there an easier method? I think that once the backbone is totally right the phases will improve so I can start putting in side chains, but I’m not sure. My latest effort is to just use Sculptor prior to Phaser in order to force the sequences to match, but again, I don’t know precisely what sequence was crystallized. I have tried both the Phenix and CCP4 software suites, for reference.
Any and all help would be much appreciated (and yield an acknowledgement on a paper, if this ever works).
Best, Eric Rosenberg
CRTA Postdoctoral Fellow Randazzo Lab Laboratory of Cellular and Molecular Biology National Cancer Institute, US _______________________________________________ phenixbb mailing list [email protected] mailto:[email protected] https://urldefense.com/v3/__http://phenix-online.org/mailman/listinfo/phenix... https://urldefense.com/v3/__http://phenix-online.org/mailman/listinfo/phenix... Unsubscribe: [email protected] mailto:[email protected]
It’s possible that the 130 aa is not a degradation product but a contaminant from the host (E. coli?). Even possibly one that wasn’t detected on SDS-PAGE (believe me, I’ve been there!). Contaminer would be a good next stop - https://strube.cbrc.kaust.edu.sa/contaminer/submit/, also searching the pdb for unit cells that are a close match to yours.
Also as another poster mentioned, try mass spec if you have more crystals available (like I said, I’ve been there!)
From: [email protected]
Hello Eric,
You've had some good responses as to things to do already, but I'll throw in one 'old school' method.
When I had this situation (although with somewhat higher resolution data), I went through the density with Coot and tried to put in residues where I thought I could identify them (Trp, Phe, Cys, Pro, etc). I did this iteratively (with some refinement) until I came up with a stretch of say 8-10 residues where I thought the sequence fit the density reasonably well. I then did a search for that sequence. In your case, if you obtained the protein from E. coli, then I would just search the E. coli set of proteins using something like UniProt. You obviously need to take into account that you won't be able to tell the difference between Asp/Asn and Glu/Gln, so don't look for 100% matches. This allowed me to narrow down the possible proteins to just one or two and I then had a full sequence to work with.
Might be worth a shot.
Best of luck, tom
________________________________
From: [email protected]
Hi everyone, I wanted to thank you sincerely for all of your replies, I learned about so many useful tools and strategies to solve structures in situations like these. As it turns out, it was actually an additive protein that I did not know was in the drop; we contacted our former graduate student and apparently she added an additive to some, but not all, wells from which we took proteins. When using the additive protein for phasing and then performing a single round of refinement, the Rfree was ~0.33. So it wasn't after MBP fusion after all, the additive just crystallized on its own. Lesson for the future-always double-check with the person who actually set the drop! Thank you again, Eric Rosenberg From: Rosenberg, Eric (NIH/NCI) [F] Sent: Friday, February 3, 2023 3:23 PM To: [email protected] Subject: Solving MR solution without sequence information Hi all, I'm in a bit of bind here and am seeking some advice. For context, a former graduate student in our lab set crystal trays of an MBP fusion protein, the fused part after MBP being ~400 amino acids long. This region is also predicted to be mostly unstructured, but has a C-terminal SH3 domain. Our graduate student then graduated and before throwing out some of her trays a year or two later, we found some hits of the MBP fusion protein that actually diffracted to 2.9 Angstrom. I spent some time working on it after we collected the data (June 2021), but because I didn't know what crystallized specifically, it was impossible to phase, and replicating seemed next to impossible, too. The Matthew's coefficient was predicting ~130 amino acids in the ASU, space group C222 or C2221. Since whatever crystallized was clearly a degradation product of the MBP fusion, I tried phasing with SH3 domains and a lot of other things to no avail. As a final last ditch effort I eventually submitted the .mtz file to SBGrid to perform a Wide Search MR job, and low and behold it actually found MR solutions that had TFZ scores ~17 in space group C2221! So here's my current situation-I have been able to phase the data set using the MR search model, but again, I don't know what specifically it is that I've crystallized. I'm currently able to get the Rfree to ~0.4, but can't seem to improve it. I am really at a loss of what to do, since there are obvious backbone issues with the protein (as seen from iterative build composite omit maps), but every time I try to manually correct them it seems to make the Rfree worse. The MR solution does not align very well at all to the MBP fusion, only ~20 identity, and again, I don't know to which ~130 amino acids I crystallized out of the ~400 of the MBP fusion. Is it one continuous stretch, two copies of a shorter stretch, etc.? I tried phasing with a polyalanine model of the MR search model and then tried autobuilding just a polyalanine sequence to get the backbone right, but that doesn't seem to work. Autobuild also fails when trying to various fragments of the MBP fusion sequence. Other than opening coot and manually building the entire polypeptide chain, is there an easier method? I think that once the backbone is totally right the phases will improve so I can start putting in side chains, but I'm not sure. My latest effort is to just use Sculptor prior to Phaser in order to force the sequences to match, but again, I don't know precisely what sequence was crystallized. I have tried both the Phenix and CCP4 software suites, for reference. Any and all help would be much appreciated (and yield an acknowledgement on a paper, if this ever works). Best, Eric Rosenberg CRTA Postdoctoral Fellow Randazzo Lab Laboratory of Cellular and Molecular Biology National Cancer Institute, US
participants (4)
-
Dr. Kevin M Jude
-
Isiorho, Eta
-
Rosenberg, Eric (NIH/NCI) [F]
-
Tom Peat