FASTA format for sequence file
Hello, I'm trying to run Phase-MR on Phenix (older version) and having trouble formatting my sequence properly. I keep getting error "Protein Sequence not recognized in Composition". I have a good idea of what FASTA should look like (carrot with a name, no spaces or weird characters, followed by the sequence in all capital letter on the next line). I have tried many things, mainly using gedit and plain text files (.txt). I've made sure the file name doesn't have any strange characters as well. If I cut and paste a FASTA file from the PDB into gedit, save that at a .txt, Phaser accepts it. If I cut and paste my protein sequence into the same file, it rejects it. Suggestions?
Hi, I have some recollection that we may have made the error report more informative, if you're using an older version. Here, off the top of my head, are some possibilities for what is wrong: 1. The sequence contains a letter other than the 20 representing standard amino acids. 2. When you say "carrot", you mean "caret", i.e. the character "^", or something else other than the greater-than sign ">" that is used to identify the line with the name. 3. Something that looks like a normal allowed letter is actually a similar-looking special symbol. When other people have had similar problems, it has sometimes been due to saving a sequence as RTF or Word format, but I think you've ruled out that possibility with your tests. On a Unix system, you can make sure by giving a command like "file mysequence.seq" and checking that it returns "ASCII text". If none of those possibilities cover what is happening, could you send me examples of the sequence files that are and are not accepted by Phaser? Thanks! Best wishes, Randy Read
On 5 Feb 2018, at 23:54, [email protected] wrote:
Hello,
I'm trying to run Phase-MR on Phenix (older version) and having trouble formatting my sequence properly. I keep getting error "Protein Sequence not recognized in Composition". I have a good idea of what FASTA should look like (carrot with a name, no spaces or weird characters, followed by the sequence in all capital letter on the next line). I have tried many things, mainly using gedit and plain text files (.txt). I've made sure the file name doesn't have any strange characters as well. If I cut and paste a FASTA file from the PDB into gedit, save that at a .txt, Phaser accepts it. If I cut and paste my protein sequence into the same file, it rejects it. Suggestions? _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb Unsubscribe: [email protected]
------ Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical Research Tel: + 44 1223 336500 Wellcome Trust/MRC Building Fax: + 44 1223 336827 Hills Road E-mail: [email protected] Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk
participants (2)
-
egoers@uoregon.edu
-
Randy Read