TextalTM Configuration Guide

Understanding TEXTALTM Environment Variables

An environment variable may contain either the name of a directory or a list of different directories where each directory name/path is seperated by a ':', i.e.,
TEXTAL_DATA=".:/u1/me/mydata:/u2/someone/basic_data:/u3/textal/data"
When TEXTALTM needs to find a file that can be reference via an environment variable, TEXTALTM will search through the listed directories for the appropriate file.  In general, you should always use absolute paths to directories appearing in an environment variable.  One exception is the current directory, or ".".  It's often useful to have "." appear in a directory list so that the directory where you are invoking textal is also searched.

The order of the directories in the list is significant.  The guiding principal is that the further left a file appears in a list, the higher priority it has.  In most cases, TEXTALTM is looking for the first occurance of a file, so it searches the list left-to-right and uses the first occurance of the file found.  For example, suppose TEXTALTM is looking for a file called ss_capra.map in the list given by TEXTAL_DATA above.  Now suppose there are two copies of this file:
/u1/me/mydata/ss_capra.map
/u3/textal/data/ss_capra.map
When TEXTALTM looks for this file, it start searching the list from left-to-right.  The first occurrence of the file found is in /u1/me/mydata, so this file is used rather than the one in /u3/textal/data.
Known TEXTALTM Environment Variables
Name
Use
Required
Notes
TEXTAL_BASE
Where textal is installed (Perl libraries are fetched relative to this location)
No
Used only in sequence alignment
TEXTAL_DATA
Where data and configuration files live.
YES
Defaults to /usr/local/textal/data.
TEXTAL_DIST
Where the textal source distribution is.
No
Only used to locate a reference Texrc that's not in $TEXTAL_DATA
PHD_PATH
Where PHd is installed, or unset if there is no PHd
No

BLAST_PATH
Where psi-blast is installed, or unset if no psi-blast available
No



Searching for Texrc (TEXTALTM Configuration) files

TEXTALTM does things slightly differently when searching for Texrc files.  The reason for this is because the final configuration is actually the union of all found Texrc files.  This is an important point worth repeating: all found Texrc files will be read.  If there are two Texrc files that set an option to different values, the second one read will have priority.  This means that the fundamental principal above still applies: the further left a Texrc is found, the higher its priority will be.  What TEXTALTM actually does is it reads in all Texrc files starting from the right-most directory and works its way left.

Let's imagine that there is a Texrc in /u3/textal/data that sets the map cache size to 200 and another in /u1/me/mydata that sets the map cache size to 100.  Since /u1/me/mydata is more left (i.e. read later), the final value of the map cache size will be 100.

There are several important points to remember here.  The first Texrc that is read in should be considered a "reference" Texrc.  If you want to change some of the Texrc settings, but not all, it is not necessary to have a complete copy of the reference Texrc.  You only need to have settings in your local Texrc that are different.  If you only change the settings that are important to you, then system-wide changes made later will also be incorporated into your runs.  As an example, suppose you like to run with your topk parameter much larger than the default, so you set it to 1000 in ~/.texrc.  Now, imagine someone updates the feature file and gives it a different name.  They change the system-wide Texrc to reflect this change.  If your local Texrc only changes the topk parameter, then your final TEXTALTM configuration will incorporate the change in feature file name automatically.  Had you copied the entire Texrc and edited it, the system-wide change would be masked by the feature file name given in your local Texrc file.

Creating and Using Texrc Files

Finding a Texrc file...

As mentioned above, an important thing to note about Texrc files is that their search-path is handled slightly differently than regular environment variables.  In part, this is because they must be searched right-to-left, but also because multiple locations via multiple environment variables are searched.  Remember that the first Texrc read in is expected to be the "reference" Texrc file.  The later a Texrc is read in, the higher its priority will be, i.e. its settings will override previous settings.

The search order for Texrc files is given below, going from the top to the bottom,
Filename
Environent Variable or Location
Default Paths
Texrc
$TEXTAL_DATA
/usr/local/textal/data:/usr/local/lib/textal/data
Texrc
$TEXTAL_DIST

.texrc
$HOME

.texrc
.1


Bootnotes:
1 This is the current working directory, i.e. the same directory where TEXTALTM was invoked in.

Texrc Syntax

The syntax of a Texrc file is pretty simple.  It consists of lines that define a name/value pair:
<identifier>="<value>"
Blank lines are ignored, as is any text following the comment character, '#'.  White-space is not significant, save for inside the double quotes.  The identifiers are also not case significant (MapCACHE is the same as mapcache).

There is one special kind of value, a list.  A list is just a comma-separated list of sub-values.  White-space within lists is not significant.  Lists are mainly a convenience for instances such as the radii of features to use.

There are also some special values.  The seed option has a special value, "auto", which means that a new random seed should be generated at invocation.  [This is handled internally by setting the seed to '0' which causes TEXTALTM to auto-generate a new seed.]  Some options, such as use_side_chain_axis are toggles that can either be "on" or "off."  Synonyms for "on" are "yes" and "true".

The list of the currently known settings (identifiers) for a TEXTALTM Texrc file is given in the table below.  See the comments in a reference Texrc file for more details about what these options are and how they're used.

seed
matching_method
topk
mapcache_size
features
mixtures
xmaps_dir
pdbs_dir
texmode
radii
number_of_peaks
density_cc_radius
good_cc_cutoff
connected_density_only
use_sidechain_axis
error_correcting_cc
halt1
distance_measure
ncpus


Bootnotes:
1 halt is a very special "directive."  It's still a name/value pair, so it must be defined as something, but if it is defined while reading in Texrc files, TEXTALTM will  stop reading in any further Texrc files after it finishes with the current one.  Suppose you want to make it difficult for users to override the system Texrc file, then you insert,
halt="yes"
into /usr/local/textal/data/Texrc, or the appropriate "reference" Texrc file.  This will ensure that that is the only Texrc file read.  [Note:  It is possible to circumvent this directive, but it takes a little effort]

Debugging the Texrc File
While parsing Texrc files, TEXTALTM will print out the name of the file it's currently parsing.  If it detects a parse error, it will indicate what line caused the error.  More insidious errors come from either incorrect values for settings in a Texrc or from a wrong Texrc being mistakenly read in.  To help catch these cases, TEXTALTM will print out a warning whenever a setting changes value along with what the original value was and where it was defined.  Finally, after all Texrc files have been read, TEXTALTM will print a list of all name/value pairs along with the name of the file that contained the definition that is actually being used by TEXTALTM.  For compatibility purposes, the old-style global parameters are also printed.

Here is a suggested debugging/verification path:
  1. Look for any redefinition warnings
  2. Check the listing of Texrc files read at the start of the log to make sure only those you expect to be read were read.
  3. Check the old-style paramter listing to make sure the values there match what you expect based on the dump of name/value pairs.