Using SeqApp

Suggestions For Using SeqApp

Dave Carmean, Simon Fraser University (Oct-95) carmean@sfu.ca (Please tell me if you have used this and if you have any suggestions)

SeqApp is a freeware Macintosh program written by Don Gilbert and available by anonymous ftp (Fetch) or gopher from ftp.bio.indiana.edu. SeqApp is a powerful program- it is used for storing and aligning data, and outputting formatted files for PHYLIP, PAUP, and other analysis programs as well as for printing hard copies of alignments. It also will compare regions of homology between two sequences (with DottyPlot) and send sequence to BLAST for similarity searches of all sequences in GenBank. However, it has several bugs or unincorporated features. Exploring new areas of the menu can lead to a program or system crash- thus always have backup copies of your files, and always save your file regularly while working and before doing anything new. These dire warnings are intended to keep your expectations low so you will be pleasantly surprised when use the program- they are not intended to discourage use of the program. I rarely have any problems with SeqApp. Also, there usually is more than one way to do things- please tell me if you learn a simpler method. The alignment window will only display ~2700 bp- the program will tell you if you exceed this. It is still possible to work with the first 2700 bp pairs in the alignment window, and all the sequence in the edit windows. SeqApp will not work under system 6 or with an SE.

SeqApp will soon be superseded by SeqPup. Each program has its own advantages, though SeqPup is available for several platforms including Windows.

A couple of hints (of general use for the Macintosh): if the program locks up, do not hit keys randomly. Wait a while, then use command (apple)-option-esc (three keys simultaneously)- That will give you the message "Force SeqApp to quit? Unsaved changes will be lost" Mouse click on the yes. This will keep other programs in progress. If nothing happens, simultaneously hit the power-on key on the keyboard, the command, and the ctrl keys. This will restart the computer, and all unsaved work will be lost.

I suggest saving your file frequently and using a new name any time you make a major change. Keep an extra sequence at the bottom of the alignment- sometimes a portion of the end of the last sequence is lost when saving (I believe this happens after taking the file between programs but don't take any chances). Occasionally check the alignment to see no portions are corrupted- especially the ends of the last sequence. Problems have occurred to others using the PrettyPrint option.

Double click on a sequence name to see that sequence in an edit window. Click and then drag a sequence name up or down to change the position of a taxon in the alignment. Under the edit menu, 'clear' will delete the entire sequence.

SeqApp has two Major Modes: Padlocked and unpadlocked (see padlock near upper left corner of align window). Padlocked: Allows aligning of sequences, either individually or blocks. Does not allow changing sequence- only inserting or deleting gaps. Highlight a block, click on an area within the block you wish to bring to another area, and drag it to that area (just to the left, actually, or it will overshoot). It will insert '~' into the gap. This '~' character and periods are not locked, and will be swallowed instead of pushed while aligning other areas. In order to lock the gaps, highlight the '~'s that should be locked and choose under the sequence menu "Lock indels" (insertion deletion). If you wish to lock the entire alignment (recommended regularly), highlight a single base and choose "Lock indels." If a taxon name is highlighted only the indels of that sequence will be locked- to undo the highlight unpadlock the alignment, click inside the alignment, and padlock the alignment again.

Never use the drag-alignment option with several sequences when some of the sequences do not reach the area being dragged. SeqApp normally will add tildes (in the middle of sequence) or periods (at the beginning of sequence), but if there is no sequence it will add garbage, and it may not be possible to undo. For example, if you are working towards the end of a 1000bp alignment, and one of the taxa has only a few hundred base pairs of sequence, and you wish to insert a gap in the alignment, either place the unfinished taxon outside of the working area (at the top or bottom) or add placeholders ("-", "~", ".", or n's) at the end of the short sequence.

Unpadlocked: Allows editing of the actual sequence, cutting and pasting; searching for sequence (but note warning following), changing case of a block of a single sequence. I know of no way to quickly delete a block or change case of a block of sequence- one can do this in a PAUP formated file in Microsoft Word, using the option key and the mouse to highlight a rectangular block of data.

NEVER use command-F or the "find..." under the edit menu. To search for any string of bases (say gtttaa), with the sequences unpadlocked, highlight the string to be searched for (it may be added or pasted at the beginning of a sequence and then deleted), then with the mouse go to the edit menu and choose the find "gtttaa" then place the pointer anywhere in the alignment and use the command-g (find again).

Initially Putting Sequence into the program:
SeqApp can read many different formats: thus you may import files directly from GenBank, GCG, Fasta, PAUP (some limitations), and PHYLIP (few limitations). One may open any text file by dragging it over the SeqApp icon (I keep an alias of SeqApp and other major programs on the desktop expressly for this). To add sequence to an existing alignment under the file menu choose open and hold down the shift key while double clicking on a sequence or click on the append box. New sequences are always placed at the bottom of the alignment. Be forewarned that GenBank comments are not saved, even in GenBank format. I find Pearson/FastA to be the most convenient format.

Making an Alignment:

While SeqApp has elegant tools for manual alignment, Clustal automatically gives an alignment that can be refined in SeqApp. Use Pearson/FastA to input a file to Clustal. I suggest tying down the ends of the align-ment by changing all periods (which Clustal ignores) to N's or -, and changing the Clustal output file to PIR.

Making a PAUP file and Getting a Printed Alignment:

One may use prettyprint for printing but I do it by saving the file in PAUP format and then printing it in a word processor. Only do this on a copy of your data file. Change to PAUP format (top right of align window), highlight area to be printed- all sequences must be long enough to fill highlighted area, as SeqApp cannot make a PAUP file with some sequences shorter than the alignment). In the menu under sequence first 'lock indels' then under file choose 'Save Selection...' and change the filename and save. If some of the highlighted area has no sequence you will get an error message- fill in those sequences with '.'.

For printing, Open the saved file in word. Highlight all (command-A), courier font size 9, under file menu choose page setup and change orientation sideways (to landscape); then choose print preview and change (drag) the left margin to 0.5" (1.3cm) and drag the page number. You may want to delete the header and add a paragraph return before each occurrence of the first taxon to separate the pieces of alignment, and strategically insert page breaks to avoid breaking up a clump of alignment.

For use in PAUP, change the header so that "missing=;" is "missing=- gap=. ;" and (if in Word) save the file as a 'text only' type. Check that the correct number of bases have been saved: sometimes SeqApp only saves a portion of the highlighted sequences. If this occurs, make a PAUP file by using the pearson/fasta format for sequences all the same length, remove the information after the taxon name (so the only space is between the taxon name and the data), and add a PAUP header deleting the word 'interleave.'

Translating Sequence to Amino Acid
Translation divides the sequence into triplets starting with the first character and ignores any triplets with non-coding characters (including gaps and unknowns). In the sequence window, highlight the names of the taxa you wish to translate and choose translate under the sequence window. It is not possible to translate a portion of sequence.

From the Help Menu (under the balloon question-mark to the right on the menu) by Don Gilbert:

The main view into a sequence document is the multiple sequence editor window, which lists sequence names to the left and sequence bases as one line that can be scrolled thru. Bases can be colored (now only nucleic colorings) or black. Sequence can be edited here, especially to align them, and subranges and subgroupings can be selected for further operations or analysis. Entire sequence(s) can be cut/copied/pasted by selecting the left name(s). Mouse-down selects one. Shift-mouse down selects many in group, Command-mouse down selects many unconnected. Double click name to open single sequence view. Select name, then grab and move up or down to relocate.

Select the lock/unlock button at the view top to lock/unlock text editing in the sequence line. With lock on (no editing) you can use shift and command mouse to select a subrange of sequences to operate on.

Bases can be slid to left and right, like beads on an abacus, when the edit lock is On (now default). Select a base or group of bases (over one or several sequences), using mouse, shift+mouse, option+mouse, command+mouse. Then grab selected bases with mouse (mouse up, then mouse down on selection), and slide to left or right. Indels "-" or spacing on ends "." will be added and squeezed out as needed to slide the bases. See also the "Degap" menu selection to remove all gaps thus entered from a sequence.

This page is maintained by Dave Carmean with an eye towards speed and clarity, and last modified 13 May 1996. Comments or suggestions are welcomed!

Back to the BioComputing Homepage 3