Nuno D. Mendes

Research interests | Publications | Software | Short CV | Personal | Documents


MUSA

Contents:


Download

These implementations of MUSA are now available:

VersionCommentsDownload
0.5.6[linux bin]

Usage

MUSA does not require the specification of any parameters in order to search for motifs. There are, however, certain parameters that can be specified in order to focus the search:

ParameterDescription
λDefines the length of λ-mers that are used to build the motifs. No motif will be found whose length is shorter than λ. By default λ = 4. By specifying a larger value you will focus the search on larger motifs.
εDefines the tolerance for the distance between λ-mers. In the case of complex motifs, it allows for some variation in the distance between components in occurrences of the same motif. By default ε = 0.
sieveDefines the proportion of sequences in which the motif must occur in order to be identified. By default, sieve = 30%.
mindistDefines the the minimum distance between two λ-mers in complex motifs. You can use this option the force the search of complex motifs. By default mindist = 1, i.e., no restriction is used.
maxdistDefines the maximum distance between two λ-mers in complex motifs. You can use this option to focus the search on limited portions of the sequence. By default maxdist = 500.

MUSA is available as a command-line program and is quite easy to use. Specify any of the options you might be interested in (which are described below) or simply run the program using the default options. You can also either specify a filename or receive the data directly from the standard input. The input sequences should be given in FASTA format.

Usage: musa [OPTION...] FASTA|-
MUSA -- Inference of Complex Motifs

  -b, --bothstrands          Search both strands
  -e, --epsilon=EPSILON      Epsilon parameter (distance tolerance)
  -l, --lambda=LAMBDA        Lambda parameter (size of lambda-mers)
  -m, --mindist=mindist      Minimum distance between lambda-mers
  -M, --maxdist=maxdist      Maximum distance between lambda-mers
  -o, --output=FILE          Output to FILE instead of standard output
  -q, --quiet                Behave quietly
  -s, --sieve=SIEVE          Percent of minimum number of sequences
  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.
Report bugs to <ndm+musa_bugs@algos.inesc-id.pt>.

Notes on the Output

Version 0.5
# MUSA/0.5 Output
# Sequences: 3 Motifs: 6
# Motif                         Quorum          P-value
ATGCGT <2> CATAT                  2 of 3        6.304220e-07
ATGC  <4> CATAT                   3 of 3        7.824886e-07
ATGC  <3> TCATAT                  2 of 3        1.633340e-05
ATGCCTGTCATAT                     1 of 3        1.723221e-05
ATGCGTGGCATAT                     1 of 3        8.292759e-05
ATGCGTATCATAT                     1 of 3        1.929857e-04

Above we can see an example of the output given by version 0.5 of MUSA. The first column corresponds to the motif found. If it is a complex motif, the distance between each component is given between <>. The second column corresponds to the quorum of the motif, i.e., the number of sequences of the dataset where the motif can be found. The last column corresponds to the statistical significance score computed for each motif.

If the user specifies a value of ε greater than zero, then the distance between each component will be shown as an pair of distances (mininum,maximum).

If the user performs the search for motifs in both strands, the output will generally show each motif and its reverse-complemented version. This may not happen, however, since the motif reconstruction procedure is not guaranteed to produce coherent reconstructions of biclusters composed of reverse-complemented pairs of configurations.


Citation

To cite MUSA use:

Mendes ND, Casimiro AC, Santos PM, Sá-Correia I, Oliveira AL, Freitas AT. MUSA: a parameter free algorithm for the identification of biologically significant motifs. Bioinformatics. 2006 Dec 15; 22(24): 2996-3002

Or simply copy

@Article{Mendes:2006:Bioinformatics:17068086,
author = "Mendes, N D and Casimiro, A C and Santos, P M and S{\'a}-Correia, I and Oliveira, A L and Freitas, A T",
title = {MUSA: a parameter free algorithm for the identification of biologically significant motifs},
journal = "Bioinformatics",
year = "2006",
volume = "22",
number = "24",
pages = "2996-3002",
month = "Dec",
pmid = "17068086",
url = "http://www.hubmed.org/display.cgi?uids=17068086",
doi = "10.1093/bioinformatics/btl537"
}
eXTReMe Tracker

Research interests | Publications | Software | Short CV | Personal | Documents