Beyond GUI

While the graphical interface is very useful for visualising search results and performing indivudual searches, it also has some limitations. For example, the selection of search matrices is limited and large iterated batch searches are not easy to perform. To perform these advanced tasks, it may be necessary to write your own Python code that interfaces with PFMFind as a library. PFMFind allows custom user plugins that provide additional search matrices, and it is relatively easy to write short scripts to automate some tasks.

Search matrix plugins

Writing plugins for PFMFind is quite easy as far as the interface is concerned. A PFMFind search matrix plugin is a Python module that defines two global variables, iteration and arg_list, as well as two functions: get_matrix() (mandatory) and print_info() (optional).


A boolean constant (either True or False). If it is set to True, the plugin can be used in the second or subsequent iterations, otherwise it is only used for the first iteration.


A list specifying the arguments of the functions get_matrix() and print_info(). Its elements are triplets of the form (name, type, default_value), where name is a string identifying the variable, the type is either a string or a list of strings. The corresponding GUI element, labeled with name, is shown as a part of the Matrix Options panel of the Search tab. If type is a string, the GUI element is a Pmw.EntryField widget whose value type is given by the string (Please refer to Pmw documentation). If type is a list of strings, a Pmw.OptionMenu is shown with options being the members of lists. In both cases, the given default_value is preselected.

search_plugin.get_matrix(HL, *args)

Construct the scoring matrix for similarity search.

  • HL ( instance) – a hit list from the previous iteration
  • args – additional positional arguments in the order they appear in arg_list

A tuple of the form (M, matrix_type, ctype), where M is a Biopython-style score matrix or PSSM, matrix_type is 0 if the matrix is a score matrix and 1 if it is a PSSM, while ctype should be set to 0 if the matrix contains similarity scores (the other values are for distance based matrices used by FSIndex).

search_plugin.print_info(HL, *args)

Construct a printable representation of the matrix and the method used to obtain it. Can be omitted, in which case the default printout is produced. It takes the same arguments as get_matrix().

  • HL ( instance) – a hit list from the previous iteration
  • args – additional positional arguments in the order they appear in arg_list

A string showing (in a human-readable way) the matrix produced by get_matrix().


The listing below shows the code of the default first iteration search plugin as an example. It extracts available amino acid scoring matrices from Biopython and removes non-standard letters from them before returning them:

from Bio.SubsMat import MatrixInfo
from import QUASI, MAX, AVG, SCORE
from import SubstitutionMatrix

_MATRIX_CTYPE = {'None': 0, 'Quasi': QUASI, 'Avg': AVG, 'Max': MAX}

iteration = False
arg_list = [('Matrix Name', MatrixInfo.available_matrices, 'blosum62'),
            ('Conversion', _MATRIX_CTYPE.keys(), 'None'),

_std_alphabet_map = {}.fromkeys(list("ACDEFGHIKLMNPQRSTVWY"))

def _filter_non_standard_letters(S):
    for a, b in S.keys():
        if a not in _std_alphabet_map or b not in _std_alphabet_map:

def get_matrix(HL, matrix_name, conv_type):

    S = SubstitutionMatrix()
    S.update(getattr(MatrixInfo, matrix_name)) = matrix_name

    matrix_type = SCORE
    ctype = _MATRIX_CTYPE[conv_type]
    return S, matrix_type, ctype

The default profile plugin is more complicated:

from cStringIO import StringIO

from import DirichletMix
from import freq_counts
from import henikoff_weights
from import BKGRND_PROBS as bg_dict
from import get_mix
from import NAMES
from import POSITIONAL

iteration = True
arg_list = [('Scale', 'real', 2.0),
            ('Weighting', ['None', 'Henikoff'], 'Henikoff'),
            ('Regulariser', NAMES, 'recode3.20comp'),

def _get_matrix_counts(HL, scale, weight_type, dirichlet_type):

    seqs = HL.get_seqs()

    # Calculate sequence weights
    DM = get_mix(dirichlet_type)
    bcounts = DM.block_counts(seqs)

    if weight_type == 'None':
        weights = [1.0]*len(seqs)
        wcounts = bcounts
    elif weight_type == 'Henikoff':
        weights = henikoff_weights(seqs, DM.alphabet, bcounts)
        wcounts = DM.block_counts(seqs, weights)

    wprobs = DM.block_probs(wcounts)
    bkgrnd = DM.aa_vector(bg_dict)

    PM = DM.block2pssm(DM.block_log_odds(wprobs, bkgrnd, scale),
                       HL.query_seq) = 'PSSM'
    PM.module = __name__
    matrix_type = POSITIONAL
    ctype = 0

    return PM, matrix_type, ctype, bcounts, weights, wcounts, wprobs

def get_matrix(HL, scale, weight_type, dirichlet_type):

    if not len(HL):
        return None, 0, 0
    return _get_matrix_counts(HL, scale, weight_type,

def print_info(HL, scale, weight_type, dirichlet_type):

    if not len(HL):
        return "Too few hits to construct PSSM"

    if weight_type is None:
        return ""

    seqs = HL.get_seqs()
    deflines = HL.get_deflines()

    PM, matrix_type, ctype, bcounts, weights, wcounts, wprobs = \
        _get_matrix_counts(HL, scale, weight_type, dirichlet_type)

    DM = get_mix(dirichlet_type)
    file_str = StringIO()
    file_str.write('***** ALIGNMENT *****\n')
    for i in range(len(seqs)):
        file_str.write('%8.4f %s %s\n' % (weights[i], seqs[i], deflines[i]))
    file_str.write('\n***** COUNTS *****\n')
    file_str.write('\n***** WEIGHTED COUNTS *****\n')
    file_str.write(DM.print_block_data(wcounts, 5, 1, 'float'))
    file_str.write('\n***** DIRICHLET MIXTURE PROBABILITIES *****\n')
    bprobs = DM.block_probs(wcounts)
    file_str.write(DM.print_block_data(bprobs, 6, 4, 'float'))
    file_str.write("\n"+ str(PM))
    return file_str.getvalue()

Table Of Contents

Previous topic

Associating search hits with sequence annotations

Next topic

Additional Information

This Page