[FrontPage] [TitleIndex] [WordIndex]

FilterSources

TMDA Filter Sources

In the following list of sources, the expected match field is documented as well as any optional or required arguments. Square brackets ([]) indicate the the argument is optional. Words in chevrons (<>) should be replaced by the appropriate option, without the chevrons.

NOTE:
The from* and to* sources match against different addresses or sets of addresses depending on whether they are used in an incoming filter or an outgoing filter.

Source

Incoming Filter

Outgoing Filter

from*

* envelope sender
* From: header field
* Reply-To: header field
* X-Primary-Address header field if the PRIMARY_ADDRESS_MATCH setting can be satisfied

* From: header field set by user's MUA

to*

* envelope recipient

One of: * recipients on tmda-sendmail's command line * To: header field

Domain Search

Domain names of the addresses given above are also used in the search. The portion after the first @ in an e-mail address is considered the "domain". i.e,

jason@mastaler.com -> mastaler.com
mastaler@cs.yale.edu -> cs.yale.edu

Domains must be listed one per line when used in a file. e.g,

wingnet.net
mastaler.com
tmda.net
cs.yale.edu

The matching is exact. This isn't wildcarding, so with the above list, mastaler@cs.yale.edu would match, but mastaler@wopr.cs.yale.edu would not match. You'd have to add wopr.cs.yale.edu to the list first.

This feature may be useful for sites that wish to check a large number of domain names, but don't want the overhead of the wildcard code. This feature is less flexible than wildcarding, but is much faster since the list of domains can be stored in a CDB or DBM (either directly, or by using -autocdb / -autodbm).

Sources

This group of sources may be used in either incoming or outgoing filter files.

from <email_address>
to <email_address>

from-file [ [-autocdb] | [-autodbm] ] [ -optional ] <textfile>
to-file [ [-autocdb] | [-autodbm] ] [ -optional ] <textfile>

from-cdb [ -optional ] <database.cdb>
to-cdb [ -optional ] <database.cdb>

from-dbm [ -optional ] [<database.db>]
to-dbm [ -optional ] [<database.db>]

from-ezmlm [ -optional ] <path_to_subscribers_parent_dir>
to-ezmlm [ -optional ] <path_to_subscribers_parent_dir>

from-mailman -attr=<attribute> [ -optional ] <path_to_list_dir>
to-mailman -attr=<attribute> [ -optional ] <path_to_list_dir>

from-sql -wildcards | -addr_column=<column_name> [ -action_column=<column_name> ] <SQL_query>
to-sql -wildcards | -addr_column=<column_name> [ -action_column=<column_name> ] <SQL_query>

Wildcard Searches

The *-sql rules can be used in two scenarios. If the -wildcards argument is given, the <SQL_query> is run and the resulting data set is read, in its entirety, from the database. The first column should be the addresses to match against. The second column is optional, but if it is present, it should be the overriding action or NULL. The returned data is searched in exactly the same way as text files containing wildcards. See Email Addresses below.

Any columns beyond the second will be ignored. This can come in handy if you need a column in the SELECT list for an ORDER BY clause. Because the search code stops at the first match, unsorted data could cause an incorrect match and the overriding action might not be what you want. If you use wildcards in the address column and you allow an overriding action, you should sort the returned values using an ORDER BY clause.

Exact Match Searches

If an exact match of the sender or recipient is all you need, e.g. you don't need wildcards, then you can use the *-sql rules to have the database perform the search for you, returning only the rows that exactly matched. You should specify the -addr_column argument and provide the name of the column that contains the addresses to search. You do not need to include this column in the SELECT list.

If you have an overriding action column, you should give its name using the -action_column argument. If you use the -action_column argument, you must include that column in the SELECT list.

Caveat: When the exact-match form of the from-sql rule is used, TMDA can search for more than one sender at once. If the SELECT statement returns more than one row, TMDA will use the overriding action from the first row, since it has no way of knowing which sender (the From:, the Reply-To: or the envelope sender address) you care most about. Instead of using overriding actions, consider using separate blacklists and whitelists.

Your SELECT statement can be as complex as you care to make it, including joins, an ORDER BY clause, a LIMIT clause, etc. TMDA must know where to place the search conditions ("<sender1> = <addr_column> OR <sender2> = <addr_column>", etc.). You should include the string "%(criteria)s" in your SELECT statement at the appropriate location. TMDA will build the list of conditions based on the addresses to be matched and will replace "%(criteria)s" with that list. Here's an example to make this clearer.

Assume you have the following rule in your incoming filter.

    from-sql -addr_column=address <SQL_query> ok

An email arrives with a From: header of "friend@example.com" and a Reply-To: header of "friend@another.com". TMDA will generate the following criteria string:

    (address = 'friend@example.com' OR address = 'friend@another.com')

Your SELECT statement (<SQL_query>) might look something like this:

    SELECT address FROM addr_list WHERE %(criteria)s

The SQL code that TMDA actually sends to the database will look like this (reformatted for easier readability):

    SELECT address      
    FROM addr_list      
    WHERE (address = 'friend@example.com' OR             
           address = 'friend@another.com')

If you store all of your users' whitelists in a single table (a good schema design), you will need some way to restrict your search to a single user's list; the user whose copy of TMDA is querying the database. In order to facilitate that, the from-sql and to-sql rules provide three strings that can be used anywhere in your SELECT statement.

You can place these in your SELECT statement by using "%(username)s", "(%hostname)s" and/or "%(recipient)s", as needed. TMDA will substitute the appropriate values into the SELECT at the time of the search. Do not put quotes around the above variables. The Python DB API takes care of that for you in a manner appropriate for the database you are using.

The following group of sources may be used only in incoming filter files.

body [ -case ] <regular_expression>
headers [ -case ] <regular_expression>

body-file [ -case ] [-optional ] [<regexp_file>]
headers-file [ -case ] [-optional ] [<regexp_file>]

size < <size_in_bytes | >size_in_bytes >

pipe <command_string>

pipe-headers <command_string>

Miscellaneous Notes

Email Addresses

In addition to explicit email addresses, you can use expressions based on UNIX shell-style wildcard characters anywhere an email address is expected.

NOTE: Wildcard characters are not recognized in a CDB or DBM file and are only recognized in SQL databases if you specify the -wildcards argument to the rule.

The special characters are:

Characters(s)    Description
-------------    -----------
*                Matches everything.
?                Matches any single character.
[seq]            Matches any character in seq.
[!seq]           Matches any character not in seq.

In addition, @=' (a custom rule) will expand to match both @ and @*.`

Here are some common examples:

# match only jdoe@domain.dom
jdoe@domain.dom
# match anyone@domain.dom, but not anyone@sub.domain.dom
*@domain.dom
# match anyone@sub.domain.dom, but not anyone@domain.dom
*@*.domain.dom
# match both anyone@domain.dom, and anyone@sub.domain.dom
*@=domain.dom

NOTE: To match the empty envelope sender such as bounce messages are sent with, use <> as the expression.

Email Address Files

Email address files are textfiles containing an email address, domain, or wildcarded email address on each line. When using the from-file and to-file sources, the textfile is searched sequentially, with the first match terminating the search.

Address files may contain an optional second field on each line that specifies an action (ok, drop, bounce, etc.). If the action is specified, it overrides the action given in the filter rule.

Auto- Database Flags

If you have lengthy email address textfiles, you might want to consider using the much faster hashed databases instead. The address files used by the auto-building hashed database feature are the same email address textfiles documented above with the sole exception that wildcards are not supported.

The -autocdb and -autodbm arguments are intended to ease the use of CDB/DBM lists in TMDA by automatically rebuilding the CDB or DBM file as necessary. This gives you the performance advantages of hashed databases without the hassle of having to manually maintain them. With the -auto* arguments, TMDA will rebuild the database if it doesn't exist or if its timestamp is older than its source file. If the rebuild fails for some reason, TMDA will fall back to matching from the textfile instead.

Before you try the CDB version of this feature, make sure you have the python-cdb extension module installed.

Database Files

CDB and DBM files are hashed databases. TMDA can look up email addresses or domains in these files. Lookup in these files is much faster than in a textfile. On the other hand, wildcards are not supported in database files -- only in textfiles.

In a CDB or DBM, the keys should be the email addresses or domain to match, and their corresponding values (or records) should be empty unless you want to override the action specified in the filter file.

CDB or DBM files can be created outside of TMDA and merely referenced by your filter files (use the *-cdb and *-dbm filter rules) or can be automatically created by TMDA if you use the -autocdb or -autodbm flags and the *-file rules.

If you wish to explore CDB databases, make sure you have the python-cdb extension module installed.

Regular Expression Files

A regular expression textfile is simply a text file with a regular expression on each line. The file is read sequentially and each regular expression is used to attempt a match. As soon as there is a match, the search stops.

Because regular expressions may include spaces, you must surround the regular expressions with quotation marks. You may use either single quotes (''''''') or double quotes ('''"''') as long as you use the the same one at both the beginning and the end.

If you need to match a quote in your regular expression, simply use the other style of quotes to surround the expression or escape the embedded quote with a backslash ('''\''').


2007-02-24 17:18