Filters

Filters are a feature recently introduced in rbot that can be briefly described as "means to transform data into other data". Yes, they are that generic, although of course there a few common applications that are supposed to obey some rules. Of course, the feature and its applications are extremely new and experimental, so those rules will probably evolve before they get stabilized; so what we describe here might be obsolete pretty soon (or never, so let's do it anyway).

Common filter interface

A common (and currently the only) way to create filters is to do it in appropriate plugins. The method is

@bot.register_filter(fname, [gname]) { |s|
  ... stuff ...
}

where fname is the filter name, gname is an optional group name and the parameter s that gets passed to the block is a DataStream object, which is a glorified Hash whose :text key is considered its main key. The filter thus has a DataStream as input, and should return a DataStream, or at least a Hash, as output.

For more complex filters, it's write to write the actual filtering code in its own method, and call it from the block:

def fname_filter(s)
  ... stuff ...
end

@bot.register_filter(:fname, [:gname]) { |s| fname_filter(s) }

To use one or more filters, you can call

@bot.filter(:fname1, ..., :fnameN, ds)

where :fname1, … :fnameN are filter names, and ds is a DataStream. Alternatively, the last argument can be replaced by a String followed by a Hash, with either the String or the Hash (but not both) being optional. The String will become the DataStream :text value.

This call will filter ds through all of the specified filters, the output of each of them being passed through as input to the next one.

Filter groups

Filter groups are used to collect filters with similar or related.

:htmlinfo

The :htmlinfo filter and its filter group is used to summarize web pages. Its output is typically used by the url plugin to display information on web pages linked in channels watched by the bot.

The input DataStream passed to an :htmlinfo filter might or might not have a :headers key. If it has, then the DataStream was created from a (partially downloaded) webpage, and the :headers value holds the HTTP response headers. The Utils.check_location method is used to check the location of the webpage against a given regular expression, and nil will be returned unless the location matches.

The input DataStream :text holds the webpage. Note that in general the amount of data passed on to :htmlinfo filters is not the entire webpage, but only the amount downloaded for htmlinfo purposes, which is held by the bot http.info_bytes configuration option.

The DataStream returned by an :htmlinfo filter should contain at least two keys: :title, with the page title, and :content, with the summarized webpage content. Since currently all :htmlinfo filters are tried, nil should be returned when the filter is not able to handle a given page.

def fname_filter(s)
  loc = Utils.check_location(s, /site.regexp/)
  return nil unless loc
  ... stuff ... # retrieve the page title and summarize its content
  return {:title => title, :content => content}
end

@bot.register_filter(:fname, :htmlinfo) { |s| fname_filter(s) }