http://www.zorba-xquery.com/modules/full-text ZC

Module Description
Before using any of the functions below please remember to import the module namespace:
import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";

This module provides an XQuery API to full-text functions. For general information about Zorba's implementation of the XQuery and XPath Full Text 1.0 specification as well as instructions for building an installing a thesaurus, see the Full Text Thesaurus documentation.

Notes on languages

To refer to paricular human languages, Zorba uses both the ISO 639-1 and ISO 639-2 languages codes. Note that Zorba supports only a subset of the complete list of language codes and not every function supports the same subset.

Most functions in this module take a language as a parameter using the xs:language XML schema data type.

Notes on stemming

The stem() functions return the stem of a word. In Zorba, the stem of a word itself, however, is not guaranteed to be a word. It is best to consider a stem as an opaque byte sequence. All that is guaranteed about a stem is that, for a given word, the stem of that word will always be the same byte sequence. Hence, you sould never compare the result of one of the stem() functions against a non-stemmed string, for example:
  if ( ft:stem( "apples" ) eq "apple" )             ** WRONG **
 
Instead do:
  if ( ft:stem( "apples" ) eq ft:stem( "apple" ) )  ** CORRECT **
 

Notes on the thesaurus

The thesaurus-lookup() functions have "levels" and "relationship" parameters. The values for these are implementation-defined. Zorba's default implementation uses the WordNet lexical database, version 3.0.

In WordNet, the number of "levels" that two phrases are apart are how many hierarchical meanings apart they are. For example, "canary" is 5 levels away from "vertebrate" (carary > finch > oscine > passerine > bird > vertebrate).

When using the WordNet implementation, Zorba supports all of the relationships (and their abbreviations) specified by ISO 2788 and ANSI/NISO Z39.19-2005 with the exceptions of "HN" (history note) and "X SN" (see scope note for). These relationships are:
Rel. Meaning WordNet Rel.
BT broader term hypernym
BTG broader term generic hypernym
BTI broader term instance instance hypernym
BTP broader term partitive part meronym
NT narrower term hyponym
NTG narrower term generic hyponym
NTI narrower term instance instance hyponym
NTP narrower term partitive part holonym
RT related term also see
SN scope note n/a
TT top term hypernym
UF non-preferred term n/a
USE preferred term n/a
Note that you can specify relationships either by their abbreviation or their meaning. Relationships are case-insensitive. In addition to the ISO 2788 and ANSI/NISO Z39.19-2005 relationships, Zorba also supports all of the relationships offered by WordNet. These relationships are:
Relationship Meaning
also see A word that is related to another, e.g., for "varnished" (furniture) one should also see "finished."
antonym A word opposite in meaning to another, e.g., "light" is an antonym for "heavy."
attribute A noun for which adjectives express values, e.g., "weight" is an attribute for which the adjectives "light" and "heavy" express values.
cause A verb that causes another, e.g., "show" is a cause of "see."
derivationally related form A word that is derived from a root word, e.g., "metric" is a derivationally related form of "meter."
derived from adjective An adverb that is derived from an adjective, e.g., "correctly" is derived from the adjective "correct."
entailment A verb that presupposes another, e.g., "snoring" entails "sleeping."
hypernym A word with a broad meaning that more specific words fall under, e.g., "meal" is a hypernym of "breakfast."
hyponym A word of more specific meaning than a general term applicable to it, e.g., "breakfast" is a hyponym of "meal."
instance hypernym A word that denotes a category of some specific instance, e.g., "author" is an instance hypernym of "Asimov."
instance hyponym A term that donotes a specific instance of some general category, e.g., "Asimov" is an instance hyponym of "author."
member holonym A word that denotes a collection of individuals, e.g., "faculty" is a member holonym of "professor."
member meronym A word that denotes a member of a larger group, e.g., a "person" is a member meronym of a "crowd."
part holonym A word that denotes a larger whole comprised of some part, e.g., "car" is a part holonym of "engine."
part meronym A word that denotes a part of a larger whole, e.g., an "engine" is part meronym of a "car."
participle of verb An adjective that is the participle of some verb, e.g., "breaking" is the participle of the verb "break."
pertainym An adjective that classifies its noun, e.g., "musical" is a pertainym in "musical instrument."
similar to Similar, though not necessarily interchangeable, adjectives. For example, "shiny" is similar to "bright", but they have subtle differences.
substance holonym A word that denotes a larger whole containing some constituent substance, e.g., "bread" is a substance holonym of "flour."
substance meronym A word that denotes a constituant substance of some larger whole, e.g., "flour" is a substance meronym of "bread."
verb group A verb that is a member of a group of similar verbs, e.g., "live" is in the verb group of "dwell", "live", "inhabit", etc.

Notes on tokenization

For general information about Zorba's implementation of tokenization, including what constitutes a token, see the Full Text Tokenizer documentation.

Author:

Paul J. Lucas

XQuery version and encoding for this module:

xquery version "3.0" encoding "utf-8";

Zorba version for this module:

The latest version of this module is 2.0. For more information about module versioning in Zorba please check out this resource.

Module Resources
Module Dependencies

Imported schemas:

Please note that the schemas are not automatically imported in the modules that import this module.
In order to import and use the schemas, please add:

import schema namespace ft-schema =  "http://www.zorba-xquery.com/modules/full-text";

Namespaces
err http://www.w3.org/2005/xqt-errors
ft http://www.zorba-xquery.com/modules/full-text
ft-schema http://www.zorba-xquery.com/modules/full-text
ver http://www.zorba-xquery.com/options/versioning
zerr http://www.zorba-xquery.com/errors
Variables
$ft:lang-da as xs:language
Predeclared constant for the Danish xs:language .
$ft:lang-de as xs:language
Predeclared constant for the German xs:language .
$ft:lang-en as xs:language
Predeclared constant for the English xs:language .
$ft:lang-es as xs:language
Predeclared constant for the Spanish xs:language .
$ft:lang-fi as xs:language
Predeclared constant for the Finnish xs:language .
$ft:lang-fr as xs:language
Predeclared constant for the French xs:language .
$ft:lang-hu as xs:language
Predeclared constant for the Hungarian xs:language .
$ft:lang-it as xs:language
Predeclared constant for the Italian xs:language .
$ft:lang-nl as xs:language
Predeclared constant for the Dutch xs:language .
$ft:lang-no as xs:language
Predeclared constant for the Norwegian xs:language .
$ft:lang-pt as xs:language
Predeclared constant for the Portuguese xs:language .
$ft:lang-ro as xs:language
Predeclared constant for the Romanian xs:language .
$ft:lang-ru as xs:language
Predeclared constant for the Russian xs:language .
$ft:lang-sv as xs:language
Predeclared constant for the Swedish xs:language .
$ft:lang-tr as xs:language
Predeclared constant for the Turkish xs:language .
Function Summary
External current-compare-options ( ) as element(ft-schema:compare-options)
Gets the current compare options.
External current-lang ( ) as xs:language
Gets the current language: either the language specified by the declare ft-option using language statement (if any) or the one returned by ft:host-lang() (if none).
External host-lang ( ) as xs:language
Gets the host's current language.
External is-stem-lang-supported ( $lang as xs:language ) as xs:boolean
Checks whether the given language is supported for stemming.
External is-stop-word ( $word as xs:string ) as xs:boolean
Checks whether the given word is a stop-word.
External is-stop-word ( $word as xs:string, $lang as xs:language ) as xs:boolean
Checks whether the given word is a stop-word.
External is-stop-word-lang-supported ( $lang as xs:language ) as xs:boolean
Checks whether the given language is supported for stop words.
External is-thesaurus-lang-supported ( $lang as xs:language ) as xs:boolean
Checks whether the given language is supported for look-up using the default thesaurus.
External is-thesaurus-lang-supported ( $uri as xs:string, $lang as xs:language ) as xs:boolean
Checks whether the given language is supported for look-up using the thesaurus specified by the given URI.
External is-tokenizer-lang-supported ( $lang as xs:language ) as xs:boolean
Checks whether the given language is supported for tokenization.
External stem ( $word as xs:string ) as xs:string
Stems the given word.
External stem ( $word as xs:string, $lang as xs:language ) as xs:string
Stems the given word.
External strip-diacritics ( $string as xs:string ) as xs:string
Strips all diacritical marks from all characters.
External thesaurus-lookup ( $phrase as xs:string ) as xs:string*
Looks-up the given phrase in the default thesaurus.
External thesaurus-lookup ( $uri as xs:string, $phrase as xs:string ) as xs:string*
Looks-up the given phrase in a thesaurus.
External thesaurus-lookup ( $uri as xs:string, $phrase as xs:string, $lang as xs:language ) as xs:string*
Looks-up the given phrase in the thesaurus specified by the given URI.
External thesaurus-lookup ( $uri as xs:string, $phrase as xs:string, $lang as xs:language, $relationship as xs:string ) as xs:string*
Looks-up the given phrase in a thesaurus.
External thesaurus-lookup ( $uri as xs:string, $phrase as xs:string, $lang as xs:language, $relationship as xs:string, $level-least as xs:integer, $level-most as xs:integer ) as xs:string*
Looks-up the given phrase in a thesaurus.
External tokenize-node ( $node as node() ) as element(ft-schema:token)*
Tokenizes the given node and all of its descendants.
External tokenize-node ( $node as node(), $lang as xs:language ) as element(ft-schema:token)*
Tokenizes the given node and all of its decendants.
External tokenize-nodes ( $includes as node()+, $excludes as node()* ) as element(ft-schema:token)*
Tokenizes the set of nodes comprising $includes (and all of its descendants) but excluding $excludes (and all of its descendants), if any.
External tokenize-nodes ( $includes as node()+, $excludes as node()*, $lang as xs:language ) as element(ft-schema:token)*
Tokenizes the set of nodes comprising $includes (and all of its descendants) but excluding $excludes (and all of its descendants), if any.
External tokenize-string ( $string as xs:string ) as xs:string*
Tokenizes the given string.
External tokenize-string ( $string as xs:string, $lang as xs:language ) as xs:string*
Tokenizes the given string.
External tokenizer-properties ( ) as element(ft-schema:tokenizer-properties)
Gets properties of the tokenizer for the language returned by ft:current-lang().
External tokenizer-properties ( $lang as xs:language ) as element(ft-schema:tokenizer-properties)
Gets properties of the tokenizer for the given language.
Functions
External current-compare-options back to 'Function Summary'
declare function ft:current-compare-options (

) as element(ft-schema:compare-options)

Gets the current compare options.

Returns:
Examples:

External current-lang back to 'Function Summary'
declare function ft:current-lang (

) as xs:language

Gets the current language: either the language specified by the declare ft-option using language statement (if any) or the one returned by ft:host-lang() (if none).

Returns:
Examples:

External host-lang back to 'Function Summary'
declare function ft:host-lang (

) as xs:language

Gets the host's current language. The "host" is the computer on which Zorba is running. The host's current language is obtained as follows:

Returns:

External is-stem-lang-supported back to 'Function Summary'
declare function ft:is-stem-lang-supported (
            $lang as xs:language
) as xs:boolean

Checks whether the given language is supported for stemming.

Parameters:
Returns:
Examples:

External is-stop-word back to 'Function Summary'
declare function ft:is-stop-word (
            $word as xs:string
) as xs:boolean

Checks whether the given word is a stop-word.

Parameters:
Returns:
Errors:
Examples:

External is-stop-word back to 'Function Summary'
declare function ft:is-stop-word (
            $word as xs:string,
            $lang as xs:language
) as xs:boolean

Checks whether the given word is a stop-word.

Parameters:
Returns:
Errors:
Examples:

External is-stop-word-lang-supported back to 'Function Summary'
declare function ft:is-stop-word-lang-supported (
            $lang as xs:language
) as xs:boolean

Checks whether the given language is supported for stop words.

Parameters:
Returns:
Examples:

External is-thesaurus-lang-supported back to 'Function Summary'
declare function ft:is-thesaurus-lang-supported (
            $lang as xs:language
) as xs:boolean

Checks whether the given language is supported for look-up using the default thesaurus.

Parameters:
Returns:

External is-thesaurus-lang-supported back to 'Function Summary'
declare function ft:is-thesaurus-lang-supported (
            $uri as xs:string,
            $lang as xs:language
) as xs:boolean

Checks whether the given language is supported for look-up using the thesaurus specified by the given URI.

Parameters:
Returns:
Errors:
Examples:

External is-tokenizer-lang-supported back to 'Function Summary'
declare function ft:is-tokenizer-lang-supported (
            $lang as xs:language
) as xs:boolean

Checks whether the given language is supported for tokenization.

Parameters:
Returns:

External stem back to 'Function Summary'
declare function ft:stem (
            $word as xs:string
) as xs:string

Stems the given word.

Parameters:
Returns:
Errors:
Examples:

External stem back to 'Function Summary'
declare function ft:stem (
            $word as xs:string,
            $lang as xs:language
) as xs:string

Stems the given word.

Parameters:
Returns:
Errors:
Examples:

External strip-diacritics back to 'Function Summary'
declare function ft:strip-diacritics (
            $string as xs:string
) as xs:string

Strips all diacritical marks from all characters.

Parameters:
Returns:
Examples:

External thesaurus-lookup back to 'Function Summary'
declare function ft:thesaurus-lookup (
            $phrase as xs:string
) as xs:string*

Looks-up the given phrase in the default thesaurus.

Parameters:
Returns:
Errors:
Examples:

External thesaurus-lookup back to 'Function Summary'
declare function ft:thesaurus-lookup (
            $uri as xs:string,
            $phrase as xs:string
) as xs:string*

Looks-up the given phrase in a thesaurus.

Parameters:
Returns:
Errors:
Examples:

External thesaurus-lookup back to 'Function Summary'
declare function ft:thesaurus-lookup (
            $uri as xs:string,
            $phrase as xs:string,
            $lang as xs:language
) as xs:string*

Looks-up the given phrase in the thesaurus specified by the given URI.

Parameters:
Returns:
Errors:
Examples:

External thesaurus-lookup back to 'Function Summary'
declare function ft:thesaurus-lookup (
            $uri as xs:string,
            $phrase as xs:string,
            $lang as xs:language,
            $relationship as xs:string
) as xs:string*

Looks-up the given phrase in a thesaurus.

Parameters:
Returns:
Errors:
Examples:

External thesaurus-lookup back to 'Function Summary'
declare function ft:thesaurus-lookup (
            $uri as xs:string,
            $phrase as xs:string,
            $lang as xs:language,
            $relationship as xs:string,
            $level-least as xs:integer,
            $level-most as xs:integer
) as xs:string*

Looks-up the given phrase in a thesaurus.

Parameters:
Returns:
Errors:
Examples:

External tokenize-node back to 'Function Summary'
declare function ft:tokenize-node (
            $node as node()
) as element(ft-schema:token)*

Tokenizes the given node and all of its descendants.

Parameters:
Returns:
Errors:
Examples:

External tokenize-node back to 'Function Summary'
declare function ft:tokenize-node (
            $node as node(),
            $lang as xs:language
) as element(ft-schema:token)*

Tokenizes the given node and all of its decendants.

Parameters:
Returns:
Errors:
Examples:

External tokenize-nodes back to 'Function Summary'
declare function ft:tokenize-nodes (
            $includes as node()+,
            $excludes as node()*
) as element(ft-schema:token)*

Tokenizes the set of nodes comprising $includes (and all of its descendants) but excluding $excludes (and all of its descendants), if any.

Parameters:
Returns:
Errors:
Examples:

External tokenize-nodes back to 'Function Summary'
declare function ft:tokenize-nodes (
            $includes as node()+,
            $excludes as node()*,
            $lang as xs:language
) as element(ft-schema:token)*

Tokenizes the set of nodes comprising $includes (and all of its descendants) but excluding $excludes (and all of its descendants), if any.

Parameters:
Returns:
Errors:
Examples:

External tokenize-string back to 'Function Summary'
declare function ft:tokenize-string (
            $string as xs:string
) as xs:string*

Tokenizes the given string.

Parameters:
Returns:
Errors:
Examples:

External tokenize-string back to 'Function Summary'
declare function ft:tokenize-string (
            $string as xs:string,
            $lang as xs:language
) as xs:string*

Tokenizes the given string.

Parameters:
Returns:
Errors:
Examples:

External tokenizer-properties back to 'Function Summary'
declare function ft:tokenizer-properties (

) as element(ft-schema:tokenizer-properties)

Gets properties of the tokenizer for the language returned by ft:current-lang().

Returns:
Errors:

External tokenizer-properties back to 'Function Summary'
declare function ft:tokenizer-properties (
            $lang as xs:language
) as element(ft-schema:tokenizer-properties)

Gets properties of the tokenizer for the given language.

Parameters:
Returns:
Errors:

blog comments powered by Disqus