Τεχνικά Χαρακτηριστικά
World Lexer Features
Like MULTI_LEXER
, the WORLD_LEXER
lexer enables you to index documents that contain different languages; however, it automatically detects the languages of a document and so does not require you to create a language column in the base table.
WORLD_LEXER
processes most languages whose characters are defined as part of Unicode 4.0. For WORLD_LEXER
to be effective, documents with multiple languages must use AL32UTF-8 or UTF8 Oracle character set encoding (including supplementary, or "surrogate-pair," characters).
Table D-2 and Table D-3 show the languages supported by WORLD_LEXER
. Note: this list may change as the Unicode standard changes, and in any case should not be considered exhaustive. (Languages are group by Unicode writing system, not by natural language groupings.)
Table D-2 Languages Supported by the World Lexer (Space-separated)
Language Group |
Languages Include |
Arabic |
Arabic, Farsi, Kurdish, Pashto, Sindhi, Urdu |
Armenian |
Armenian |
Bengali |
Assamese, Bengali |
Bopomofo |
Hakka Chinese, Minnan Chinese |
Cyrillic |
Over 50 languages, including Belorussian, Bulgarian, Macedonian, Moldavian, Russian, Serbian, Serbo-Croatian, Ukrainian |
Devenagari |
Bhojpuri, Bihari, Hindi, Kashmiri, Marathi, Nepali, Pali, Sanskrit |
Ethiopic |
Amharic, Ge'ez, Tigrinya, Tigre |
Georgian |
Georgian |
Greek |
Greek |
Gujarati |
Gujarati, Kacchi |
Gurmukhi |
Punjabi |
Hebrew |
Hebrew, Ladino, Yiddish |
Kaganga |
Redjang |
Kannada |
Kanarese, Kannada |
Korean |
Korean, Hanja Hangul |
Latin |
Afrikaans, Albanian, Basque, Breton, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Faeroese, Fijian, Finnish, Flemish, French, Frisian, German, Hawaiian, Hungarian, Icelandic, Indonesian, Irish, Italian, Lappish, Classic Latin, Latvian, Lithuanian, Malay, Maltese, Pinyin Mandarin, Maori, Norwegian, Polish, Portuguese, Provencal, Romanian, Rumanian, Samoan, Scottish Gaelic, Slovak, Slovene, Slovenian, Sorbian, Spanish, Swahili, Swedish, Tagalog, Turkish, Vietnamese, Welsh |
Malayalam |
Malayalam |
Mongolian |
Mongolian |
Oriya |
Oriya |
Sinhalese, Sinhala |
Pali, Sinhalese |
Syriac |
Aramaic, Syriac |
Tamil |
Tamil |
Telugu |
Telugu |
Thaana |
Dhiveli, Divehi, Maldivian |
Operators
ABOUT Operator
Use the ABOUT operator to query on concepts. The system looks up concept information in the theme component of the index.
This feature is supported for English and French with CONTEXT indexes only.
Fuzzy Operator
This operator enables you to search for words that have similar spelling to specified word. Text supports fuzzy for English, German, Italian, Dutch, Spanish, Japanese, Optical Character recognition (OCR), and automatic language detection.
Stem Operator
This operator enables you to search for words that have the same root as the specified term. For example, a stem of $sing expands into a query on the wordssang, sung, sing. The Text stemmer supports the following languages: English, French, Spanish, Italian, German, Japanese and Dutch.
Supplied Stop Lists
A stoplist is a list of words that do not get indexed. These are usually common words in a language such as this, that, and can in English.
Text provides a default stoplist for English, Chinese (traditional and simplified), Danish, Dutch, Finnish, French, German, Italian, Portuguese, Spanish, and Swedish. lists the stoplists for various languages.
Knowledge Base
A Text knowledge base is a hierarchical tree of concepts used for theme indexing, ABOUT queries, and deriving themes for document services.
Text supplies knowledge bases in English and French only.
Knowledge Base Extension
You can extend theme functionality to languages other than English or French by loading your own knowledge base for any single byte white space delimited language, including Spanish.
Multi-Lingual Features Matrix
The following table summarizes the multilingual features for the supported languages.
Multilingual Features for Supported Languages
LANGUAGE |
ALTERNATE SPELLING |
FUZZY MATCHING |
LANGUAGE SPECIFIC LEXER |
DEFAULT STOP LIST |
STEMMING |
ENGLISH |
N/A |
Yes |
Yes |
Yes |
Yes |
GERMAN |
Yes |
Yes |
Yes |
Yes |
Yes |
JAPANESE |
N/A |
Yes |
Yes |
No |
Yes |
FRENCH |
N/A |
Yes |
Yes |
Yes |
Yes |
SPANISH |
N/A |
Yes |
Yes |
Yes |
Yes |
ITALIAN |
N/A |
Yes |
Yes |
Yes |
Yes |
DUTCH |
N/A |
Yes |
Yes |
Yes |
Yes |
PORTUGUESE |
N/A |
Yes |
Yes |
Yes |
No |
KOREAN |
N/A |
No |
Yes |
No |
No |
SIMPLIFIED CHINESE |
N/A |
No |
Yes |
Yes |
No |
TRADITIONAL CHINESE |
N/A |
No |
Yes |
Yes |
No |
DANISH |
Yes |
No |
Yes |
No |
No |
SWEDISH |
Yes |
No |
Yes |
Yes |
No |
FINNISH |
N/A |
No |
Yes |
No |
No |
Υποστηριζόμενοι τύποι αρχείων
Format |
Version |
Adobe FrameMaker (MIF) |
Versions 3.0, 4.0, 5.0, and 6.0 and Japanese 3.0, 4.0, 5.0, and 6.0 (text only) |
ANSI Text |
7 and 8 bit |
ASCII Text |
7 and 8 bit |
DEC WPS Plus (DX) |
Versions through 3.1 |
DEC WPS Plus (WPL) |
Versions through 4.1 |
DisplayWrite 2 and 3 (TXT) |
All versions |
EBCDIC |
All versions |
Enable |
Versions 3.0, 4.0, and 4.5 |
First Choice |
Versions through 3.0 |
Framework |
Version 3.0 |
Hangul |
Versions 97, 2002, and 2005 |
IBM FFT |
All versions |
IBM Revisable Form Text |
All versions |
IBM Writing Assistant |
Version 1.01 |
Just System Ichitaro |
Versions 4.x through 6.x, 8.x through 13.x and 2004 |
JustWrite |
Versions through 3.0 |
Legacy |
Versions 1.1 |
Lotus AMI/AMI Professional |
Versions 3.1 |
Lotus Manuscript |
Version 2.0 |
Lotus Word Pro (non-Windows) |
Versions SmartSuite 97, Millennium, and Millennium 9.6 (text only) |
Lotus Word Pro (Windows) |
Versions SmartSuite 96, 97, and Millennium and Millennium 9.6 |
MacWrite II |
Version 1.1 |
MASS11 |
Versions through 8.0 |
Microsoft Rich Text Format (RTF) |
All versions |
Microsoft Word (DOS) |
Versions through 6.0 |
Microsoft Word (Mac) |
Versions 4.0 - 2004 |
Microsoft Word (Windows) |
Versions through 2007 |
Microsoft WordPad |
All versions |
Microsoft Works (DOS) |
Versions through 2.0 |
Microsoft Works (Mac) |
Versions through 2.0 |
Microsoft Works (Windows) |
Versions through 4.0 |
Microsoft Windows Write |
Versions through 3.0 |
MultiMate |
Versions through 4.0 |
Navy DIF |
All versions |
Nota Bene |
Version 3.0 |
Novell Perfect Works |
Version 2.0 |
Novell/Corel WordPerfect (DOS) |
Versions through 6.1 |
Novell/Corel WordPerfect (Mac) |
Versions 1.02 through 3.0 |
Novell/Corel WordPerfect (Windows) |
Versions through 12.0 |
Office Writer |
Versions 4.0 - 6.0 |
OpenOffice Writer (Windows and UNIX) |
OpenOffice version 1.1 and 2.0 |
PC-File Letter |
Versions through 5.0 |
PC-File+ Letter |
Versions through 3.0 |
PFS:Write |
Versions A, B, and C |
Professional Write Plus (Windows) |
Version 1.0 |
Q&A (DOS) |
Version 2.0 |
Q&A Write (Windows) |
Version 3.0 |
Samna Word |
Versions through Samna Word IV+ |
Signature |
Version 1.0 |
SmartWare II |
Version 1.02 |
Sprint |
Versions through 1.0 |
StarOffice Writer |
Version 5.2 (text only) and 6.x through 8.x |
Total Word |
Version 1.2 |
Unicode Text |
All versions |
UTF-8 |
All versions |
Volkswriter 3 and 4 |
Versions through 1.0 |
Wang PC (IWP) |
Versions through 2.6 |
WordMARC |
Versions through Composer Plus |
WordStar (Windows) |
Version 1.0 |
WordStar 2000 (DOS) |
Versions through 3.0 |
XyWrite |
Versions through III Plus |
Spreadsheet Formats
Format |
Version |
Enable |
Versions 3.0, 4.0, and 4.5 |
First Choice |
Versions through 3.0 |
Framework |
Version 3.0 |
Lotus 1-2-3 (DOS & Windows) |
Versions through 5.0 |
Lotus 1-2-3 (OS/2) |
Versions through 2.0 |
Lotus 1-2-3 Charts (DOS & Windows) |
Versions through 5.0 |
Lotus 1-2-3 for SmartSuite |
Versions 97 - Millennium 9.6 |
Lotus Symphony |
Versions 1.0, 1.1, and 2.0 |
Mac Works |
Version 2.0 |
Microsoft Excel Charts |
Versions 2.x - 7.0 |
Microsoft Excel (Mac) |
Versions 3.0 - 4.0, 98, 2001, 2002, 2004, and v.X |
Microsoft Excel (Windows) |
Versions 2.2 through 2007 |
Microsoft Multiplan |
Version 4.0 |
Microsoft Works (Windows) |
Versions through 4.0 |
Microsoft Works (DOS) |
Versions through 2.0 |
Microsoft Works (Mac) |
Versions through 2.0 |
Mosaic Twin |
Version 2.5 |
Novell Perfect Works |
Version 2.0 |
PFS:Professional Plan |
Version 1.0 |
Quattro Pro (DOS) |
Versions through 5.0 (text only) |
Quattro Pro (Windows) |
Version through 12.0 (text only) |
SmartWare II |
Version 1.02 |
StarOffice/OpenOffice Calc (Windows and UNIX) |
StarOffice versions 5.2 (text only) through 8.x and OpenOffice version 1.1 and 2.0 |
SuperCalc 5 |
Version 4.0 |
VP Planner 3D |
Version 1.0 |
Presentation Formats
Format |
Version |
Corel/Novell Presentations |
Versions through 12.0 |
Harvard Graphics (DOS) |
Versions 2.x and 3.x |
Harvard Graphics (Windows) |
Windows versions |
Freelance (Windows) |
Versions through Millennium 9.6 |
Freelance (OS/2) |
Versions through 2.0 |
Microsoft PowerPoint (Windows) |
Versions 3.0 through 2007 |
Microsoft PowerPoint (Mac) |
Versions 4.0 through v.x |
StarOffice/OpenOffice Impress (Windows and UNIX) |
StarOffice versions 5.2 (text only) and 6.x through 8.x (full support) and OpenOffice version 1.1 and 2.0 (text only) |
Database Formats
Format |
Version |
Access |
Versions through 2.0 |
dBASE |
Versions through 5.0 |
DataEase |
Version 4.x |
dBXL |
Version 1.3 |
Enable |
Versions 3.0, 4.0, and 4.5 |
First Choice |
Versions through 3.0 |
FoxBase |
Version 2.1 |
Framework |
Version 3.0 |
Microsoft Works (Windows) |
Versions through 4.0 |
Microsoft Works (DOS) |
Versions through 2.0 |
Microsoft Works (Mac) |
Versions through 2.0 |
Paradox (DOS) |
Versions through 4.0 |
Paradox (Windows) |
Versions through 1.0 |
Personal R:BASE |
Version 1.0 |
R:BASE 5000 |
Versions through 3.1 |
R:BASE System V |
Version 1.0 |
Reflex |
Version 2.0 |
Q & A |
Versions through 2.0 |
SmartWare II |
Version 1.02 |
Archive File Format
When filtering an archive file, all the contents of the files inside the archive will be exported to a single output file. This will also include the contents of all subfolders and files inside the archive file.
Supported Archive File Formats
Format |
Version |
GZIP |
|
Microsoft Binder |
Versions 7.0 - 97 (conversion of files contained in the Binder File is supported only on Windows) |
UUEncode |
|
UNIX Compress |
|
UNIX Tar |
|
ZIP |
PKWARE versions through 2.04g |
LZA Self-Extracting Compress |
|
LZH Compress |
|
Email Formats
Format |
Version |
Microsoft Outlook Folder (PST) |
Microsoft Outlook Folder and Microsoft Outlook Offline Folder files versions 97, 98, 2000, 2002, 2003, and 2007 |
Microsoft Outlook Message (MSG) |
Microsoft Outlook Message and Microsoft Outlook Form Template versions 97, 98, 2000, 2002, 2003, and 2007 |
MIME |
MIME-encoded mail messages. |
MIME Support Notes
The following formats are supported:
- MIME formats
- EML
- MHT (Web Archive)
- NWS (Newsgroup single-part and multi-part)
- Simple Text Mail (defined in RFC 2822)
- TNEF format
- MIME encodings, including
- base64 (defined in RFC 1521)
- binary (defined in RFC 1521)
- binhex (defined in RFC 1741)
- btoa
- quoted-printable (defined in RFC 1521)
- utf-7 (defined in RFC 2152)
- uue
- xxe
- yenc
In addition, the body of a message can be encoded in several ways. The following encodings are supported:
- HTML
- RTF
- TNEF
- Text/enriched (defined in RFC 1523)
- Text/richtext (defined in RFC1341)
- Embedded mail message (defined in RFC 822) - this is handled as a link to a new message
The attachments of a MIME message can be stored in many formats.
Other Formats
Format |
Version |
Executable (EXE, DLL) |
|
HTML |
Versions through 3.0, with some limitations |
MacroMedia Flash |
Macromedia Flash 6.x, MacroMedia Flash 7.x, and MacroMedia Flash Lite (text only) |
Microsoft Project |
Versions 98 - 2003 (text only) |
MP3 |
ID3 information |
vCard, vCalendar |
Version 2.1 |
Windows Executable |
|
WML |
Version 5.2 |
XML |
Text only |
Yahoo Instant |
|
Graphic Format
The following table lists the graphic formats that the AUTO_FILTER filter recognizes. This means that indexing a text column that contains any of these formats produces no error. As such, it is safe for the column to contain any of these formats.
Formats are categorized as either embedded graphics or standalone graphics. Embedded graphics are inserted or referenced within a document.
Note:
The AUTO_FILTER filter cannot extract textual information from graphics.
Supported Graphic Formats
Format |
Version |
Adobe Photoshop (PSD) |
Version 4.0 |
Adobe Illustrator |
Versions 7.0 and 9.0 |
Adobe FrameMaker graphics (FMV) |
Vector/raster through 5.0 |
Adobe Acrobat (PDF) |
Versions 1.0, 2.1, 3.0, 4.0, 5.0, 6.0, and 7.0 (including Japanese PDF) |
Ami Draw (SDW) |
Ami Draw |
AutoCAD Interchange and Native Drawing formats (DXF and DWG) |
AutoCAD Drawing Versions 2.5 - 2.6, 9.0-14.0, 2000i and 2002 |
AutoShade Rendering (RND) |
Version 2.0 |
Binary Group 3 Fax |
All versions |
Bitmap (BMP, RLE, ICO, CUR, OS/2 DIB, and WARP) |
All versions |
CALS Raster (GP4) |
Type I and Type II |
Corel Clipart format (CMX) |
Versions 5 through 6 |
Corel Draw (CDR) |
Versions 3.x - 8.x |
Corel Draw (CDR with TIFF header) |
Versions 2.x - 9.x |
Computer Graphics Metafile (CGM) |
ANSI, CALS NIST version 3.0 |
Encapsulated PostScript (EPS) |
TIFF header only |
GEM Paint (IMG) |
All versions |
Graphics Environment Mgr (GEM) |
Bitmap and vector |
Graphics Interchange Format (GIF) |
All versions |
Hewlett Packard Graphics Language (HPGL) |
Version 2.0 |
IBM Graphics Data Format (GDF) |
Version 1.0 |
IBM Picture Interchange Format (PIF) |
Version 1.0 |
Initial Graphics Exchange Spec (IGES) |
Version 5.1 |
JBIG2 |
JBIG2 graphic embeddings in PDF files |
JFIF (JPEG not in TIFF format) |
All versions |
JPEG (including EXIF) |
All versions |
Kodak Flash Pix (FPX) |
All versions |
Kodak Photo CD (PCD) |
Version 1.0 |
Lotus PIC |
All versions |
Lotus Snapshot |
All versions |
Macintosh PIC1 and PICT2 |
Bitmap only |
MacPaint (PNTG) |
All versions |
Micrografx Draw (DRW) |
Versions through 4.0 |
Micrografx Designer (DRW) |
Versions through 3.1 |
Micrografx Designer (DFS) |
Windows 95, version 6.0 |
Novell PerfectWorks (Draw) |
Version 2.0 |
OS/2 PM Metafile (MET) |
Version 3.0 |
Paint Shop Pro 6 (PSP) |
Windows only, versions 5.0 - 6.0 |
PC Paintbrush (PCX and DCX) |
All versions |
Portable Bitmap (PBM) |
All versions |
Portable Graymap (PGM) |
No specific version |
Portable Network Graphics (PNG) |
Version 1.0 |
Portable Pixmap (PPM) |
No specific version |
Postscript (PS) |
Levels 1-2 |
Progressive JPEG |
No specific version |
Sun Raster (SRS) |
No specific version |
StarOffice/OpenOffice Draw for Windows and UNIX |
StarOffice versions 5.2 (text only) through 8.x and OpenOffice version 1.1 and 2.0 |
TIFF |
Versions through 6 |
TIFF CCITT Group 3 and 4 |
Versions through 6 |
Truevision TGA (TARGA) |
Version 2 |
Visio (preview) |
Version 4 |
Visio |
Versions 5, 2000, 2002, and 2003 |
WBMP |
No specific version |
Windows Enhanced Metafile (EMF) |
No specific version |
Windows Metafile (WMF) |
No specific version |
WordPerfect Graphics (WPG and WPG2) |
Versions through 2.0 |
X-Windows Bitmap (XBM) |
x10 compatible |
X-Windows Dump (XWD) |
x10 compatible |
X-Windows Pixmap (XPM) |
x10 compatible |
Graphics Formats Limitations
AutoCAD drawing files are not supported on IBM AIX.