Requirement: It should be possible to specify different emphases in the search query.
1. Object definition
A 'Product' in our case has the following attributes:
- Title
- Alternate title
- Feature
- Feature-Title
- Feature-Content
- Tags
- External links
- Link description
2. Table defintion
We created a help table in the database. All tags assigned to the product are written space-delimited into the PRODUCT_TAG column, the same for
- PRODUCT_ALTERNATE_TITLE
- PRODUCT_FEATURE
- FEATURE_CONTENT
- EXTERNAL_LINK
- LINK_DESCRIPTION
The CREATE TABLE statement looks like:
CREATE TABLE PRODUCT (
ID NUMBER(38,0) NOT NULL,
PRODUCT_TITLE VARCHAR2(256),
PRODUCT_ALTERNATE_TITLE clob,
PRODUCT_DESCRIPTION VARCHAR2(4000),
PRODUCT_TAG clob,
PRODUCT_FEATURE clob,
FEATURE_CONTENT clob,
EXTERNAL_LINK clob,
LINK_DESCRIPTION clob
);
3. Index creation
First of all we need to grant the ctxapp to our user:
grant ctxapp to MY_USER;
Then we need to create a 'Lexer' to use the extended possibilities Oracle Text delivers:
"base-letter conversion, composite word indexing, case-sensitive indexing and alternate spelling for whitespace-delimited languages that have extended character sets."
begin
ctx_ddl.create_preference(
'german_lexer','basic_lexer'
);
ctx_ddl.set_attribute(
'german_lexer','composite','german'
);
ctx_ddl.set_attribute (
'german_lexer', 'SKIPJOINS', '-'
);
ctx_ddl.create_preference (
'german_wordlist', 'BASIC_WORDLIST'
);
ctx_ddl.set_attribute (
'german_wordlist', 'STEMMER', 'GERMAN'
);
end;
/
We have to create a preference for the index so that we can create an index over more than one column (MULTI_COLUMN_DATASTORE).
begin
ctx_ddl.create_preference(
preference_name => 'PRODUCT_DATA_STORE',
object_name => 'MULTI_COLUMN_DATASTORE'
);
ctx_ddl.set_attribute(
preference_name => 'PRODUCT_DATA_STORE',
attribute_name => 'COLUMNS',
attribute_value => '
PRODUCT_TITLE,
PRODUCT_DESCRIPTION,
PRODUCT_ALTERNATE_TITLE,
PRODUCT_TAG,
PRODUCT_FEATURE,
FEATURE_CONTENT,
EXTERNAL_LINK,LINK_DESCRIPTION'
);
end;
/
The creation of the index looks like:
CREATE INDEX PRODUCT_FT_IDX ON PRODUCT(PRODUCT_TITLE)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('
datastore PRODUCT_DATA_STORE
section group CTXSYS.AUTO_SECTION_GROUP
LEXER german_lexer
WORDLIST german_wordlist
STOPLIST CTXSYS.EMPTY_STOPLIST
SYNC (ON COMMIT)');
The 'index column' you have to specify is the first column from our MULTI_COLUMN_DATASTORE preference (here: PRODUCT_TITLE). The 'german_lexer' we created before handles all the special character stuff for us.
The 'german_wordlist' enables some extra functionality provided by oracle text:
"Use the wordlist preference to enable the query options such as stemming, fuzzy matching for your language. You can also use the wordlist preference to enable substring and prefix indexing, which improves performance for wildcard queries with CONTAINS and CATSEARCH."The stoplist parameter specifies which words ahouldn't be indexed by Oracle; We don't want it so we specify a empty stoplist (CTXSYS.EMPTY_STOPLIST).
4. Search Query
Now the the index is ready and waits for better queries to use it. Our query looks like this:
SELECT
score(1) as SCORE_VALUE,
ID_PRODUCT ,
PRODUCT_TITLE,
PRODUCT_DESCRIPTION
FROM PRODUCT
WHERE CONTAINS(PRODUCT_TITLE,'
(((autos) within (PRODUCT_TITLE))*10
ACCUM (($autos) within (PRODUCT_TITLE))*9
ACCUM ((%autos%) within (PRODUCT_TITLE))*2)
ACCUM (((autos) within (PRODUCT_DESCRIPTION))*9
ACCUM (($autos) within (PRODUCT_DESCRIPTION))*8
ACCUM ((%autos%) within (PRODUCT_DESCRIPTION))*2)
ACCUM (((autos) within (PRODUCT_ALTERNATE_TITLE))*8
ACCUM (($autos) within (PRODUCT_ALTERNATE_TITLE))*7
ACCUM ((%autos%) within (PRODUCT_ALTERNATE_TITLE))*2)
ACCUM (((autos) within (PRODUCT_TAG))*7
ACCUM (($autos) within (PRODUCT_TAG))*6
ACCUM ((%autos%) within (PRODUCT_TAG))*2)
ACCUM (((autos) within (PRODUCT_FEATURE))*6
ACCUM (($autos) within (PRODUCT_FEATURE))*5
ACCUM ((%autos%) within (PRODUCT_FEATURE))*1)
ACCUM (((autos) within (FEATURE_CONTENT))*5
ACCUM (($autos) within (FEATURE_CONTENT))*4
ACCUM ((%autos%) within (FEATURE_CONTENT))*1)
ACCUM (((autos) within (EXTERNAL_LINK))*4
ACCUM (($autos) within (EXTERNAL_LINK))*3
ACCUM ((%autos%) within (EXTERNAL_LINK))*1)
ACCUM (((autos) within (LINK_DESCRIPTION))*3
ACCUM (($autos) within (LINK_DESCRIPTION))*2.5
ACCUM ((%autos%) within (LINK_DESCRIPTION))',1)>0
Now it is possible to adjust the criterias:
- Is the searchstring found as the 'complete phrase': (autos) within (COLUMN)
- Is the baseword of the searchstring found as the 'complete phrase': ($auto) within (COLUMN); [This means: Searching for 'cars' should also find all occurrences of 'car']
- Is the searchstring anywhere inside the text: (%autos%) within (COLUMN)
For all of this possiblities for every column we now can adjust the index score value: ((autos) within (PRODUCT_TITLE))*10. Oracle Text uses this value to calculate the score value (there are more criteria like: amount of occurrences).
Now we can sort the resultset after score(1) or SCORE_VALUE to get a better sort order for the search.
Keine Kommentare:
Kommentar veröffentlichen