FAST Search Server 2010 for SharePoint can bring
Enterprise search capabilities into SharePoint with much of the ease of
deployment, configuration, and management provided by SharePoint 2010.
1. Introducing FAST Search Server 2010
FAST Search Server 2010
for SharePoint can be deployed across multiple servers to satisfy
requirements for redundancy, performance, and capacity. Both SharePoint
Search and FAST Search use the same crawling infrastructure, since
SharePoint Search now incorporates the modularity of FAST Search. Adding
more servers can be used to scale document volume, query volume, and
processing power for content, query, and results. You can deploy,
configure, and manage FAST Search through the user interface, Windows
PowerShell cmdlets, XML configuration files, and command-line
operations.
FAST Search Server 2010
for SharePoint uses SharePoint 2010 for query servers and crawling
content, but adds additional servers in a FAST farm for processing
content, producing index partitions, and processing queries. To save
costs, a single FAST farm can be dedicated to search and shared across
multiple SharePoint farms. As you might guess, a wide range of farm
topologies are possible that can handle both simple and demanding
requirements.
For larger environments, FAST
Search Server 2010 for SharePoint uses rows and columns of servers to
scale out to potentially unlimited content size and query volume,
including built-in fault tolerance. To increase content processing
capacity almost linearly, you add more columns of servers. To increase
the query processing ability, you add more rows of servers. Fault
tolerance is provided by deploying a minimum of two rows. The diagram in
Figure 1 illustrates a configuration of FAST servers that can process approximately 100 million items.
The following is a list of some of the enhancements provided by FAST Search Server 2010 for SharePoint.
Can search in any language
Can detect 84 languages, although not all linguistic tools are provided for all languages
Lemmatization (variations) improves recall (query for better includes good)
Phrase search includes stop words (noise words), but keyword queries do not
Only nouns and adjectives are expanded for precision (book > books not booked)
Deeper
refinements with exact counts of items including duplicates; retrieves
metadata from entire results list, not just first 50 items
Ability to sort results on any metadata
Entity extraction
Architecturally, the FAST add-on products modify the content processing pipeline to be easily extensible with custom plug-ins.
With SharePoint Search, you created a Search Service Application (SSA).
With FAST Search for SharePoint, you will create two FAST Search
Service Applications, the FAST Search Connector and the FAST Search
Query.
These SSAs create
the connections between the SharePoint Search components that crawl
content and construct queries and the FAST Search components that
process the content indexing and querying. These FAST SSAs essentially
modify the processing “pipeline” to divert the indexing and querying
processing from the SharePoint Search layer of the architecture to the
more powerful FAST layer of servers.
This FAST Content Processing
pipeline is constructed of a series of customizable plug-ins that can be
managed with the user interface, cmdlets, or config files. In many
stages of the pipeline, you will find plug-ins available to provide
standard SharePoint or enhanced FAST capabilities. Out-of-box (OOB)
plug-ins can be enabled or disabled as needed. The following plug-ins
are enabled by default.
Format conversion.
Language detection and encoding for 84 languages.
Lemmatizer:
Linguistics normalization similar but superior to the stemming
performed by SharePoint Search. Lemmatizing uses a dictionary to reduce
words to their basic form as in ate = eat. FAST
Search Server for SharePoint 2010 performs linguistic processing for
items returned by the crawl process before those items are indexed and
for items in the query before the query is processed.
Tokenizer: The word breakers for FAST are better than the SharePoint Search processes, particularly in non-English languages.
Entity
extraction: The entity extraction process detects various properties as
known entities (such as names, locations, and dates) from the retrieved
documents, and maps them as metadata into managed properties even if
they are not natively defined as metadata within the documents. Users
can now query these as properties or metadata, and they also can drill
down or refine results based on these properties. You can create custom
extractors using, for example, a dictionary (list) of product names,
projects, or organizational units relevant to your organization.
DateTimeNormalizer.
Vectorizer:
Creates a document signature that represents a document’s content in a
way that allows comparison between documents for similarity searching.
WebAnalyzer: Anchor text and link cardinality analysis.
PropertiesMapper: Maps metadata to crawled properties.
PropertiesReporter: Reports detected properties or metadata.
Other optional plug-ins that are provided but are disabled by default include the following.
2. Architecture and Topology
FAST Search Server 2010
for SharePoint extends the three-tiered architecture of SharePoint
Search 2010 into multitiered distributed farm architecture. The
following sections describe the tiers and their components.
2.1. SharePoint WFE Servers
The standard SharePoint
2010 WFE servers will continue to provide the Query and Federation
Object Model (OM), the Query Web service, and the Search Centers with
the accompanying Web Parts customized to accept queries from and present
results to users. The use of FAST Search should be transparent to users
except for enhanced results.
Site collection administrators will be able to manage FAST Search functionality
in Site Settings, just as they did for SharePoint Search. The number of
WFEs will scale as required. You should consider using one or more WFEs
dedicated as crawl targets if the crawls levels impact performance for
users.
2.2. SharePoint Application Servers
The FAST Content SSA and
Query SSA will be provided by SharePoint 2010 application servers in
the parent farm. Depending on the workload, these servers may be able to
provide other SharePoint services. These services can scale for
performance and resiliency by adding query components to the Query SSA
and crawl components to the Content SSA.
The Query SSA provides the
connection to FAST query processing and both the query and the crawl
components for people search. The Content SSA provides the connection to
FAST crawl and index processing.
Both the parent
SharePoint farm and remote SharePoint farms will interface with FAST
using the SSAs on the parent farm. At this layer, FAST differs from
SharePoint Search in that the SharePoint SSA provided both the crawl and
query components, whereas FAST requires separate SSAs for each
component.
2.3. FAST Application Servers
This layer provides the
following components that can be hosted on one or more servers as
required for performance and resiliency.
Administration
Content Distributor
Item Processing
Web Analyzer
Indexing Dispatcher
Indexer
Query Matching
Query Processing
2.4. Database Servers
SQL Server 2008 Enterprise servers will provide the following search databases in clusters and/or mirrored configurations.
SharePoint Central and Site Administration
Search Admin (FAST)
Search Admin (Content and Query SSA including people search)
Property (Query SSA people search)
Search Crawl (Query SSA people search)
Note:
With FAST Search for SharePoint, the metadata is stored in optimized files on file system, not in SQL databases.
2.5. Scaling FAST Application Layer (Cluster)
All the FAST
Search Server 2010 for SharePoint components can run on a single
server. However, depending on how you scale out to run the components on
one or more servers, the system can
Figure 2 shows an example of a scaled-out cluster of components for the purpose of this discussion.
Index column
The complete searchable index can be split into multiple disjoint index
columns (or partitions) when the complete index is too large to reside
on one server. Queries will be evaluated against all index columns
within the search cluster, and the results from each index column are
merged into the final results list. Unlike SharePoint Search 2010,
adding an index partition (column) here requires a complete re-indexing
of all content, so accuracy in your original design is very important.
Search row
A search row contains set of search nodes (servers) hosting index
partitions that together contain all items indexed within the search
cluster. Adding search rows provides increased query performance and
fault tolerance.
Primary and backup indexer rows
An index row provides an indexer for each partition. When you add a row
of indexer nodes, they are configured as backup indexer nodes for fault
tolerance. Both rows of indexers produce the same set of indexes, but
only the primary indexer distributes the indexes to the query matching
nodes.
2.6. Indexing Connectors
FAST Search Server 2010 for SharePoint uses the Connector
Framework for indexing content, just as SharePoint 2010 does. In fact,
most content sources can be crawled with the SharePoint 2010 connectors.
However, FAST Search Server 2010 for SharePoint does offer three
advanced SharePoint specific indexing
connectors. The choice of indexing connector is influenced by the kind
of content that you want to crawl, by the specific requirements of your
organization, and (sometimes) just by an administrator’s preference.
Even though these indexing
connectors are known as the FAST Search connector, remember that it is a
collection of connectors, not one separate indexing connector. As the
FAST Search connector is associated with one or more content sources
through the FAST Search Content SSA, the individual indexing connectors
are used.
This FAST Search connector offers the following connectors with options for Web, database, and Lotus Notes content.
2.6.1. Web content
SharePoint Sites Always use the SharePoint connector.
File shares Always use the File share connector.
Exchange Always use the Exchange connector.
People profiles Always use the People profiles connector. (Profiles are crawled through the FAST Search Query Search Service Application.)
Websites
If you have a limited amount of websites without dynamic content, use
the website indexing connector. However, use the FAST Search Web crawler
when
You have many websites to crawl.
The website content contains dynamic data, including JavaScript.
The organization needs access to advanced Web crawling, configuration, and scheduling options.
You want to crawl RSS Web content.
The website content uses advanced logon options.
2.6.2. Database CONTENT
Use the Business Data Catalog-based indexing connectors if
The preferred configuration method is Microsoft SharePoint Designer 2010.
You want to use time stamp–based change detection for incremental database crawls.
The preferred management method is SharePoint 2010 Central Administration.
You
want to enable crawling based on the change log. This requires directly
modifying the connector model file and creating a stored procedure in
the database.
Use the FAST Search database connector when
The preferred configuration method is using SQL queries.
Advanced data joining operation options through SQL queries are required.
You
want to use advanced incremental update features. This connector uses
checksum-based change detection for incremental crawls if there is no
update information available. It also supports time stamp–based change
detection and change detection based on update and delete flags.
2.6.3. Lotus Notes Content
Use the Lotus Notes connector when
Use the FAST Search Lotus Notes connector when
Full Lotus Notes security support is required, including support for Lotus Notes roles.
You want to crawl Lotus Notes databases as attachments.
2.6.4. Line-of-Business Data Content
Use Business Data
Catalog–based connectors when the data in your content source contains
data in line-of-business applications and when you want to enable
crawling based on the change log. This requires directly modifying the
connector model file and creating a stored procedure in the database.