Sharepoint 2010 : FAST Search Server 2010 for SharePoint

4/16/2012 3:51:49 PM

FAST Search Server 2010 for SharePoint can bring Enterprise search capabilities into SharePoint with much of the ease of deployment, configuration, and management provided by SharePoint 2010.

1. Introducing FAST Search Server 2010

FAST Search Server 2010 for SharePoint can be deployed across multiple servers to satisfy requirements for redundancy, performance, and capacity. Both SharePoint Search and FAST Search use the same crawling infrastructure, since SharePoint Search now incorporates the modularity of FAST Search. Adding more servers can be used to scale document volume, query volume, and processing power for content, query, and results. You can deploy, configure, and manage FAST Search through the user interface, Windows PowerShell cmdlets, XML configuration files, and command-line operations.

FAST Search Server 2010 for SharePoint uses SharePoint 2010 for query servers and crawling content, but adds additional servers in a FAST farm for processing content, producing index partitions, and processing queries. To save costs, a single FAST farm can be dedicated to search and shared across multiple SharePoint farms. As you might guess, a wide range of farm topologies are possible that can handle both simple and demanding requirements.

For larger environments, FAST Search Server 2010 for SharePoint uses rows and columns of servers to scale out to potentially unlimited content size and query volume, including built-in fault tolerance. To increase content processing capacity almost linearly, you add more columns of servers. To increase the query processing ability, you add more rows of servers. Fault tolerance is provided by deploying a minimum of two rows. The diagram in Figure 1 illustrates a configuration of FAST servers that can process approximately 100 million items.

Figure 1. FAST Search Server 2010 for SharePoint scaling diagram

The following is a list of some of the enhancements provided by FAST Search Server 2010 for SharePoint.

Can search in any language
Can detect 84 languages, although not all linguistic tools are provided for all languages
Lemmatization (variations) improves recall (query for better includes good)
Phrase search includes stop words (noise words), but keyword queries do not
Only nouns and adjectives are expanded for precision (book > books not booked)
Deeper refinements with exact counts of items including duplicates; retrieves metadata from entire results list, not just first 50 items
Ability to sort results on any metadata
Entity extraction

Architecturally, the FAST add-on products modify the content processing pipeline to be easily extensible with custom plug-ins. With SharePoint Search, you created a Search Service Application (SSA). With FAST Search for SharePoint, you will create two FAST Search Service Applications, the FAST Search Connector and the FAST Search Query.

These SSAs create the connections between the SharePoint Search components that crawl content and construct queries and the FAST Search components that process the content indexing and querying. These FAST SSAs essentially modify the processing “pipeline” to divert the indexing and querying processing from the SharePoint Search layer of the architecture to the more powerful FAST layer of servers.

This FAST Content Processing pipeline is constructed of a series of customizable plug-ins that can be managed with the user interface, cmdlets, or config files. In many stages of the pipeline, you will find plug-ins available to provide standard SharePoint or enhanced FAST capabilities. Out-of-box (OOB) plug-ins can be enabled or disabled as needed. The following plug-ins are enabled by default.

Format conversion.
Language detection and encoding for 84 languages.
Lemmatizer: Linguistics normalization similar but superior to the stemming performed by SharePoint Search. Lemmatizing uses a dictionary to reduce words to their basic form as in ate = eat. FAST Search Server for SharePoint 2010 performs linguistic processing for items returned by the crawl process before those items are indexed and for items in the query before the query is processed.
Tokenizer: The word breakers for FAST are better than the SharePoint Search processes, particularly in non-English languages.
Entity extraction: The entity extraction process detects various properties as known entities (such as names, locations, and dates) from the retrieved documents, and maps them as metadata into managed properties even if they are not natively defined as metadata within the documents. Users can now query these as properties or metadata, and they also can drill down or refine results based on these properties. You can create custom extractors using, for example, a dictionary (list) of product names, projects, or organizational units relevant to your organization.
DateTimeNormalizer.
Vectorizer: Creates a document signature that represents a document’s content in a way that allows comparison between documents for similarity searching.
WebAnalyzer: Anchor text and link cardinality analysis.
PropertiesMapper: Maps metadata to crawled properties.
PropertiesReporter: Reports detected properties or metadata.

Other optional plug-ins that are provided but are disabled by default include the following.

XML Properties Mapper
Offensive Content Filter
Verbatim extractor loads dictionary for custom extraction like a list of product names, projects, or departments
Field Collapsing

2. Architecture and Topology

FAST Search Server 2010 for SharePoint extends the three-tiered architecture of SharePoint Search 2010 into multitiered distributed farm architecture. The following sections describe the tiers and their components.

2.1. SharePoint WFE Servers

The standard SharePoint 2010 WFE servers will continue to provide the Query and Federation Object Model (OM), the Query Web service, and the Search Centers with the accompanying Web Parts customized to accept queries from and present results to users. The use of FAST Search should be transparent to users except for enhanced results.

Site collection administrators will be able to manage FAST Search functionality in Site Settings, just as they did for SharePoint Search. The number of WFEs will scale as required. You should consider using one or more WFEs dedicated as crawl targets if the crawls levels impact performance for users.

2.2. SharePoint Application Servers

The FAST Content SSA and Query SSA will be provided by SharePoint 2010 application servers in the parent farm. Depending on the workload, these servers may be able to provide other SharePoint services. These services can scale for performance and resiliency by adding query components to the Query SSA and crawl components to the Content SSA.

The Query SSA provides the connection to FAST query processing and both the query and the crawl components for people search. The Content SSA provides the connection to FAST crawl and index processing.

Both the parent SharePoint farm and remote SharePoint farms will interface with FAST using the SSAs on the parent farm. At this layer, FAST differs from SharePoint Search in that the SharePoint SSA provided both the crawl and query components, whereas FAST requires separate SSAs for each component.

2.3. FAST Application Servers

This layer provides the following components that can be hosted on one or more servers as required for performance and resiliency.

Administration
Content Distributor
Item Processing
Web Analyzer
Indexing Dispatcher
Indexer
Query Matching
Query Processing

2.4. Database Servers

SQL Server 2008 Enterprise servers will provide the following search databases in clusters and/or mirrored configurations.

SharePoint Central and Site Administration
Search Admin (FAST)
Search Admin (Content and Query SSA including people search)
Property (Query SSA people search)
Search Crawl (Query SSA people search)

Note:

With FAST Search for SharePoint, the metadata is stored in optimized files on file system, not in SQL databases.

2.5. Scaling FAST Application Layer (Cluster)

All the FAST Search Server 2010 for SharePoint components can run on a single server. However, depending on how you scale out to run the components on one or more servers, the system can

Index a larger number of items
Handle more item updates
Reduce indexing latency
Respond to more queries per second

Figure 2 shows an example of a scaled-out cluster of components for the purpose of this discussion.

Figure 2. FAST Search components cluster

Index column The complete searchable index can be split into multiple disjoint index columns (or partitions) when the complete index is too large to reside on one server. Queries will be evaluated against all index columns within the search cluster, and the results from each index column are merged into the final results list. Unlike SharePoint Search 2010, adding an index partition (column) here requires a complete re-indexing of all content, so accuracy in your original design is very important.
Search row A search row contains set of search nodes (servers) hosting index partitions that together contain all items indexed within the search cluster. Adding search rows provides increased query performance and fault tolerance.
Primary and backup indexer rows An index row provides an indexer for each partition. When you add a row of indexer nodes, they are configured as backup indexer nodes for fault tolerance. Both rows of indexers produce the same set of indexes, but only the primary indexer distributes the indexes to the query matching nodes.

2.6. Indexing Connectors

FAST Search Server 2010 for SharePoint uses the Connector Framework for indexing content, just as SharePoint 2010 does. In fact, most content sources can be crawled with the SharePoint 2010 connectors. However, FAST Search Server 2010 for SharePoint does offer three advanced SharePoint specific indexing connectors. The choice of indexing connector is influenced by the kind of content that you want to crawl, by the specific requirements of your organization, and (sometimes) just by an administrator’s preference.

Even though these indexing connectors are known as the FAST Search connector, remember that it is a collection of connectors, not one separate indexing connector. As the FAST Search connector is associated with one or more content sources through the FAST Search Content SSA, the individual indexing connectors are used.

This FAST Search connector offers the following connectors with options for Web, database, and Lotus Notes content.

2.6.1. Web content

SharePoint Sites Always use the SharePoint connector.
File shares Always use the File share connector.
Exchange Always use the Exchange connector.
People profiles Always use the People profiles connector. (Profiles are crawled through the FAST Search Query Search Service Application.)
Websites If you have a limited amount of websites without dynamic content, use the website indexing connector. However, use the FAST Search Web crawler when
- You have many websites to crawl.
- The website content contains dynamic data, including JavaScript.
- The organization needs access to advanced Web crawling, configuration, and scheduling options.
- You want to crawl RSS Web content.
- The website content uses advanced logon options.

2.6.2. Database CONTENT

Use the Business Data Catalog-based indexing connectors if

The preferred configuration method is Microsoft SharePoint Designer 2010.
You want to use time stamp–based change detection for incremental database crawls.
The preferred management method is SharePoint 2010 Central Administration.
You want to enable crawling based on the change log. This requires directly modifying the connector model file and creating a stored procedure in the database.

Use the FAST Search database connector when

The preferred configuration method is using SQL queries.
Advanced data joining operation options through SQL queries are required.
You want to use advanced incremental update features. This connector uses checksum-based change detection for incremental crawls if there is no update information available. It also supports time stamp–based change detection and change detection based on update and delete flags.

2.6.3. Lotus Notes Content

Use the Lotus Notes connector when

The preferred management is Central Administration.

Use the FAST Search Lotus Notes connector when

Full Lotus Notes security support is required, including support for Lotus Notes roles.
You want to crawl Lotus Notes databases as attachments.

2.6.4. Line-of-Business Data Content

Use Business Data Catalog–based connectors when the data in your content source contains data in line-of-business applications and when you want to enable crawling based on the change log. This requires directly modifying the connector model file and creating a stored procedure in the database.