Sharepoint 2010 : Managing the Search Service Topology

4/16/2012 3:49:43 PM

A single SharePoint 2010 farm can provide multiple instances of search service applications. Each service instance can support multiple component instances in a topology that can be designed for performance, for resiliency, or to isolate information.

The default topology of a new search service application will have a single instance of each component, all components on one application server, and all databases on one database server. The four components are as follows.

Administration interface
Crawl component (the crawler)
Database set, broken into three databases: administration, crawl, and property
Index partition

This topology can be changed using the Modify Topology link located on the Farm-Wide Search Administration page or from the Search Service Administration page. The SharePoint Search topology cannot be changed in stand-alone installations.

The Search Application Topology Web Part presents the initial topology in the lower portion of the Search Administration page, shown in Figure 1.

Figure 1. Search Application Topology Web Part with initial topology

1. Scaling Considerations

The reason that the SharePoint 2010 product team broke out the search components was to provide robust scaling for the different aspects of the crawl and indexing process. Different implementations will find bottlenecks being created at different points in the overall indexing process, so by enabling the scaling of the different components, you can create a topology that fits your needs.

For example, if you need to reduce crawl times and increase index freshness, then consider performing the following actions.

If the crawl component is overwhelmed, add more crawl component servers.
If the crawl database is I/O bound on the SQL server, add crawl databases on same SQL server.
If the SQL server bottleneck is memory or CPU, add SQL servers with additional crawl databases.

To reduce query latency, consider the following remedies.

If full-text query latency is high, partition the index into smaller partitions. Each partition can contain ~10M items. You will generally need enough memory to fit 33 percent of the index in RAM to meet sub-second latency responses.
If query latency is high because of the number of queries, add query components to mirror index partitions.
If property query latency is high because the property database is I/O bound, add property databases to the same SQL server.
If property query latency is high because of a memory or CPU bottleneck on SQL server, add property databases to additional SQL servers.

Note:

You can use the SharePoint Backend Query Latency report provided by SharePoint 2010 to help determine where scaling out is needed to improve query latency.

To increase availability for query process, consider these suggestions.

Deploy multiple query components with multiple index partitions.
Deploy multiple mirrored query components and property store databases.
Use clustered or mirrored database servers to host search databases.

To increase availability for crawling process, add crawl databases supported by multiple crawl components with redundant crawl servers. Normally, two crawl components can support a crawl database.

2. Modifying the Topology

To make any change to the topology, click the Modify button at the top of the page to open the topology management page partially shown in Figure 2 . The view of components on this page can be organized by components or by server hosting the components. From the New menu, select the component that you want to create and add to the search application.

Figure 2. Manage Search Topology page menus

Changes to the topology are defined in the appropriate dialog boxes: New, Properties, or Delete. However, changes are not implemented until the Apply Topology Changes button is clicked. Clicking this button starts the SharePoint timer job, which accomplishes the actions required. You can make multiple changes to the search topology and then apply them all at once by clicking the Apply Topology Changes button. Because many changes can impact performance during their application, you might want to choose to define the changes in the management pages but use Windows PowerShell scripts to schedule their implementation.

To create a new component, click the New link in the upper-left corner of the page and select the appropriate component from the drop-down list. To edit the properties or delete a component, use the context menu of the component. There must be at least one instance of each component, so the Delete command will not be available in some instances. Also, for the query component, there is an extra menu item, Add Mirror, as shown in Figure 3.

Figure 3. Context menu for the query component

3. Crawl Databases

Crawl databases contain configurations and instructions required by the crawl component, tables used during crawls to queue items to be crawled, and log information used in crawl logs. Since a new crawl component must be associated with an existing or pending database, you should create the new database first.

The database can be hosted on a SQL server other than the default server. See the scaling considerations above to determine placement of the databases. Microsoft strongly recommends Windows authentication over the SQL authentication option. If you have mirrored SQL servers, you can associate the database with a failover database server. As discussed earlier, if you select the Dedicate This Crawl Store To Hosts As Specified In Host Distribution Rules option, the database will not participate in the automatic host distribution process. As long as there is more than one crawl database, any crawl database can be specified in a Host Distribution Rule.

Note:

Crawl databases that have had crawl components associated with them cannot be deleted until the crawl components have been associated with another database.

4. Crawl Components

Use the New menu to create a new crawl component. The configuration is very simple. Crawl component names are zero-based and generated as they are built, not while they are pending. Select a member of the farm as the server to host this crawl component. Only servers with a complete installation will be listed. Select the Associated Crawl Database. If this database already has a crawl component associated with it, then you are adding a crawl component for resiliency. Since all of the crawl instructions including state are stored in a SQL database, one crawl component can easily pick up where another left off. If you are adding one or more crawl components to a new database, then you are spreading the crawl workload for performance.

Finally, you must specify the location on the server file system where the index files will be created and stored before they are propagated to the query servers. Unlike previous versions, the space requirements for this location are relatively small and constant, because the entire index is not stored on the crawl component server.

Note:

You should exclude all index locations from antivirus scans.

To redistribute existing content to the new database, you must use the Auto Host Distribution tool, which is available by clicking the If You Would Like The System To Analyze Your Current Distribution And Make Recommendations For Redistribution, Click Here link on the Host Name view of the crawl logs. The tool will give an estimate of the time required to redistribute the content. Crawls will be paused during this activity, and the redistribution status will be displayed on the Host Distribution Rules page. On the Search Administration dashboard, you will see that the crawl status is Paused For: Refactoring. Crawl activity will resume automatically after the content is redistributed.

If you remove the crawl components from a database and delete the database, the content will automatically be redistributed to other existing databases. Crawls will be paused during this activity as well.

5. Property Databases

New property databases are created to improve query performance. They can also be mirrored on SQL servers for resiliency. When creating a new property database, you not only get to select the SQL server instance, but you also can use your own naming convention or accept the recommended name. Again, either Windows or SQL authentication will work; Windows authentication is recommended. If you have SQL mirroring established, you may enter the name of the failover database server.

The name generated for the original property database for our demo farm was the name of the search service application appended with “PropertyStoreDB Appended With The Database GUID: Search_Service_Application_PropertyStoreDB_df97fb03501a4066a34f0ffffefbfc95.” The generated name of the second database was SearchPropertyDB1. The naming convention for the Crawl Database followed a similar convention.

11.7.6. Index Partitions and Query Components

You can split the index into smaller partitions to speed up full-text queries. Each partition can contain approximately 10 million items, but you should plan on having enough memory to fit 33 percent of the index in RAM to meet sub-second latency responses. The index size and full-text queries must be monitored so that the topology design is kept current with the growth of search activities and performance standards.

To split the index, select Index Partition And Query Component from the New menu (though note that the dialog box that opens is titled Add Query Component). Select the farm member to host the index partition and query component and then select the existing property database with which this index will be associated. Then specify the location for the index file. Unlike the crawl component, the index component will store a complete copy of this partition of the index, so sufficient space must be reserved to handle the current partition and estimated growth.

Finally, there is a Set This Query Component As Failover Only option that is available even for the first query server associated with this index.

The command to add mirrors is tucked away in the context menu for the query component, as you saw in Figure 11-23 . The property box for the new query component is the same as before, except the Associated Property Database is preconfigured and inactive. The failover option is still available. You would add mirrors for the index partitions to reduce query latency due to the number of queries. Note that the term “mirror” is used even though these are index files, not SQL databases.

Other -----------------

- Microsoft Dynamics CRM 4.0 Accelerators : Notifications Accelerator

- Microsoft Dynamics CRM 4.0 Accelerators : Newsfeed Business Productivity Accelerator

- Recovering from a Disaster in an Exchange Server 2007 Environment : Recovering Exchange Application and Exchange Data

- Recovering from a Disaster in an Exchange Server 2007 Environment : Recovering from a Boot Failure & Recovering from a Complete Server Failure

- System Center Configuration Manager 2007 : Inside the ConfigMgr Database

- System Center Configuration Manager 2007 : Components and Communications

- Microsoft Content Management Server : Increasing Search Accuracy by Generating Search Engine Specific Pages

- Microsoft Content Management Server : Configuring Templates to Allow Postings to Return Accurate Last Modified Time

- Active Directory Domain Services 2008 : Modify a Computer Object’s Delegation Properties & Modify a Computer Object’s Location Properties

- Active Directory Domain Services 2008 : Modify a Computer Object’s General Properties & View a Computer Object’s Operating System Properties