Saturday, July 17, 2010

SharePoint Search Service not working

Recently, I faced an issue on the SharePoint search service in our organization.

In our server farm, we have two Web Front Ends (which also play the role of query servers) and one dedicated index server.

I spent a lot of time investigating the issue, and was almost breaking my head at the end of it and finally it turned out to be quite a small issue.

Let me start by explaining what the error was and how I proceeded.

I noticed this error, because, the users of our portal complained that some of the web part which were using the SharePoint Search queries were not working anymore and not returning any results.

On investigation, I found that that the SharePoint Search was indeed not working. I checked the following things

  1. Go to SharePoint Central Admin -> Operations -> Services on the Server and check if the Search service is Started on Index, Query servers. This was OK
  2. From the same page in Central Admin, checked if the Search Service was configured correctly on the Index and Query Servers, by clicking on Office SharePoint Server Search and checking the following properties: Farm Search Service Account, Default Index File Location, Index Propagation Location etc. This was OK
  3. I checked if the Farm Search Service Account Credentials were not correct or expired. This was OK as well
  4. I checked if the Default Index Location and Index Propagation Location were setup correctly. I checked if the index propagation location on both the query servers were shared and the Permissions had been given to the Search Service account on this Share to Read/Write/Modify. Also I checked if the Search Service account was member of the Administrator group in the Server and also member of the Farm Administrator Group WSS_ADMIN_WPG. This was OK too
  5. Checked the Event log in all the three Servers (2 query and 1 index) and found that the following error message was being logged on the Event Log every one minute on both the web front ends which were also the Query Servers. Whereas on index server there was no specific error message. "First error message which indicated something is wrong with search service, but what does it mean??"

6. Also later I saw that there was another Event Log error message that was being logged on the query servers periodically, which stated something like below "Query Server XXXX has been taken out of the rotation due to this error: The system cannot find the file specified...". Again the error message indicated, something being wrong on Query Server, but could not figure out what exactly it was...aaahhh

7. I spent a lot of time googling for this error message, and found that it was not new and some other people had faced it, but none of the solutions proposed were working for my scenario as different setups seemed to be OK. Some talked about recreating SSP to fix these type of errors. However, I didn't want to take such a big action, as it would mean lot of work recreating SSP, Audiences, User Profile Configuration etc...Worst of all this would mean a interruption time for the SharePoint users. And ours is a internet facing site, being used by 40,000+ users across the globe and this will mean big impact.

8. I then went to SharePoint Central Admin -> Shared Services Administration (clicked on my SSP) -> Search Settings and checked the Indexing Status, Propagation Status and Errors in the Log. I found that the Indexing was working well, however the Propagation Status indicated "Propagating" perpetually.

9. I checked the SharePoint ULS log and i could not find any additional information for this error more than what I had already found in the Event Logs.

At this stage, though I had not figured out what the problem exactly was, the investigation indicated the problem being in the direction of Query Server role. So i decided to stop all the Search services and start again from Scratch to setup the Search Service, even though it meant, all my Indexed data would have to be re-crawled. Please note, I decided to take this action, only because it was least impacting to End users and also, if re-crawling worked well, then all my index would be re-created as before.

I will write another post on this topic on what steps, I followed to re-setup the Search Service Service, as one has to be really careful about this, to not to miss anything and follow a specific sequence in stopping and starting the services in Query and Index servers.

So after, I had re-setup the search service, i checked if the search service was ok and to my desperation, the same errors were being logged again on the Query Server's Event Log and it was Status-quo.

I tried all possible things to check and recheck again the index propagation shares, permissions, search service accounts. I even went to the SharePoint Search Database on the SQL server that is created by SharePoint by default to store Meta data related to Search and checked the data in different tables and tried to investigate, where the problem might really be. More specifically, I was checking the entries in the MSSPropagationSearchServerTable. I could not find anything specific here also.

Finally, we decided to have a Microsoft consultant on board to do a check. After a good amount of investigation by this consultant, we could definitely say that

· Servers in Farm are patched equally

· Services on servers are running fine

· Office Search settings configured correctly

· SharePoint services search settings configured correctly

· Services accounts are OK

· Content sources are OK

Issue Found Finally - Read Below!!

But what missed my eye, but was found by the consultant was SharePoint Timer (SP Timer) Windows service was not started on the Index Server. This situation happened because, the SP Timer service Startup Type was set to "Manual". Due to which, when we did a Update Installation on our Farm and rebooted our different servers, this Service didn't start automatically on the index server and was not noticed until we faced this issue. This is really weird as well, that SP Timer service "Startup Type" on Web front ends (query servers) was set to automatic, following the default installation of SharePoint, but the index server had it in "Manual mode".

Indeed, after we made this service automatic and started this service. In some time the Index propagation started working normally and Query servers were added back to the rotation.

I learnt a lot things during this investigation, but here are a few quick ones

- Always check and ensure that the different Windows Service working in the background of SharePoint core Features, are running in "Automatic" Startup Mode.
- SharePoint Error message do not always precise or direct you towards the right problem. Something I already knew from my pretty vast experience in SharePoint 2003 and 2007. I hope it is better in 2010
- A second pair of eyes can help you to look at the problem from different outlook and could help spot something you have missed. :)

I hope, this post could help somebody else in similar situation.

Cheers!!

No comments:

Post a Comment