Monday, June 23, 2008
« PowerShell Quickie - What version of Sha... | Main | Changing ODC links in Excel Services fro... »

 

image

Recently I went through the process of indexing a subversion source code repository with SharePoint. I thought I'd share those steps as OOTB SharePoint won't index ps1, cs or vb files.

Setting up search to index these files works either if the files themselves live in a document library or are external to SharePoint. The process to index files from other source control systems will vary depending on how you can get access to the source files. If you need to index SourceSafe you can set up what's called a mirror directory that automatically save the files from your repositories on disk and I suspect you can index Team Foundation Server via its Web Access URL's although I've not tried that.

The subversion side of things is pretty easy, pick the repository you want and export the latest version using the svn client i.e. svn export svn://devhosting/svn/webparts d:\SVNExport\webparts. Script the export of each repository and then schedule it.

On the SharePoint side you set up a new content source to crawl the directories.

In this case the Indexing is on a separate machine so we enter the UNC path. Make sure the content access account has read rights to the share. If needed you can setup separate credentials for this source.

In the SSP on the Search Setting page, click New Content Source under Content source and crawl schedules

image

image

The problem now is if you start a full crawl typically only the .txt files are indexed as the SharePoint indexers have no idea what to do with file extensions it doesn't recognise.

There are a couple of steps to getting new file extensions indexed. This assumes you are a Search Service administrator.

First add the extension to File Types

1. On the Search Administration page, click File Types under Crawling.

image

2. On the Manage File Types page, click New File Type.

image

3. On the Add File Type page, type the file name extension in the File extension box for the file type that you want to add.
To search for PowerShell files, type ps1
Do not include the period (.) character in front of the file name extension.

4.Click OK.

image

5. Rinse and repeat for each file type that you want to add.

The second step in getting the file extensions recognised is to add it to the registry entries the SharePoint Server Search service reads when it starts up. This key is located at

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension

Add a new key, enter the extension including the dot i.e. .ps1.

Save and set its default value to be {4A3DD7AB-0A6B-43B0-8A90-0D8B0CC36AAB}. This means use the text parser Ifilter tquery.dll for this extension.

image

And a new key for each file extension you want indexed in this case cs,ps1 and aspx but you can add vb vbs or whatever other text files you need indexed.

Stop and start the search service with these commands

net stop osearch

net start osearch

Now do a full crawl of your content type and your files should have been full text indexed. The crawl log is useful in seeing if the filtering barfed on your files.

 

Now you can go to the Search Center enter your keyword and get a list of code files back.

image

Here I've set up a custom scope, search page, and added a custom search tab so separate the code results on its own. I won't go into it here but there is a good post here that shows how you do this.

Even better with SharePoint Search if you know you want PowerShell files only you can enter the fileextension keyword and search will filter out everything but PowerShell files.

image

 

Searching your entire code repository with subsecond query times is now pretty easy.