Adventure in SPWonderland

Take apart and put back together

NAVIGATION - SEARCH

Case Sensitive MindF**k Part 1 SharePoint not indexing TWiki

So we’re setting up a crawl of a TWiki site as one source in a suite of content sources.

So far so good, once the authentication was sorted we noticed a problem, only the root Url of the site was getting crawled.

Various ideas were thrown around about nofollow and noindex attributes but we couldn’t find anything wrong with our configuration and nothing seemed to fit the problem.

I noticed that this particular TWiki installation was case sensitive to Urls by accident (thought those days were gone, configurable apparently) and that got me thinking.

By kicking a crawl off i noticed that SharePoint was requesting lower case urls from the Site for every link on the home page getting a 404 and stopping.

Ok penny drops but why is SharePoint sending a lower case url, well… this is by design as part of the crawler’s normalization of urls (http://blogs.msdn.com/b/enterprisesearch/archive/2010/07/09/crawling-case-sensitive-repositories-using-sharepoint-server-2010.aspx)

In 2010 if you’re setting up a crawl rule that checkbox you’ve ignored called Match Case (badly named surely Preserve Url Casing would get the point across better) just needs to be set and viola the crawler will preserve the case of Urls it requests.

Some tests are really hard to ignore

Picture this, you're a IT bod working for Thomson Financial, you've got I don't know how many sites subscribing to your global financial news feed, you need to check that your latest update is working, so you key a test message in the system, hmm that's funny nothing coming out on the test feed reader.

Try a second one, and a third and you just keep on going...

Now you did check its the test system you were logged in to?

 

Test Please Ignore

 

Its even harder to ignore on some other sites...looks like they might have lost ignore 1 and ignore 2 though, maybe they meant these to go around the world :-)

 

ThisIsATest2 - Copy