Submissions/CirrusSearch: How we've replaced a great search engine with an awesome search engine

This is an accepted submission for Wikimania 2014.

Submission no. 5010
Title of the submission
CirrusSearch: How we've replaced a great search engine with an awesome search engine
Type of submission
presentation + Q&A with audience
Author of the submission
Chad, Nik Everett, Dan Garry
E-mail address
chad@wikimedia.org, neverett@wikimedia.org, dgarry@wikimedia.org
Username
^demon, manybubbles, DGarry (WMF)
Country of origin
United States
Affiliation, if any (organisation, company etc.)
Wikimedia Foundation
Personal homepage or blog
https://mediawiki.org/wiki/Search
Abstract
Wikimedia sites have historically used a home-grown search engine written for MediaWiki by volunteers, based on Apache Lucene. While it served us well for many years, "Lucene-search" became a source of significant technical debt and operational nightmares over time. It was getting more and more difficult to maintain, and in 2013 we started to look for a replacement.
After evaluating alternatives like Solr, we started a larger discussion with the MediaWiki developer community that led to choosing Elasticsearch, a search engine written in Java also based on Lucene. We set up the operational infrastructure and developed CirrusSearch, an extension to integrate Elasticsearch with MediaWiki sites. After extensive testing in collaboration with the Wikimedia community, we started to gradually roll it out to wikis who volunteered to help us test it further, first as a secondary search tool, then as the primary. As of March 2014, most issues have been resolved, and most Wikimedia sites are now using CirrusSearch as their secondary search engine as a Beta Feature.
In addition to being easier for us to maintain, CirrusSearch also brings new features over our legacy search tool, like better support for searching in different languages, faster updates to the search index (meaning changes to articles are reflected in search results much faster), and template expansion (meaning all content in an article that's in a template is now reflected in search results.
By Wikimania 2014, we should have several more exciting projects underway, including geographic data for mobile, work on revamping the search interface, and providing search data to Wikimedia Labs users for custom querying.
In this talk, we'll present Elasticsearch, the new infrastructure that we set up, and the cool features now available when searching Wikimedia sites using CirrusSearch. Finally, we'll take this opportunity to discuss it with Wikimedians, and open up the session for a Q&A with the audience, as we're very interested to hear what end users want to see from search in the coming year.
Track
Technology, Interface & Infrastructure
Length of session
40 minutes (30 minute talk + 10 minute Q&A)
Will you attend Wikimania if your submission is not accepted?
Yes
Slides
File:Wikimania_2014_-_CirrusSearch.pdf
Special requests
None that we know of.


Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).

  1. NaBUru38 00:03, 26 March 2014 (UTC)[reply]
  2. --Mglaser (talk) 11:29, 30 March 2014 (UTC)[reply]
  3. Greg G (talk) 17:56, 31 March 2014 (UTC)[reply]
  4. JackHerrick (talk) 21:32, 31 March 2014 (UTC)[reply]
  5. Markus Krötzsch (talk) 06:44, 1 April 2014 (UTC)[reply]
  6. Quiddity (talk) 19:08, 12 April 2014 (UTC)[reply]
  7. the wub "?!" 23:23, 13 April 2014 (UTC)[reply]
  8. Micru (talk) 13:46, 15 April 2014 (UTC)[reply]
  9. JSahleen (WMF) (talk) 13:23, 5 August 2014 (UTC)[reply]
  10. Add your username here.