Category: Search

  • Combining Search Scores: Winning and Failing

    Trey Jones at Wikimedia Foundation published some very interesting notes up about how to think about combining scores for search ranking (particularly Elasticsearch). I like this insight a lot: addition is looking for ways to win, multiplication is looking for ways to fail This is pretty interesting to me when thinking about how I chose to implement the…

  • Scaling Elasticsearch Part 3: Queries

    See part 1 and part 2 for an overview of our system and how we scale our indexing. Originally I was planning a separate post for global queries and related posts queries, but it was hard to break into two posts and contributed to me taking forever to write them. Two types of queries run…

  • Scaling Elasticsearch Part 2: Indexing

    In part 1 I gave an overview of our cluster configuration. In this part we’ll dig into: How our data is partitioned into indices to scale over time Optimizing bulk indexing Scaling real time indexing How we manage indexing failures and downtime. The details of our document mappings are mostly irrelevant for our indexing scaling…

  • Scaling Elasticsearch Part 1: Overview

    We recently launched Related Posts across WordPress.com, so its time to pop the hood and take a look at what ended up in our engine. There’s a lot of good information spread across the web on how to use Elasticsearch, but I haven’t seen too many detailed discussions of what it looks like to scale…

  • Managing Elasticsearch Cluster Restart Time

    While building a fairly large index (8TB total for 500 million docs), I ran into some very long restart times for the cluster. That prompted me to start a discussion about long restart times. There’s some good discussion in that thread, and I wanted to write a post to summarize what we are doing to…

  • Introducing Related Posts

    Do you ever wonder what happens when your readers reach the end of your posts? What do they click on? Where do they go next? What if you’ve piqued a reader’s interest and left them wanting more, but don’t give them the option to do so? Today, we’re so happy to announce Related Posts on…

  • Three Principles for Multilingal Indexing in Elasticsearch

    Recently I’ve been working on how to build Elasticsearch indices for WordPress blogs in a way that will work across multiple languages. Elasticsearch has a lot of built in support for different languages, but there are a number of configuration options to wade through and there are a few plugins that improve on the built…

  • Mapping WordPress Posts to Elasticsearch

    I thought I’d share the Elasticsearch type mapping I am using for WordPress posts. We’ve refined it over a number of iterations and it combines dynamic templates and multi_field mappings along with a number of more standard mappings. So this is probably a good general example of how to index real data from a traditional…

  • Building Word Clouds with Faceted Search

    Elasticsearch’s faceted results are a great way to analyze the contents of a set of documents. For over a year now, Polldaddy has used Elasticsearch to create reports for the most popular answers and words given to free text survey responses. For more details take a look at the feature announcement. However, running faceted search on such…

  • Elasticsearch: Five Things I was Doing Wrong

    Update: Also check out my series on scaling Elasticsearch. I’ve been working with Elasticsearch off and on for over a year, but recently I attended Elasticsearch.com’s training class (well worth the time and money) and discovered a few significant things that I was doing just plain wrong. Before using Elasticsearch I used Lucene directly, and…