We use Apache Solr to handle search queries. If you haven't used Solr before, the principle is that Solr maintains a index of all of our searchable data. We can then retrieve a list of records matching a search query by requesting a url from the Solr server. In its most basic form, the url looks something like

http://our.solr.server/solr/our-index/select?q=Find%20this

Solr then returns a list of all the records in the index that match the query. Pretty straight-forward, no?

This is just the start though. Solr offers a whole bunch of different options for fine-tuning a query, filtering result sets and ranking results that can all be passed as part of the URL. Just to get an idea, here are the parameters that can be passed when using the eDisMax parser. Many of these we vary dependent on the context of the search. As a simple example, our index contains results relevant to multiple websites so we often need to filter out only the results for the site the search was performed on. A quick glance shows that in our system we have over 30 different factors that can affect the component's url.

We have a class that handles sending these requests to the server. This class contained a build_url method which in turn contained a small army of conditional statements that checked against each possible critera, appending the necessary components to the URL. The code looked something like this

class SolrSearch
  attr_accessor :query, :options

  def initialize query, options={}
    @query = query
	@options = options
  end

  def build_url
    url = "http://our.solr.server/solr/our-index/select?q=#{query}"
    url += "&start=#{options[:start]}" if options[:start]
    # ... repeated for each option
    return url
  end
end

Clearly this is no good and it would be pretty irresponsible for us to keep compounding the problem by adding more and more options to this list. Fortunately there is a quick and simple way we can refactor this method into something simpler, easier to understand and easier to maintain.

The basic premise is that we will extract this method into a new class whose sole job is to build this url. There are four steps to the process.

1) Check the existing tests

If a class has got this complicated there is a good chance that an un-tested line has snuck in here or there. It will pay dividends later to have a quick check that the tests are sufficient to cover all the requirements. The better the tests are now, the less chance there is that we break something in our refactoring. It can be handy to use something like SimpleCov to check which parts of the current code aren't being tested. In this case I discovered that a handful of the conditions weren't being tested and so added in the missing tests.

2) Create the new class and move the existing method into the class

We now need to create our new class based on the existing method. This class will accept an instance of our original class as an initialization parameter and contain one other method which is the method we wish to extract.

class SolrSearch
  class UrlBuilder
    def initialize(search)
      @search = search
    end

    def build_url
      # ... we copy our existing build_url method and paste it here
    end

  private
    attr_reader :search

  end
end

We can then update our original class to use our new UrlBuilder

# in our original SolrSearch class
def build_url
  UrlBuilder.new(self).build_url
end

You may have noticed that I chose to create the new class 'inside' the existing SolrSearch class. Whether or not this is appropriate is a design decision you need to make on a case-by-case basis. If in doubt, keeping the new class namespaced inside the old one can be a sensible choice as it avoids un-intended interactions with other parts of the system and makes it clear that this only exists for the benefit of the SolrSearch class. If you wish to use it elsewhere later on, it can always be extracted at that point. Don't forget that you will need to require the file containing your new class from the original SolrSearch class.

If we now run our tests, we will see that we have a problem. The build_url method uses a couple of other methods defined on the search class that don't exist in the scope of our new class, namely query and options. We could fix this by adding some parameters to our new build_url method and passing in these values however I prefer to use delegation to instead call the methods on the existing search instance. This effectively defines query and options methods on our new class that pass the messages on to our search instance. In order to do this we use Ruby's forwardable module.

# SolrSearch::UrlBuilder

require 'forwardable'

class SolrSearch
  class UrlBuilder
    extend Forwardable

	def_delegators :search, :query, :options

	def initialize(search)
	  @search = search
	end

	def build_url
	  url = "http://our.solr.server/solr/our-index/select?q=#{query}"
      url += "&start=#{options[:start]}" if options[:start]
	  # etc etc ...
      return url
    end

  private
    attr_accessor :search

  end
end

We should now be able to run our tests and have them pass! Before we get too excited though, we haven't really improved anything yet. You could even argue that we have made things more complicated. The important thing we have done is to create a place we can use to break up the functionality of our build_url method.

At this point I would add a commit. This means that if we have problems later in the refactoring, we can easily see a diff of what we have changed since we moved the working method. This will also greatly benefit those reviewing this code as it becomes much easier to follow.

3) Refactor the method into smaller simpler components

Now that have our new class working we can break up the build_url method into its different components. Typically these means extracting the steps into different methods.

# SolrSearch::UrlBuilder

require 'forwardable'

class SolrSearch
  class UrlBuilder
	extend Forwardable

	def_delegators :search, :query, :options

	def initialize(search)
	  @search = search
    end

    def build_url
      append_query
      apply_start_option
      # ... etc etc
      return url
    end

  private
    attr_reader :search
    attr_writer :url

    def url
      @url ||= "http://our.solr.server/solr/our-index/select?"
    end

    def append_query
      url += "q=#{query}"
    end

    def apply_start_option
      url += "&start=#{options[:start]}" if options[:start]
    end
  end
end

Abstracting each condition like this means that by looking at our build_url method we can see exactly what components are being used to make our URL without having to understand the details of the component and vice versa. The benefit of this becomes even clearer when dealing more complex components that are dependent on multiple options. It is also quite likely that other repeated code can be identified and abstracted at this point.

It is worth noting that I have added all of these new methods as private methods. This is because these methods are only for use internally by the UrlBuilder. Since the sole purpose of this class is to build the URL, the only public method necessary should be the method to return the URL, in this case build_url.

Throughout this process, we can keep running our tests and they should always pass.

4) Move the existing tests

We are still only implicitly testing this class through the tests for our original SolrSearch class so we now need to add tests for our new class. Exactly what needs to be done here depends on how your existing tests are written and the framework you are using. From part 1, we know that we have all the tests that we need so this should just be a case of extracting them into a new test suite and adjusting them so that the subject of the tests is the method on our new class rather than the original class.

It may be tempting to add additional tests for the new methods we have added to our SolrSearch::UrlBuilder class but I would advise against this. We deliberately added these as private methods. Therefore in order to test them directly we either need to make them public or we need to circumvent the private scope using Ruby's send method. The real point here though is that we don't really care about what these methods do or how they do it so long as our build_url method carries on working correctly. If we add tests for each of these private methods and then we choose to refactor the internals of this class at a later date, we need to go through and update all of these tests, even though as far as the rest of our application is concerned, the behaviour of our class has not changed as the behaviour of its only public method has not changed.

We do need to add a new test to our original class to check that when we call the original build_url method, our new class is instantiated and its build_url method called. There are a few ways of doing this. The way that I prefer is to add an option to the initializer to inject the type of builder class we want to use. That we we can easily swap out our builder class for a test double that we can set expectations on. In addition, if we ever need to support another type of builder we can easily inject it when we instantiate our SolrSearch class.

To do this, we first need to update the initializer of our SolrSearch class to store the urlbuilder option in an instance variable. We also want to add a private reader to reference it by and update our `buildurl` method to use this.

#in our original SolrSearch class

class SolrSearch
  attr_accessor :query, :options

  def initialize query, options={}
    @query = query
    @url_builder = options.fetch(:url_builder) { UrlBuilder }
	@options = options
  end

  def build_url
  	url_builder.build_url
  end

  # ... existing code

private
  attr_reader :url_builder

  # ... existing code

end

Then in our tests we can do the following

should 'build the url' do
  builder_stub = stub
  subject = SolrSearch.new('some-query', url_builder: builder_stub)
  builder_stub.expects(:build_url)

  subject.build_url
end

One important thing that this doesn't test is that we have correctly defaulted our url_builder to the UrlBuilder class. However this is the kind of thing we can test for in a higher level integration test since any test that involves performing a search will require this to be wired up correctly.

5) A final once over to polish

We should now have a working UrlBuilder class with its own complete set of tests. There may be one or two final bits and bobs to take care of in order to finish things off.

Firstly, we handled the problem of the missing query and options methods by delegating them back to our original class. In this case, this is fine as these things are still the responsibility of the SolrSearch class. Sometimes we find that this is not the case however. Our particular example method was doing pretty much the same job over and over again. Another place we may want to do this kind of refactoring is where we have lots of different jobs that have all been lumped together in a single large method. In this case, it is quite possible that our method, and hence our new class, has its fingers in lots of different pies and so our new class becomes more tightly coupled to the existing class than it needs to be. Perhaps we even need to break up our new class into several smaller classes. Another possibility is that methods we are delegating back to our original class are actually only used by our new class and so it would be appropriate to move those methods to our new class. Its worth having a look and double-checking that these messages are still being handled or delegated appropriately based on the responsibilities of those classes.

Another common thing that we may want to change is the name of our new class or method. The names we chose were dervied from the name of the original method we were extracting. So in order to use our new class, we use

SolrSearch::UrlBuilder.new(search).build_url

I decided that it would be a little nicer to instead write

SolrSearch::UrlBuilder.new(search).to_s

Since the only place we are using this class at the moment is in our original class (plus the tests for the original and new classes), it's pretty easy to change this.

It is important to note that this kind of refactoring is not necessarily a complete solution as it doesn't address any problems with the original design. Effectively it is just a way of breaking complicated functionality down into smaller components and so sometimes can just be a small step in a larger refactoring. Their main benefits are that they relatively quick and easy to do, particularly if you have a good set of tests to begin with, and can help bring clarity to an existing design making it easier to address any other issues that remain.