Testing page caching with SpiderTest
The website I’m currently working on is similar to an online brochure. The data on the site changes hourly, but every user sees the same thing. As a result, we decided to use page caching to dramatically speed up the site. Once a page is visited, the html is written out to disk and all subsequent requests are served by apache. The setup of this approach is detailed elsewhere (for example, Rails Envy: Ruby on Rails Caching Tutorial).
Setting up caching was easy, but we wanted to ensure that we did not make any mistakes. All pages should be cached, since any miss will result in a much higher load on our rails application. I’ve written previously about our internationalization test (Improved internationalization test) which spiders the site (using SpiderTest) looking for non localized text. Since we were already visiting every page, it seemed like a good place to add a check for page caching. Spidering the site again would make our test suite too long.
The consume page method is called for every page that is visited by the spider. We expanded the implementation by adding a call to assert_page_is_cached:
def consume_page(html, url)
html.gsub!("http://www.example.com", "")
unless redirect?(html) || asset?(url)
assert_page_has_been_moved_to_language_file(html, url)
assert_page_is_cached(url)
super
end
def assert_page_is_cached(url)
path = ActionController::Routing.normalize_paths([ActionController::Base.page_cache_directory + url])[0]
page = path.ends_with?(".html") ? path : "#{path}.html"
assert_true File.exists?(page), "Page NOT cached: #{url} (looking in #{page})"
end
We also had to add new lines to our setup to turn on caching (since it is normally off in test mode):
def setup
FileUtils.rm_rf ActionController::Base.page_cache_directory
ActionController::Base.perform_caching = true
end
Since we run this test as its own suite, the test is totally isolated from other tests. There is no need to implement a teardown.
The full test, including the internationalization testing from before looks like:
require 'hpricot'
class InternationalizationText < ActionController::IntegrationTest
include Caboose::SpiderIntegrator
def setup
FileUtils.rm_rf ActionController::Base.page_cache_directory
ActionController::Base.perform_caching = true
blank_out_localization
blank_out_html_escape
end
def blank_out_localization
GLoc::InstanceMethods.class_eval do
alias :old_l :l
def l(symbol, *arguments)
""
end
end
end
def blank_out_html_escape
ERB::Util.class_eval do
alias :old_html_escape :html_escape
def html_escape(s)
""
end
alias :h :html_escape
end
end
def test_all_text_has_been_moved_to_language_file
get '/'
assert_response :success
spider(@response.body, '/', :verbose => true)
end
def consume_page(html, url)
html.gsub!("http://www.example.com", "")
unless redirect?(html) || asset?(url)
assert_page_has_been_moved_to_language_file(html, url)
assert_page_is_cached(url)
super
end
def redirect?(html)
html.include?("<body>You are being")
end
def asset?(url)
File.exist?(File.expand_path("#{RAILS_ROOT}/public/#{url}"))
end
def assert_page_has_been_moved_to_language_file(page_text, url)
doc = Hpricot.parse(page_text)
assert_does_not_contain_words doc.at("title").inner_text, url
body = doc.at('body')
(body.search("//script[@type='text/javascript']")).remove
assert_does_not_contain_words(body.inner_text, url)
assert_attribute_does_not_contain_words body, url, 'title'
assert_attribute_does_not_contain_words body, url, 'alt'
end
def assert_attribute_does_not_contain_words body, url, attribute
body.search("//*[@#{attribute}]") do |element|
assert_does_not_contain_words element.get_attribute(attribute), url
end
end
def assert_does_not_contain_words text, url
match = text.match(/[A-Za-z]([A-Za-z]| )*/)
fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
end
def assert_page_is_cached(url)
path = ActionController::Routing.normalize_paths([ActionController::Base.page_cache_directory + url])[0]
page = path.ends_with?(".html") ? path : "#{path}.html"
assert_true File.exists?(page), "Page NOT cached: #{url} (looking in #{page})"
end
end