Testing internationalization language files
Update (8/29/08): I wrote about a bunch of improvements to the test described below in Improved internationalization test
My current project wants to localize our site for different languages. One way to do this is to use a plugin like GLoc. We can externalize all of our labels and messages into language files (en.yml, fr.yml, etc), and then switch out the language based upon the user’s preference.
We started a site-wide effort to pull out all of the content into en.yml. However, we had two big questions:
- How would we know that we did not miss any content?
- How would we ensure going forward that someone would not mistakenly put text in a view (or helper).
We came up with the following solution: We create a new language file called blank.yml which has a value of blank for every key. When we switch to this language, all of the labels are blank. Therefore, we can crawl our site and look for any text that is not blank.
The language files are yaml, and look like:
help: Help
login: Login
username: "Please enter your username:"
password: "Please enter your password:"
Our rake task will take this file and generate a new file with the same keys and blank values:
task :'translate:blank' do
en = YAML.load_file("#{RAILS_ROOT}/lang/en.yml")
File.open("#{RAILS_ROOT}/lang/blank.yml", "w") do |blank|
en.each { |key, value| blank.puts("#{key}: ") }
end
end
Now, we decided to use SpiderTest to crawl our site. SpiderTest will parse the html and follow every link on every page. SpiderTest runs as an integration test, so a simple test will look like:
class InternationalizationTest < ActionController::IntegrationTest
include Caboose::SpiderIntegrator
def test_all_text_has_been_moved_to_language_file
get '/'
assert_response :success
spider(@response.body, '/')
end
end
Since we want to do more than just test the validity of each link, we need a callback for each page. SpiderTest does not really provide one, but we looked at the source and noticed that it calls a consume_page method for every page. Since we are including SpiderIntegrator as a module in our test class, we can override the method, do what we want, and then call super:
def consume_page(html, url)
assert_page_has_been_moved_to_language_file(html, url)
super
end
The assert_page_has_been_moved_to_language_file method uses Hpricot to parse the html and check for text. Many of our pages are dynamic and contain text that cannot be localized. For example, we do not want to localize the address of a building. We decided to add a CSS class that represents text which cannot be localized. For example:
<span class="nonlocalizable"><%= @building.address %></span>
And our assert_page_has_been_moved_to_language_file method looks like:
def assert_page_has_been_moved_to_language_file(page_text, url)
doc = Hpricot.parse(page_text)
assert_does_not_contain_words doc.at("title").inner_text, url
body = doc.at('body')
(body.search(".nonlocalizable")).remove
(body.search("//script[@type='text/javascript']")).remove
assert_does_not_contain_words(body.inner_text, url)
end
def assert_does_not_contain_words text, url
match = text.match(/\w+/)
fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
end
We test both the title and the body for text on the page using the \w+ regular expression. We have to strip out the script nodes because the inner_text method will show javascript which is not shown to the user.
Here is the final test, including a setup which switches the language to blank and a teardown that puts it back:
require 'hpricot'
class InternationalizationText < ActionController::IntegrationTest
include Caboose::SpiderIntegrator
def setup
GLoc.set_language :blank
end
def teardown
GLoc.set_language :en
end
def test_all_text_has_been_moved_to_language_file
get '/'
assert_response :success
spider(@response.body, '/', :verbose => true)
end
def consume_page(html, url)
assert_page_has_been_moved_to_language_file(html, url)
super
end
def assert_page_has_been_moved_to_language_file(page_text, url)
doc = Hpricot.parse(page_text)
assert_does_not_contain_words doc.at("title").inner_text, url
body = doc.at('body')
(body.search(".nonlocalizable")).remove
(body.search("//script[@type='text/javascript']")).remove
assert_does_not_contain_words(body.inner_text, url)
end
def assert_does_not_contain_words text, url
match = text.match(/\w+/)
fail "Found text that was not in the language file: #{match[0].inspect} on #{url}" if match
end
end
A nice side effect of this test is that it will also check for broken links, since SpiderTest will raise an exception if it cannot follow a link. If you do not want SpiderTest to crawl certain pages, you can ignore them by passing the :ignore_urls option to the spider method:
spider(@response.body, '/', :verbose => true, :ignore_urls => [%r{/busted/.*}])