Internationalization and metadata on directoryΒΆ

Here, we will build a web site with a more complex structure and content in two languages: English and French. We will start from scratch with a new empty directory:

$ mkdir site2
$ cd site2
$ export SITE_ROOT=`pwd`

Again, the SITE_ROOT environment variable is not used by Soho but will be useful to keep track of directory changes below.

The web site will have two sections, one for each language:

$ mkdir src
$ cd src
$ mkdir en
$ mkdir fr

Once we have that, we may proceed and add content with two source files in each section. First, src/en/index.html:

This is the home page.

Then src/en/contact.html:

You may indeed contact me.

And a home page in French in src/fr/index.html:

Ceci est une page d'accueil.

Finally, the contact page in src/fr/contact.html:

Vous pouvez me contacter.

Until now, this is very similar to what we have done in the previous section of the tutorial. We have translated content in source files. We will now add some automatically translated content in the template (templates/layout.pt):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
<!DOCTYPE html>
<html>
<head>
  <title tal:content="md['title']"/>
  <link rel="stylesheet" type="text/css" href="/css/style.css"/>
</head>
<body i18n:domain="tutorial">
<div class="content" tal:content="structure body"/>
<div class="footer">
  <a href="contact.html" i18n:translate="">Contact me!</a>
</div>
</body>
</html>

This is the same template as in the previous section except for the highlighted lines. As you can see, we set the i18n domain at line 7. We also define a link whose text will be automatically translated at line 10.

The template uses a CSS file, so you may want to copy the CSS file of the previous section to a new assets/css directory.

Now we must set the translation somewhere. The default text (“Contact me”) is in English, so we only need to define the translation in French. This must be done in a file that follows the GNU gettext format. We could write this file from scratch (the format is simple enough), but we will use Python tools to help us instead:

  • Babel, which provides tools that will help us generate the translation files;
  • Lingua, which provides message extractors. In other words, Lingua detects which messages have to be translated.

Warning

As of this writing, Babel and Lingua support of Python 3 is unknown. See the note below for an alternative.

$ easy_install Babel lingua

(As usual, you may use pip install instead of easy_install.)

For Babel to work, we can use a dummy setup.py file in the root directory of our web site (at the same level as src). Such files are usually written to describe and create Python packages. Here, we will only use the portion that is needed by Babel.

from setuptools import setup


# Of course, this is not a real Python package. We use 'setup()' only
# to register message extractors.
setup(packages=(),
      message_extractors={'.': (
            ('src/**meta.py', 'lingua_python', None),
            ('templates/**.pt', 'lingua_xml', None),
            )}
      )

These few lines instruct Babel to use Lingua to extract messages from templates (*.pt files in templates) and metadata files (*.meta.py). As you can see, this file is very simple and can be reused for any Soho web site.

We also need to tell Babel where we want the translation files to be stored. We can do so in a setup.cfg file.

[init_catalog]
domain = tutorial
input_file = locale/tutorial.pot
output_dir = locale

[extract_messages]
output_file = locale/tutorial.pot
width = 80

[update_catalog]
domain = tutorial
input_file = locale/tutorial.pot
output_dir = locale
previous = true

[compile_catalog]
domain = tutorial
directory = locale
statistics = true

This file tells Babel about the i18n domain that we want to handle (tutorial), the path where files will be created (a locale directory, which is the standard name and the default in Soho) and a few other settings that you may read about in Babel documentation. We will create this directory and get ready to generate our translation files:

$ cd $SITE_ROOT
$ mkdir locale
$ ls
assets    locale    setup.cfg    setup.py    sohoconf.py    src    templates    www

First, we extract messages from our template with the extract_messages command:

$ python setup.py extract_messages
running extract_messages
extracting messages from templates/layout.pt
writing PO template file to locale/tutorial.pot
$ ls locale
tutorial.pot

This tutorial.pot file is a template. It contains a preamble and a list of messages to be translated:

$ tail -n 4 locale/tutorial.pot
#: templates/layout.pt:10
msgid "Contact me!"
msgstr ""

As indicated above, our message is already in English, so we need only a French translation that we will create with the init_catalog command:

$ python setup.py init_catalog -l fr
running init_catalog
creating catalog 'locale/fr/LC_MESSAGES/tutorial.po' based on 'locale/tutorial.pot'

A “catalog” has been created in the indicated path and it will look very much like the tutorial.pot template above. The preamble will be slightly different, but the last lines should be the same:

$ tail -n 4 locale/fr/LC_MESSAGES/tutorial.po
#: templates/layout.pt:10
msgid "Contact me!"
msgstr ""

You may edit this new tutorial.po file and add a translation (see the highlighted line below):

# French translations for UNKNOWN.
# Copyright (C) 2012 ORGANIZATION
# This file is distributed under the same license as the UNKNOWN project.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2012.
#
msgid ""
msgstr ""
"Project-Id-Version: UNKNOWN 0.0.0\n"
"Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
"POT-Creation-Date: 2012-06-17 15:05+0200\n"
"PO-Revision-Date: 2012-06-17 15:09+0200\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: fr <LL@li.org>\n"
"Plural-Forms: nplurals=2; plural=(n > 1)\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 0.9.6\n"

#: templates/layout.pt:10
msgid "Contact me!"
msgstr "Contactez-moi !"

Once the tutorial.po file is ready, you have to compile it (into a tutorial.mo file) with the compile_catalog command:

$ python setup.py compile_catalog
running compile_catalog
1 of 1 messages (100%) translated in 'locale/fr/LC_MESSAGES/tutorial.po'
compiling catalog 'locale/fr/LC_MESSAGES/tutorial.po' to 'locale/fr/LC_MESSAGES/tutorial.mo'

If you change your template and add new messages or change existing messages, you will need to perform a similar set of commands:

$ # change template and add or modify messages to translate
$ python setup.py extract_messages
$ python setup.py update_catalog
$ # edit your updated '.po' file
$ python setup.py compile_catalog

Note that you need to run the init_catalog command only once. Afterwards, you will have to run the update_catalog command.

Note

As indicated above, you do not have to use Babel and Lingua to generate the translation files. You may very well create the .po by hand and generate the compiled .mo file with msgfmt. However, having tools (be it Babel and Lingua or others) extract messages from your templates and metadata files is very valuable and can save a lot of time.

This is all good, we have translated our message. But we still need to indicate to Soho that the en section has content in English and the fr section has content in French, otherwise the template will not know which language to translate the “Contact me!” message to. The language should be set in a metadata file, under the locale binding. Since all files of the en directory are indeed in English, we can set the metadata on the directory itself (as a file named en/.meta.py) and it will be inherited by all its files:

locale = 'en'

As well, we create a similar file in fr/.meta.py:

locale = 'fr'

The template still expects a title key in the metadata, so we need to indicate this for each source file. For example, in en/index.html.meta.py:

title = 'My home page'

Let’s check that we have all expected files:

$ ls -R
assets    locale    setup.cfg    setup.py    sohoconf.py    src    templates    www

./assets:
css

./assets/css:
style.css

./locale:
fr    tutorial.pot

./locale/fr:
LC_MESSAGES

./locale/fr/LC_MESSAGES:
tutorial.mo    tutorial.po

./src:
en    fr

./src/en:
contact.html    contact.html.meta.py    index.html    index.html.meta.py

./src/fr:
contact.html    contact.html.meta.py    index.html    index.html.meta.py

./templates:
layout.pt

All right, we can generate our web site:

$ soho-build
<date> - INFO - Copying assets...
<date> - INFO - Copying "/path/to/site2/assets/css/style.css" to "/path/to/site2/www/css/style.css"
<date> - INFO - Building HTML files...
<date> - INFO - Processing "/path/to/site2/src/en/contact.html" (writing in "/path/to/site2/www/en/contact.html").
<date> - INFO - Processing "/path/to/site2/src/en/index.html" (writing in "/path/to/site2/www/en/index.html").
<date> - INFO - Processing "/path/to/site2/src/fr/contact.html" (writing in "/path/to/site2/www/fr/contact.html").
<date> - INFO - Processing "/path/to/site2/src/fr/index.html" (writing in "/path/to/site2/www/fr/index.html").
<date> - INFO - Generating Sitemap...
<date> - INFO - Done.

We can see that the “Contact me” message appears in the proper language in each section of the site:

$ head -n 12 www/en/index.html | tail -n 3
<div class="footer">
 <a href="contact.html">Contact me!</a>
</div>
$ head -n 12 www/fr/index.html | tail -n 3
<div class="footer">
  <a href="contact.html">Contactez-moi !</a>
</div>

Note

If you open the HTML pages in your browser (as in file:///path/to/site2/www/en/index.html), you may find that the CSS is not loaded. This is because we indicated an absolute path (/css/style.css) instead of a relative path to cope with the multi-level structure of the site. You would need an HTTP server to have the CSS correctly loaded. Fortunately, Python comes with a very handy simple HTTP server that you can run with the following command:

$ cd $SITE_ROOT
$ cd www
$ python -m SimpleHTTPServer 8000

You can then see your site at http://localhost:8000/en/index.html.