Jump to content


Click here to lend your support to: Traq and make a donation at pledgie.com !
Photo

New locale system


  • Please log in to reply
13 replies to this topic

#1 Jack

Jack

    Project Founder

  • Administrators
  • 673 posts
  • LocationAustralia

Posted 11 February 2012 - 12:40 PM

The localisation system in Traq has always been simple, It's probably time to expand it and have it be able to properly handle all languages.

I'm currently looking at something like this:
Translation = Matched "x_something" to "n something"

Check if "$lang" contains key "_regex:n something"
   If so, regex it with the "$lang['_regex:n something']['regex']" value
   Translation = Build the new translation with "$lang['_regex:n something']['string']" and matches from the regular expression match.

Return Translation

But there may be a better way. Feel free to post some suggestions.

#2 wr2

wr2

    Newbie

  • Members
  • Pip
  • 4 posts

Posted 18 February 2012 - 12:26 PM

Hello!

Maybe it will be even more convenient to separate 'string values' and 'numerical values'?

For numerical values it should be possible to define several localized text alternatives. Their index depends on value itself and could be calculated by custom function or something like it.

Let me show you a typical example for English and Russian languages (to avoid cyrillic letters here I'll replace real words):

ENGLISH:


$lang['x_days'][0] = '{1} days';

function get_locale_numeric_index ($val)
{
return 0;
}


RUSSIAN:


$lang['x_days'][0] = '{1} dney';
$lang['x_days'][1] = '{1} den';
$lang['x_days'][2] = '{1} dnya';

function get_locale_numeric_index ($val)
{
$retval = 0;
$tmp = $val % 100;
if (($tmp < 10) || ($tmp > 20)) {
$tmp = $val % 10;
if ($tmp == 1) {
$retval = 1;
}
else if (($tmp >= 2) && ($tmp <= 4)) {
$retval = 2;
}
}
return $retval;
}


Planned usage in this case looks like:


do_something ($lang['x_days'][get_locale_numeric_index ($value)]);


Same function should be used for other numeric values.

Also, don't forget about date/time representation. As I can see at http://php.net/manua...nction.date.php :

To format dates in other languages, you should use the setlocale() and strftime() functions instead of date().



#3 Jack

Jack

    Project Founder

  • Administrators
  • 673 posts
  • LocationAustralia

Posted 18 February 2012 - 01:41 PM

I just give this a few minutes of thought, and came up with a way to allow any language to work simply through the l() function, but be powerful beyond anything.


/**
* Traq Locale System
* Copyright © 2009-2012 Traq.io
* Released under the GNU GPL v3 license.
*/

// The classic L() function.
function l()
{
call_user_func_array(array('Locale_' . settings('locale'), 'translate'), func_get_arts());
}

// The english locale class.
class Locale_en extends Locale {
public static function translate()
{
$string = func_get_arg(0);
$vars = array_slice(func_get_args(), 1);

// do whatever is needed to translate
// the string for this language.
}

public static function date($format, $timestamp)
{
// because this is the English locale, just use date()
return date($format, $timestamp);
}
}


#4 wr2

wr2

    Newbie

  • Members
  • Pip
  • 4 posts

Posted 18 February 2012 - 02:49 PM

Very powerful way. Nevertheless, I think it's good idea to also provide some additional information in string itself (to simplify translate function).

For example, in MediaWiki they do it like this:

'category-article-count-limited' => 'The following {{PLURAL:$1|page is|$1 pages are}} in the current category.',


It's useful even for English language. For Russian language, for example, I'll provide 3 alternatives (not 2) and code for translate function you mentioned will be slightly differrent, of course. But I'll still be able to take your function (for English) as base for my one (for Russian), so basic string parsing code will be the same.


#5 arturo182

arturo182

    Advanced Member

  • Contributor
  • PipPipPip
  • 151 posts

Posted 18 February 2012 - 09:09 PM

In Polish there are even more possibilities, for example, "x [tickets] opened" is:
0 otwartych
1 otwarty
2-4 otwarte
5-21 otwartych
22-24 otwarte
25-31 otwartych
32-24 otwarte
and so on...

#6 wr2

wr2

    Newbie

  • Members
  • Pip
  • 4 posts

Posted 19 February 2012 - 12:26 AM

In Polish there are even more possibilities, for example, "x [tickets] opened" is:
0 otwartych
1 otwarty
2-4 otwarte
5-21 otwartych
22-24 otwarte
25-31 otwartych
32-24 otwarte
and so on...


In Russian situation is nearly the same (only exception is that 21,31,41,... use the same word as 1). So there are just 3 possible words and I meant that all of them should be written in string itself (more or less similar to MediaWiki example). Some function must parse already localized string, find numeric variable parts in it, choose and return appropriate variant depending on numeric variable value.
If numeric variables are marked (in some way) in original strings and there is typical syntax for several variants, it's easy enough to translate them (even for people without any programming skills).
Also, it's not too hard to write unified function to choose appropriate variant for particular language. It should be done only once and for all.

#7 Jack

Jack

    Project Founder

  • Administrators
  • 673 posts
  • LocationAustralia

Posted 19 February 2012 - 12:35 AM

Okay, I've started the new locale system, it works pretty much like the example I posted above, but with we2's suggestion, kind of.

Let's say we want to allow "there are 4 posts" to be localised, but with the ability to have it say "1 post".

The string would be:

$locale = array(
'there_are_x_posts' => 'there {plural:$1, {is $1 post|are $1 posts}'
);


And then to display it:

l('there_are_x_posts', 4);


However, if a language has a different word(?) for 3 posts, it can be done like so:


'there_are_x_posts' => 'there {plural:$1, {$1 word for one post|$1 word for two posts|$1 word for three posts}'


You can add as many as you want in the {x|y|z} "replacements" array. But note: these are fetched by their index using the "count" number. So if there are 10 replacements, 5 will be the fifth, 8 will be the eighth and so on.

But what if there are only 10 and the number is 400?, in that case the last replacement is used.

If you need this to work differently, you can create your own string compiler in the language class, which is why I went with the languages as a class method, so there is no limit to what can be done.

#8 wr2

wr2

    Newbie

  • Members
  • Pip
  • 4 posts

Posted 19 February 2012 - 08:53 AM

By the way, do you intend to include localization files into future releases? I could try to maintain Russian one to obtain localized version just "out of the box".

#9 arturo182

arturo182

    Advanced Member

  • Contributor
  • PipPipPip
  • 151 posts

Posted 19 February 2012 - 10:21 AM

I'm checking every commit for localization changes just so I can keep the Polish translation up to date so it can be added when Jack decides to include localizations in the release.

#10 Jack

Jack

    Project Founder

  • Administrators
  • 673 posts
  • LocationAustralia

Posted 20 February 2012 - 08:37 AM

By the way, do you intend to include localization files into future releases? I could try to maintain Russian one to obtain localized version just "out of the box".


Not sure yet.

I've been thinking of adding a language installer to the Traq Install script that will let the person installing choose what language to download once they click the final install button.

This of course, will rely on an API the new Traq site will have for downloading language files.

(The plan for the new Traq site is to allow people to submit plugins and languages to a "Traq Extension Database" type thing.)

#11 arturo182

arturo182

    Advanced Member

  • Contributor
  • PipPipPip
  • 151 posts

Posted 24 February 2012 - 05:55 PM

I was curious and check how Qt does translations and here's what I found out, two informations are used to determine which form to use.

1. For every language there's an array of types of strings that are used for describing numerals.
For Polish it's 3 (singular, paucal, plural) for English it's two (singular, plural).

2. For every language there's a rule function that determines which type to use based on the number.
For Polish it's:
$type = (n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);
so for n = 1 I know to use the "0" type which is singular.
For English it's:
$type = (n==1 ? 0 : 1);
so it's plural for all n except n=1.

I'm not saying you have to use this method, I'm just posting it as a fun fact ;)

#12 Jack

Jack

    Project Founder

  • Administrators
  • 673 posts
  • LocationAustralia

Posted 24 February 2012 - 10:15 PM

Perhaps I could make some enhancements to the locale system, so the plural replacements will work like, for example, [0] = singular, [1] = paucal, [2] = plural.

Probably do this when I rework how the locale classes are loaded.

Thinking of moving them to a non-static form. So Locale::load('enus'); will return the instantiated class of Locale_enUS, instead of having the L() function try to figure out what class to use.

#13 arturo182

arturo182

    Advanced Member

  • Contributor
  • PipPipPip
  • 151 posts

Posted 24 February 2012 - 11:11 PM

If you're going to rework it remember that the number of forms should not be fixed as different languages have different forms, for example:
Welsh: Nullar, Singular, Dual, Sexal, Plural
Arabic: Nullar, Singular, Dual, Minority Plural, Plural, Plural (100-102, ...)
Japanese: Universal

The l() function should probably take second optional argument n which is passed to the locale and the locale will know what to do with it.

#14 Jack

Jack

    Project Founder

  • Administrators
  • 673 posts
  • LocationAustralia

Posted 26 February 2012 - 06:55 PM

Okay, the plural system for locale strings has been finished, the information about it is available on the Localization wiki page on GitHub.


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users