|
- Readme First! - Read and follow the rules, otherwise your posts will be closed |
|
|
|
|
|
There are currently, 152 guest(s) and 0 member(s) that are online.
You are Anonymous user. You can register for free by clicking here |
|
|
|
|
|
|
Dread writes " internationalization + localization = Native Language
Support
i18n ( Internationalization )
This is the process of making the program ready to accept the use of languages
other than English.
Basically most program languages in use today the program's language is
written in English. PHP is no different.
The input / output doesn't necessarily need to be English but some preparations
must be made to accept alternative encoding standards. Now there are hundreds
of encoding standards ISO
8859-1 being one of the most used in *-nuke languages today
Why UTF-8?
Before UTF-8 emerged, Linux users all over the world had to use various
different language-specific extensions of ASCII. Most popular were ISO
8859-1 and ISO 8859-2 in Europe, ISO 8859-7 in Greece, KOI-8 / ISO 8859-5
/ CP1251 in Russia, EUC and Shift-JIS in Japan, BIG5 in Taiwan, etc. This
made the exchange of files difficult and application software had to worry
about various small differences between these encodings. Support for these
encodings was usually incomplete, untested, and unsatisfactory, because
the application developers rarely used all these encodings themselves.
Because of these difficulties, major Linux distributors and application
developers have now started to phase out these older legacy encodings
in favor of UTF-8. UTF-8 support has improved dramatically over the last
few years and ever more people now use UTF-8 on a daily basis in
* text files (source code, HTML files, email messages, etc.)
* file names
* standard input and standard output, pipes
* environment variables
* cut and paste selection buffers
* telnet, modem, and serial port connections to terminal emulators
and in any other places where byte sequences used to be interpreted in
ASCII. 1
"Unicode is well on the way to replace ASCII, ISO 8859 and EUC at
all levels. It allows you to handle not only text in practically
any script and language used on this planet, it also provides
you with a comprehensive set of mathematical and technical symbols to
simplify scientific information exchange."-Markus Kuhn. 1
Further more he states, "With the UTF-8 encoding, Unicode can be used
in a convenient and backwards compatible way in environments that, like
Unix, were designed entirely around ASCII. UTF-8 is the way in which Unicode
is used under Unix, Linux, and similar systems. It is now time to make
sure that you are well familiar with it and that your software supports
UTF-8 smoothly. "
To use UTF-8 encoding on the web, you can do so by notifying the browser
through the header, meta content tag, and/or form control. All browsers
since Navigator 4 accept UTF-8 encoding although it's fonts size are a
big big for utf fonts.
Two great attribute of using UTF-8 is that you can use multiple languages
on one web page with the same encoding...Here
is sample UTF-8 and it is backwards compatible with ASCII
l10n ( Localization )
The process of translating output into individual language files, nuke
scores a big plus on this fact because multi-lingualization is built in,
all you need is the right files in the right place and nuke takes care
of this fact via the cookie.
Numbers and currency also must be taken into account. Luckily PHP has a
built in function for this using the setlocale()
function..
We are also accepting input to help update the language files at http://coppermine.findhere.org/modules.php?name=CPGlang
there is a web based form to help get all language variables translated
into their native tongues. Many of the admin variables and others are
still in English. It even helps translate itself... I will also be making
this module into part of the NLI
package so others may use it for their modules.
Native Language Support
When a program is properly i18n and l10n is said to provide NLS. This
should include language detection to show the user the page in the right
language the first time, my NLI
release will include this function.
This includes:
- Locale specific and culture specific conventions (dates, numbers,
etc.)
- Messages in native languages
- Input method support
So what does this mean to the programmer? Just add a couple of tags
and you're on you way? Not exactly. There are a couple of problems here.
Form submission
"Note. The "get" method restricts form
data set values to ASCII characters. Only the "post" method
(with enctype="multipart/form-data") is specified to cover the
entire [ISO10646] character set."3
GET METHOD
When we use the URL to pass parameters in nuke we are using the get method.
Any variable using from the user of language files needs to be rawurlencoded.
This is not just for moving to UTF-8. Currently 17 nuke languages
use an 8 bit encoding, your modules should provide a way for ANY
character a user or language file may use in a parameter, uploaded file
name, title of article, or wherever these may be found...
FORMS WITH GET METHOD
ASCII characters ONLY
POST METHOD
To use UTF-8 or other encoding other than 8859-1 you need to have
proper form controls including enctype="multipart/form-data"
and accept-charset="UTF-8" 2
STRINGS
Character encodings that work with PHP:
ISO-8859-*, EUC-JP, UTF-8
Character encodings that do NOT work with PHP:
JIS, SJIS
Character encoding, that does not work with PHP, may be converted with
mbstring's HTTP input/output conversion feature/function. mbstring is
an extended module and may not be enable in all configration and it function
are considered experimental.4
1 UTF-8
and Unicode FAQ for Unix/Linux:
2 Form
submission
3
W3 org Form content types
4 Multi-Byte String
Functions"
|
|
Posted on Wednesday, January 28 @ 08:33:21 CET by Zhen-Xjell |
|
|
|
|
| |
|
Average Score: 5 Votes: 1

|
|
|
|
|
|
| The comments are owned by the poster. We aren't responsible for their content. |
| | | | |
No Comments Allowed for Anonymous, please register | | | | | |