Oracle8 Reference
Release 8.0

A58242-01

Library

Product

Contents

Index

Prev Next

4
National Language Support

This chapter describes features that enable Oracle Server applications to operate with multiple languages using conventions specified by the application user. The following topics are included:

What Does National Language Support Provide?

Oracle Server National Language Support allows users to interact with the database in their native language. It also allows applications to run in different language environments.

To achieve these goals, NLS provides

The remainder of this chapter provides background on these issues and describes the mechanisms NLS provides to handle them.

Oracle Server NLS Architecture

The NLS architecture has two components: language-independent functions and language-dependent data. The former provides generic language-oriented features; the latter provides data required to operate these features for a specific language.

Because the language-dependent data is separate from the code, the operation of NLS functions is governed by data supplied at runtime. New languages can be added and language-specific application characteristics can be altered without requiring any code changes. This architecture also enables language-dependent features to be specified for each session.

Background Information

This section provides background information on the issues involved in multi-lingual applications, and shows how they are resolved by the National Language Support (NLS) features of the Oracle Server. The remaining sections of this chapter discuss the specific parameters that control NLS operation.

Character Encoding Schemes

To understand how Oracle Server deals with character data, it is important to understand the general features of character representation on computers. The appearance of a character on a terminal depends on the convention for character representation used by that terminal. When you press a character key on the keyboard, the terminal generates a numeric code specified by the character encoding scheme in use on that device. When the terminal receives a number representing a character, it displays the character shape specified by that encoding scheme.

Encoding schemes define the representation of alphabetic characters, numerals, and punctuation characters, together with codes that control terminal display and communication. A character encoding scheme (also known as a character set or code page) specifies numbers corresponding to each character that the terminal can display. Examples are 7-bit ASCII, EBCDIC Code Page 500, and Japanese Extended UNIX Code.

Many encoding schemes are used by hardware manufacturers to support different languages. All support the 26 letters of the Latin alphabet, A to Z. In general, single-byte encoding schemes are used for European and Middle Eastern languages and multi-byte encoding schemes for Asian languages.

Restrictions on Character Sets Used to Express Names and Text

Table 4-1 lists the restrictions on the character sets that can be used to express names and other text in Oracle.

Table 4-1 Restrictions on Character Sets Used to Express Names and Text
Name   1-Byte
Fixed
 
Varying
Width
 
Multi-byte fixed width character sets  

comments

 

Yes

 

Yes

 

Yes

 

database link names

 

Yes

 

No

 

No

 

database names

 

Yes

 

No

 

No

 

filenames (datafile, logfile, controlfile, initialization file)

 

Yes

 

No

 

No

 

instance names

 

Yes

 

No

 

No

 

directory names

 

Yes

 

No

 

No

 

keywords (see Note below)

 

Yes

 

No

 

No

 

recovery manager filenames

 

Yes

 

No

 

No

 

rollback segment names
(see Note below)

 

Yes

 

No

 

No

 

stored script names

 

Yes

 

Yes

 

No

 

tablespace names
(see Note below)

 

Yes

 

Yes*

 

No

 

Note: Keywords can be expressed only in English (single byte).

Note: The ROLLBACK_SEGMENTS parameter does not support NLS.

Note: Recovery Manager doesn't support varying width character set tablespace names.

For a list of supported string formats and character sets, including LOB data (LOB, BLOB, CLOB, NCLOB), see Table 4-3 .

Single-Byte 7-Bit Encoding Schemes

Single-byte 7-bit encoding schemes can define up to 128 characters, and normally support just one language. The only characters defined in 7-bit ASCII are the 26 Latin alphabetic characters. Various other 7-bit schemes are used where certain characters (normally punctuation) in 7-bit ASCII are replaced with additional alphanumeric characters required for a specific language.

Single-Byte 8-Bit Encoding Schemes

Single-byte 8-bit encoding schemes can define up to 256 characters, and normally support a group of languages. For example, ISO 8859/1 supports many Western European languages.

Varying-Width Multi-Byte Encoding Schemes

Multi-byte encoding schemes are needed for Asian languages because these languages use thousands of characters. Some multi-byte encoding schemes use the value of the most significant bit to indicate if a byte represents a single-byte character or is the first or second byte of a double-byte character. In other schemes, control codes differentiate single-byte from double-byte characters. A shift-out code indicates that the following bytes are double-byte characters until a shift-in code is encountered.

There are two general groups of encoding schemes, those based on 7-bit ASCII and those based on IBM EBCDIC. Within each group, all schemes normally use the same encoding for the 26 Latin characters (A to Z), but use different encoding for other characters used in languages other than English. ASCII and EBCDIC use different encodings, even for the Latin characters.

Fixed-Width Multi-Byte Encoding Schemes

A fixed-width multi-byte character set is a subset of a corresponding varying-width multi-byte character set. A fixed-width multi-byte character set contains all of the characters of a certain width which belong to a corresponding varying-width multi-byte character set. In fixed-width multi-byte character sets, no shift-out or shift-in codes are used even if they are needed in their corresponding varying-width multi-byte character set.

A fixed-width multi-byte character set can be used as the national character set, but cannot be used as a database character set. This is because database character sets must have either EBCDIC or 7-bit ASCII as a subset in order to represent identifiers and to hold SQL and PL/SQL source code.

Pattern Matching Characters for Fixed-width Multi-byte Character Sets

The LIKE operator is used in character string comparisons with pattern matching. Its syntax requires the use of two special pattern matching characters: the underscore (_) and the percent sign(%).

Table 4-2 Encoding for the Underscore, Percent Sign, and Pad Character
For this Character Set   Use these Code Point Values  
underscore   percent sign   pad character (space)  

JA16SJISFIXED

 

0x8151

 

0x8193

 

0x8140

 

JA16EUCFIXED

 

0xa1b2

 

0xa1f3

 

0xa1a1

 

JA16DBCSFIXED

 

0x426d

 

0x426c

 

0x4040

 

ZHT32TRISFIXED

 

0x8eb1a1df

 

0x8eb1a1a5

 

0x8ebla1a0

 

UTF8 Encoding

The UNICODE encoding scheme, UTF-8 (character set name AL24UTFFSS for UNICODE Version 1.1 and character set name UTF8 for UNICODE Version 2.0), a variable-width, multi-byte format, is supported with Oracle8.

Choosing Character Sets for Database Character Set and National Character Set

This section describes the uses for the database character set and the national character set. It also presents some general guidelines for choosing character sets to represent the database character set and the national character set.

Uses for the Database Character Set

Oracle uses the database character set for these items:

Since SQL and PL/SQL keywords are expected to appear in either 7-bit ASCII or in EBCDIC, whichever is native to the host, the database character set must be a superset of one of these encodings. Often, an Oracle character set is chosen based on a corresponding character encoding supported by the platform operating system, thereby allowing for interoperability. However, having the database character set be equivalent to the platform operating system is not a requirement.

Uses for the National Character Set

Oracle uses the national character set for these items:

As described in Varying-Width Multi-Byte Encoding Schemes and "Fixed-Width Multi-Byte Encoding Schemes" on page 4-5, a varying-width multi-byte character set can be used as a national character set or a database character set. A fixed-width multi-byte character set, on the other hand, can be used for a national character set but not a database character set.

Guidelines for Choosing Character Sets

There does not have to be a close relationship between the character sets which you choose for your database character set and your national character set. However, these guidelines are suggested:

Consider your need to use national character literals

Consider your need for character literals to represent national character set values when choosing the database character set and national character set to use together on a platform. Only characters in the repertoire of both the database character set and the national character set can be meaningfully used in a national character literal. Thus, you might want to choose a national character set and a database character set which are closely-related. For example, many Japanese customers will probably choose JA16EUC as their database character set and JA16EUCFIXED as their national character set.

You might find that there are characters in your chosen national character set which do not occur in your chosen database character set. You create the needed characters with the CHR(n USING NCHAR_CS) function where n represents the codepoint value of the character.

Consider performance

Some string operations will be faster if you choose a fixed-width character set for the national character set. A separate performance issue is space efficiency (and thus speed) when using smaller-width character sets. These issues potentially trade-off against each other when the choice is between a varying-width and a fixed-width character set.

Be careful when mixing fixed-width and varying-width character sets

Because fixed-width multi-byte character sets are measured in characters but varying-width character sets are measured in bytes, be careful if you use a fixed-width multi-byte character set as your national character set on one platform and a varying-width character set on another platform.

For example, if you use %TYPE or a named type to declare an item on one platform using the declaration information of an item from the other platform, you might receive a constraint limit too small to support the data. For example, "NCHAR (10)" on the platform using the fixed-width multi-byte set will allocate enough space for 10 characters, but if %TYPE or use of a named type creates a correspondingly typed item on the other platform, it will allocate only 10 bytes. Usually, this is not enough for 10 characters. To be safe, do one of the following:

Consider the shortcomings of converting between character sets

Character set conversions can be silently lossy. Characters not in the destination character set will convert to "?" or some other designated codepoint. If you have distributed environments, consider using character sets with similar repertoires as your database character sets on your various platforms. Also consider using character sets with similar repertoires as your national character sets, to avoid either undesirable loss of data or results which vary depending on which platform evaluates a particular expression.

Customizing Character Sets

In some cases, you may wish to tailor a character set to meet specific user needs. In Oracle8, users can extend an existing encoded character set definition to suit their needs. User Defined Characters (UDC) are often used to encode special characters representing proper names, historical terms, and vendor-specific characters.

For further information, see "General Concepts for Customized Character Sets" on page 4-51

Specifying Language-Dependent Behavior

This section discusses the parameters that specify language-dependent operation. You can set language-dependent behavior defaults for the server and set language dependent behavior for the client that overrides these defaults.

Most NLS parameters can be used in three ways:

The following NLS parameters can be initialization parameters, environment variables, and ALTER SESSION parameters:

The following parameters can be specified as initialization parameters and ALTER SESSION parameters, but not as environment variables:

For more information on these parameters, see "NLS Parameters" on page 4-18.

The following NLS parameters can be set only as environment variables:

For more information on these parameters, see "NLS Parameters" on page 4-18. For additional information on NLS_LANG, see Specifying Language-Dependent Behavior for a Session below.

Specifying Language-Dependent Behavior for a Session

This section discusses the NLS parameters that specify language-dependent operation of applications.

NLS_LANG

Note: If the NLS_LANG parameter is not set, then the values assigned to other NLS parameters are ignored.

The NLS_LANG environment variable has three components (language, territory, and charset) in the form:

NLS_LANG = language_territory.charset

Each component controls the operation of a subset of NLS features.

language

 

Specifies conventions such as the language used for Oracle messages, day names, and month names. Each supported language has a unique name; for example, American, French, or German. The language argument specifies default values for the territory and character set arguments, so either (or both) territory or charset can be omitted. If language is not specified, the value defaults to American. For a complete list of languages, see "Supported Languages" on page 4-39.

 

territory

 

Specifies conventions such as the default date format and decimal character used for numbers. Each supported territory has a unique name; for example, America, France, or Canada. If territory is not specified, the value defaults to America. For a complete list of territories, see "Supported Territories" on page 4-41.

 

charset

 

Specifies the character set used by the client application (normally that of the user's terminal). Each supported character set has a unique acronym, for example, US7ASCII, WE8ISO8859P1, WE8DEC, WE8EBCDIC500, or JA16EUC. Each language has a default character set associated with it. Default values for the languages available on your system are listed in your installation or user's guide. For a complete list of character sets, see "Storage Character Sets" on page 4-44.

 

Note: All components of the NLS_LANG definition are optional; any item left out will default. If you specify territory or charset, you must include the preceding delimiter [underscore ( _ ) for territory, period ( . ) for charset], otherwise the value will be parsed as a language name.

The three arguments of NLS_LANG can be specified in any combination, as in the following examples:

NLS_LANG = AMERICAN_AMERICA.US7ASCII

or

NLS_LANG = FRENCH_FRANCE.WE8ISO8859P1

or

NLS_LANG = FRENCH_CANADA.WE8DEC

or

NLS_LANG = JAPANESE_JAPAN.JA16EUC

Specifying NLS_LANG

NLS_LANG is defined for each session by means of an environment variable or equivalent platform-specific mechanism. Different sessions connected to the same database can specify different values for NLS_LANG.

For example, on VMS you could specify the value of NLS_LANG by entering the following line at the VMS prompt:

$ DEFINE NLS_LANG FRENCH_FRANCE.WE8DEC

If you do not specify a value for NLS_LANG, the language-dependent behavior defaults to the language specified by the NLS_LANGUAGE database initialization parameter and the territory specified by the NLS_TERRITORY database initialization parameter. Additionally, if you do not specify a value for NLS_LANG, other NLS environment variables you may have set are ignored.

If you do specify a value for NLS_LANG, the values set in initialization parameters are ignored.

For more information on how to set NLS_LANG on your system, see your operating system-specific Oracle documentation.

Client/Server Architecture

NLS_LANG sets the NLS language and territory environment used by the database for both the server session and for the client application. Using the one parameter ensures that the language environments of both database and client application are automatically the same.

Because NLS_LANG is an environment variable, it is read by the client application at startup time. The client communicates the information defined in NLS_LANG to the server when it connects.

Overriding Language and Territory Specifications

The default values for language and territory can be overridden for a session by using the ALTER SESSION statement. For example:

ALTER SESSION SET NLS_LANGUAGE = FRENCH NLS_TERRITORY = FRANCE

This feature implicitly determines the language environment of the database for each session. An ALTER SESSION statement is automatically executed when a session connects to a database to set the values of the database parameters NLS_LANGUAGE and NLS_TERRITORY to those specified by the language and territory arguments of NLS_LANG. If NLS_LANG is not defined, no ALTER SESSION statement is executed.

When NLS_LANG is defined, the implicit ALTER SESSION is executed for all instances to which the session connects, for both direct and indirect connections. If the values of NLS parameters are changed explicitly with ALTER SESSION during a session, the changes are propagated to all instances to which that user session is connected.

The NLS_NCHAR parameter specifies the character set used by the client application for national character set data. For more information on this parameter, see "NLS_NCHAR" on page 4-31.

Specifying Language-Dependent Application Behavior

Language-Dependent Functions

Setting the values of various NLS parameters allows applications to function in a language-dependent manner. The language-dependent functions controlled by NLS include

Messages and Text

All messages and text should be in the same language. For example, when running a Developer 2000 application, messages and boilerplate text seen by the user originate from three sources:

The application is responsible for meeting the last requirement. NLS takes care of the other two.

Number Format

The database must know the number-formatting convention used in each session to interpret numeric strings correctly. For example, the database needs to know whether numbers are entered with a period or a comma as the decimal character (234.00 or 234,00). In the same vein, the application needs to be able to display numeric information in the format expected at the client site.

Date Format, Currency Symbols, and First Day of the Week

Similarly, date and currency information need to be interpreted properly when they are input to the server, and formatted in the expected manner when output to the user's terminal. These functions are all controlled by the NLS parameters. For more information, see "NLS Parameters" on page 4-18.

Sorting Character Data

Conventionally, when character data is sorted, the sort sequence is based on the numeric values of the characters defined by the character encoding scheme. Such a sort is called a binary sort. Such a sort produces reasonable results for the English alphabet because the ASCII and EBCDIC standards define the letters A to Z in ascending numeric value.

Note however, that in the ASCII standard all uppercase letters appear before any lowercase letters. In the EBCDIC standard, the opposite is true: all lowercase letters appear before any uppercase letters.

Binary Sorts

When characters used in other languages are present, a binary sort generally does not produce reasonable results. For example, an ascending ORDER BY query would return the character strings ABC, ABZ, BCD, ÄBC, in that sequence, when the Ä has a higher numeric value than B in the character encoding scheme.

Linguistic Sorts

To produce a sort sequence that matches the alphabetic sequence of characters for a particular language, another sort technique must be used that sorts characters independently of their numeric values in the character encoding scheme. This technique is called a linguistic sort. A linguistic sort operates by replacing characters with other binary values that reflect the character's proper linguistic order so that a binary sort returns the desired result.

Oracle Server provides both sort mechanisms. Linguistic sort sequences are defined as part of language-dependent data. Each linguistic sort sequence has a unique name. NLS parameters define the sort mechanism for ORDER BY queries. A default value can be specified, and this value can be overridden for each session with the NLS_SORT parameter. A complete list of linguistic definitions is provided in "Linguistic Definitions" on page 4-52.

Warning: Linguistic sorting is not supported on multi-byte character sets. If the database character set is multi-byte, you get binary sorting, which makes the sort sequence dependent on the character set specification.

Linguistic Special Cases

Linguistic special cases are character sequences that need to be treated as a single character when sorting. Such special cases are handled automatically when using a linguistic sort. For example, one of the linguistic sort sequences for Spanish specifies that the double characters ch and ll are sorted as single characters appearing between c and d and between l and m respectively.

Another example is the German language sharp s (ß). The linguistic sort sequence German can sort this sequence as the two characters SS, while the linguistic sort sequence Austrian sorts it as SZ.

Special cases like these are also handled when converting uppercase characters to lowercase, and vice versa. For example, in German the uppercase of the sharp s is the two characters SS. Such case-conversion issues are handled by the NLS_UPPER, NLS_LOWER, and NLS_INITCAP functions, according to the conventions established by the linguistic sort sequence. (The standard functions UPPER, LOWER, and INITCAP do not handle these special cases.)

Specifying Default Language-Dependent Behavior

This section describes NLS_LANGUAGE and NLS_TERRITORY, the database initialization parameters that specify the default language-dependent behavior for a session.

NLS_LANGUAGE

NLS_LANGUAGE specifies the default conventions for the following session characteristics:

The value specified for NLS_LANGUAGE in the initialization file is the default for all sessions in that instance.

For more information on which language conventions supported, see your operating system-specific Oracle documentation.

For example, to specify the default session language as French, the parameter should be set as follows:

NLS_LANGUAGE = FRENCH

In this case, the server message

ORA-00942: table or view does not exist

will appear as

ORA-00942: table ou vue inexistante

Messages used by the server are stored in binary-format files that are placed in the ORA_RDBMS directory, or the equivalent.

Multiple versions of these files can exist, one for each supported language, using the filename convention

<product_id><language_id>.MSB

For example, the file containing the server messages in French is called ORAF.MSB, "F" being the language abbreviation for French.

Messages are stored in these files in one specific character set, depending on the particular machine and operating system. If this is different from the database character set, message text is automatically converted to the database character set. If necessary, it will be further converted to the client character set if it is different from the database character set. Hence, messages will be displayed correctly at the user's terminal, subject to the limitations of character set conversion.

The default value of NLS_LANGUAGE may be operating system specific. You can alter the NLS_LANGUAGE parameter by changing the value in the initialization file and then restarting the instance. For more information on NLS_LANGUAGE as an initialization parameter, see "NLS_LANGUAGE" on page 1-81.

For more information on the default value, see your operating system-specific Oracle documentation.

NLS_TERRITORY

NLS_TERRITORY specifies the conventions for the following default date and numeric formatting characteristics:

The value specified for NLS_TERRITORY in the initialization file is the default for the instance. For example, to specify the default as France, the parameter should be set as follows:

NLS_TERRITORY = FRANCE

In this case, numbers would be formatted using a comma as the decimal character.

The default value of NLS_TERRITORY can be operating system specific.

You can alter the NLS_TERRITORY parameter by changing the value in the initialization file and then restarting the instance. For more information on NLS_TERRITORY as an initialization parameter, see "NLS_TERRITORY" on page 1-83.

For more information on the default value and to see which territory conventions are supported on your system, see your operating system-specific Oracle documentation.

Runtime Loadable NLS Data

Data Loading

Language-independent data (NLSDATA) is loaded into memory at runtime; this determines the behavior of an application in a given language environment that is defined by the NLSDATA. In conjunction with NLSDATA, a boot file is used to determine the availability of NLS objects which can be loaded.

On initialization, the boot file is loaded into memory, where it serves as the master list of available NLS objects, prior to loading NLSDATA files. Oracle supports both system and user boot files. A user boot file may only contain a subset of the system boot file. When loading, the user boot file takes precedence over the system boot file. If the user boot file is not present, the system boot file will be used; this way, all NLS data defined in the system boot file will be available for loading. If neither user nor system boot file is found, then a default linked-in boot file and some default linked-in data objects (language American, territory America, character set US7ASCII) will be loaded. NLS functionality, however, will be limited to what is provided by the linked-in data objects. After a boot file (either user or system) is loaded, the NLSDATA files are read into memory based on the availability of the NLS objects defined in the boot file.

The idea behind a user boot file is to give an application further flexibility to tailor exactly which NLS objects it needs for its language environment, thus controlling the application's memory consumption.

Utilities

Oracle Server includes the following two utilities to assist you in maintaining NLS data:

NLS Data Installation Utility

 

Generate binary-format data objects from their text-format versions. Use this when you receive NLS data updates or if you create your own data objects.

 

NLS Configuration Utility (LXBCNF)

 

Create and edit user boot files.

 

For more information, see Oracle8 Utilities.

NLS Parameters

The NLS_LANGUAGE and NLS_TERRITORY parameters implicitly specify several aspects of language-dependent operation. Additional NLS parameters provide explicit control over these operations. Most of the parameters listed below can be specified in the initialization file; they can also be specified for each session with the ALTER SESSION command.

Parameter   Description  

NLS_CALENDAR

 

Calendar system

 

NLS_CURRENCY

 

Local currency symbol

 

NLS_DATE_FORMAT

 

Default date format

 

NLS_DATE_ LANGUAGE

 

Default language for dates

 

NLS_ISO_CURRENCY

 

ISO international currency symbol

 

NLS_LANGUAGE

 

Default language

 

NLS_NUMERIC_CHARACTERS

 

Decimal character and group separator

 

NLS_SORT

 

Character sort sequence

 

NLS_TERRITORY

 

Default territory

 

For a complete description of ALTER SESSION, see Oracle8 SQL Reference.

NLS_CALENDAR

Parameter type:  

string

 
Parameter class:  

dynamic, scope = ALTER SESSION

 
Default value:  

Gregorian

 
Range of values:  

any valid calendar format name

 

Many different calendar systems are in use throughout the world. NLS_CALENDAR specifies which calendar system Oracle uses.

NLS_CALENDAR can have one of the following values:

For example, if NLS_CALENDAR is set to "Japanese Imperial", the date format is "E YY-MM-DD", and the date is May 15, 1997, then the SYSDATE is displayed as follows:

SELECT SYSDATE FROM DUAL;
SYSDATE
--------
H 09-05-15 

NLS_CURRENCY

This parameter specifies the character string returned by the number format mask L, the local currency symbol, overriding that defined implicitly by NLS_TERRITORY. For example, to set the local currency symbol to "Dfl" (including a space), the parameter should be set as follows:

NLS_CURRENCY = "Dfl "

In this case, the query

SELECT TO_CHAR(TOTAL, 'L099G999D99') "TOTAL"
     FROM ORDERS WHERE CUSTNO = 586

would return

TOTAL
-------------
Dfl 12.673,49

You can alter the default value of NLS_CURRENCY by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_CURRENCY command.

For more information on NLS_CURRENCY as an initialization parameter, see "NLS_CURRENCY" on page 1-79.

NLS_DATE_FORMAT

Defines the default date format to use with the TO_CHAR and TO_DATE functions. The default value of this parameter is determined by NLS_TERRITORY. The value of this parameter can be any valid date format mask, and the value must be surrounded by double quotes. For example:

NLS_DATE_FORMAT = "MM/DD/YYYY"

To add string literals to the date format, enclose the string literal with double quotes. Note that every special character (such as the double quote) must be preceded with an escape character. The entire expression must be surrounded with single quotes. For example:

NLS_DATE_FORMAT = '\"Today\'s date\" MM/DD/YYYY'

As another example, to set the default date format to display Roman numerals for months, you would include the following line in your initialization file:

NLS_DATE_FORMAT = "DD RM YYYY"

With such a default date format, the following SELECT statement would return the month using Roman numerals (assuming today's date is February 12, 1997):

SELECT TO_CHAR(SYSDATE) CURRDATE
     FROM DUAL;
CURRDATE
---------
12 II 1997

The value of this parameter is stored in the tokenized internal date format. Each format element occupies two bytes, and each string occupies the number of bytes in the string plus a terminator byte. Also, the entire format mask has a two-byte terminator. For example, "MM/DD/YY" occupies 12 bytes internally because there are three format elements, two one-byte strings (the two slashes), and the two-byte terminator for the format mask. The tokenized format for the value of this parameter cannot exceed 24 bytes.

Note: The applications you design may need to allow for a variable-length default date format. Also, the parameter value must be surrounded by double quotes: single quotes are interpreted as part of the format mask.

You can alter the default value of NLS_DATE_FORMAT by changing its value in the initialization file and then restarting the instance, and you can alter the value during a session using an ALTER SESSION SET NLS_DATE_FORMAT command.

For more information on NLS_DATE_FORMAT as an initialization parameter, see "NLS_DATE_FORMAT" on page 1-79.

Date Formats and Partition Bound Expressions

Partition bound expressions for a date column must specify a date using a format which requires that the month, day, and 4-digit year are fully specified. For example, the date format MM-DD-YYYY requires that the month, day, and 4-digit year are fully specified. In contrast, the date format DD-MON-YY (11-jan-97, for example) is invalid because it relies on the current date for the century.

Use TO_DATE() to specify a date format which requires the full specification of month, day, and 4-digit year. For example:

TO_DATE('11-jan-1997', 'dd-mon-yyyy')

If the default date format, specified by NLS_DATE_FORMAT, of your session does not support specification of a date independent of current century (that is, if your default date format is MM-DD-YY), you must take one of the following actions:

For a more information on using TO_DATE(), see Oracle8 SQL Reference.

NLS_DATE_ LANGUAGE

This parameter specifies the language for the spelling of day and month names by the functions TO_CHAR and TO_DATE, overriding that specified implicitly by NLS_LANGUAGE. NLS_DATE_LANGUAGE has the same syntax as the NLS_LANGUAGE parameter, and all supported languages are valid values. For example, to specify the date language as French, the parameter should be set as follows:

NLS_DATE_LANGUAGE = FRENCH

In this case, the query

SELECT TO_CHAR(SYSDATE, 'Day:Dd Month yyyy')
     FROM DUAL;

would return

Mercredi:12 Février 1997

Month and day name abbreviations are also in the language specified, for example:

Me:12 Fév 1997

The default date format also uses the language-specific month name abbreviations. For example, if the default date format is DD-MON-YYYY, the above date would be inserted using:

INSERT INTO tablename VALUES ('12-Fév-1997');

The abbreviations for AM, PM, AD, and BC are also returned in the language specified by NLS_DATE_LANGUAGE. Note that numbers spelled using the TO_CHAR function always use English spellings; for example:

SELECT TO_CHAR(TO_DATE('12-Fév'),'Day: ddspth Month')
FROM DUAL;

would return:

Mercredi: twenty-seventh Février

You can alter the default value of NLS_DATE_LANGUAGE by changing its value in the initialization file and then restarting the instance, and you can alter the value during a session using an ALTER SESSION SET NLS_DATE_LANGUAGE command.

For more information on NLS_DATE_LANGUAGE as an initialization parameter, see "NLS_DATE_LANGUAGE" on page 1-80.

NLS_ISO_CURRENCY

This parameter specifies the character string returned by the number format mask C, the ISO currency symbol, overriding that defined implicitly by NLS_TERRITORY.

Local currency symbols can be ambiguous; for example, a dollar sign ($) can refer to US dollars or Australian dollars. ISO Specification 4217 1987-07-15 defines unique "international" currency symbols for the currencies of specific territories (or countries).

For example, the ISO currency symbol for the US Dollar is USD, for the Australian Dollar AUD. To specify the ISO currency symbol, the corresponding territory name is used.

NLS_ISO_CURRENCY has the same syntax as the NLS_TERRITORY parameter, and all supported territories are valid values. For example, to specify the ISO currency symbol for France, the parameter should be set as follows:

NLS_ISO_CURRENCY = FRANCE

In this case, the query

SELECT TO_CHAR(TOTAL, 'C099G999D99') "TOTAL"
FROM ORDERS WHERE CUSTNO = 586

would return

TOTAL
-------------
FRF12.673,49

You can alter the default value of NLS_ISO_CURRENCY by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_ISO_CURRENCY command.

For more information on NLS_ISO_CURRENCY as an initialization parameter, see "NLS_ISO_CURRENCY" on page 1-80.

NLS_NUMERIC_CHARACTERS

This parameter specifies the decimal character and grouping separator, overriding those defined implicitly by NLS_TERRITORY. The group separator is the character that separates integer groups (that is, the thousands, millions, billions, and so on). The decimal character separates the integer and decimal parts of a number.

Any character can be the decimal or group separator. The two characters specified must be single-byte, and both characters must be different from each other. The characters cannot be any numeric character or any of the following characters: plus (+), hyphen (-), less than sign (<), greater than sign (>).

The characters are specified in the following format:

NLS_NUMERIC_CHARACTERS = "<decimal_character><group_separator>"

The grouping separator is the character returned by the number format mask G. For example, to set the decimal character to a comma and the grouping separator to a period, the parameter should be set as follows:

NLS_NUMERIC_CHARACTERS = ",."

Both characters are single byte and must be different. Either can be a space.

Note: When the decimal character is not a period (.) or when a group separator is used, numbers appearing in SQL statements must be enclosed in quotes. For example, with the value of NLS_NUMERIC_CHARACTERS above, the following SQL statement requires quotation marks around the numeric literals:

INSERT INTO SIZES (ITEMID, WIDTH, QUANTITY)
     VALUES (618, '45,5', TO_NUMBER('1.234','9G999'));

You can alter the default value of NLS_NUMERIC_CHARACTERS in either of these ways:

For more information on NLS_NUMERIC_CHARACTERS as an initialization parameter, see "NLS_NUMERIC_CHARACTERS" on page 1-81.

NLS_SORT

This parameter specifies the type of sort for character data, overriding that defined implicitly by NLS_LANGUAGE.

The syntax of NLS_SORT is:

NLS_SORT = { BINARY | name }

BINARY specifies a binary sort and name specifies a particular linguistic sort sequence. For example, to specify the linguistic sort sequence called German, the parameter should be set as follows:

NLS_SORT = German

The name given to a linguistic sort sequence has no direct connection to language names. Usually, however, each supported language will have an appropriate linguistic sort sequence defined that uses the same name.

Note: When the NLS_SORT parameter is set to BINARY, the optimizer can in some cases satisfy the ORDER BY clause without doing a sort (by choosing an index scan). But when NLS_SORT is set to a linguistic sort, a sort is always needed to satisfy the ORDER BY clause.

You can alter the default value of NLS_SORT by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_SORT command.

For more information on NLS_SORT as an initialization parameter, see "NLS_SORT" on page 1-82.

A complete list of linguistic definitions is provided in Table 4-8, "Linguistic Definitions" .

Specifying Character Sets

The character encoding scheme used by the database is defined at database creation as part of the CREATE DATABASE statement. All data columns of type CHAR, CLOB, VARCHAR2, and LONG, including columns in the data dictionary, have their data stored in the database character set. In addition, the choice of database character set determines which characters can name objects in the database. Data columns of type NCHAR, NCLOB, NVARCHAR2 use the national character set.

Once the database is created, the character set choices cannot be changed without re-creating the database. Hence, it is important to consider carefully which character set(s) to use. The database character set should always be a superset or equivalent of the operating system's native character set. The character sets used by client applications that access the database will usually determine which superset is the best choice.

If all client applications use the same character set, then this is the normal choice for the database character set. When client applications use different character sets, the database character set should be a superset (or equivalent) of all the client character sets. This will ensure that every character is represented when converting from a client character set to the database character set.

When a client application operates with a terminal that uses a different character set, then the client application's characters must be converted to the database character set, and vice versa. This conversion is performed automatically, and is transparent to the client application. The character set used by the client application is defined by the NLS_LANG parameter. Similarly, the character set used for national character set data is defined by the NLS_NCHAR parameter. For more information on these parameters, see "NLS_LANG" on page 4-10 and "NLS_NCHAR" on page 4-31.

Supported Character Sets

Oracle Server National Language Support features solve the problems that result from the fact that different encoding schemes use different binary values to represent the same character. With NLS, data created with one encoding scheme can be correctly processed and displayed on a system that uses a different encoding scheme. Table 4-3 lists the supported string format and character sets.

Table 4-3 Supported Character String Functionality and Character Sets
Type   1-Byte
Fixed
 
Varying
Width
 
Multi-byte fixed width
character sets
 
Object Type and
Collection Type Support
 

CHAR

 

Yes

 

Yes

 

No

 

Yes

 

NCHAR

 

Yes

 

Yes

 

Yes

 

No

 

BLOB

 

Yes

 

Yes

 

Yes

 

Yes

 

CLOB

 

Yes

 

No

 

No

 

Yes

 

NCLOB

 

Yes

 

No

 

Yes

 

No

 

Note: CLOBs only support 1-byte fixed width database character sets. NCLOBs only support fixed-width NCHAR database character sets. BLOBs process characters as a series of byte sequences. The data is not subject to any NLS-sensitive operations.

Character Set Conversion

Where a character exists in both source and destination character sets, conversion presents no problem. However, data conversion has to accommodate characters that do not exist in the destination character set. In such cases, replacement characters are used. The source character is replaced by a character that does exist in the destination character set.

Replacement characters may be defined for specific characters as part of a character set definition. Where a specific replacement character is not defined, a default replacement character is used. To avoid the use of replacement characters when converting from client to database character set, the latter should be a superset (or equivalent) of all the client character sets.

The Concatenation Operator

If the database character set replaces the vertical bar ("|") with a national character, then all SQL statements that use the concatenation operator (ASCII 124) will fail. For example, creating a procedure will fail because it generates a recursive SQL statement that uses concatenation. When you use a 7-bit replacement character set such as D7DEC, F7DEC, or SF7ASCII for the database character set, then the national character which replaces the vertical bar is not allowed in object names because the vertical bar is interpreted as the concatenation operator.

On the user side, a 7-bit replacement character set can be used if the database character set is the same or compatible, that is, if both character sets replace the vertical bar with the same national character.

Storing Data in Multi-Byte Character Sets

Width specifications of the character datatypes CHAR and VARCHAR2 refer to bytes, not characters. Hence, the specification CHAR(20) in a table definition allows 20 bytes for storing character data.

If the database character set is single byte, the number of characters and number of bytes will be the same. If the database character set is multi-byte, there will in general be no such correspondence. A character can consist of one or more bytes, depending on the specific multi-byte encoding scheme and whether shift-in/shift-out control codes are present. Hence, column widths must be chosen with care to allow for the maximum possible number of bytes for a given number of characters.

When using the NCHAR and NVARCHAR2 data types, the width specification refers to characters if the national character set is fixed-width multi-byte. Otherwise, the width specification refers to bytes.

Loadable Character Sets

Oracle Server loads character sets upon first reference. Instead of linking all character sets as static data, each character set is read into dynamic memory upon first reference. The size of the executable is thus reduced by eliminating character set data not in use during execution.

Date and Number Formats

Several format masks are provided with the TO_CHAR, TO_DATE, and TO_NUMBER functions to format dates and numbers according to the relevant conventions.

Note: The TO_NUMBER function also accepts a format mask.

Date Formats

A format element RM (Roman Month) returns a month as a Roman numeral. Both uppercase and lowercase can be specified, using RM and rm respectively. For example, for the date 7 Sep 1998, "DD-rm-YYYY" will return "07-ix-1998" and "DD-RM-YYYY" will return "07-IX-1998".

Note that the MON and DY format masks explicitly support month and day abbreviations that may not be three characters in length. For example, the abbreviations "Lu" and "Ma" can be specified for the French "Lundi" and "Mardi", respectively.

Week and Day Number Conventions

The week numbers returned by the WW format mask are calculated according to the algorithm int((day-ijan1)/7). This week number algorithm does not follow the ISO standard (2015, 1992-06-15).

To support the ISO standard, a format element IW is provided that returns the ISO week number. In addition, format elements I IY IYY and IYYY, equivalent in behavior to the format elements Y, YY, YYY, and YYYY, return the year relating to the ISO week number.

In the ISO standard, the year relating to an ISO week number can be different from the calendar year. For example 1st Jan 1988 is in ISO week number 53 of 1987. A week always starts on a Monday and ends on a Sunday.

For example, January 1, 1991, is a Tuesday, so Monday, December 31, 1990, to Sunday, January 6, 1991, is week 1. Thus the ISO week number and year for December 31, 1990, is 1, 1991. To get the ISO week number, use the format mask "IW" for the week number and one of the "IY" formats for the year.

Number Formats

Several additional format elements are provided for formatting numbers:

For Roman numerals, both uppercase and lowercase can be specified, using RN and rn, respectively. The number to be converted must be an integer in the range 1 to 3999.

For complete information on using date and number masks, see Oracle8 SQL Reference.

Additional NLS Environment Variables

SQL commands such as ALTER SESSION SET NLS_parameter = value can be issued to alter the NLS settings for the current session. In addition, Oracle Server supports the following NLS parameters as environment variables to provide greater flexibility for multi-lingual applications:

These variables work in a similar fashion to NLS_LANG. The syntax for the environments listed above is the same as that for the ALTER SESSION command.

Note: If NLS_LANG is not set, the other NLS environment variables are ignored.

The following is an example for a UNIX environment:

setenv NLS_DATE_FORMAT "dd/mon/yyyy"

For more information, see the Oracle8 Administrator's Guide.

Client-Only Environment Variables

The following environment variables can be set in the client environment:

NLS_CREDIT

Default value:  

derived from NLS_TERRITORY

 
Range of values:  

any string, maximum of 9 bytes (not including null)

 

NLS_CREDIT sets the symbol that displays a credit in financial reports. The default value of this parameter is determined by NLS_TERRITORY.

NLS_DEBIT

Default value:  

derived from NLS_TERRITORY

 
Range of values:  

any string, maximum of 9 bytes (not including null)

 

NLS_DEBIT sets the symbol that displays a debit in financial reports. The default value of this parameter is determined by NLS_TERRITORY.

NLS_LIST_SEPARATOR

Default value:  

derived from NLS_TERRITORY

 
Range of values:  

any valid character

 

NLS_LIST_SEPARATOR specifies the character to use to separate values in a list of values.

The character specified must be single-byte and cannot be the same as either the numeric or monetary decimal character, any numeric character, or any of the following characters: plus (+), hyphen (-), less than sign (<), greater than sign (>), period (.).

NLS_MONETARY_CHARACTERS

Default value:  

derived from NLS_TERRITORY

 

NLS_MONETARY_CHARACTERS specifies the characters that indicate monetary units, such as the dollar sign ($) for U.S. Dollars, and the cent symbol (¢) for cents.

The two characters specified must be single-byte and cannot be the same as each other. They also cannot be any numeric character or any of the following characters: plus (+), hyphen (-), less than sign (<), greater than sign (>).

NLS_NCHAR

Default value:  

derived from NLS_LANG

 
Range of values:  

any valid character set name

 

NLS_NCHAR specifies the character set used by the client application for national character set data. If it is not specified, the client application uses the same character set which it uses for the database character set data.

Using NLS Parameters in SQL Functions

All character functions support both single-byte and multi-byte characters. Except where explicitly stated, character functions operate character-by-character, rather than byte-by-byte.

All SQL functions whose behavior depends on NLS conventions allow NLS parameters to be specified. These functions are

Explicitly specifying the optional NLS parameters for these functions allows the function evaluations to be independent of the NLS parameters in force for the session. This feature may be important for SQL statements that contain numbers and dates as string literals.

For example, the following query is evaluated correctly only if the language specified for dates is American:

SELECT ENAME FROM EMP
WHERE HIREDATE > '1-JAN-91'

Such a query can be made independent of the current date language by using these statements:

SELECT ENAME FROM EMP
WHERE HIREDATE > TO_DATE('1-JAN-91','DD-MON-YY',
   'NLS_DATE_LANGUAGE = AMERICAN')

In this way, language-independent SQL statements can be defined where necessary. For example, such statements might be necessary when string literals appear in SQL statements in views, CHECK constraints, or triggers.

Default Specifications

When evaluating views and triggers, default values for NLS function parameters are taken from the values currently in force for the session. When evaluating CHECK constraints, default values are set by the NLS parameters that were specified at database creation.

Specifying Parameters

The syntax that specifies NLS parameters in SQL functions is:

'parameter = value'

The following NLS parameters can be specified:

Only certain NLS parameters are valid for particular SQL functions, as follows:

SQL Function   NLS Parameter  

TO_DATE:

 

NLS_DATE_LANGUAGE
NLS_CALENDAR

 

TO_NUMBER:

 

NLS_NUMERIC_CHARACTERS
NLS_CURRENCY
NLS_ISO_CURRENCY

 

TO_CHAR

 

NLS_DATE_LANGUAGE
NLS_NUMERIC_CHARACTERS
NLS_CURRENCY
NLS_ISO_CURRENCY
NLS_CALENDAR

 

NLS_UPPER

 

NLS_SORT

 

NLS_LOWER

 

NLS_SORT

 

NLS_INITCAP

 

NLS_SORT

 

NLSSORT

 

NLS_SORT

 

Examples of the use of NLS parameters are

TO_DATE ('1-JAN-89', 'DD-MON-YY',
   'nls_date_language = American')

TO_CHAR (hiredate, 'DD/MON/YYYY',
   'nls_date_language = French')

TO_NUMBER ('13.000,00', '99G999D99',
   'nls_numeric_characters = ''.,''')

TO_CHAR (sal, '9G999D99L', 'nls_numeric_characters = ''.,''
   nls_currency = ''Dfl ''')

TO_CHAR (sal, '9G999D99C', 'nls_numeric_characters = '',.''
   nls_iso_currency = Japan')
NLS_UPPER (ename, 'nls_sort = Austrian')

NLSSORT (ename, 'nls_sort = German')

Note: For some languages, various lowercase characters correspond to a sequence of uppercase characters, or vice versa. As a result, the output from NLS_UPPER, NLS_LOWER, and NLS_INITCAP can differ from the length of the input.

Unacceptable Parameters

Note that NLS_LANGUAGE and NLS_TERRITORY are not accepted as parameters in SQL functions. Only NLS parameters that explicitly define the specific data items required for unambiguous interpretation of a format are accepted. NLS_DATE_FORMAT is also not accepted as a parameter for the reason described below.

If an NLS parameter is specified in TO_CHAR, TO_NUMBER, or TO_DATE, a format mask must also be specified as the second parameter. For example, the following specification is legal:

TO_CHAR (hiredate, 'DD/MON/YYYY', 'nls_date_language = French')

These are illegal:

TO_CHAR (hiredate, 'nls_date_language = French')
TO_CHAR (hiredate, 'nls_date_language = French',
   'DD/MON/YY')

This restriction means that a date format must always be specified if an NLS parameter is in a TO_CHAR or TO_DATE function. As a result, NLS_DATE_FORMAT is not a valid NLS parameter for these functions.

CONVERT Function

The SQL function CONVERT allows for conversion of character data between character sets.

For more information on CONVERT, see Oracle8 SQL Reference.

The CONVERT function converts the binary representation of a character string in one character set to another. It uses exactly the same technique as described previously for the conversion between database and client character sets. Hence, it uses replacement characters and has the same limitations.

If the CONVERT function is used in a stored procedure, the stored procedure will run independently of the client character set (that is, it will use the server's character set), which sometimes results in the last converted character being truncated. The syntax for CONVERT is:

where src_char_set is the source and dest_char_set is the destination character set.

In client/server environments using different character sets, use the TRANSLATE (...USING...) statement to perform conversions instead of CONVERT. The conversion to client character sets will then properly know the server character set of the result of the TRANSLATE statement.

NLSSORT Function

The NLSSORT function replaces a character string with the equivalent sort string used by the linguistic sort mechanism. For a binary sort, the sort string is the same as the input string. The linguistic sort technique operates by replacing each character string with some other binary values, chosen so that sorting the resulting string produces the desired sorting sequence. When a linguistic sort is being used, NLSSORT returns the binary values that replace the original string.

String Comparisons in a WHERE Clause

NLSSORT allows applications to perform string matching that follows alphabetic conventions. Normally, character strings in a WHERE clause are compared using the characters' binary values. A character is "greater than" another if it has a higher binary value in the database character set. Because the sequence of characters based on their binary values might not match the alphabetic sequence for a language, such comparisons often do not follow alphabetic conventions. For example, if a column (COL1) contains the values ABC, ABZ, BCD and ÄBC in the ISO 8859/1 8-bit character set, the following query:

SELECT COL1 FROM TAB1 WHERE COL1 > 'B'

returns both BCD and ÄBC because Ä has a higher numeric value than B. However, in German, an Ä is sorted alphabetically before B. Such conventions are language dependent even when the same character is used. In Swedish, an Ä is sorted after Z. Linguistic comparisons can be made using NLSSORT in the WHERE clause, as follows:

WHERE NLSSORT(col) comparison_operator NLSSORT(comparison_string)

Note that NLSSORT has to be on both sides of the comparison operator. For example:

SELECT COL1 FROM TAB1 WHERE NLSSORT(COL1) > NLSSORT('B')

If a German linguistic sort is being used, this does not return strings beginning with Ä because in the German alphabet Ä comes before B. If a Swedish linguistic sort is being used, such names are returned because in the Swedish alphabet Ä comes after Z.

Other SQL Functions

Two SQL functions, NLS_CHARSET_NAME and NLS_CHARSET_ID, are provided to convert between character set ID numbers and character set names. They are used by programs which need to determine character set ID numbers for binding variables through OCI.

The NLS_CHARSET_DECL_LEN function returns the declaration length (in number of characters) for an NCHAR column.

For more information on these functions, see Oracle8 SQL Reference.

Converting from Character Set Number to Character Set Name

The NLS_CHARSET_NAME(n) function returns the name of the character set corresponding to ID number n. The function returns NULL if n is not a recognized character set ID value.

Converting from Character Set Name to Character Set Number

NLS_CHARSET_ID(TEXT) returns the character set ID corresponding to the name specified by TEXT. TEXT is defined as a run-time VARCHAR2 quantity, a character set name. Values for TEXT can be NLSRTL names that resolve to sets other than the database character set or the national character set.

If the value CHAR_CS is entered for TEXT, the function returns the ID of the server's database character set. If the value NCHAR_CS is entered for TEXT, the function returns the ID of the server's national character set. The function returns NULL if TEXT is not a recognized name. The value for TEXT must be entered in all uppercase.

Returning the Length of an NCHAR Column

NLS_CHARSET_DECL_LEN(BYTECNT, CSID) returns the declaration length (in number of characters) for an NCHAR column. The BYTECNT argument is the byte length of the column. The CSID argument is the character set ID of the column.

Partitioned Tables and Indexes

String comparison for partition VALUES LESS THAN collation for DDL and DML always follows BINARY order.

Controlling an ORDER BY Clause

If a linguistic sorting sequence is in use, then NLSSORT is used implicitly on each character item in the ORDER BY clause. As a result, the sort mechanism (linguistic or binary) for an ORDER BY is transparent to the application. However, if the NLSSORT function is explicitly specified for a character item in an ORDER BY item, then the implicit NLSSORT is not done.

In other words, the NLSSORT linguistic replacement is only applied once, not twice. The NLSSORT function is generally not needed in an ORDER BY clause when the default sort mechanism is a linguistic sort. However, when the default sort mechanism is BINARY, then a query such as:

SELECT ENAME FROM EMP
ORDER BY ENAME

will use a binary sort. A German linguistic sort can be obtained using:

SELECT ENAME FROM EMP
ORDER BY NLSSORT(ENAME, 'NLS_SORT = GERMAN')

Obsolete NLS Data

Prior to Oracle Server release 7.2, when a character set was renamed the old name was usually supported along with the new name for several releases after the change. Beginning with release 7.2, the old names are no longer supported. Table 4-4 lists the affected character sets. If you reference any of these character sets in your code, please replace them with their new name:

Table 4-4 New Names for Obsolete NLS Data Character Sets
Old Name   New Name  

AR8MSAWIN

 

AR8MSWIN1256

 

JVMS

 

JA16VMS

 

JEUC

 

JA16EUC

 

SJIS

 

JA16SJIS

 

JDBCS

 

JA16DBCS

 

KSC5601

 

KO16KSC5601

 

KDBCS

 

KO16DBCS

 

CGB2312-80

 

ZHS16CGB231280

 

CNS 11643-86

 

ZHT32EUC

 

ZHT32CNS1164386

 

ZHT32EUC

 

TSTSET2

 

JA16TSTSET2

 

TSTSET

 

JA16TSTSET

 

Character set CL8MSWINDOW31 has been de-supported. The newer character set CL8MSWIN1251 is actually a duplicate of CL8MSWINDOW31 and includes some characters omitted from the earlier version. Change any usage of CL8MSWINDOW31 to CL8MSWIN1251 instead.

Unicode (UTF-8) Support

Unicode has two major encoding schemes: UCS-2 and UTF-8. UCS-2 is a two-byte fixed-width format; UTF-8 is a multi-byte format with variable width. Oracle8 provides support for the UTF-8 format because this enhancement is transparent to clients who already provide support for multi-byte character sets.

The character set name for UTF-8 is AL24UTFFSS for UNICODE Version 1.1 and UTF8 for UNICODE Version 2.0. Conversion between UTF-8 and other existing character sets is provided in this release of Oracle Server. Conversion between UTF-8 and single-byte character sets is performed through an internal number matching mechanism; conversion between UTF-8 and multi-byte character sets is performed with conversion functions and tables.

Clients should be aware that UTF8 is now officially supported as a new character set. Since UTF8 is the UTF-8 encoding for UNICODE Version 2.0, it is recommended for use in UNICODE support. The encoding scheme of UTF8 is very similar to some existing character sets, thus no major impact on existing products is expected.

Note: UNICODE no longer supports the encoding scheme UTF-2. UTF-8 replaces UTF-2.

NLS Data

This section lists supported languages, territories, storage character sets, linguistic definitions, and calendars.

You can also obtain information about supported character sets, languages, territories, and sorting orders by querying the dynamic data view V$NLS_VALID_VAUES. For more information on the data which can be returned by this view, see "V$NLS_VALID_VALUES" on page 3-60.

Supported Languages

Table 4-5 lists the 46 languages supported by the Oracle Server.

Table 4-5 Oracle Supported Languages
Abbreviation   Name  

us

 

AMERICAN

 

ar

 

ARABIC

 

bn

 

BENGALI

 

ptb

 

BRAZILIAN PORTUGUESE

 

bg

 

BULGARIAN

 

frc

 

CANADIAN FRENCH

 

ca

 

CATALAN

 

hr

 

CROATIAN

 

cs

 

CZECH

 

dk

 

DANISH

 

nl

 

DUTCH

 

eg

 

EGYPTIAN

 

gb

 

ENGLISH

 

et

 

ESTONIAN

 

sf

 

FINNISH

 

f

 

FRENCH

 

din

 

GERMAN DIN

 

d

 

GERMAN

 

el

 

GREEK

 

iw

 

HEBREW

 

hu

 

HUNGARIAN

 

is

 

ICELANDIC

 

in

 

INDONESIAN

 

i

 

ITALIAN

 

ja

 

JAPANESE

 

ko

 

KOREAN

 

esa

 

LATIN AMERICAN SPANISH

 

lv

 

LATVIAN

 

lt

 

LITHUANIAN

 

ms

 

MALAY

 

esm

 

MEXICAN SPANISH

 

n

 

NORWEGIAN

 

pl

 

POLISH

 

pt

 

PORTUGUESE

 

ro

 

ROMANIAN

 

ru

 

RUSSIAN

 

zhs

 

SIMPLIFIED CHINESE

 

sk

 

SLOVAK

 

sl

 

SLOVENIAN

 

e

 

SPANISH

 

s

 

SWEDISH

 

th

 

THAI

 

zht

 

TRADITIONAL CHINESE

 

tr

 

TURKISH

 

uk

 

UKRAINIAN

 

vn

 

VIETNAMESE

 

Supported Territories

Table 4-6 lists the 67 territories supported by the Oracle Server.

Table 4-6 Oracle Supported Territories
Abbreviation   Name  

dz

 

ALGERIA

 

us

 

AMERICA

 

at

 

AUSTRIA

 

bh

 

BAHRAIN

 

bd

 

BANGLADESH

 

br

 

BRAZIL

 

bg

 

BULGARIA

 

ca

 

CANADA

 

cat

 

CATALONIA

 

cn

 

CHINA

 

cis

 

CIS

 

hr

 

CROATIA

 

cz

 

CZECH REPUBLIC

 

cs

 

CZECHOSLOVAKIA

 

dk

 

DENMARK

 

dj

 

DJIBOUTI

 

eg

 

EGYPT

 

ee

 

ESTONIA

 

fi

 

FINLAND

 

fr

 

FRANCE

 

de

 

GERMANY

 

gr

 

GREECE

 

hk

 

HONG KONG

 

hu

 

HUNGARY

 

is

 

ICELAND

 

id

 

INDONESIA

 

iq

 

IRAQ

 

il

 

ISRAEL

 

it

 

ITALY

 

jp

 

JAPAN

 

jo

 

JORDAN

 

kr

 

KOREA

 

kw

 

KUWAIT

 

lv

 

LATVIA

 

lb

 

LEBANON

 

ly

 

LIBYA

 

lit

 

LITHUANIA

 

my

 

MALAYSIA

 

mr

 

MAURITANIA

 

mx

 

MEXICO

 

ma

 

MOROCCO

 

no

 

NORWAY

 

om

 

OMAN

 

pl

 

POLAND

 

pt

 

PORTUGAL

 

qa

 

QATAR

 

ro

 

ROMANIA

 

sa

 

SAUDI ARABIA

 

sk

 

SLOVAKIA

 

si

 

SLOVENIA

 

so

 

SOMALIA

 

es

 

SPAIN

 

sd

 

SUDAN

 

se

 

SWEDEN

 

ch

 

SWITZERLAND

 

sy

 

SYRIA

 

tw

 

TAIWAN

 

th

 

THAILAND

 

nl

 

THE NETHERLANDS

 

tn

 

TUNISIA

 

tr

 

TURKEY

 

ua

 

UKRAINE

 

ae

 

UNITED ARAB EMIRATES

 

gb

 

UNITED KINGDOM

 

vn

 

VIETNAM

 

ye

 

YEMEN

 

cy

 

CYPRUS

 

Storage Character Sets

Table 4-7 lists the 180 storage character sets supported by the Oracle Server.

Table 4-7 Storage Character Sets
Name   Description  

US7ASCII

 

ASCII 7-bit American

 

WE8DEC

 

DEC 8-bit West European

 

WE8HP

 

HP LaserJet 8-bit West European

 

US8PC437

 

IBM-PC Code Page 437 8-bit American

 

WE8EBCDIC37

 

EBCDIC Code Page 37 8-bit West European

 

WE8EBCDIC500

 

EBCDIC Code Page 500 8-bit West European

 

WE8EBCDIC285

 

EBCDIC Code Page 285 8-bit West European

 

WE8PC850

 

IBM-PC Code Page 850 8-bit West European

 

D7DEC

 

DEC VT100 7-bit German

 

F7DEC

 

DEC VT100 7-bit French

 

S7DEC

 

DEC VT100 7-bit Swedish

 

E7DEC

 

DEC VT100 7-bit Spanish

 

SF7ASCII

 

ASCII 7-bit Finnish

 

NDK7DEC

 

DEC VT100 7-bit Norwegian/Danish

 

I7DEC

 

DEC VT100 7-bit Italian

 

NL7DEC

 

DEC VT100 7-bit Dutch

 

CH7DEC

 

DEC VT100 7-bit Swiss (German/French)

 

YUG7ASCII

 

ASCII 7-bit Yugoslavian

 

SF7DEC

 

DEC VT100 7-bit Finnish

 

TR7DEC

 

DEC VT100 7-bit Turkish

 

IW7IS960

 

Israeli Standard 960 7-bit Latin/Hebrew

 

IN8ISCII

 

Multiple-Script Indian Standard 8-bit Latin/Indian Languages

 

WE8ISO8859P1

 

ISO 8859-1 West European

 

EE8ISO8859P2

 

ISO 8859-2 East European

 

SE8ISO8859P3

 

ISO 8859-3 South European

 

NEE8ISO8859P4

 

ISO 8859-4 North and North-East European

 

CL8ISO8859P5

 

ISO 8859-5 Latin/Cyrillic

 

AR8ISO8859P6

 

ISO 8859-6 Latin/Arabic

 

EL8ISO8859P7

 

ISO 8859-7 Latin/Greek

 

IW8ISO8859P8

 

ISO 8859-8 Latin/Hebrew

 

WE8ISO8859P9

 

ISO 8859-9 West European & Turkish

 

NE8ISO8859P10

 

ISO 8859-10 North European

 

TH8TISASCII

 

Thai Industrial Standard 620-2533 - ASCII 8-bit

 

TH8TISEBCDIC

 

Thai Industrial Standard 620-2533 - EBCDIC 8-bit

 

BN8BSCII

 

Bangladesh National Code 8-bit BSCII

 

VN8VN3

 

VN3 8-bit Vietnamese

 

WE8NEXTSTEP

 

NeXTSTEP PostScript 8-bit West European

 

AR8EBCDICX

 

EBCDIC XBASIC Server 8-bit Latin/Arabic

 

EL8DEC

 

DEC 8-bit Latin/Greek

 

TR8DEC

 

DEC 8-bit Turkish

 

WE8EBCDIC37C

 

EBCDIC Code Page 37 8-bit Oracle/c

 

WE8EBCDIC500C

 

EBCDIC Code Page 500 8-bit Oracle/c

 

IW8EBCDIC424

 

EBCDIC Code Page 424 8-bit Latin/Hebrew

 

TR8EBCDIC1026

 

EBCDIC Code Page 1026 8-bit Turkish

 

WE8EBCDIC871

 

EBCDIC Code Page 871 8-bit Icelandic

 

WE8EBCDIC284

 

EBCDIC Code Page 284 8-bit Latin American/Spanish

 

EEC8EUROASCI

 

EEC Targon 35 ASCI West European / Greek

 

EEC8EUROPA3

 

EEC EUROPA3 8-bit West European/Greek

 

LA8PASSPORT

 

German Government Printer 8-bit All-European Latin

 

BG8PC437S

 

IBM-PC Code Page 437 8-bit (Bulgarian Modification)

 

EE8PC852

 

IBM-PC Code Page 852 8-bit East European

 

RU8PC866

 

IBM-PC Code Page 866 8-bit Latin/Cyrillic

 

RU8BESTA

 

BESTA 8-bit Latin/Cyrillic

 

IW8PC1507

 

IBM-PC Code Page 1507/862 8-bit Latin/Hebrew

 

RU8PC855

 

IBM-PC Code Page 855 8-bit Latin/Cyrillic

 

TR8PC857

 

IBM-PC Code Page 857 8-bit Turkish

 

CL8MACCYRILLIC

 

Mac Client 8-bit Latin/Cyrillic

 

CL8MACCYRILLICS

 

Mac Server 8-bit Latin/Cyrillic

 

WE8PC860

 

IBM-PC Code Page 860 8-bit West European

 

IS8PC861

 

IBM-PC Code Page 861 8-bit Icelandic

 

EE8MACCES

 

Mac Server 8-bit Central European

 

EE8MACCROATIANS

 

Mac Server 8-bit Croatian

 

TR8MACTURKISHS

 

Mac Server 8-bit Turkish

 

IS8MACICELANDICS

 

Mac Server 8-bit Icelandic

 

EL8MACGREEKS

 

Mac Server 8-bit Greek

 

IW8MACHEBREWS

 

Mac Server 8-bit Hebrew

 

EE8MSWIN1250

 

MS Windows Code Page 1250 8-bit East European

 

CL8MSWIN1251

 

MS Windows Code Page 1251 8-bit Latin/Cyrillic

 

ET8MSWIN923

 

MS Windows Code Page 923 8-bit Estonian

 

BG8MSWIN

 

MS Windows 8-bit Bulgarian Cyrillic

 

EL8MSWIN1253

 

MS Windows Code Page 1253 8-bit Latin/Greek

 

IW8MSWIN1255

 

MS Windows Code Page 1255 8-bit Latin/Hebrew

 

LT8MSWIN921

 

MS Windows Code Page 921 8-bit Lithuanian

 

TR8MSWIN1254

 

MS Windows Code Page 1254 8-bit Turkish

 

WE8MSWIN1252

 

MS Windows Code Page 1252 8-bit West European

 

BLT8MSWIN1257

 

MS Windows Code Page 1257 8-bit Baltic

 

D8EBCDIC273

 

EBCDIC Code Page 273/1 8-bit Austrian German

 

I8EBCDIC280

 

EBCDIC Code Page 280/1 8-bit Italian

 

DK8EBCDIC277

 

EBCDIC Code Page 277/1 8-bit Danish

 

S8EBCDIC278

 

EBCDIC Code Page 278/1 8-bit Swedish

 

EE8EBCDIC870

 

EBCDIC Code Page 870 8-bit East European

 

CL8EBCDIC1025

 

EBCDIC Code Page 1025 8-bit Cyrillic

 

F8EBCDIC297

 

EBCDIC Code Page 297 8-bit French

 

IW8EBCDIC1086

 

EBCDIC Code Page 1086 8-bit Hebrew

 

CL8EBCDIC1025X

 

EBCDIC Code Page 1025 (Modified) 8-bit Cyrillic

 

N8PC865

 

IBM-PC Code Page 865 8-bit Norwegian

 

BLT8CP921

 

Latvian Standard LVS8-92(1) Windows/Unix 8-bit Baltic

 

LV8PC1117

 

IBM-PC Code Page 1117 8-bit Latvian

 

LV8PC8LR

 

Latvian Version IBM-PC Code Page 866 8-bit Latin/Cyrillic

 

BLT8EBCDIC1112

 

EBCDIC Code Page 1112 8-bit Baltic Multilingual

 

LV8RST104090

 

IBM-PC Alternative Code Page 8-bit Latvian (Latin/Cyrillic)

 

CL8KOI8R

 

RELCOM Internet Standard 8-bit Latin/Cyrillic

 

BLT8PC775

 

IBM-PC Code Page 775 8-bit Baltic

 

F7SIEMENS9780X

 

Siemens 97801/97808 7-bit French

 

E7SIEMENS9780X

 

Siemens 97801/97808 7-bit Spanish

 

S7SIEMENS9780X

 

Siemens 97801/97808 7-bit Swedish

 

DK7SIEMENS9780X

 

Siemens 97801/97808 7-bit Danish

 

N7SIEMENS9780X

 

Siemens 97801/97808 7-bit Norwegian

 

I7SIEMENS9780X

 

Siemens 97801/97808 7-bit Italian

 

D7SIEMENS9780X

 

Siemens 97801/97808 7-bit German

 

WE8GCOS7

 

Bull EBCDIC GCOS7 8-bit West European

 

EL8GCOS7

 

Bull EBCDIC GCOS7 8-bit Greek

 

US8BS2000

 

Siemens 9750-62 EBCDIC 8-bit American

 

D8BS2000

 

Siemens 9750-62 EBCDIC 8-bit German

 

F8BS2000

 

Siemens 9750-62 EBCDIC 8-bit French

 

E8BS2000

 

Siemens 9750-62 EBCDIC 8-bit Spanish

 

DK8BS2000

 

Siemens 9750-62 EBCDIC 8-bit Danish

 

S8BS2000

 

Siemens 9750-62 EBCDIC 8-bit Swedish

 

WE8BS2000

 

Siemens EBCDIC.DF.04 8-bit West European

 

CL8BS2000

 

Siemens EBCDIC.EHC.LC 8-bit Cyrillic

 

WE8BS2000L5

 

Siemens EBCDIC.DF.04.L5 8-bit West European/Turkish

 

WE8DG

 

DG 8-bit West European

 

WE8NCR4970

 

NCR 4970 8-bit West European

 

WE8ROMAN8

 

HP Roman8 8-bit West European

 

EE8MACCE

 

Mac Client 8-bit Central European

 

EE8MACCROATIAN

 

Mac Client 8-bit Croatian

 

TR8MACTURKISH

 

Mac Client 8-bit Turkish

 

IS8MACICELANDIC

 

Mac Client 8-bit Icelandic

 

EL8MACGREEK

 

Mac Client 8-bit Greek

 

IW8MACHEBREW

 

Mac Client 8-bit Hebrew

 

US8ICL

 

ICL EBCDIC 8-bit American

 

WE8ICL

 

ICL EBCDIC 8-bit West European

 

WE8ISOICLUK

 

ICL special version ISO8859-1

 

WE8MACROMAN8

 

Mac Client 8-bit Extended Roman8 West European

 

WE8MACROMAN8S

 

Mac Server 8-bit Extended Roman8 West European

 

TH8MACTHAI

 

Mac Client 8-bit Latin/Thai

 

TH8MACTHAIS

 

Mac Server 8-bit Latin/Thai

 

HU8CWI2

 

Hungarian 8-bit CWI-2

 

EL8PC437S

 

IBM-PC Code Page 437 8-bit (Greek modification)

 

EL8EBCDIC875

 

EBCDIC Code Page 875 8-bit Greek

 

EL8PC737

 

IBM-PC Code Page 737 8-bit Greek/Latin

 

LT8PC772

 

IBM-PC Code Page 772 8-bit Lithuanian (Latin/Cyrillic)

 

LT8PC774

 

IBM-PC Code Page 774 8-bit Lithuanian (Latin)

 

EL8PC869

 

IBM-PC Code Page 869 8-bit Greek/Latin

 

EL8PC851

 

IBM-PC Code Page 851 8-bit Greek/Latin

 

CDN8PC863

 

IBM-PC Code Page 863 8-bit Canadian French

 

HU8ABMOD

 

Hungarian 8-bit Special AB Mod

 

AR8ASMO8X

 

ASMO Extended 708 8-bit Latin/Arabic

 

AR8NAFITHA711

 

Nafitha Enhanced 711 Server 8-bit Latin/Arabic

 

AR8SAKHR707

 

SAKHR 707 Server 8-bit Latin/Arabic

 

AR8MUSSAD768

 

Mussa'd Alarabi/2 768 Server 8-bit Latin/Arabic

 

AR8ADOS710

 

Arabic MS-DOS 710 Server 8-bit Latin/Arabic

 

AR8ADOS720

 

Arabic MS-DOS 720 Server 8-bit Latin/Arabic

 

AR8APTEC715

 

APTEC 715 Server 8-bit Latin/Arabic

 

AR8MSAWIN

 

MS Windows Code Page 1256 8-Bit Latin/Arabic

 

AR8MSWIN1256

 

MS Windows Code Page 1256 8-Bit Latin/Arabic

 

AR8NAFITHA721

 

Nafitha International 721 Server 8-bit Latin/Arabic

 

AR8SAKHR706

 

SAKHR 706 Server 8-bit Latin/Arabic

 

LA8ISO6937

 

ISO 6937 8-bit Coded Character Set for Text Communication

 

US8NOOP

 

No-op character set prohibiting conversions

 

JA16VMS

 

JVMS 16-bit Japanese

 

JA16EUC

 

EUC 16-bit Japanese

 

JA16EUCYEN

 

EUC 16-bit Japanese with `\' mapped to the Japanese yen character

 

JA16SJIS

 

Shift-JIS 16-bit Japanese

 

JA16DBCS

 

IBM DBCS 16-bit Japanese

 

JA16SJISYEN

 

Shift-JIS 16-bit Japanese with `\' mapped to the Japanese yen character

 

JA16EBCDIC930

 

IBM DBCS Code Page 290 16-bit Japanese

 

JA16MACSJIS

 

Mac client Shift-JIS 16-bit Japanese

 

KO16KSC5601

 

KSC5601 16-bit Korean

 

KO16DBCS

 

IBM DBCS 16-bit Korean

 

KO16KSCCS

 

KSCCS 16-bit Korean

 

ZHS16CGB231280

 

CGB2312-80 16-bit Simplified Chinese

 

ZHS16MACCGB231280

 

Mac client CGB2312-80 16-bit Simplified Chinese

 

ZHS16GBK

 

Windows95 16-bit PRC version Chinese character set

 

ZHS16DBCS

 

EBCDIC 16-bit Simplified Chinese character set

 

ZHT32EUC

 

EUC 32-bit Traditional Chinese

 

ZHT32SOPS

 

SOPS 32-bit Traditional Chinese

 

ZHT16DBT

 

Taiwan Taxation 16-bit Traditional Chinese

 

ZHT32TRIS

 

TRIS 32-bit Traditional Chinese

 

ZHT16DBCS

 

IBM DBCS 16-bit Traditional Chinese

 

ZHT16BIG5

 

BIG5 16-bit Traditional Chinese

 

ZHT16CCDC

 

HP CCDC 16-bit Traditional Chinese

 

AL24UTFFSS

 

Unicode 1.1 UTF-8 character set

 

UTF8

 

Unicode 2.0 UTF-8 character set

 

JA16EUCFIXED

 

16-bit Japanese. A fixed-width subset of JA16EUC (contains only the 2-byte characters of JA16EUC). Contains no 7- or 8-bit ASCII characters

 

JA16SJISFIXED

 

SJIS 16-bit Japanese. A fixed-width subset of JA16SJIS (contains only the 2-byte characters of JA16JIS). Contains no 7- or 8-bit ASCII characters

 

JA16DBCSFIXED

 

16-bit only JA16DBCS. A fixed-width subset of JA16DBCS which has only 16-bit (double byte character set-DBCS) characters. Contains no 7- or 8-bit ASCII characters

 

ZHT32TRISFIXED

 

TRIS 32-bit Fixed-width Traditional Chinese

 

General Concepts for Customized Character Sets

When you order an Oracle distribution set, a default set of NLS data objects is included. Some NLS data is customizable. It is possible to extend Oracle's character set definition files to add user defined characters to an existing Oracle character set.

Character set information and encoding are defined in text files. These character set definition text files contain descriptions of a character set and are specified so that a database administrator can modify or create a new character set easily. All characters are defined in terms of Unicode 2.0 code points. That is, each character is defined as a Unicode 2.0 character code value. Conversion between character sets is done by using Unicode as the intermediate form.

Once a character set definition file is created, it must be `compiled' into platform-specific binary files that can be dynamically loaded into memory at runtime. The NLS Data Installation Utility (lxinst) described in Oracle8 Utilities allows you to convert and install character set definition text files into binary format, and merge it into an NLS data object set.

Be aware that this procedure does not ensure any of the following:

For details, see Oracle8 Utilities.

Linguistic Definitions

Linguistic definitions define linguistic cases for particular languages. Extended linguistic definitions include some special linguistic cases for the language. Table 4-8 lists the 63 linguistic definitions supported by the Oracle Server.

Table 4-8 Linguistic Definitions
Basic Name   Extended Name  

ARABIC

 

--

 

ASCII7

 

--

 

BENGALI

 

--

 

BULGARIAN

 

--

 

CANADIAN FRENCH

 

--

 

CATALAN

 

XCATALAN

 

CROATIAN

 

XCROATIAN

 

CZECH

 

XCZECH

 

DANISH

 

XDANISH

 

DUTCH

 

XDUTCH

 

EEC_EURO

 

--

 

ESTONIAN

 

--

 

FINNISH

 

--

 

FRENCH

 

XFRENCH

 

GERMAN

 

XGERMAN

 

GERMAN_DIN

 

XGERMAN_DIN

 

GREEK

 

--

 

HEBREW

 

--

 

HUNGARIAN

 

XHUNGARIAN

 

ICELANDIC

 

--

 

ITALIAN

 

--

 

JAPANESE

 

--

 

LATIN

 

--

 

LATVIAN

 

--

 

LITHUANIAN

 

--

 

MALAY

 

--

 

NORWEGIAN

 

--

 

POLISH

 

--

 

PUNCTUATION

 

XPUNCTUATION

 

ROMANIAN

 

--

 

RUSSIAN

 

--

 

SLOVAK

 

XSLOVAK

 

SLOVENIAN

 

XSLOVENIAN

 

SPANISH

 

XSPANISH

 

SWEDISH

 

--

 

SWISS

 

XSWISS

 

THAI_DICTIONARY

 

--

 

THAI_TELEPHONE

 

--

 

TURKISH

 

XTURKISH

 

UKRAINIAN

 

--

 

VIETNAMESE

 

--

 

WEST_EUROPEAN

 

XWEST_EUROPEAN

 

INDONESIAN

 

--

 

ARABIC_MATCH

 

--

 

ARABIC_ABJ_SORT

 

--

 

ARABIC_ABJ_MATCH

 

--

 

EEC_EUROPA3

 

--

 

Calendar Systems

Table 4-9 lists the calendar systems supported by the Oracle Server.

Table 4-9 NLS Supported Calendars
Name   Abbreviation   Character Set Texts   Default Format  

Japanese Imperial

 

ji

 

JA16EUC

 

EEYY"\307\257"MM"\267\356"DD"\306\374"

 

ROC Official

 

co

 

ZHT32EUC

 

EEyy"\310\241 "mm"\305\314"dd"\305\312"

 

Thai Buddha

 

tb

 

TH8TISASCII

 

"\307\321\27 1\267\325\350" dd month EE yyyy

 

Persian

 

pg

 

AR8ASMO8X

 

DD Month YYYY

 

Arabic Hijrah

 

hl

 

AR8ISO8859P6

 

DD Month YYYY

 

English Hijrah

 

hl

 

AR8ISO8859P6

 

DD Month YYYY

 

Note: By default, the Gregorian system is used.




Prev

Next
Oracle
Copyright © 1997 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index