Expires 10/22/2001 International Language Bridge (ILB) For Mark Felton Implementing Language Free Services draft-felton-universal-language-01.txt 1.1 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. 1.2 Copyright Notice Temporary Copyright (C) Mark Felton (2001). All Rights Reserved // Future Copyright (C) The Internet Society (2001). All Rights Reserved. Mark Felton markf@scicom.alphacdc.com 1.3 Table Of Contents Abstract Terminology UL Components Unicode (Universal Code) Fixed Components (FC) Type Designator (TD) - Type Element (TE) Variable Meaning Function (VM_function) Language Forcing (LF) Linking Element (LE) User Defined Components (UDC) Basic Fixed Components Pictorial Representations (PR) Syntax versus Content Syntax Component UL Syntax Position Dependent Syntax (PDS) User Defined Syntax (UDS) Degree of Specificity Multiple Vocabularies Basic Translators (BT) and Filter Translators (FT) Some Initial Benefits of UL Steps in Creating UL Short and Long Term Goals For UL Fonts & Sounds UL Type Dictionaries (UTD) Host Language Interface Multi-lingual Vocabulary Issues /* New Sections */ Neutrabet Phrase Templates Universal Language Parsing ` Java Code 1.4 Abstract The existence of language and culture, creates an enormous blockage for the World Wide Web. While we communicate well internationally using pictures, we have limited communications when we attempt to build semantic bridges. In general, our approach to this has been to try to force the other guy to learn our language. This has had only limited success. Many people, quite rightly, resent a vision of American as the only true universal language. In many ways we find science fiction images, like the Star Trek, universal language decoder, much more appealing. This is because we know intuitively that language is a central part of culture and should be maintained, not destroyed. Various solutions have been offered. Probably the most frequently discussed is the concept of translators. In this image of reality, a black box device takes one language and translates it into the other language. This may include a variety of methods including: * text (L1) to text (L2) * speech (L1) to text (L2) * text (L1) to speech (L2) * speech (L1) to speech (L2) There are also intermediary plans * text (LI) to text (L2) to speech (L2) * speech (L1) to text (L1) to text (L2) * speech (L1) to text (L1) to text (L2) to speech (L2) These plans all have one thing in common. The idea that bi-directional translation can occur between the various languages. Unfortunately, this is not often the case. Many concepts in one language do not translate easily to another language. In addition the syntax of one language may result in poorly formulated sentence when translated across the language barrier. The most success in this area will result when a single human being is used as an intermediary between the two language participants. This translator possesses knowledge of both languages. They use their natural linguistic skills to provide approximations of the meaning in the two languages. Language translation also requires sophisticated knowledge of the two languages. The translator is often fluent in one language (their native language) and semi fluent in the second language. This means that the translation in one direction will have more quality than in the reverse direction. For example, I have more vocabulary available when I translate to English than when I translate from English to one of my secondary languages. Is there an alternative solution. In this proposal, it is suggested that an intermediary language is needed to support real success in international linguistic communications. The present plan requires our translators to go from English to German; English to Chinese; English to Japanese; English to Vietnamese; then German to English; Chinese to English; etc. If we take the number of languages in the world and then assume each will talk to each other language, the formula for the number of required translators is: N * (N-1) where is N is the number of languages. With ten languages, this comes to 90 translators. With 100 languages, this comes to 9900 translators. How can this be avoided? The solution is to provide a universal language (UL) which acts as an intermediary between each language. Each language must provide a translation to and from the UL. This means that for each language there are two translators, one from UL and one to UL. With 100 languages the requirement is 200 translators rather than 9900. The use of a UL has a number of immediate advantages. First, the UL can be constructed with a limited number of concepts. Rather than providing all the nuances of every language, the UL is restricted to a limited subset of concepts. These can grow as new needs emerge. The UL also minimizes the requirements for knowing who is going to receive a message. An item can be posted in UL on the World Wide Web. The browser is handed the job of translation from UL to the users Host Language (HL). The UL to HL translator can be provided as a plug-in, java applet or built in capability on the browser. Email has similar advantages. A person wishing to send important information between multiple international facilities can do so without concern about recipient language base. Rather than one translation for each language that will receive the message, a single translation to UL will suffice. 1.5 TERMINOLOGY 1. BE - Base Element(s) - This is the Unicode along with definitions that make up the UL. The Base Elements can include Position Dependent Syntax (PDS) in UL. The Standard Base Elements are the UL used by all users. SBE does not include Meaning Components for User Defined Components. Only the UDC place holders are included. 2. FC - Fixed Component(s) - A Unicode component in the UL that has a fixed meaning across languages. 3. G-UL - Generic Universal Language - The base of the UL. The G-UL is distributed to all users along with translations to and from all known HL. 4. HL - Host Language(s) - The native language which the UL will translate to and from. Host languages are normally our native languages (English, French, Chinese, Japanese). However, there is nothing that restricts a Host Language. It would be possible to add a made up host language, such as Piglatin. 5. ILB - International Language Bridge - The name used to designate the collection of all concepts, requirements and other factors relating to the creation, dissemination and use of a Universal Language. 6. Jumbo - A collection of multiple UL Unicode that is used frequently enough by a group that it justifies a UDC as a short cut. 7. LE - Linking Element(s) - Used to force a connection of a TD to a FC. This is needed only where there may be confusion about which FC is being modified by the TD. LE should rarely be needed. This is because the UL syntax will normally identify the required logic. 8. LF - Language Forcing - A method to inline a specific HL. A LF will force the translator to use a specific language whether it is or is not the users HL. 9. MC - Meaning Component(s) - This is the global meaning that must be conveyed by a UL Group. 10. MV - Multiple Vocabularies - A method of providing Core Vocabulary along with Secondary Vocabularies (SV) to support quicker downloading and end user specialization. The CV is distributed to all users. The SV are provided only as needed. MV may have overlap, i.e. the same Unicode with the same MC may appear in more than one MV. However, when there is overlap, the Meaning Component (MC) must be the same where the Unicode is the same. This differs from UDC. 11. PR - Pictorial Representation(s) - A method of creating visual clues to the meaning of a UL Unicode. 12. SBE - Standard Base Element(s) - UL without UDC Meaning Components. 13. SC - Syntax Component(s) - A UL-G that is used to convey syntactic meaning, e.g. this is a question. 14. TD - Type Designator(s) - A Unicode component in the UL that is always followed by a variable field. TD are used for specific information with a range, e.g. tall or short; amount of money; distance. 15. TE - Type Element(s) - Associated with a TD. The TE is the variable quantity relative to the specific TD. Multiple TE can be associated with a single TD. 16. UDC - User Defined Component(s) - The UDC allows a group of users to customize the UL. This allows the core UL to be restricted. A Generic UL provides the base construct. The UDC adds the specialization. UDC are provided for each of the types available in UL. 17. UL - Universal Language - The general language used as an intermediary between all Host Languages. 18. UL-G - Universal Language Group(s) - A group of one or more UL Unicode that builds a Meaning Component. Many UL-Gs require only a single UL. A UL-G that is frequently used and which is created from multiple may become a Jumbo Group. When this happens, it is a prime candidate for grouping into a UDC. 19. UL-Phrase - A UL Phrase is a group of UL-G that combine to create enough meaning to support all HL translators. Depending on the context, a UL-Phrase may be a little as a single Unicode or made up of a string of UL-Gs. 20. Unicode - Universal Code - A double byte code that already exists in languages such as Java. Unicode is provided to create a universal method that can provide any language, native or other on the world wide web and in programs. Unicode is a recognized standard. 21. VM_function - Variable Meaning Function(s) - A VM function is used in conjunction with a Type Designator. It provides a HL specific way to create the variable meanings from the range of Unicode. 1.5A New In Release 0.1 22. Language Parsing - The rules used to go between UL and NL. 23. Neutrabet - An abstract alphabet developed for creating words in the Universal Language. 24. Phrase Template - A method of determining what type of phrase is to be constructed from the UL when going the NL or visa versa. 1.6 UL Components Unicode (Universal Code) The Unicode standard allows 16 bit binary code to represent a language. A language range in the Unicode standard is used to signify which language is being represented. There are numerous expansion ranges in the Unicode standard. UL can easily be fitted into this expansion. For the purpose of this document Unicode references are shown as xx##, e.g. xx01 or xx07. The xx represents the language range identifier which is not presently specified for UL. The ## represents specific numbers used to identify a specific member in the UL Unicode. While it is strongly suggested that UL eventually be added to Unicode, this is not a necessary condition for beginning. One can easily envision a mime or other method to identify UL in a document. If this were the case UL could overlap Unicode. This means that the initial development can be done in any integer range. The unicode source is located at: http://www.unicode.org/ The required PDF files are at: http://www.unicode.org/charts/ 1.7 Fixed Components (FC) A fixed component is one which has the same Meaning Component (MC) in all contexts. A meaning component is not tied to a language. For example, "food" is a meaning component. It translated differently into different languages, but it has the same meaning across the languages. While it may seem logical to equate Meaning Component with nouns in the English language. This is not necessarily the case. For example "send an email" might be an useful MC. Other examples are: * call me by phone * available for a conference call An FC should be broad. Specificity must be defined though UDC. So "computer" might make sense as an FC, "Dell Computer" would not. Certain parts of language are not needed in UL. For example words like "the, a or an" are not needed. These must be added by the HL translator. There are some FCs that are needed but should only be available in a single form. For example negation uses multiple words in many languages. The rules for UL must be clear and universal with respect to negation. Questions is another area where languages may vary significantly. A question is created with a Syntax Component (SC). 1.8 Type Designator (TD) - Type Element (TE) A type designator is used to signal that the next item is of a particular type. Typical type designators might include: * Money/Quantity * Size/Unit * Emotional Component * Proper Noun * Address * Phone Number * Temporal/ Past Present Future A Type Designator is followed by a piece of information called the Type Element. The TE may vary across an extremely large range. For example peoples names may vary enormously. The UL to HL translator must provide a method of translating the Variable Information (VI) that follows the TD. For some instances, this is extremely simple. For example a unit conversion from HL dollars to UL monetary units could exist for each language and monetary unit. The same is true of phone numbers that could translate from English numbers to UL numbers. A Chinese client would receive the UL numbers and translate them into Chinese numbers. Each type designator will also include a NULL value option. This will be used to indicate that value is either unavailable in the HL that created the content, or the value is unspecified in this context. The UL to HL translator will provide a reasonable translation for the NULL value. It is definitely worth noting that UL provides a special TE to support verb tense or temporal relationship. When the temporal TE is used, there may be two or more TE used in series. This allows verb conjugation to be handled with minimal UL Unicode. A verb (TD) might have two (or more) TE associated with it. This allows "run, ran, will run, has run, etc." to all be done with a TD-TE. In addition a second TE (TD-TE-TE) will change this from above to "walk, walked, has walked, will walk, etc.". The generic TD is state of motion of a person. The conjugation TE changes the temporal aspects, while the relative TE changes the state from run to walk. With this strategy, numerous English language words are produced with only three Unicode. This should also be true for other languages. The critical part of UL becomes the identification of the fundamental Meaning Components (MC). 1.9 Variable Meaning Function (VM_function) A Variable Meaning Function is a language based rule for adding the variable meaning associated with the TE part of the TD-TE pair. Typical examples of VM_function(s) are: * Numbers - a method of producing a Host Language (HL) specific interpretation of the numeric values from the TE. * Direction - a method of producing a HL specific interpretation of direction, e.g. north, south, up, down, etc. * Emotion - a method of producing a HL specific interpretation of emotional state, e.g. good, bad, terrific, etc. * Temperature - a method of producing a HL specific interpretation of temperature, e.g. hot, cold, freezing, etc. 2.0 Language Forcing (LF) In some instances language overriding might be needed. For example, a monetary web sight might want all transactions represented in dollars even when the surrounding information is translated. 2.1 Linking Element (LE) A linking element is used to connect a type designator (TD) to a fixed element (FE). LE are used when confusion can result as to which FE is being modified. It is unclear at this time if Linking Elements are really needed. A properly designed language syntax may remove the need for LE. 2.2 User Defined Components (UDC) User defined components are placed in a local vocabulary located on the client. For example, Gates Rubber might use the word "rubber" frequently during emails, while Coca Cola might use "beverage". The UDC are defined prior to communications. In the case of a Web page, it would be the job of the Web page provider to produce a list of UDC used on the Web page. These could be downloadable either prior to access or during access. The client would announce it language to the server so the correct subset of UDC would be provided. A UDC may be a full concept rather than a single word. Return to Coca Cola as a potential user, the concept "sell Coca Cola" could be a single UDC. With several fixed components it would be possible to get Do we have a good ad to sell Coca Cola Can you sell Coca Cola in China? What do we need to do to increase our ability to sell Coca Cola? 2.3 Basic Fixed Components The basic fixed components must be selected to provide the most common content used in email and Web pages. We begin by providing general categories that should be universal across languages. * Objects - Stationary things in our world. These are in general what the English language calls nouns, but they are not all nouns. For example in the sentence "The love that I feel.", the word "love" is a noun, but it is not an object. An le rule would be all type designators must precede the basic element they modify. On a computer screen this would allow large ball to map out a space prior to placing the ball on the screen. The same is true for right ball (ball to the right). The difficulty arises with red ball. In this instance the ball needs to be placed on the screen before the red is added. It is unclear whether link elements should be used here or store and wait logic in the language translators. 2.6 Syntax Component In UL, a Syntax Component (SC) is a Unicode used to convey syntactical meaning is a statement. The most clearly defined SC is the question component. An SC->Question will make a UL statement into a question. The location of SCs will be a defined part of the UL language. UL will not contain some of the normal syntax found in languages. For example, there will not be commas. A period SC may be needed to identify completion of a UL sentence. This is an open issue. Since UL can be embedded in other languages, e.g. HTML, it will allow for syntax transitions from outside the UL language. For example, an HTML table of UL information could be sent via email. 2.7 UL Syntax The language of UL will have strict syntax rules. The basic unit is Unicode. A group of Unicode will produce a phrase. Each phrase creates a phrase meaning, i.e. it should provide enough information to create an acceptable statement in any HL. The set of all phrases creates a message. It is also possible to have multiple messages provided the messages are embedded in another language, e.g. multiple messages in UL embedded in HTML. 2.8 Position Dependent Syntax (PDS) Position Dependent Syntax (PDS) allows UL to use the same Unicode to produce different Meaning Components. This occurs when the Unicode is a TD-TE combination. For example the same Unicode can be used for degree of heat and cold as are used for amount of money. In the former case there is a temperature TD while in the later there is a money TD. They are both followed by a Unicode with a possible range of values. TD-TE elements will be created to provide a reasonable range of values. Where more resolution is needed, the User Definable Components (UDC) will be used. Linking Elements can also provide support for these specialized applications. 2.9 User Defined Syntax (UDS) The availability of User Definable Components (UDC) allow a specialized need to define a syntax independent of UL. These applications will be allowed but will not be supported by the UL development team. 3.0 Degree of Specificity Languages are subtle by nature. They allow us to express things in a rich variety of ways. As stated earlier, the UL should be simple, forcing the translators to add richness to the translations. As a simple example, "run, walk and stand" are three different words in the English language. But in UL they could be expressed as three states of a person in motions. The first is Motion + large magnitude. The second is Motion + minimum magnitude. The third is motion plus zero magnitude. If we add to this a Unicode for road, a number of sentences can be created. * walking on the road * running on the road * standing on the road * on the road * moving down the road Of course, if moving on roads was a common part of a groups communications, it would be possible to create a User Defined Component (UDC) to represent movement on a road. This brings out an important point. A UDC can be created by transferring a language specific definition to the end users or it can be created by combining identifiable UL components. In the latter case, the UDC is referred to as a Jumbo UDC. Case 1: UDC#1 => HL#1 -> "text in host language #1 here" : UDC#1 => HL#2 -> "text in host language #2 here" Case 2: UDC#2 => UL -> xx01 xx03 xx17 ... (sequence of UL codes produce a jumbo) 3.1 Multiple Vocabularies UL should support Multiple Vocabularies (MV). Rather than having every UL to HL translator support every possible UL Unicode, it should be possible to define sub-vocabularies that are application specific. For example, there could be a set of highly common codes. Then there could be a second set of business codes and a third set of sports codes. A person wishing to send an email about business would load the common codes and the business codes. For sports discussions load the common codes and sports codes. For a business that provides sport equipment, load all three. When possible, which codes to load can be sent along with the message. . MV will consist of a Core Vocabulary (CV) and many Secondary Vocabularies (SV). There is nothing to stop the MV from having the same UL-G. However, where there is overlap between MV, the Meaning Component (MC) must be the same for overlapping elements. Only UDC (User Defined Components) can have a different Meaning Component for the same Unicode. UDC are not a part of the Multiple Vocabularies. They are an independent area of the UL that is fixed in size and available at all times to the users and translators 3.2 Basic Translators (BT) and Filter Translators (FT) A Basic Translator (BT) takes content to or from a HL (Host Language) and translates to or from UL. A translator will have no knowledge of embedding in other languages. A Filter Translator (FT) will include knowledge of other languages in addition to UL. Some possible FT are: * EMAIL FT - Leaves in place email headers needed for transfer. Replaces other content either UL-HL for HL-UL. * HTML FT - Leaves in place HTML identifiers. Replaces other content either UL-HL or HL-UL. * VIDEO TEXT FT - Text output available along with TV is now common practice. It is used for deaf people, elderly people or to support noisy environments (e.g. aerobics classes). Using UL, it would be possible to support language independence. The Video Text would be available in multiple languages. * MOVIE TEXT FT - Movies require translators to provide subtitles in different languages. A Movie Text FT could allow a single HL, e.g. English, to be translated through UL to other languages. This would allow wider movie distribution around the world. With a properly created translator (TR) a person should be able to take an email in the HL, put it through the HL to UL email filter, then send it to a foreign recipient. The far end recipient would then use a UL-HL Filter Translator (FT) to go from UL to their native host language. 3.3 Some Initial Benefits of UL * With UL a multi-language Web site can be provide with a single UL site. * UL provides a more user friendly interface for international email. * With UL it is possible to take an existing Web site in any HL and run it through an HL for language #1 to UL filter . It can then be passed through a second UL to HL for language #2. The Web page will then appear in the users native language. * With UL it is possible to take an existing email in any HL and run it through an HL for language #1 to UL filter. It can then be passed through a second UL to HL for language #2. The email will then appear in the users native language. 3.4 Steps in Creating UL 1. Identification of high level objects - This is being worked in the present document. 2. Identification of HL language expectations - It may not be possible to do all languages for a prototype. It should be possible to do a subset. This subset should include languages from each of the major continents. For example, a good subset might be English, Chinese, Japanese, Russian, Spanish, Arabic. It is important that the initial work not focus on European only languages because they have a common syntactic structure, use the Arabic alphabet and have numerous other commonalties. 3. Architecture of the UL language. This will require a team of linguistic and computer experts. The goal will be to conceptualize the needed structure to provide a common international base. The architectural analysis will also try to identify the best locus for initial prototyping. For example should it be browser based, server based, integrated into HTTP? Should it be developed as a Java API or as a browser plug-in. Will it initially be a runtime component or less interactive? These and other questions can be worked by the software segment of the architecture team. The linguistic team will focus on aspects specific to the language. How will negation operate? Where will a question operator reside? What constitutes a Meaning Concept across the various languages? How will base and filter translators operate? What is the core UL? What types of vocabularies should be supported? 4. Identification of the Base Elements (BE). The basic core language will need to be identified. Next there will be a need to prototype it and determine how well it translates between different languages. While the original work can be done with a limited expectation, at some point there will be a need to determine how easily it expands. Finally the specialization stages using UDC needs to be tested. 3.5 Short and Long Term Goals For UL Here are some of the possible short and long term uses for a Universal Language: 1. Email - The use of email represents the most frequent method of communications on the Internet. The number of emails sent daily has multiplied at an astronomical rate. As global markets continue to grow a method of rapid translation of email content will greatly expand our global commitments. 2. Web Pages - HTML content has now become an enormous asset for providing information to the public. The ability to provide multilingual content through a single translation mechanism using UL will greatly expand the dissemination of content. This can have important advantages for world wide global responsibility. It can mean that international programs in space exploration, telecommunications, medicine, etc. can share content without language constraints. 3. Firewall Content - Many multinational companies now provide information behind the safety of firewalls. Since they often use standard mechanisms available on the World Wide Web, the use of UL should be invaluable to their private international networking. 4. Other Document Format - The basic mechanisms provided by UL should be expandable to other document formats. These can be public format, e.g. PDF and Word or more private formats such as Framemaker. The UL translations mechanisms would work equally well with attachable documents as with standard interchange mechanisms. 5. Computer Languages - UL can provide significant gains for computer languages attempting to build language independence. UL is built on Unicode. This code has been developed to provide multiple language capability. Java is one example of a language that has done extensive work on Unicode usage. 6. Intermediary For Text to Speech and Speech to Text - While UL is a Unicode textual language, it can be used as an intermediary for the transition from one language to another language in any format. This includes speech, Braille or any other media used to communicate language content. As the state of the art continues to grow, UL could potential serve as the mechanism for real time audio translation. This means it could be use in numerous disciplines including the replacement of movie subtitles, translation of public speeches, etc. While this is clearly a long term goal, the objective is not unreasonable. Setting the stage for future growth in language translation will clearly serve many potential growth areas. 3.6 Fonts & Sounds The Universal Language has no external representation. It exists only as Unicode. Unlike other languages that require character representation, e.g. Kanji, UL is an intermediate language. UL has no sound representation. Again this differs from other languages where pronunciation and confusion due to accents is a critical concern. The fact that UL exists only as an intermediate language has significant advantages. Translation rules need only apply to meaning. By providing strict syntactic rules for UL, translations will require minimal risk of misinterpretation. The primary difficulty will be in determining the linkage from UL to the Native Languages. 3.7 UL Type Dictionaries (UTD) Each of the Unicode types in UL are grouped together into a UL type Dictionary. The UTD required to support UL are: * UTD.FC - The Fixed Component dictionary consisting of all Fixed Component objects in UL. * UTD.TD - The Type Designator dictionary consisting of all Type Designator objects in UL * UTD.LE - The Linking Element dictionary consisting of all Linking Element objects in UL. * UTD.UDC - The User Defined Component dictionary consisting of all User Defined Component objects in UL. * UTD.UL-G - The Universal Language Group dictionary consisting of all Universal Language Groups objects in UL. * UTD.LF - The Language Forcing dictionary consisting of all Language Forcing objects in UL. * UTD.SC - The Syntax Component dictionary consisting of all Syntax Component objects in UL. * UTD.Jumbo - The Jumbo dictionary consisting of all Jumbo objects in UL. 3.8 Host Language Interface The Unicode interpretation of Host Languages will be used to provide a common language base. For example the English equivalent of "cow" is Unicode XX XX XX. The same meaning in Chinese is "xiahu" which is Unicode XX XX. Note that the English cow takes three Unicode while the Chinese requires only two. While UL requires only one. All three require the Unicode fall in the range for the specified language, i.e. HL or UL. 3.9 Multi-lingual Vocabulary Issues A serious issue for UL is the amount of vocabulary required and the nature of Host Language vocabulary. The number of words in the English language is enormous. Other languages share these large sizes. While much of the vocabulary is common across the languages, e.g. many nouns, there are places where vocabulary will result in high degrees of complexity. Some examples of problem areas are: * Words which have multiple meanings in one language but only a single meaning in a second language. For example, "HOT" is an English word for a temperature state and in food for a spicy state. In Spanish, these are two separate words. * Words which may have other nuances in one language that do not exist in another language. In Chinese, there are different words for "first son versus second or greater son". The "first son" has special responsibilities that do no exist in other cultures. Other relatives are also viewed differently than among European cultures. * Words which are taken from other languages. The word "email" is used by other cultures. However, it is frequently pronounced using the characteristics of the host language. A number of tools are available in UL to solve these issues. These include: * UDC - the User Defined Vocabulary - Allowing areas where languages significantly diverge to be covered on a case by case basis. * LF - the use of language forcing - Allowing a HL to be forced into use, e.g. Sputnik. * TD-TE - allowing relative aspects of a language to be included, e.g. the various Chinese relationships could include TE values. A more complex Host Language output would be needed in English to fully explain what is meant. 4.0 Neutrabet In order to form Meaning Components in the Universal Language, a Neutrabet similar to an alphabet will be developed. The Neutrabet will have the following characteristics: 1. A termination Unicode, temporarily designated at XX00. Usage of the termination will explained shortly. 2. Ranges for different types of Meaning Codes. As an initial example, consider concrete objects. These may be represented by N Unicode. The value of N is calculated using the following formula. Let S sub 1 = the subset of Unicode used to create all concrete objects. Let N sub 1 = the number of concrete objects to be represented Let n sub 1 = the number of Unicode required to represent N sub 1 + 1 (XX00) Then N sub 1 <= Sum from x =1 to n sub 1 (Permutations of n sub 1 Unicode taken (x at a time) with replacement. Here is a table of progression n sub 1 N sub 1 1 1 2 2**1 + 2**2 = 6 3 3**1 + 3**2 + 3**3 = 3 + 9 + 27 = 39 4 4**1 + 4**2 + 4**3 + 4**4 = 4 + 16 + 64 + 256 = 340 10 11,111,111,110 Clearly 10 characters may be adequate to represent all concrete objects that are of interest. The reason that these numbers go up so astronomically with the Neutrabet is the absence of any need to be aware of sounds or readability. The Neutrabet is purely abstract. If we think in terms of the first three letters of the English alphabet, remembering the use of the termination character, these would be the allowable Meaning Concepts: A B C AA AB AC BA BB BC CA CB CC AAA AAB AAC ABA ABB ABC ACA ACB ACC BAA BAB BAC BBA BBB BBC BCA BCB BCC CAA CAB CAC CBA CBB CBC CCA CCB CCC The use of the End Character ( XX00 ) As has been pointed out by others, the neutrabet can be formed in other ways, e.g 10 things taken 11 at a time. The example is merely done to illustrate how few unicodes would be needed to create a full Universal Language. 5.0 Phrase Templates In order to implement the Universal Language, the concept of a Phrase Template (PT) is introduced. The idea of a PT is to have strict linguistic rules for transformation from UL to HLs. The Phrase Template states the order in which the HL should be unwrapped by the Language Parser to form the HL or visa versa. Here are some basic Phrase Templates. This discussion is English based, hopefully there are similar templates in other languages. ¸ Simple Multi-word Phrase examples: hello, goodbye, what, where ¸ Subject Verb Object (basic) examples: I go store ¸ Basic Question example: what's that? who's hungry? The UL languages development requires that Phrase Templates be provided that can map from UL to NL and visa versa. Phrase Templates allow UL to avoid the concern of the infinite variety available in languages. The templates provide restrictions on the types of meaning transfers that can be done. Obviously a goal for UL is to have the Phrase Templates become all inclusive at some point. But this does not have to be the case to get started. 6.0 Universal Language Parsing The primary types of Parsing are, UL to HL and HL to UL. These needed to be examined separately. 6.1 UL To HL Parsing This is an interative process consistion of the following steps: 1. Retrieve the UL Phrase Template. This is the first part of the UL phrase. 2. Host Language determines the HL Template that will change the UL to HL. 3. Retrieve the content UL Unicode. 4. Map it from the UL Phrase Template to the HL Phrase Template using UL Phase Template to HL Phrase Template mapping rules. Here is a simple example 1. UL Phrase Template is a simple multi-word phrase 2. HL Phrase Template is a one to one translation from UL content to HL content. 3. UL content is a greeting 4. HL content English is "Hello" HL content Spanish is "Hola" HL content Chinese is "Ni Hao Mah" HL content French is "bonjour" Obviously this is a simple example, but it illustrates the basic principle. 7.0 Java Prototype The following code is mean to be an initial version of the Universal Language. It is provided here to start the process of developing the Universal Language. 7.1 Ordering Of Meaning Components In the Java code below, the vocabulary is ordered in alphabetical order based on English. This is an absolutely incorrect representation. It is done with major appologies. At some point a more meaningful order needs to be established. This would focus on universal characteristics across languages. Here are some possible ordering principles: 1. Level of concreteness over time, e.g. ball is more concrete than audience since the later requires more context. 2. Visible size and mobility - Here ball become ordered before star because while both are visible, the ball can be moved. Note that atom, which is extremely small, fall after both. So size does not mean larger or smaller. 3. Complexity of context, e.g. audience precedes botonist since the latter requires long term context based on years of study of a specific discipline. 4. Similarity of meaning, e.g. ball and sphere might be grouped together. Note that all these ordering principles will have validity across languages. /************** compile using javac compiler ********/ /* * To test initial java code * Seperate files by searching for filename: and end: * compile all code using javac compiler * run * java phrase */ /* Filename: phrase.java javac phrase.java */ class phrase extends dictionary { phrase(int[] ucode) { if(ucode[0]==1) { phrase_template=1; } } public int get_pt() { return phrase_template; } public static void main( String[] args ) { int[] ucode_tst = { 1,2,3 }; phrase p = new phrase( ucode_tst ); System.out.println("PT is " + p.get_pt()); System.out.println("word is " + p.get(0)); } private int phrase_template = 0; } /* End: phrase.java */ /* Filename: phrase_templateINTF.java javac phrase_templateINTF.java */ interface phrase_templateINTF { final int basic = 1; final int subj_obj = 2; } /* End: phrase_templateINTF.java */ /* Filename dictionary.java javac dictionary.java */ import java.util.Vector; import mc_chinese_INTF; import mc_english_INTF; import mc_spanish_INTF; class concrete_word { void add(char[][] unicode) { cw = unicode; } String get( int which ) { return new String( cw[which], 0, cw[which].length); } char cw[][]; int count; }; class concrete_dict { public void add(char[][] concrete) { dict.addElement( concrete ); } private Vector dict = new Vector(); } class dictionary implements mc_chinese_INTF, mc_spanish_INTF, mc_english_INTF { concrete_word cw; concrete_dict dw; dictionary() { cw = new concrete_word(); dw = new concrete_dict(); cw.add(mc_english_INTF.w1); } String get( int which) { return cw.get(which); } public static void main( String[] args ) { dictionary t = new dictionary(); // Test System.out.println( "Unicode is " + t.get(0)); System.out.println( "Unicode is " + t.get(1)); } } /* End: dictionary.java */ /* filename: mc_chinese_INTF.java javac mc_chinese_INTF.java This file is only partially complete. Unicodes with zero have not yet been located. Some of the others are only best approximations from the ones located in CJK 4E00-9FAF The unicode source is located at: http://www.unicode.org/ The required PDF files are at: http://www.unicode.org/charts/ */ interface mc_chinese_INTF { final char[][] w1 = { { '\u7a7a', '\u6c14' }, // air { '\u900b', '\u900d' }, // aisle { '\u500c', '\u5929', '\u7bc0' }, // albatros { '\u682e', '\u0069', '\u5405' }, // album { '\u6d12', '\u682f' }, // alcohol { '\u57fe', '\u0069', '\u0072', '\u0071' }, // alcove { '\u5515', '\u6e6d' }, // ale { '\u4ee3', '\u6573' }, // algebra { '\u0063', '\u0069', '\u0072', '\u0071' }, // alien { '\u0063', '\u0069', '\u0072', '\u0071' }, // alimony { '\u78e9' }, // alkali { '\u77ed', '\u543b' }, // aligator { '\u5408', '\u51ce' }, // alloy { '\u5386' }, // almanac { '\u674f', '\u6838' }, // almond { '\u9ad8', '\u5c71' }, // alp { '\u5b57', '\u0000', '\u5700' }, // alphabet { '\u0063', '\u575b' }, // altar { '\u0000' }, // aluminum { '\u0000', '\u0000', '\u0000', '\u0000' }, // amazon { '\u5927', '\u4f52' }, // ambasador { '\u0000', '\u62a4' }, // ambulance { '\u0000', '\u6676' }, // amethyst { '\u5b89', '\u57f9' }, // ammeter { '\u6c26', '\u6729' }, // ammonia { '\u519b', '\u706b' }, // ammunition { '\u0000', '\u0000' }, // amoeba { '\u5b89', '\u57f9' }, // ampere { '\u0000', '\u0000', '\u0000', '\u0000' }, // amphibian { '\u0000', '\u0000', '\u0000', '\u0000' }, // ampitheatre { '\u87d2', '\u867c' }, // anaconda { '\u0000', '\u0000', '\u0000', '\u0000' }, // anaesthesia { '\u5929' }, // angel { '\u752a' }, // angle { '\u52a8', '\u60da' }, // animal { '\u8e19' }, // ankle { '\u5e74', '\u91d2' }, // annuity { '\u8862', '\u868a' }, // ant { '\u0000', '\u0000', '\u0000', '\u0000' }, // antarctic { '\u0000', '\u0000', '\u0000', '\u0000' }, // antelope { '\u5929', '\u7ebf' }, // antenna { '\u0000', '\u0000', '\u0000', '\u0000' }, // anthem { '\u8bd7', '\u96c6' }, // anthology { '\u0063', '\u0069', '\u0072', '\u0071' }, // anthropoid { '\u4eba', '\u5940', '\u5b66' }, // anthropology { '\u62ad', '\u83cc' }, // antibiotic { '\u62ad', '\u4f53' }, // antibody { '\u62ad', '\u51bb', '\u5242' }, // antifreeze { '\u53e4', '\u5743' }, // antique { '\u961e', '\u5ed4', '\u5242' }, // antiseptic { '\u5ed8', '\u752a' }, // antler { '\u0000', '\u0000', '\u0000', '\u0000' }, // anus { '\u0063', '\u0069', '\u0072', '\u0071' }, // anvil { '\u738b', '\u0000', '\u6629' }, // aorta { '\u0000', '\u0000', '\u0000', '\u0000' }, // apartment { '\u0063', '\u0069', '\u0072', '\u0071' }, // ape { '\u5e75', '\u7555', '\u6d12' }, // apertif { '\u0063', '\u0069', '\u0072', '\u0071' }, // aperture { '\u4fa0', '\u5f92' }, // apostle { '\u0000', '\u0000' }, // apostrophe { '\u4eea', '\u5650' }, // apparatus { '\u9611', '\u5c3e', '\u708e' }, // appendicitus { '\u0063', '\u0069', '\u0072', '\u0071' }, // appendix { '\u0063', '\u0069', '\u0072', '\u0071' }, // appetite { '\u0063', '\u0069', '\u0072', '\u0071' }, // appetizer { '\u82f9', '\u0069' }, // apple { '\u5189', '\u5177' }, // appliance { '\u7533', '\u8bf7', '\u4eba' }, // applicant { '\u7533', '\u8bf7', '\u838d' }, // application { '\u5b66', '\u0000', '\u0000' }, // apprentice { '\u593b', '\u6811' }, // apricot { '\u56f4', '\u6870' }, // apron { '\u0000', '\u0000', '\u0000', '\u0000' }, // aqualung { '\u0000', '\u0000', '\u0000', '\u0000' }, // aquamarine { '\u0000', '\u0000', '\u0000', '\u0000' }, // aquarium { '\u8c21', '\u6a79' }, // aqueduct { '\u5f27' }, // arc { '\u0000', '\u0000', '\u0000', '\u0000' }, // arcade { '\u5927', '\u4e3b', '\u72e1' }, // archbishop { '\u5f13', '\u0069', '\u4e8e' }, // archer { '\u0000', '\u0000', '\u0000', '\u0000' }, // archipelago { '\u8bbe', '\u8ba1', '\u5202', '\u4e21' }, // architect { '\u0063', '\u0069', '\u0072', '\u0071' }, // archives { '\u62f1' }, // archway { '\u0000', '\u0000' }, // area { '\u0000', '\u6280', '\u573a' }, // arena { '\u548a', '\u53f9' }, // aria { '\u601d', '\u0069', '\u0072', '\u0071' }, // aristocracy { '\u601d', '\u0069', '\u0072', '\u0071' }, // aristocrat { '\u7bb3', '\u672f' }, // arithmetic { '\u4e07', '\u821f' }, // ark { '\u0000', '\u0000', '\u0000', '\u0000' }, // arm { '\u0000', '\u0000' }, // armada { '\u72b0', '\u72f3' }, // armadillo { '\u9ccc', '\u7532' }, // armour { '\u0000', '\u0000' }, // arms { '\u9646', '\u519b' }, // army { '\u7bad' }, // arrow { '\u706b', '\u836f', '\u538d' }, // arsenal { '\u0000', '\u0000' }, // arsenic { '\u65c0', '\u706b' }, // arson { '\u0063', '\u672f' }, // art { '\u0000' }, // artefact { '\u0063', '\u8112' }, // artery { '\u592b', '\u0069', '\u0072', '\u0071' }, // arthritis { '\u0063', '\u0069', '\u0072', '\u0071' }, // artichoke { '\u0063', '\u0069', '\u0072', '\u0071' }, // article { '\u4eba', '\u58ec', '\u54c1' }, // artifact { '\u5927', '\u70ae' }, // artillery { '\u4e8e', '\u0000', '\u4eba' }, // artisan { '\u827a', '\u672f' }, // artist { '\u53f3', '\u68c9' }, // asbestos { '\u4e0a', '\u5347' }, // ascension { '\u4e0a', '\u5347' }, // ascent { '\u6849', '\u6811' }, // ash { '\u0000', '\u0000' }, // asphalt { '\u82a6', '\u7b0b' }, // asparagus { '\u6837', '\u5b50' }, // aspect { '\u0000', '\u0000' }, // aspirin { '\u0000', '\u0000' }, // ass - animal { '\u0000', '\u0000', '\u0000', '\u0000' }, // assassin { '\u0000', '\u0000', '\u0000', '\u0000' }, // asterisk { '\u0000', '\u0000', '\u0000', '\u0000' }, // asteroid { '\u0000', '\u0000', '\u0000', '\u0000' }, // asthma { '\u0000', '\u0000', '\u0000', '\u0000' }, // astrology { '\u0000', '\u0000', '\u0000', '\u0000' }, // astronaut { '\u0000', '\u0000', '\u0000', '\u0000' }, // astronomy { '\u0000', '\u0000', '\u0000', '\u0000' }, // asylum { '\u4f53', '\u0000' }, // athlete { '\u5730', '\u56fe', '\u96ec' }, // atlas { '\u5927', '\u6c14' }, // atmosphere { '\u0000', '\u0000' }, // atol { '\u539f', '\u5b50' }, // atom { '\u0000', '\u0000' }, // attache { '\u0000', '\u0000' }, // attic { '\u5f8b', '\u0000' }, // attorney { '\u6fcf', '\u0000' }, // auction { '\u5927', '\u80c6' }, // audacity { '\u5426', '\u0000' }, // audience { '\u8ba1' }, // audit { '\u8bd5', '\u542c' }, // audition { '\u0000', '\u0000', '\u0000', '\u0000' }, // auditorium { '\u59d1', '\u0000' }, // aunt { '\u4f5c', '\u5bb6' }, // author { '\u6743', '\u529b' }, // authority { '\u81ea', '\u4f20' }, // autobiography { '\u0000', '\u0000', '\u0000', '\u0000' }, // autocrat { '\u58ec', '\u8ff9' }, // autograph { '\u8f66' }, // automobile { '\u79cb', '\u5b63' }, // autumn { '\u0000', '\u0000', '\u0000', '\u0000' }, // autopsy { '\u0000', '\u0000' }, // avalanche { '\u6797', '\u836b' }, // avenue { '\u822a', '\u5ba4' }, // aviation { '\u0000', '\u0000' }, // avocado { '\u6388', '\u4e88' }, // award { '\u65a7' }, // axe { '\u516c', '\u7406', }, // axiom { '\u8f74' }, // axis { '\u8f6e', '\u8f74' }, // axle }; } /* end: mc_chinese_INTF.java */ /* Filename: mc_spanish_INTF.java javac mc_spanish_INTF.java */ interface mc_spanish_INTF { final char[][] w1 = { { '\u0063', '\u0069', '\u0072', '\u0065' }, // air { '\u006e', '\u0061', '\u0076', '\u0065' }, // aisle { '\u0061', '\u006c', '\u0062', '\u0061', '\u0074', '\u0072', '\u006f', '\u0073' }, // albatros { '\u0061', '\u006c', '\u0062', '\u0075', '\u006d' }, // album { '\u0061', '\u006c', '\u0063', '\u006f', '\u0068', '\u006f', '\u006c' }, // alcohol { '\u006e', '\u0069', '\u0063', '\u0068', '\u006f' }, // alcove { '\u0000' }, // ale { '\u0000' }, // algebra { '\u0000' }, // alien { '\u0000' }, // alimony { '\u0000' }, // alkali { '\u0000' }, // aligator { '\u0000' }, // alloy { '\u0000' }, // almanac { '\u0000' }, // almond { '\u0000' }, // alp { '\u0000' }, // alphabet { '\u0000' }, // altar { '\u0000' }, // aluminum { '\u0000' }, // amazon { '\u0000' }, // ambasador { '\u0000' }, // ambulance { '\u0000' }, // amethyst { '\u0000' }, // ammeter { '\u0000' }, // ammonia { '\u0000' }, // ammunition { '\u0000' }, // amoeba { '\u0000' }, // ampere { '\u0000' }, // amphibian { '\u0000' }, // ampitheatre { '\u0000' }, // anaconda { '\u0000' }, // anaesthesia { '\u0000' }, // angel { '\u0000' }, // angle { '\u0000' }, // animal { '\u0000' }, // ankle { '\u0000' }, // annuity { '\u0000' }, // ant { '\u0000' }, // antarctic { '\u0000' }, // antelope { '\u0000' }, // antenna { '\u0000' }, // anthem { '\u0000' }, // anthology { '\u0000' }, // anthropoid { '\u0000' }, // anthropology { '\u0000' }, // antibiotic { '\u0000' }, // antibody { '\u0000' }, // antifreeze { '\u0000' }, // antique { '\u0000' }, // antiseptic { '\u0000' }, // antler { '\u0000' }, // anus { '\u0000' }, // anvil { '\u0000' }, // aorta { '\u0000' }, // apartment { '\u0000' }, // ape { '\u0000' }, // apertif { '\u0000' }, // aperture { '\u0000' }, // apostle { '\u0000' }, // apostrophe { '\u0000' }, // apparatus { '\u0000' }, // appendicitus { '\u0000' }, // appendix { '\u0000' }, // appetite { '\u0000' }, // appetizer { '\u0000' }, // apple { '\u0000' }, // appliance { '\u0000' }, // applicant { '\u0000' }, // application { '\u0000' }, // apprentice { '\u0000' }, // apricot { '\u0000' }, // apron { '\u0000' }, // aqualung { '\u0000' }, // aquamarine { '\u0000' }, // aquarium { '\u0000' }, // aqueduct { '\u0000' }, // arc { '\u0000' }, // arcade { '\u0000' }, // arch { '\u0000' }, // archaelogy { '\u0000' }, // archangel { '\u0000' }, // archbishop { '\u0000' }, // archer { '\u0000' }, // archipelago { '\u0000' }, // architect { '\u0000' }, // archives { '\u0000' }, // archway { '\u0000' }, // actic { '\u0000' }, // area { '\u0000' }, // arena { '\u0000' }, // aria { '\u0000' }, // aristocracy { '\u0000' }, // aristocrat { '\u0000' }, // arithmetic { '\u0000' }, // ark { '\u0000' }, // arm { '\u0000' }, // armada { '\u0000' }, // armadillo { '\u0000' }, // armour { '\u0000' }, // arms { '\u0000' }, // army { '\u0000' }, // arrow { '\u0000' }, // arsenic { '\u0000' }, // arsenal { '\u0000' }, // arson { '\u0000' }, // art { '\u0000' }, // artefact { '\u0000' }, // artery { '\u0000' }, // arthritis { '\u0000' }, // artichoke { '\u0000' }, // article { '\u0000' }, // artifact { '\u0000' }, // artillery { '\u0000' }, // artisan { '\u0000' }, // artist { '\u0000' }, // asbestos { '\u0000' }, // ash { '\u0000' }, // asphalt { '\u0000' }, // aspirin { '\u0000' }, // ass - animal { '\u0000' }, // assassin { '\u0000' }, // asterisk { '\u0000' }, // asteroid { '\u0000' }, // asthma { '\u0000' }, // astrology { '\u0000' }, // astronaut { '\u0000' }, // astronomy { '\u0000' }, // asylum { '\u0000' }, // athlete { '\u0000' }, // atlas { '\u0000' }, // atmosphere { '\u0000' }, // atol { '\u0000' }, // atom { '\u0000' }, // attache { '\u0000' }, // attic { '\u0000' }, // attorney { '\u0000' }, // auction { '\u0000' }, // audience { '\u0000' }, // audit { '\u0000' }, // audition { '\u0000' }, // auditorium { '\u0000' }, // aunt { '\u0000' }, // author { '\u0000' }, // authority { '\u0000' }, // autobiography { '\u0000' }, // autocrat { '\u0000' }, // autograph { '\u0000' }, // automobile { '\u0000' }, // autopsy { '\u0000' }, // avalanche { '\u0000' }, // avenue { '\u0000' }, // aviation { '\u0000' }, // avocado { '\u0000' }, // award { '\u0000' }, // axe { '\u0000' }, // axiom { '\u0000' }, // axis { '\u0000' }, // axle }; } /* End: mc_spanish_INTF.java */ /* Filename: mc_english_INTF.java javac mc_english_INTF.java */ interface mc_english_INTF { final char[][] w1 = { { '\u0061', '\u0069', '\u0072' }, // air { '\u0061', '\u0069', '\u0073', '\u006c','\u0065' }, // aisle { '\u0061', '\u006c', '\u0062', '\u0061', '\u0074', '\u0072', '\u006f', '\u0073' }, // albatros { '\u0061', '\u006c', '\u0062', '\u0075', '\u006d' }, // album { '\u0061', '\u006c', '\u0063', '\u006f', '\u0068', '\u006f', '\u006c' }, // alcohol { '\u0061', '\u006c', '\u0063', '\u0068', '\u0076', '\u0065' }, // alcove { '\u0061', '\u006c', '\u0065' }, // ale { '\u0061', '\u006c', '\u0065', '\u0000', '\u0061', '\u006c', '\u0065' }, // algebra { '\u0000' }, // alien { '\u0000' }, // alimony { '\u0000' }, // alkali { '\u0000' }, // aligator { '\u0000' }, // alloy { '\u0000' }, // almanac { '\u0000' }, // almond { '\u0000' }, // alp { '\u0000' }, // alphabet { '\u0000' }, // altar { '\u0000' }, // aluminum { '\u0000' }, // amazon { '\u0000' }, // ambasador { '\u0000' }, // ambulance { '\u0000' }, // amethyst { '\u0000' }, // ammeter { '\u0000' }, // ammonia { '\u0000' }, // ammunition { '\u0000' }, // amoeba { '\u0000' }, // ampere { '\u0000' }, // amphibian { '\u0000' }, // ampitheatre { '\u0000' }, // anaconda { '\u0000' }, // anaesthesia { '\u0000' }, // angel { '\u0000' }, // angle { '\u0000' }, // animal { '\u0000' }, // ankle { '\u0000' }, // annuity { '\u0000' }, // ant { '\u0000' }, // antarctic { '\u0000' }, // antelope { '\u0000' }, // antenna { '\u0000' }, // anthem { '\u0000' }, // anthology { '\u0000' }, // anthropoid { '\u0000' }, // anthropology { '\u0000' }, // antibiotic { '\u0000' }, // antibody { '\u0000' }, // antifreeze { '\u0000' }, // antique { '\u0000' }, // antiseptic { '\u0000' }, // antler { '\u0000' }, // anus { '\u0000' }, // anvil { '\u0000' }, // aorta { '\u0000' }, // apartment { '\u0000' }, // ape { '\u0000' }, // apertif { '\u0000' }, // aperture { '\u0000' }, // apostle { '\u0000' }, // apostrophe { '\u0000' }, // apparatus { '\u0000' }, // appendicitus { '\u0000' }, // appendix { '\u0000' }, // appetite { '\u0000' }, // appetizer { '\u0000' }, // apple { '\u0000' }, // appliance { '\u0000' }, // applicant { '\u0000' }, // application { '\u0000' }, // apprentice { '\u0000' }, // apricot { '\u0000' }, // apron { '\u0000' }, // aqualung { '\u0000' }, // aquamarine { '\u0000' }, // aquarium { '\u0000' }, // aqueduct { '\u0000' }, // arc { '\u0000' }, // arcade { '\u0000' }, // arch { '\u0000' }, // archaelogy { '\u0000' }, // archangel { '\u0000' }, // archbishop { '\u0000' }, // archer { '\u0000' }, // archipelago { '\u0000' }, // architect { '\u0000' }, // archives { '\u0000' }, // archway { '\u0000' }, // actic { '\u0000' }, // area { '\u0000' }, // arena { '\u0000' }, // aria { '\u0000' }, // aristocracy { '\u0000' }, // aristocrat { '\u0000' }, // arithmetic { '\u0000' }, // ark { '\u0000' }, // arm { '\u0000' }, // armada { '\u0000' }, // armadillo { '\u0000' }, // armour { '\u0000' }, // arms { '\u0000' }, // army { '\u0000' }, // arrow { '\u0000' }, // arsenic { '\u0000' }, // arsenal { '\u0000' }, // arson { '\u0000' }, // art { '\u0000' }, // artefact { '\u0000' }, // artery { '\u0000' }, // arthritis { '\u0000' }, // artichoke { '\u0000' }, // article { '\u0000' }, // artifact { '\u0000' }, // artillery { '\u0000' }, // artisan { '\u0000' }, // artist { '\u0000' }, // asbestos { '\u0000' }, // ash { '\u0000' }, // asphalt { '\u0000' }, // aspirin { '\u0000' }, // ass - animal { '\u0000' }, // assassin { '\u0000' }, // asterisk { '\u0000' }, // asteroid { '\u0000' }, // asthma { '\u0000' }, // astrology { '\u0000' }, // astronaut { '\u0000' }, // astronomy { '\u0000' }, // asylum { '\u0000' }, // athlete { '\u0000' }, // atlas { '\u0000' }, // atmosphere { '\u0000' }, // atol { '\u0000' }, // atom { '\u0000' }, // attache { '\u0000' }, // attic { '\u0000' }, // attorney { '\u0000' }, // auction { '\u0000' }, // audience { '\u0000' }, // audit { '\u0000' }, // audition { '\u0000' }, // auditorium { '\u0000' }, // aunt { '\u0000' }, // author { '\u0000' }, // authority { '\u0000' }, // autobiography { '\u0000' }, // autocrat { '\u0000' }, // autograph { '\u0000' }, // automobile { '\u0000' }, // autopsy { '\u0000' }, // avalanche { '\u0000' }, // avenue { '\u0000' }, // aviation { '\u0000' }, // avocado { '\u0000' }, // award { '\u0000' }, // axe { '\u0000' }, // axiom { '\u0000' }, // axis { '\u0000' }, // axle }; } /* End: mc_english_INTF.java */ /* Filename: mc_any_INTF.java This class is to be copied to other names in order to develop other languages. It should not be compiled in its present state. */ interface mc_any_INTF { final char[][] w1 = { { '\u0000' }, // air { '\u0000' }, // aisle { '\u0000' }, // albatros { '\u0000' }, // album { '\u0000' }, // alcohol { '\u0000' }, // alcove { '\u0000' }, // ale { '\u0000' }, // algebra { '\u0000' }, // alien { '\u0000' }, // alimony { '\u0000' }, // alkali { '\u0000' }, // aligator { '\u0000' }, // alloy { '\u0000' }, // almanac { '\u0000' }, // almond { '\u0000' }, // alp { '\u0000' }, // alphabet { '\u0000' }, // altar { '\u0000' }, // aluminum { '\u0000' }, // amazon { '\u0000' }, // ambasador { '\u0000' }, // ambulance { '\u0000' }, // amethyst { '\u0000' }, // ammeter { '\u0000' }, // ammonia { '\u0000' }, // ammunition { '\u0000' }, // amoeba { '\u0000' }, // ampere { '\u0000' }, // amphibian { '\u0000' }, // ampitheatre { '\u0000' }, // anaconda { '\u0000' }, // anaesthesia { '\u0000' }, // angel { '\u0000' }, // angle { '\u0000' }, // animal { '\u0000' }, // ankle { '\u0000' }, // annuity { '\u0000' }, // ant { '\u0000' }, // antarctic { '\u0000' }, // antelope { '\u0000' }, // antenna { '\u0000' }, // anthem { '\u0000' }, // anthology { '\u0000' }, // anthropoid { '\u0000' }, // anthropology { '\u0000' }, // antibiotic { '\u0000' }, // antibody { '\u0000' }, // antifreeze { '\u0000' }, // antique { '\u0000' }, // antiseptic { '\u0000' }, // antler { '\u0000' }, // anus { '\u0000' }, // anvil { '\u0000' }, // aorta { '\u0000' }, // apartment { '\u0000' }, // ape { '\u0000' }, // apertif { '\u0000' }, // aperture { '\u0000' }, // apostle { '\u0000' }, // apostrophe { '\u0000' }, // apparatus { '\u0000' }, // appendicitus { '\u0000' }, // appendix { '\u0000' }, // appetite { '\u0000' }, // appetizer { '\u0000' }, // apple { '\u0000' }, // appliance { '\u0000' }, // applicant { '\u0000' }, // application { '\u0000' }, // apprentice { '\u0000' }, // apricot { '\u0000' }, // apron { '\u0000' }, // aqualung { '\u0000' }, // aquamarine { '\u0000' }, // aquarium { '\u0000' }, // aqueduct { '\u0000' }, // arc { '\u0000' }, // arcade { '\u0000' }, // arch { '\u0000' }, // archaelogy { '\u0000' }, // archangel { '\u0000' }, // archbishop { '\u0000' }, // archer { '\u0000' }, // archipelago { '\u0000' }, // architect { '\u0000' }, // archives { '\u0000' }, // archway { '\u0000' }, // actic { '\u0000' }, // area { '\u0000' }, // arena { '\u0000' }, // aria { '\u0000' }, // aristocracy { '\u0000' }, // aristocrat { '\u0000' }, // arithmetic { '\u0000' }, // ark { '\u0000' }, // arm { '\u0000' }, // armada { '\u0000' }, // armadillo { '\u0000' }, // armour { '\u0000' }, // arms { '\u0000' }, // army { '\u0000' }, // arrow { '\u0000' }, // arsenic { '\u0000' }, // arsenal { '\u0000' }, // arson { '\u0000' }, // art { '\u0000' }, // artefact { '\u0000' }, // artery { '\u0000' }, // arthritis { '\u0000' }, // artichoke { '\u0000' }, // article { '\u0000' }, // artifact { '\u0000' }, // artillery { '\u0000' }, // artisan { '\u0000' }, // artist { '\u0000' }, // asbestos { '\u0000' }, // ash { '\u0000' }, // asphalt { '\u0000' }, // aspirin { '\u0000' }, // ass - animal { '\u0000' }, // assassin { '\u0000' }, // asterisk { '\u0000' }, // asteroid { '\u0000' }, // asthma { '\u0000' }, // astrology { '\u0000' }, // astronaut { '\u0000' }, // astronomy { '\u0000' }, // asylum { '\u0000' }, // athlete { '\u0000' }, // atlas { '\u0000' }, // atmosphere { '\u0000' }, // atol { '\u0000' }, // atom { '\u0000' }, // attache { '\u0000' }, // attic { '\u0000' }, // attorney { '\u0000' }, // auction { '\u0000' }, // audience { '\u0000' }, // audit { '\u0000' }, // audition { '\u0000' }, // auditorium { '\u0000' }, // aunt { '\u0000' }, // author { '\u0000' }, // authority { '\u0000' }, // autobiography { '\u0000' }, // autocrat { '\u0000' }, // autograph { '\u0000' }, // automobile { '\u0000' }, // autopsy { '\u0000' }, // avalanche { '\u0000' }, // avenue { '\u0000' }, // aviation { '\u0000' }, // avocado { '\u0000' }, // award { '\u0000' }, // axe { '\u0000' }, // axiom { '\u0000' }, // axis { '\u0000' }, // axle }; } /* End: mc_spanish_INTF.java */ /* Filename: Host_Languages.java javac Host_Languages.java */ interface Host_Languages { final byte English = 0; final byte French = 1; final byte Spanish = 2; final byte German = 3; final byte Chinese = 4; final byte Vietnamese = 5; final byte Japanese = 6; final byte Hindi = 7; final byte Russian = 8; final byte Cherokee = 9; final byte Arabic = 10; final byte Greek = 11; final byte Hebrew = 12; final byte Bengali = 13; final byte Canadian_aboringinal = 14; final byte Korean = 15; final byte Ethiopic = 16; final byte Gujarati = 17; final byte Gurmukhi = 18; final byte Cyrillic = 19; final byte Mongolian = 20; final byte Romanian = 21; final byte Serbian = 22; final byte Georgian = 23; final byte Thai = 24; final byte Tibetan = 25; final byte Ogonek = 26; // ... } /* End: Host_Languages.java */ Expires 10/22/2001