INTERNET-DRAFT Soobok Lee draft-ietf-idn-lsb-ace-02.txt Expires 2002-Mar-12 2001-Sep-12 Improving ACE using code point reordering v2.0 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt Distribution of this document is unlimited. Please send comments to the author lsb@postel.co.kr or to the idn working group at idn@ops.ietf.org. Abstract This document describes a method to improve ACE label efficiency by frequency-based temporal reordering of code points in ACE encoding and decoding processes in order to relocate scattered frequent characters into much more compact area in reordered code space. This reordering can be easily implemented only with simple character mapping tables without adding much complexity to existing ACEs. When applied to AMC-ACE-Z, the reordering produces up to 30% shorter ACE labels for long IDN labels. Contents Differences from version 1.0 Overview Unified Han / Hangeul blocks Major scripts blocks Other scripts blocks Implementation : partial permutation table Implementation : complexity/efficiency trade-off Security considerations References Author A1. REORDER.C: reordering library code/data version 2.0 A2. LAMCZ.C: Example implementation into AMC-ACE-Z A3. Experiments Result Differences from version 1.0 Version 2.0 differs from version 1.0 in three aspects: 1) In the reordering table for Unified Han block, A pair of Simplified Chinese,Traditional Chinese and optionally Kanji ese letter are ranked by the sum of their frequecies and placed side-by-side 2) new supports for Japanese Hirakana,Hindi,Arabic,Hebrew, Cyrillic,Greek,Thai,Tamil,Ethiopic scripts in addition to Han/Hangeul 3) new implementation of reordering with AMC-ACE-Z, the chosen ACE in the 51th IETF/IDN WG meeting. 4) new partial permutation tables and lookup functions replace older buggy and inefficient linear mapping tables. Overview Shorter ACE labels help us to: 1. save memory resources 2. reduce internet traffic 3. reduce DNS UDP request/result packet size required, thus less opportunity for packet fragmentations and more room for result resource sets. 4. fit more text into domain labels When ACE algorithms encode a sequence of code points in a label from a single script block into ASCII-compatible strings, the ratio of the length of output ACE label to that of original native-script label is varied with the size of the script block and its character frequency distribution. That is, their efficiencies are script dependent. For example, in AMC-ACE-Z, basic latin characters ( 0~9 a-b - ) in IDN labels are encoded in literal mode ("as it is") and the ratio is 1:1. But for Cyrillic,Katakana,Han and Hangeul script blocks, the ratios in non-literal mode are 1.41,1.89,3.15 and 3.24, respectively. As such examples shows, most ACE algorithms are designed to favor latin and small script blocks over very large blocks like han and hangeul. For CJK people (in China,Hongkong,Macao,Japan,South/North Korea and Taiwan), that disadvantage results in longer ACE labels and less room for free-form long names. It is clear that there must be some improvements to ACEs to compensate this unfair disadvantage. ACE algorithms, in general, encode each successive XOR or arithmetic distance of a sequence of code points of a label using a base16 or base36-variant(AMC-ACE-Z) encoding which only use LDH characters. Shorter ACE labels come from shorter successive code distances. For example, ACE-AMC-Z's non-literal mode works conceptually in these three steps: 1. sort the label's character sequence into increasing code point order to make a sorted label 2. compute successive code point distances (always positive) of the characters in the sorted label 3. encode the original position of each character and its successive code distance from previous one using base36-variant (boostring) variable-length encoding To achieve shorter code distances in step 2, this draft suggests another reordered unicode space which shares the same set of characters with the standard unicode space,but differs in that it assigns new code point value for each character according to the order of frequency in use. The reordered unicode space relocates most frequent ones into narrower code ranges and that reduces the variance of new code point values and their mutual code point distances become much less than that for unreordered original unicode space. For example, the next hangeul label 1) (U+D55C U+AD6D U+C778 U+D130 U+B137 U+C815 U+BCF4 U+C13C U+D130) will be reordered into the next by reordering in [A1]: 2) (U+AFF8 U+AFF0 U+AFFA U+AFE8 U+AFBA U+AFEB U+AFEE U+AF76 U+AFE8) The sorted labels in ACE-AMC-Z step 1 are: 1) (U+AD6D U+B137 U+BCF4 U+C13C U+C778 U+C815 U+D130 U+D130 U+D55C) 2) (U+AF76 U+AFBA U+AFE8 U+AFE8 U+AFEB U+AFEE U+AFF0 U+AFF8 U+AFFA) The maginitudes of maximum variations of code point values: 1) U+AD6D ~ U+D55C = 0x27EF ( 10223 ) 2) U+AF76 ~ U+AFFA = 0x0084 ( 132 ) The successive code distances in AMC-ACE-Z step 2 are: 1) (AD25 3CA BBD 448 63C 9D 91B 0 42C) 2) (AEF6 44 2E 0 3 3 2 8 2) The base36-variant (bootstring) encoding(ACE label) of the two distance sequences in AMC-ACE-Z step 3 are : 1) 5d0bx5euxnjje69i70af08beA817g ( 29 chars) 2) 1s0b1e7ecnsk7dta ( 16 chars, 13 chars saved ) Reordered code points produced 45% (=13/29) shorter label. ACE decoding takes the reverse step and restores the pre-reordering code points correctly. So reordering is transparent to end users. Reordered unicode space is only used internally in ACE encoding and decoding algorithms. This draft describes the principles and method to construct the reordered unicode space for small and large script blocks using character frequency data. It suggest the reorering is implemented easily only with characters mapping tables and lookup functions. Reordering is easily pluggable into any ACE algorithms only by inserting few lines of reordering/restoring function calls into ACE routines as exemplified in this AMC-ACE-Z implmentations [A2]. ACE Label Efficiency and Scripts Based on the experiments with samples of 10-characters-long labels for each script block in the unicode script repoitores, the label length improvement ratios are varied with the size of the frequent set of basic alphabets in each script block. +-----------------------------------------------------------------+ | reordering on samples labels of 10-characters long | +--------------+---------------------+----------------------------+ | Script Block | Basic Alphabets/ | AMC-ACE-Z label length | | | Frequent Characters | constant | +--------+-----+------+--------------+---------+---------+--------+ | Name |Size |Number|Distribution |no |reorder |improve | | | | | |reorder | | | +--------+-----+------+--------------+---------+---------+--------+ |Han |20992| 20992| 4096(99%)/ | 3.09 | 2.24 | 27.47%| | | | | scattered | | | | +--------+-----+------+--------------+---------+---------+--------+ |Hangeul |11172| 11172| 1024(99%)/ | 3.04 | 2.10 | 31.07%| | | | | scattered | | | | +--------+-----+------+--------------+---------+---------+--------+ |Greek | 144| 29/35| sequential/ | 1.33 | 1.25 | 5.72%| | | | | compact | | | | +--------+-----+------+--------------+---------+---------+--------+ |Arabic | 256| 28/65| sequential/ | 1.47 | 1.28 | 12.96%| | | | | compact | | | | +--------+-----+------+--------------+---------+---------+--------+ |Cyrillic| 304| 33/47| sequential/ | 1.34 | 1.27 | 5.08%| | | | | compact | | | | +--------+-----+------+--------------+---------+---------+--------+ |Hebrew | 102| 22/29| sequential/ | 1.35 | 1.25 | 7.06%| | | | | compact | | | | +--------+-----+------+--------------+---------+---------+--------+ |Hindi | 128| 44/64| sequential/ | 1.82 | 1.57 | 14.00%| | | | | compact | | | | +--------+-----+------+--------------+---------+---------+--------+ |Hiragana| 96| 73/86| sequential/ | 1.70 | 1.58 | 6.96%| | | | | compact | | | | +--------+-----+------+--------------+---------+---------+--------+ |Katakana| 96| 73/90| sequential/ | 1.77 | 1.56 | 11.97%| | | | | compact | | | | +--------+-----+------+--------------+---------+---------+--------+ |Tamil | 128| 35/42| sequential/ | 1.67 | 1.44 | 13.70%| | | | | compact | | | | +--------+-----+------+--------------+---------+---------+--------+ |Ethiopic| 416| 182/| sequential/ | 2.28 | 1.96 | 14.15%| | | | 116| scattered | | | | +--------+-----+------+--------------+---------+---------+--------+ If we denote as N, the number of code points in a label, we can make the formula L of the AMC-ACE-Z label length for each reordered script block from the above table like this: Group 1) : improvement : 27% ~ 31% Han : L = N * 2.20 ( close to that for UCS-2 ) Hangul : L = N * 2.08 ( close to that for UCS-2 ) Group 2) : improvement : 5 ~ 7%, |basic alphabets| < 35 Greek : L = N * 1.25 ( much less than that for UCS-2) Arabic : L = N * 1.28 ( 12.96% ) Cyrillic: L = N * 1.27 Hebrew : L = N * 1.25 Group 3) : improvement : 11 ~ 14%, |basic alphabets| >= 35 Hindi : L = N * 1.57 ( much less than that for UCS-2) Hiragana: L = N * 1.57 ( 6.96% ) Katakana: L = N * 1.56 Thai : L = N * 1.44 Tamil : L = N * 1.44 Ethiopic: L = N * 1.96 The amount of improvement on ACE label length seems to be varied with the frequency distributions of basic alphabets of each script block. Unified Han and Hangeul 11172 Hangul syllables and 20912 CJK Unified Han ideographs occupy roughly two thirds of current assigned unicode code points. Their lexicographical ordering makes various ACE compression algorithm work poorly for them, because they are spread evenly through out those wide code blocks. According to one usage frequency statistics on hangeul syllables in general hangeul texts, the most frequent 256 Hangul syllables have the cumulative frequency sum of 88.2% and for the case of top 512 ones, it reaches 99.9%. That means the maximum variation of code point values(11172) can be shrinked into 512 in reordered hangeul block with a probability of 99.9%. Likewise, the most frequent 256 Han letters have the cumulative frequency sum of 58.2% and for the cases of top 512,1024,2048 and 4096 ones, it reaches 72.8%,85.9%,95.4% and 99.4%, respectively. That means the maximum variation of code point values (20912) can be shrinked into 2048 with a probability of 95.4%. The han/hangul frequency mapping tables are constructed from nameprepped ML.com domains from VGRS MultiLingual testbeds. The frequenet characters in the tables are organized by their increasing frequency order to minimize the AMC-ACE-Z bootstring delta values which can be lowered when bigger code distances are from the lower positions of the sorted labels in AMC-ACE-Z step 2. In general,character frequency distributions in any script block may undergo some shifts within the frequent set by the passage of time, but the in and out of some characters from the frequent set are very rare. So, their impacts may be as marginal and negligable as the following comparison of experiment results shows. Reorering tables based on most frequent 1024,2048,3072 and 4096 han and hangul letters in increasing frequency order, produced marginal differences in improvements: N is the length of sample labels and other decimal values (in percentage) are the improvement ratios for all the combinations of all N and 4 reordering tables. | N| HAN-4096| HAN-3072| HAN-2048| HAN-1024| | 1| 7.07 | 5.49 | 3.58 | 1.64| | 2| 13.61 | 13.22 | 11.57 | 8.06| | 3| 16.26 | 16.05 | 15.10 | 12.26| | 4| 20.80 | 20.71 | 20.19 | 18.11| | 5| 22.17 | 22.03 | 21.47 | 19.41| | 6| 24.85 | 24.77 | 24.41 | 22.48| | 7| 25.52 | 25.40 | 24.99 | 23.17| | 8| 26.47 | 26.36 | 26.00 | 24.15| | 9| 26.54 | 26.46 | 26.04 | 24.26| | 10| 27.47 | 27.40 | 27.01 | 25.09| | 11| 27.30 | 27.26 | 26.85 | 25.12| | 12| 27.74 | 27.64 | 27.41 | 25.60| | 13| 27.27 | 27.17 | 26.78 | 25.28| | 14| 27.48 | 27.35 | 27.08 | 24.94| | 15| 28.60 | 28.43 | 28.56 | 26.54| | 16| 27.70 | 27.84 | 27.70 | 25.51| | 17| 25.68 | 25.68 | 25.43 | 23.70| |ALL| 20.30 | 20.14 | 19.43 | 17.09| Experiments with two reorering tables in increasing and descreasing orders for most frequent 2048,4096 han letters,also produced marginal differences in improvements: (4096D means: the ordering table is in decreasing frequency order) | N| HAN-4096| HAN-4096D| HAN-2048| HAN-2048D| | 1| 7.07 | 7.01 | 3.58 | 3.51 | | 2| 13.61 | 13.44 | 11.57 | 11.27 | | 3| 16.26 | 16.35 | 15.10 | 14.93 | | 4| 20.80 | 20.56 | 20.19 | 19.90 | | 5| 22.17 | 21.80 | 21.47 | 21.12 | | 6| 24.85 | 24.21 | 24.41 | 23.82 | | 7| 25.52 | 24.59 | 24.99 | 24.14 | | 8| 26.47 | 25.68 | 26.00 | 25.36 | | 9| 26.54 | 25.55 | 26.04 | 25.18 | | 10| 27.47 | 26.79 | 27.01 | 26.42 | | 11| 27.30 | 26.82 | 26.85 | 26.36 | | 12| 27.74 | 27.46 | 27.41 | 27.13 | | 13| 27.27 | 26.97 | 26.78 | 26.59 | | 14| 27.48 | 27.31 | 27.08 | 26.99 | | 15| 28.60 | 28.60 | 28.56 | 28.56 | | 16| 27.70 | 27.55 | 27.70 | 27.20 | | 17| 25.68 | 25.93 | 25.43 | 25.68 | |ALL| 20.30 | 20.00 | 19.43 | 19.07 | These experiments show that the influences of some fluctations in character frequency distributions in the frequent set of a script would not be so great that could invalidate or outdate this reordering approach in the forseeable future. But,to be as neutral and fair as possible in dealing with the cases with different usage patterns in China,Japan,Korea and Taiwan, here are provided some provisions for grouping country-specific variants of certain han letters. Especially, a group of simplified chainese letter (SC) and traditional chinese letter (TC) and Kanji-specific letter (KC) are ranked by the sum of their frequecies and placed side-by-side in the reordering table for Unified Han block. For example, the reordering table looks like: (TC1) (TC2 SC2) (TC3 KC3) (TC4) (TC5 SC5 KC5) (TC6) ..... This grouping will serve to prevent the frequency orders from being skewed toward one of those country-specific usage patterns. The experiments results 27 and 28 in [A3] shows that this reordering scheme improve 21.95% and 18.50% for SC and TC labels,respectively. According to experiments with huge han/hangeul domain samples, as for 15 or more letters of han/hangeul domains, AMC-ACE-Z with reordering produced the shortest ACE labels which length approximate to 2.0*n~2.2*n (n= number of han/hangul code points in a label), 33.3% more efficient than bare AMC-ACE-Z without the reordering. This efficiency is close to that of UCS-2 ( 2.0 * n) and much better than that of UTF8 ( 3.0*n ). The appendix [A3] also contains some tuning experiments on ACE-Z's skew and damp parameters. With skew==48 and damp==75, +1.3% in compression ratio was achieved for han domains with some marginal loss of efficiency in non-CJK scripts. Other Major scripts For Latin,Greek,Arabic,Cyrillic,Katakana,Hiragana,Thai,Hindi,Tamil and Ethiopic scripts, their character usage frequency data are constructed from registered nameprepped domain samples on Verisign ML.com testbeds. In consequence,the tables usually contain basic lowe-cased alphabet characters from each script block and they are organized by their increasing frequency order to minimize the AMC-ACE-Z bootstring delta values which can be lowered when bigger code distances are from the lower positions of the sorted labels in AMC-ACE-Z step 2. These script blocks have their sizes less than 256, except for Cyrillic(304) and Ethiopic(418) block. The mean average of successive code distances of labels in these script blocks, is expected to be much less than 255 even in unreorderd unicode space, and needs 1 or 2 digits in the bootstring notation of AMC-ACE-Z. That describes why the label length formula stays between 1.4*n ~ 1.8*n for bare AMC-ACE-Z and between 1.2*n ~ 1.6*n for even reordered AMC-ACE-Z. Both are more efficient than UCS-2 encoding. Reordered AMC-ACE-Z produces 11% ~ 14% shorter labels than bare AMC-ACE-Z for Arabic,Hindi,Thai,Tamil,Ethiopic and Katakana and 5% ~ 8% shorter labels than bare AMC-ACE-Z for Greek,Hebrew, Cyrillic and Hiragana. These differences in the amounts of ACE label improvement seem to come from the difference in frequency distributions of the basic alphabets in each script block. Basic Latin, Latin supplement, and Latin Extended A,B blocks occupy more than 512 code points. But, they are shared by many language groups in the western europe,eastern europe,africa and asia. So it is difficult to decide the weights to apply when we mix the character frequency distribution data of shared latin characters. Moreover, the most frequent 'a' to 'z' are already encoded in literal-mode in AMC-ACE-Z, for which reordering can't do anything to improve. The mean average number of extended latin characters in a label is between 1 ~ 2 and that causes the reordering table for latin produce only marginal improvements (<1%). Other language scripts Other language script blocks can be classified into three groups according to the frequency or the size of populations using the scripts: Group 4) 5 millions or more populations Bengali, Ethiopic, Gujarati, Gurmukhi, Kannada, Khmer, Lao, Malayalam, Mongolian, Myanmar, Oriya, Sinhala, Tamil, Telugu, Thaana, Tibetan, Group 5) less than 5 millions or extinct/archaic scripts Armenian, Cherokee, Georgian, Ogham, Runic, Syriac, Yi syllables Gothic,Deseret,and Unified Canadian Aboriginal Syllables ( UCAS ) Group 6) supplementary script block or special characters blocks IPA extensions; Combining diacritical marks; Greek Extended; CJK Radicals( Supplement), CJK Symbol and Punctuation, CJK Unified Ideographs Extension A and B, CJK Compatibility Ideographs (Supplement), Kanxi Radicals , Bopomofo( Extended), Kanbun, Hangul Compatiblity Jamo, Hangul Jamo, special character blocks +----------------------------------------------------------------+ | Factsheet on Group4 scripts ( > 5 millions) | +--------------+----------------------+--------------------------+ | Script Block | Basic/Frequent | Populations/Usage | | | Alphabets | | +---------+----+------+---------------+---------------+----------+ | Name |Size|Number|Distribution |native-speakers|Regions | +---------+----+------+---------------+---------------+----------+ |Bengali | 128| 43| sequential/ | 180 millions |Bangladesh| | | | | compact | | India | +---------+----+------+---------------+---------------+----------+ |Ethiopic | 416| 182| sequential/ | 25 millions | Ethiopia | | | | | compact | | Kenya | | | | | | | Somalia | +---------+----+------+---------------+---------------+----------+ |Gujarati | 128| 43| sequential/ | 45 millions | Gujarat | | | | | compact | | India | +---------+----+------+---------------+---------------+----------+ |Gurmukhi | 128| 46| sequential/ | 60 millions | Panjab | | | | | compact | | India | | | | | | | Pakistan | +---------+----+------+---------------+---------------+----------+ |Kannada | 128| 48| sequential/ | 27 millions |Karanataka| | | | | compact | | India | +---------+----+------+---------------+---------------+----------+ |Khmer | 128| 60| sequential/ | 8 millions | Cambodia | | | | | compact | | | +---------+----+------+---------------+---------------+----------+ |Lao | 128| 32| sequential/ | 15 millions | Laos | | | | | compact | | Thailand | +---------+----+------+---------------+---------------+----------+ |Malayalam| 128| 45| sequential/ | 34 millions | Kerala | | | | | compact | | India | +---------+----+------+---------------+---------------+----------+ +---------+----+------+---------------+---------------+----------+ |Mongolian| 176| 35| sequential/ | 5 millions | Mongolia | | | | | compact | | | +---------+----+------+---------------+---------------+----------+ |Myanmar | 160| 44| sequential/ | 25 millions | Myanma | | | | | compact | | (Burma) | +---------+----+------+---------------+---------------+----------+ |Oriya | 128| 56| sequential/ | 22 millions | Orissa | | | | | compact | | India | +---------+----+------+---------------+---------------+----------+ |Sinhala | 128| 54| sequential/ | 12 millions | Srilanka | | | | | compact | | | +---------+----+------+---------------+---------------+----------+ |Tamil | 128| 35| sequential/ | 52 millions |Tamilnadu | | | | | compact | | India | | | | | | |Sri Lanka | | | | | | | etc | +---------+----+------+---------------+---------------+----------+ |Telugu | 128| 48| sequential/ | 45 millions | Andrah | | | | | compact | | Pradesh | | | | | | | India | +---------+----+------+---------------+---------------+----------+ |Tibetan | 128| 45| sequential/ | 6.5 millions | Tibet | | | | | compact | | Bhutan | +---------+----+------+---------------+---------------+----------+ +----------------------------------------------------------------+ | Factsheet on Group5 scripts ( < 5 millions ) | +----------------+---------------------+-------------------------+ | Script Block | Basic/Frequent | Populations/Usage | | | Alphabets | | +-----------+----+------+--------------+---------------+---------+ | Name |Size|Number|Distribution |native-speakers|Regions | +-----------+----+------+--------------+---------------+---------+ |Armenian | 96| 38| sequential/ | 2millions |Armenia/ | | | | | compact | | Iran | +-----------+----+------+--------------+---------------+---------+ |UCAS | 640| 640| sequential/ | 200,000 | America | | | | | compact | | | +-----------+----+------+--------------+---------------+---------+ |Cheorkee | 96| 77| sequential/ | 22,500 | America | | | | | compact | | | +-----------+----+------+--------------+---------------+---------+ |Georgian | 96| 33| sequential/ | 3.5 millions | Georgia | | | | | compact | | | +-----------+----+------+--------------+---------------+---------+ |OGHAM | 32| 29| sequential/ | 0/extinct | old | | | | | compact | | Ireland | +-----------+----+------+--------------+---------------+---------+ |RUNIC | 224| 224| sequential/ | 0/extinct | old | | | | | compact | | Europe | +-----------+----+------+--------------+---------------+---------+ |SYRIAC | 128| 128| sequential/ | ?/liturgical | Syria, | | | | | compact | |Lebanon, | | | | | | | Iraq | +-----------+----+------+--------------+---------------+---------+ +-----------+----+------+--------------+---------------+---------+ |Yi |1168| 1168| sequential/ | 2~5 millions | Yunna, | |syllables | | | compact | | Sichuan | | | | | | | /China | +-----------+----+------+--------------+---------------+---------+ Implementation : partial permutation table The frequency-based reordering of a script block can be regarded as a kind of mathematical "transposition" or "transfigure" on the script block in mathematical terms. There may be three viable alternatives to implement a transposition for reordering of each script block. The first alternative in the following table, "linear search" has the minimum space ( data entries) requirements with some execution time overheads. This I-D contains one implementation of this first alternative for brevity, but implementors can choose other ones that fit their space and time efficiency requirements. +-----------------------------------------------------------------+ | Implementation alternatives of transposition | | | | |S| : the size of a entire script block | | |n| : the size of the frequent set of a script block | +---+----------+------------------------+------------+------------+ |No.| Search | Table | # of data | Time | | | Method | | entries | complexity | +---+----------+------------------------+------------+------------+ |1. | linear| one partial permutation| |n| | O(n) | | | search| table| | | +---+----------+------------------------+------------+------------+ |2. | binary|two 2-columns tables for| |n|*4 | O(log n) | | | search| forward/backward| | | | | | transpositions| | | +---+----------+------------------------+------------+------------+ |3. | random| two| |S|*2 | O(1) | | | access|entire-script-block-wide| | | | | | tables for forward/| | | | | | backward transpositions| | | +---+----------+------------------------+------------+------------+ A mathematical transposition is consisted of disjoint cycles and each cycle is decomposed into a group of 1:1 permutations between two of all character elements of the cycle. The reordering mapping table for each script block is implemented with a partial permutation table for space efficiency. To exemplify the differences between full and partial permutation tables, let's assume we have a script block S with 8 characters : 1 ~ 8. Let's express a permutation as (lhs -> rhs): lhs is mapped into rhs. If the full permutation table that reflects new frequeny order for the most frequent 4 characters is (4->1) (8->2) (7->3) (3->4) (5->5) (6->6) (1->7) (2->8), we can define the partial permutation table with only the first 4 permutations like this, (4->1) (8->2) (7->3) (3->4). We call (?->5) (?->6) (?->7) (?->8) the implicit permutations. We need the following important assumptions to make this partial permutation tables mathematically correct. First, 5 and 6 are neither lhs or rhs in the partial table, and for such cases, we assume (5->5) and (6->6) are implicitly assumed cycles in the transposition. Second, Except for these identity mappings for 5,6, there also should be no cycles within the implicit permutations (5->5) (6->6) (?->7) (?->8). That is, (?->7) and (?-8) should not contain any lhs or rhs values which are not referred to in the partial permutations. This assumption requires that they must be members of permutations to form a cycle in the partial permutations (4->1) (8->2) (7->3) (3->4). Based on these assumptions and the partial permutation table (4->1) (8->2) (7->3) (3->4), we can restore the full permutation table using the following algorithm: let P be the partial table, for each char R which is in S and is not a rhs in P, if R is not a lhs in P let (R -> R) // by assumption next for loop end if set Q so that ( R -> Q) in P if Q is a rhs in P set R2 so that ( Q -> R2) in P // inner loop for a cycle while ( R2 is a rhs in P ) find Q2 so that (R2 -> Q2) in P set R2 = Q2 end while let ( R2 -> R ) next for loop end if let ( Q->R ) next for loop end for This algorithm runs: for R=5, no ( 5 -> ? ) in P (5 -> 5) for R=6, no ( 6 -> ? ) in P (6 -> 6) for R=7, (7 -> 3),(3 -> 4),(4 -> 1), but no ( 1 -> ? ) in P (1 -> 7) for R=8, (8 -> 2), but no ( 2 -> ? ) in P (2 -> 8) At last, we got (5->5) (6->6) (1->7) (2->8). This algorithm trys to fill the "missing link",the last permutation of a cycle in the partial permutation. for the case R=7, (7->3)*(3->4)*(4->1)*(?->7)= (7->3->4->1)(?->7)= (7->3->4->1->7) for the case R=8, likewise, (8->2)*(?->8) = (8->2->8) This inner loop of this algorithm is executed only for rarely used characters not in the partial table as lhs values. So it will not impose much execution time overhead in real deployments. This partial permutation algorithm helps to reduce the set of required reordering mappings toward the number of most frequent 5% ~ 10% of characters of the entire script block, while other identity mappings for rarely used characters are omitted. Of course, this partial permutation algorithm can be used to prepare the full permutation tables for alternatives 2 and 3. Implementation : complexity/efficiency trade-offs As for code size of reordering, the total number of lines of all the mapping tables and codes is roughly around 536 lines, which is much less than that of [NAMEPREP] which has at least 6800 lines of mapping tables for compatibility/ canonical mappings for NFKC in JPNIC's MDNkit2.x and more than 55100 lines of legacy-to-unicode mapping tables and functions in ICONV library version 1.61. NAMEPREP tables and codes: NFKC tables : 6806 lines codes : 459 lines -------------------------- SUM1 : 7265 lines legacy-to-UCS : 55100 lines -------------------------- SUM2 : 62365 lines Reordering tables and codes: Group 1 table : 664 lines ( han-4096/hangeul-1024 code points ) Group 2,3 : 105 lines ( 528 code points ) mapping codes : 100 lines --------------------------- SUM3 : 869 lines Other options: Group 4 : +200 lines ( 816 code points, estimations ) Group 5 : +600 lines ( 2337 code points, estimations ) Group 6,7 are rarely used ones, so dropped out in this analysis. Since Group 5 are also rarely used ones, this draft recommends to drop out them, too. For Group 4, the author have not collected the frequency data yet. For Group 2,3, the member scripts are from most industrialized countries and the author could get sufficient characgter usage frequency data for them. However, since bare AMC-ACE-Z already produces labels much shorter than UCS-2 labels, we may drop out tables for Group 2,3 if the objective of reordering is set to provide near-to-or-better-than-UCS2 efficiecy for every script block. That means we may need only Group 1 tables for han and hangeul scripts. Even For Group 1 ,we have an additional option of han-2048/hangeul-512 ( -320 lines ) at the cost of marginal 1% loss of improvement ratio for ACE label length for han and hangeul domains. SUM3 is less than 1/8 of SUM1 and less than 1/70 of SUM2. So the code/data overheads for introducing this reordering is minimal and neglectable while it gives great(30%) improvements to han/hangeul labels. As for time efficiency of reordering with linear search, reordering + AMC-ACE-Z on 615937 ML.com samples took 50% more time than bare AMC-ACE-Z. If we choose binary search or random access alternatives for transposition implementation, the time overhead with reordering could be made much less or near to zero. Platform: Linux 5.2 on Intel Pentium II 266 Mhz, with gcc -O Samples : VGRS ML.com domain samples : 615937 Time : reordering - 33 seconds bare - 22 seconds this may include disk I/O time ( about 1~2 seconds) Average : reordering - 0.053 milliseconds/per domain bare - 0.037 milliseconds /per domain this included disk I/O time, so "on-the-fly" encoding would be only slightly faster than that. When a legacy-coded IDN enters into an IDN-aware application, intensive table lookup and mapping operations are performed in this order: 1. legacy code to unicode translation 2. nameprep 2.1 mapping 2.2 KC normalization 2.2.1 compatibility decomposition mapping 2.2.2 canonical composition mapping 2.3 prohibition 3. ACE encoding 3.1 reordering 3.2 encode successive code distance/difference in base?? 4. ACE decoding 4.1 restore reordering 4.2 decode base??-encoded labels into unicode code points 5. unicode to legacy code translation for display This helps us to estimate the relative importance of code size and code simplicity over ACE label efficiency in determining the best ACE algorithm for IDNA. Security considerations ACE-encoded reordered code points are restored in reverse ACE translation and this improvement do not introduce any new security problems into ACE. Acknowledgements I appreciate VeriSign GRS MLTBD teams' contributions with valuable ML.com testbed registration samples. Thanks Kilnam Chon, Kyungjae Park and other KRNIC members, Ives Arouye and Adam M. Costello for their advices and interests that help this draft to be more sophisticated one. Especially, Adam's suggestions on tunings,table construction and experiments made this draft more concrete and more suitable for AMC-ACE-Z. References [AMCACEZ] Adam Costello, "AMC-ACE-z version 0.3.1", 2001-Sep-07, draft-ietf-idn-amc-ace-z-00, latest version at http://www.cs.berkeley.edu/~amc/charset/amc-ace-z.gz [UNICODE] The Unicode Consortium, "The Unicode Standard", http://www.unicode.org/unicode/standard/standard.html. [IDNA] Patrik Falstrom, Paul Hoffman, "Internationalizing Host Names In Applications (IDNA)", draft-ietf-idn-idna-01 [NAMEPREP] Paul Hoffman, Marc Blanchet, "Preparation of Internationalized Host Names", Feb 2001, draft-ietf-idn-nameprep-03 Author Soobok Lee Postel Services, Inc. http://www.postel.co.kr Tel: +82-11-9774-2737 A1. REORDER.C: reordering library codes and data version 2.0 Save this example source code into reorder.c before saving and compiling lamcz.c in the same directory. /* begin of reorder.c version 2.0 */ /* reorder.c version 2.0, han-4096/hangeul-1024 */ /* Soobok Lee, 2001/9/8 */ #if UINT_MAX >= 0x1FFFFF typedef unsigned int u_codep; #else typedef unsigned long u_codep; #endif #define IS_RANGE(R,code) (code >= R ## _BASE && \ code < R ## _BASE+ R ## _SIZE) #define ARABIC_BASE 0x0600 #define ARABIC_SIZE 0x0100 #define ARABIC_FREQ 0x41 u_codep ARABIC_TABLE[ ARABIC_FREQ ] = { 0x066a,0x0664,0x0665,0x0666,0x0667,0x0668,0x0669,0x066b, 0x066d,0x06b0,0x061f,0x06af,0x0671,0x06de,0x0663,0x06cc, 0x061b,0x0662,0x064c,0x0661,0x0652,0x064b,0x0660,0x0650, 0x064f,0x0624,0x064d,0x0640,0x064e,0x0622,0x0630,0x0651, 0x0638,0x0626,0x062b,0x0649,0x0621,0x063a,0x0636,0x0625, 0x062e,0x0637,0x0635,0x0632,0x0634,0x0623,0x0647,0x062d, 0x0642,0x062c,0x0641,0x0643,0x0639,0x062f,0x0629,0x0633, 0x062a,0x0628,0x0646,0x0648,0x0645,0x0631,0x064a,0x0644, 0x0627}; #define CYRILLIC_BASE 0x0400 #define CYRILLIC_SIZE 0x0130 #define CYRILLIC_FREQ 0x2f u_codep CYRILLIC_TABLE[ CYRILLIC_FREQ ] = { 0x04b9,0x0459,0x045e,0x045a,0x0457,0x0455,0x0454,0x045b, 0x045f,0x045c,0x0458,0x0453,0x0449,0x0456,0x0452,0x044a, 0x0451,0x0436,0x044e,0x044d,0x0448,0x0447,0x0446,0x0445, 0x0439,0x044c,0x044b,0x044f,0x0444,0x0437,0x0433,0x0431, 0x0443,0x0434,0x043f,0x0432,0x043c,0x043b,0x043a,0x0441, 0x043d,0x0442,0x0435,0x0440,0x0438,0x043e,0x0430}; #define ETHIOPIC_BASE 0x1200 #define ETHIOPIC_SIZE 0x01A0 #define ETHIOPIC_FREQ 0x74 u_codep ETHIOPIC_TABLE[ ETHIOPIC_FREQ ] = { 0x12ca,0x123a,0x12cc,0x123b,0x130c,0x1212,0x1320,0x1323, 0x130b,0x12d3,0x123c,0x134a,0x1290,0x121a,0x1303,0x1300, 0x134b,0x121c,0x134e,0x1201,0x12f6,0x121e,0x127d,0x1296, 0x1240,0x12db,0x12dc,0x12c8,0x12de,0x1354,0x120a,0x1355, 0x120c,0x12a4,0x122e,0x12a6,0x12a8,0x12f0,0x1309,0x1308, 0x1266,0x12e8,0x12aa,0x1306,0x1305,0x1231,0x12cd,0x1219, 0x12d5,0x1276,0x130a,0x1274,0x1369,0x12da,0x12dd,0x1272, 0x1236,0x129b,0x12a3,0x1230,0x130e,0x133d,0x1234,0x1246, 0x1352,0x123d,0x1264,0x1233,0x1228,0x12a5,0x1243,0x1260, 0x12ed,0x126a,0x12f1,0x1292,0x121b,0x1356,0x1206,0x120e, 0x1245,0x1261,0x12ae,0x12f2,0x120b,0x1294,0x1293,0x122a, 0x1232,0x12ab,0x1262,0x12f5,0x1273,0x12cb,0x1263,0x1335, 0x1265,0x1210,0x130d,0x1270,0x121d,0x1238,0x12f3,0x1218, 0x120d,0x134d,0x12ad,0x122b,0x12eb,0x12a2,0x122d,0x1235, 0x12ee,0x1295,0x12a0,0x1275}; #define GREEK_BASE 0x0370 #define GREEK_SIZE 0x0090 #define GREEK_FREQ 0x23 u_codep GREEK_TABLE[ GREEK_FREQ ] = { 0x03b0,0x0390,0x03cb,0x03c8,0x03ca,0x03ce,0x03cd,0x03be, 0x03b6,0x03ae,0x03ad,0x03b2,0x03b8,0x03cc,0x03c7,0x03ac, 0x03c6,0x03af,0x03c9,0x03b4,0x03b3,0x03c5,0x03bc,0x03c0, 0x03b7,0x03ba,0x03bb,0x03bd,0x03c4,0x03c1,0x03b5,0x03c2, 0x03b9,0x03bf,0x03b1}; #define HANGUL_BASE 0xAC00 #define HANGUL_SIZE 0x2BA4 #define HANGUL_FREQ 0x400 u_codep HANGUL_TABLE[HANGUL_FREQ]={ 0xc216,0xbc0f,0xb2ff,0xd47c,0xaf3c,0xad7d,0xc5cc,0xb618, 0xb349,0xc370,0xb4a4,0xb985,0xd301,0xae4d,0xacb8,0xbc43, 0xc149,0xc298,0xb291,0xbc99,0xd131,0xd78c,0xb6f0,0xbe64, 0xc783,0xc9da,0xbb49,0xbb3b,0xb5a8,0xb3d7,0xcb48,0xc648, 0xbcb3,0xb975,0xcc3b,0xb0b5,0xc258,0xc0f4,0xd0a5,0xb0c4, 0xd5db,0xd754,0xce61,0xb968,0xc573,0xccc4,0xba70,0xce89, 0xbd90,0xb2e0,0xbfd4,0xc068,0xc496,0xb193,0xafcd,0xad97, 0xc0f9,0xb8f0,0xb78f,0xc228,0xaee8,0xc27c,0xba2f,0xd749, 0xac12,0xcc4c,0xd325,0xcc3f,0xbbc0,0xb057,0xb2cc,0xc974, 0xc7ad,0xaf2d,0xac31,0xb9ce,0xbf08,0xb4e3,0xb315,0xb380, 0xc5c7,0xc539,0xbd93,0xae5c,0xba84,0xcc10,0xd700,0xc880, 0xc170,0xb3d5,0xb310,0xb818,0xbe57,0xb048,0xc77d,0xcf8c, 0xc1f1,0xd31f,0xd750,0xc58f,0xaf42,0xbed0,0xbafc,0xb0ab, 0xc4f8,0xbc40,0xb369,0xd770,0xcabc,0xbcf6,0xb86d,0xb11b, 0xc5c9,0xd401,0xc65c,0xc82f,0xcef7,0xd0b7,0xb4ed,0xd5e8, 0xbd99,0xacb9,0xbbac,0xcf55,0xd69f,0xae61,0xb2ac,0xce85, 0xd48b,0xc448,0xbdd4,0xb78d,0xb58e,0xccbc,0xbc27,0xd799, 0xd0ed,0xc816,0xcffc,0xcc57,0xc465,0xbca1,0xbc11,0xbe54, 0xc3e0,0xc557,0xc290,0xd280,0xc7bc,0xb527,0xd140,0xb611, 0xae4a,0xb534,0xc530,0xce98,0xc8c4,0xac2f,0xbf55,0xce59, 0xd590,0xb385,0xb4ec,0xb96d,0xb35c,0xac9f,0xd0ac,0xb7ff, 0xcc60,0xc38c,0xd15d,0xb08c,0xcd78,0xc0db,0xb04c,0xcc54, 0xd03c,0xc610,0xd5d0,0xcc0d,0xc22b,0xc2f9,0xaf41,0xc19c, 0xc385,0xce35,0xc0b6,0xbc85,0xb6b1,0xb987,0xc787,0xd33d, 0xd759,0xb0d0,0xcf08,0xb5a0,0xd0e4,0xd018,0xacf1,0xc80a, 0xd6a1,0xc5e3,0xcfc4,0xc1a5,0xc500,0xc0cc,0xc0ec,0xd234, 0xade4,0xb192,0xd575,0xc9f8,0xd384,0xc5ff,0xbb50,0xd230, 0xb531,0xc7c8,0xb69d,0xacd7,0xc82c,0xd30d,0xb9f5,0xc634, 0xce6d,0xc5e1,0xc998,0xb301,0xb044,0xccab,0xc575,0xbc1b, 0xd798,0xbbff,0xc27d,0xac19,0xc918,0xcc64,0xbabb,0xd584, 0xb524,0xb418,0xb2db,0xc232,0xc999,0xcfe1,0xb128,0xb82c, 0xb515,0xc719,0xae54,0xb214,0xd720,0xc61b,0xbfcc,0xc989, 0xad0c,0xc058,0xb7b5,0xb2d0,0xc9d3,0xba4d,0xb518,0xb801, 0xb05d,0xac90,0xb8b0,0xc8e0,0xaf34,0xb959,0xd761,0xb810, 0xac94,0xc74d,0xc0fe,0xc0bd,0xb738,0xbc1d,0xad73,0xc78e, 0xd3a0,0xcd2c,0xba78,0xc8e4,0xd038,0xb118,0xb0b8,0xb625, 0xb155,0xc148,0xac13,0xd3ab,0xafbc,0xba64,0xad34,0xb9bf, 0xb561,0xc950,0xd1b0,0xc0ad,0xc55e,0xb538,0xb9d8,0xb54c, 0xbe7c,0xd587,0xb458,0xc5c6,0xbc97,0xce84,0xb2f5,0xcbd4, 0xc2f6,0xbf40,0xc090,0xd0f1,0xc90c,0xbd10,0xc15c,0xb9d1, 0xd399,0xd0e0,0xc639,0xcf04,0xc798,0xbb35,0xba5c,0xcca9, 0xacf0,0xcca8,0xce69,0xc878,0xb454,0xaebc,0xd321,0xc2b4, 0xcabd,0xbd04,0xbd95,0xacf6,0xcc1c,0xc3d8,0xc0f7,0xbab8, 0xc813,0xb97c,0xbc2d,0xc81d,0xc22f,0xb69c,0xc70c,0xb784, 0xb4dd,0xb465,0xacfd,0xc570,0xc250,0xc53d,0xb41c,0xae34, 0xd655,0xc0bf,0xac9c,0xc9dd,0xc308,0xbd07,0xb0ad,0xb7ad, 0xd300,0xd3fc,0xba55,0xb3fc,0xb730,0xcea1,0xc580,0xd1f4, 0xc728,0xafc0,0xcc29,0xc904,0xd508,0xd751,0xce94,0xb290, 0xd1b1,0xbc0d,0xb188,0xc751,0xb358,0xb5bc,0xb0e5,0xbc24, 0xc22d,0xb371,0xd0b4,0xb9b4,0xcf10,0xc635,0xacf3,0xb12c, 0xce04,0xc14b,0xba58,0xb364,0xb51c,0xd1a8,0xc98c,0xba4b, 0xd2bc,0xae68,0xd035,0xb760,0xac83,0xc568,0xc6b1,0xcd09, 0xbe10,0xb7a9,0xd608,0xc584,0xd56b,0xb0a9,0xd070,0xc2fc, 0xd540,0xce78,0xd1a1,0xb95c,0xc384,0xb86c,0xd790,0xb728, 0xaed8,0xc5b5,0xb451,0xc990,0xcc0c,0xb799,0xd718,0xd6fc, 0xb150,0xc12f,0xc388,0xb1cc,0xac24,0xb610,0xd31d,0xc12d, 0xb7f4,0xd2f4,0xaca8,0xd2f8,0xb8f8,0xd38c,0xc05c,0xb304, 0xb3d4,0xace4,0xc88c,0xc838,0xb2ed,0xc7c1,0xb864,0xc9d0, 0xafb8,0xd380,0xbabd,0xbb18,0xc7a0,0xb7a8,0xc6f0,0xb5a1, 0xade0,0xc36c,0xc83c,0xceec,0xc5fd,0xd3ed,0xb284,0xac1d, 0xb1e8,0xbc34,0xb7fc,0xb7f0,0xb9f9,0xc788,0xb205,0xbcbc, 0xba39,0xccd0,0xb8e1,0xd601,0xd034,0xb07c,0xc824,0xc6cd, 0xc9d5,0xc6c3,0xc794,0xcf69,0xcc99,0xd39c,0xd31c,0xd134, 0xafc8,0xad74,0xb9de,0xd329,0xb81b,0xc820,0xbe48,0xbcbd, 0xb374,0xb8e9,0xbbf9,0xb871,0xb208,0xbe68,0xbc25,0xb099, 0xd1a4,0xb125,0xac11,0xc96c,0xcef5,0xc20d,0xd5cc,0xd0d0, 0xc789,0xd3d0,0xce21,0xc561,0xbe45,0xb7ed,0xbe75,0xc8fd, 0xd578,0xba40,0xc30d,0xd0d5,0xc811,0xcda4,0xc6c0,0xb530, 0xc154,0xc695,0xbe60,0xb989,0xc5fc,0xc2a8,0xc637,0xc140, 0xb840,0xbcfc,0xc0e4,0xad81,0xd2c0,0xc555,0xb4c0,0xb545, 0xac78,0xb828,0xd480,0xbe59,0xce60,0xc2ed,0xb09a,0xbdf0, 0xc1c4,0xd2f1,0xc73c,0xc5c4,0xd788,0xc708,0xaf2c,0xbc38, 0xd640,0xc0d8,0xb9c9,0xd0a8,0xc2ac,0xb17c,0xd0c4,0xd3f4, 0xb179,0xad7f,0xc2f1,0xb0c9,0xd150,0xb2dd,0xd29c,0xc9f1, 0xcc3e,0xadc0,0xc368,0xc6c5,0xb140,0xb7c9,0xc12c,0xc4f0, 0xd6a8,0xb960,0xc300,0xc1e0,0xb9db,0xd53d,0xbcc4,0xd61c, 0xc808,0xbaac,0xb809,0xc9dc,0xd5e4,0xacac,0xd68d,0xb2d8, 0xd5ec,0xbca8,0xbe5b,0xc219,0xb180,0xd0d1,0xd32c,0xae4c, 0xba3c,0xb8f9,0xcea0,0xc7a1,0xcd98,0xc644,0xb9d0,0xaddc, 0xd0b9,0xc26c,0xcfe0,0xcc45,0xb9e8,0xac08,0xce68,0xb298, 0xcd1d,0xb839,0xbc94,0xb378,0xc5d1,0xadfc,0xb86f,0xc0b4, 0xbc8c,0xd0c1,0xb978,0xace1,0xc874,0xcde8,0xac01,0xd3b8, 0xc735,0xaca9,0xc9c8,0xb144,0xb9dd,0xd6c8,0xc2b5,0xd568, 0xcf13,0xce5c,0xcc30,0xc0f5,0xb2a5,0xb4e0,0xadf9,0xb85d, 0xb9bd,0xb3c8,0xcc2c,0xb3cc,0xae09,0xd050,0xb09c,0xae38, 0xb9e5,0xb0a0,0xcd0c,0xb824,0xcc38,0xc288,0xc190,0xc5b8, 0xc900,0xb780,0xc5bc,0xb860,0xc62c,0xb77d,0xd734,0xcfe8, 0xc871,0xbba4,0xd48d,0xce7c,0xb2e5,0xc88b,0xc0c9,0xd560, 0xd478,0xbcc0,0xb825,0xbe4c,0xc724,0xb204,0xcf64,0xc625, 0xc801,0xd45c,0xd544,0xcf5c,0xc2b9,0xd765,0xb110,0xcda9, 0xb3c5,0xc554,0xbc00,0xc21c,0xac80,0xbd09,0xba74,0xcd94, 0xc194,0xd669,0xd5a5,0xd6c4,0xd3f0,0xc2f8,0xb9ad,0xb355, 0xbc88,0xd5c8,0xacb0,0xb984,0xc775,0xc559,0xcf58,0xd65c, 0xc655,0xcd5c,0xc0c8,0xc791,0xd63c,0xd76c,0xc2ec,0xd314, 0xad70,0xba38,0xb2c9,0xd15c,0xb958,0xb4e4,0xb9c1,0xc564, 0xc139,0xcc98,0xbc45,0xb80c,0xbe14,0xbca4,0xb294,0xc5f4, 0xbaa9,0xd64d,0xb354,0xce90,0xd328,0xd2b9,0xac10,0xd37c, 0xbd84,0xb78c,0xb108,0xcc44,0xcee8,0xb2f4,0xbcf8,0xbd88, 0xc545,0xcee4,0xd138,0xae00,0xb529,0xd648,0xcd08,0xd56d, 0xb4f1,0xbc18,0xd14d,0xd658,0xcca0,0xac04,0xc800,0xc785, 0xc18d,0xb2f7,0xd5d8,0xc6f9,0xc744,0xb450,0xcd9c,0xc704, 0xd4e8,0xc6e8,0xd0c8,0xd398,0xb8e8,0xc9c1,0xccb4,0xc99d, 0xd3c9,0xb7fd,0xc5d8,0xc694,0xb974,0xc11d,0xb79c,0xc54c, 0xd615,0xd0dd,0xb274,0xbc95,0xbab0,0xc5e0,0xce20,0xd305, 0xc640,0xd0a4,0xbd81,0xb2f9,0xc628,0xc560,0xbc1c,0xc158, 0xbca0,0xb2e8,0xcd95,0xb9cc,0xc721,0xb8cc,0xc57c,0xac70, 0xd0dc,0xb9b0,0xc5ed,0xc9d1,0xc6d4,0xace8,0xbcf5,0xc6cc, 0xb2ec,0xb370,0xc57d,0xb9bc,0xc74c,0xcc3d,0xc678,0xd154, 0xd22c,0xb0b4,0xc5d4,0xad8c,0xbc31,0xd50c,0xd611,0xd569, 0xbbfc,0xc988,0xaf43,0xb798,0xbcd1,0xc220,0xb18d,0xd310, 0xac8c,0xcf00,0xc2e4,0xd604,0xc1fc,0xd551,0xc13c,0xc6b8, 0xadf8,0xc810,0xbc15,0xc885,0xb124,0xc0dd,0xd14c,0xac1c, 0xc608,0xd074,0xc784,0xba85,0xccad,0xb7ec,0xc124,0xb791, 0xc6b4,0xc1a1,0xc528,0xb178,0xae08,0xd574,0xb0a8,0xbc30, 0xbe0c,0xd1a0,0xae40,0xc740,0xac74,0xacc4,0xc7ac,0xc5ec, 0xba54,0xcc28,0xd1b5,0xc120,0xc0bc,0xbc84,0xad00,0xac15, 0xce58,0xc591,0xbc14,0xc6a9,0xcef4,0xd589,0xc5f0,0xbb3c, 0xbaa8,0xd488,0xd0c0,0xd638,0xb808,0xc2dd,0xd30c,0xd53c, 0xc911,0xc870,0xcc9c,0xad11,0xbb34,0xc9c4,0xb2c8,0xacfc, 0xc5c5,0xce74,0xb137,0xc548,0xd558,0xb2e4,0xd2f0,0xc5d0, 0xd3ec,0xc758,0xb9e4,0xbc29,0xbb38,0xc601,0xd06c,0xc18c, 0xc720,0xc6b0,0xc138,0xb098,0xc11c,0xcf54,0xd504,0xc624, 0xc81c,0xad50,0xacbd,0xacf5,0xd68c,0xb85c,0xc0c1,0xb514, 0xc77c,0xbbf8,0xc218,0xbd80,0xc2e0,0xc5b4,0xace0,0xb9c8, 0xb3c4,0xad6c,0xbe44,0xac00,0xd654,0xc7a5,0xc131,0xd559, 0xd130,0xc8fc,0xb4dc,0xc815,0xc804,0xb77c,0xbcf4,0xc2dc, 0xad6d,0xc0b0,0xc790,0xc9c0,0xc6d0,0xb3d9,0xd2b8,0xb300, 0xd55c,0xae30,0xc778,0xc544,0xb9ac,0xc0ac,0xc2a4,0xc774 }; #define HEBREW_BASE 0x0590 #define HEBREW_SIZE 0x0070 #define HEBREW_FREQ 0x1d u_codep HEBREW_TABLE[ HEBREW_FREQ ] = { 0x05f0,0x05f3,0x05e3,0x05e5,0x05da,0x05e6,0x05d6,0x05d2, 0x05df,0x05db,0x05d8,0x05e4,0x05d7,0x05e2,0x05e1,0x05dd, 0x05e7,0x05d3,0x05e9,0x05e0,0x05d1,0x05d0,0x05dc,0x05ea, 0x05de,0x05d4,0x05e8,0x05d5,0x05d9}; #define DEVANAGARI_BASE 0x0900 #define DEVANAGARI_SIZE 0x0080 #define DEVANAGARI_FREQ 0x40 u_codep DEVANAGARI_TABLE[ DEVANAGARI_FREQ ] = { 0x0933,0x0911,0x0922,0x0950,0x0967,0x0964,0x0946,0x091d, 0x090b,0x090e,0x0910,0x0931,0x090a,0x0945,0x0937,0x0909, 0x0901,0x090f,0x0908,0x0943,0x0918,0x0906,0x0949,0x091e, 0x094c,0x0920,0x093c,0x0907,0x092d,0x0923,0x0948,0x0925, 0x0927,0x0905,0x0942,0x092b,0x0916,0x091a,0x0921,0x0941, 0x0939,0x0902,0x091f,0x0936,0x0926,0x0917,0x094b,0x092c, 0x091c,0x092a,0x092f,0x0924,0x0947,0x0932,0x0938,0x092e, 0x0940,0x093f,0x0915,0x0935,0x0928,0x094d,0x0930,0x093e }; #define HIRAGANA_BASE 0x3040 #define HIRAGANA_SIZE 0x0060 #define HIRAGANA_FREQ 0x56 u_codep HIRAGANA_TABLE[ HIRAGANA_FREQ ] = { 0x309d,0x309a,0x308e,0x3090,0x3091,0x309e,0x3045,0x3062, 0x3049,0x3047,0x3041,0x307a,0x305c,0x305e,0x3065,0x3078, 0x306c,0x3043,0x3074,0x307d,0x3077,0x3092,0x3071,0x307c, 0x3056,0x3079,0x3052,0x3086,0x305d,0x3085,0x305a,0x307b, 0x3054,0x304e,0x3050,0x3076,0x3087,0x3088,0x3080,0x306d, 0x3073,0x305b,0x3072,0x308c,0x3075,0x3070,0x3060,0x3069, 0x308d,0x3067,0x3081,0x3083,0x3066,0x306f,0x306b,0x3048, 0x3051,0x304c,0x3058,0x3064,0x308f,0x3082,0x3063,0x3042, 0x3084,0x307f,0x3061,0x3053,0x3059,0x306a,0x3089,0x3055, 0x305f,0x3068,0x304d,0x308b,0x307e,0x304f,0x304a,0x304b, 0x3046,0x308a,0x3057,0x306e,0x3093,0x3044}; #define KATAKANA_BASE 0x30A0 #define KATAKANA_SIZE 0x0060 #define KATAKANA_FREQ 0x5a u_codep KATAKANA_TABLE[ KATAKANA_FREQ ] = { 0x30fd,0x30fe,0x30f5,0x30ee,0x30f1,0x30f2,0x30f0,0x30c5, 0x30c2,0x30f6,0x30a5,0x30be,0x30cc,0x30bc,0x30f4,0x30d8, 0x30e8,0x30b2,0x30b4,0x30d2,0x30ae,0x30e6,0x30a9,0x30da, 0x30e4,0x30dc,0x30b6,0x30a1,0x30d9,0x30fb,0x30ce,0x30bd, 0x30d4,0x30dd,0x30c4,0x30e2,0x30db,0x30ac,0x30b1,0x30cf, 0x30a7,0x30c0,0x30ba,0x30ef,0x30bb,0x30df,0x30e7,0x30e3, 0x30e5,0x30c1,0x30ad,0x30ca,0x30e1,0x30d6,0x30d1,0x30d0, 0x30cb,0x30a6,0x30aa,0x30b0,0x30c7,0x30ed,0x30cd,0x30d3, 0x30b5,0x30d7,0x30a3,0x30ab,0x30e0,0x30de,0x30ec,0x30b3, 0x30a8,0x30c9,0x30b8,0x30d5,0x30c6,0x30bf,0x30b7,0x30e9, 0x30ea,0x30a2,0x30af,0x30eb,0x30c3,0x30c8,0x30a4,0x30b9, 0x30f3,0x30fc}; #define LATIN_BASE 0x00A0 #define LATIN_SIZE 0x02D0 #define LATIN_FREQ 0x70 u_codep LATIN_TABLE[ LATIN_FREQ ] = { 0x02c9,0x02bc,0x0121,0x00ab,0x0303,0x0263,0x0304,0x025b, 0x014d,0x0307,0x0199,0x0259,0x02c6,0x00bb,0x01fd,0x030a, 0x0109,0x0127,0x00ac,0x00a6,0x013a,0x0138,0x0135,0x014f, 0x0157,0x013e,0x0117,0x016b,0x00f7,0x0171,0x010f,0x0115, 0x00b6,0x00b1,0x01a1,0x0129,0x0148,0x01b0,0x0125,0x0146, 0x013c,0x00b0,0x00d7,0x017a,0x0192,0x0113,0x016f,0x0155, 0x012b,0x00a1,0x0123,0x0165,0x00a4,0x0144,0x0159,0x00b7, 0x00ff,0x00a7,0x0105,0x0101,0x011b,0x0151,0x0153,0x00a2, 0x0103,0x00a5,0x00bf,0x015b,0x00f9,0x00ec,0x0107,0x017e, 0x0119,0x017c,0x00fb,0x00f0,0x00fe,0x0111,0x00a3,0x00f5, 0x0161,0x0142,0x00eb,0x0163,0x00f2,0x010d,0x00ee,0x00ef, 0x00a9,0x00e2,0x00e3,0x00fd,0x00ea,0x011f,0x00f4,0x00ae, 0x00e0,0x00fa,0x015f,0x00e8,0x00e6,0x00e7,0x00e1,0x00e5, 0x00ed,0x00f3,0x00f8,0x00fc,0x00f1,0x00f6,0x00e4,0x00e9 }; #define LATINEA_BASE 0x1E00 #define LATINEA_SIZE 0x0100 #define LATINEA_FREQ 0x18 u_codep LATINEA_TABLE[ LATINEA_FREQ ] = { 0x1ec1,0x1ed1,0x1ea5,0x1ea7,0x1ed5,0x1ed7,0x1ead,0x1eb5, 0x1eb7,0x1ef9,0x1ec3,0x1edd,0x1ebf,0x1ee5,0x1ee9,0x1ec5, 0x1ea3,0x1ecd,0x1ed3,0x1ed9,0x1edb,0x1ecb,0x1ec7,0x1ea1 }; #define TAMIL_BASE 0x0B80 #define TAMIL_SIZE 0x0080 #define TAMIL_FREQ 0x2a u_codep TAMIL_TABLE[ TAMIL_FREQ ] = { 0x0b90,0x0b9c,0x0b9e,0x0b8a,0x0b86,0x0b8f,0x0bb7,0x0b92, 0x0b89,0x0bb1,0x0b82,0x0bc0,0x0bca,0x0bc7,0x0b93,0x0b88, 0x0bc6,0x0bc2,0x0b87,0x0bcb,0x0b9f,0x0b99,0x0b85,0x0baf, 0x0bb3,0x0ba3,0x0ba8,0x0baa,0x0bb5,0x0bc1,0x0bc8,0x0bb4, 0x0ba9,0x0bb0,0x0bb2,0x0b9a,0x0bbe,0x0ba4,0x0b95,0x0bae, 0x0bbf,0x0bcd}; #define THAI_BASE 0x0E00 #define THAI_SIZE 0x0080 #define THAI_FREQ 0x4b u_codep THAI_TABLE[ THAI_FREQ ] = { 0x0e52,0x0e53,0x0e54,0x0e55,0x0e56,0x0e57,0x0e51,0x0e58, 0x0e12,0x0e06,0x0e0f,0x0e09,0x0e11,0x0e0e,0x0e24,0x0e2f, 0x0e46,0x0e3f,0x0e59,0x0e1d,0x0e2c,0x0e4b,0x0e10,0x0e36, 0x0e4a,0x0e43,0x0e2e,0x0e13,0x0e0d,0x0e1c,0x0e29,0x0e20, 0x0e18,0x0e4d,0x0e16,0x0e28,0x0e1f,0x0e02,0x0e47,0x0e37, 0x0e39,0x0e08,0x0e30,0x0e41,0x0e0b,0x0e42,0x0e1b,0x0e0a, 0x0e38,0x0e1e,0x0e2b,0x0e44,0x0e1a,0x0e49,0x0e04,0x0e14, 0x0e48,0x0e15,0x0e25,0x0e27,0x0e4c,0x0e35,0x0e2a,0x0e07, 0x0e31,0x0e34,0x0e21,0x0e17,0x0e01,0x0e22,0x0e40,0x0e2d, 0x0e19,0x0e23,0x0e32}; #define UNIHAN_BASE 0x4E00 #define UNIHAN_SIZE 0x5200 #define UNIHAN_FREQ 0x1000 u_codep UNIHAN_TABLE[UNIHAN_FREQ]={ 0x8332,0x9433,0x956d,0x57cb,0x8dcc,0x54ea,0x849c,0x73cb, 0x7460,0x73bf,0x707a,0x99c1,0x9a73,0x7825,0x8a4e,0x8bb5, 0x87ec,0x8749,0x8e2b,0x78b0,0x6e25,0x9149,0x62d8,0x618e, 0x679a,0x5ad6,0x62b9,0x4e11,0x93fd,0x5980,0x69b4,0x74a7, 0x760d,0x75a1,0x9b27,0x95f9,0x6583,0x6bd9,0x6f7c,0x79d2, 0x79c9,0x5665,0x54dd,0x5984,0x629e,0x67b8,0x8046,0x6aac, 0x6bc6,0x6bb4,0x93e2,0x9556,0x6380,0x9598,0x95f8,0x62f1, 0x5141,0x96a3,0x6714,0x8e44,0x6a44,0x74df,0x7c64,0x8cb3, 0x8d30,0x6442,0x80fa,0x84fc,0x90f4,0x8c8a,0x80ba,0x6b3d, 0x94a6,0x63aa,0x745a,0x8b21,0x9112,0x90b9,0x559a,0x5524, 0x8912,0x5077,0x8a6c,0x8bdf,0x7be9,0x7b5b,0x7722,0x6cc4, 0x599e,0x6f8e,0x7ee9,0x859b,0x7119,0x9df2,0x9e6b,0x6f6f, 0x6d54,0x52a3,0x5d97,0x5d02,0x6614,0x82b9,0x7d5e,0x7ede, 0x5b9b,0x5157,0x5156,0x903c,0x5bb0,0x6271,0x9e93,0x583a, 0x4e15,0x7d18,0x6de8,0x9d1b,0x9e33,0x72f8,0x901b,0x76d4, 0x7165,0x7115,0x6251,0x701f,0x6f47,0x5398,0x561f,0x6ef8, 0x6d52,0x53e6,0x7e5e,0x7ed5,0x6e4a,0x51d1,0x7672,0x766b, 0x92c7,0x94a1,0x7fc5,0x5fd9,0x968d,0x6eef,0x6ede,0x8af7, 0x8bbd,0x857a,0x95c7,0x6252,0x6436,0x62a2,0x641e,0x6435, 0x9761,0x9a6a,0x9a8a,0x6e09,0x7d43,0x854e,0x835e,0x6f5e, 0x57ae,0x84d3,0x6b53,0x7adf,0x8b20,0x8c23,0x83b7,0x4e56, 0x5cef,0x6070,0x8543,0x50ad,0x7e55,0x7f2e,0x574e,0x5fa0, 0x5f95,0x6065,0x803b,0x62dd,0x60ca,0x8b5f,0x987b,0x7e1b, 0x7f1a,0x6c9b,0x852d,0x836b,0x652c,0x63fd,0x7e10,0x7ec9, 0x5bee,0x7f70,0x7f5a,0x9cf6,0x9e22,0x6350,0x54b3,0x600e, 0x8986,0x7cb9,0x6bde,0x798e,0x796f,0x9685,0x93b3,0x954d, 0x6cbd,0x7246,0x6db5,0x8218,0x745b,0x8ce4,0x8d31,0x5e45, 0x6fd5,0x6dc7,0x8782,0x9945,0x9992,0x5440,0x7262,0x8d5e, 0x9ed4,0x76f2,0x6f70,0x6e83,0x89f8,0x66c6,0x81c9,0x8138, 0x9daf,0x83ba,0x62d4,0x65a4,0x6fc1,0x6d4a,0x9704,0x7d2f, 0x81fb,0x810f,0x64c1,0x62e5,0x8cdb,0x7aba,0x7aa5,0x634c, 0x934a,0x932c,0x85e9,0x621a,0x52aa,0x5384,0x5d14,0x4f36, 0x96a7,0x53ad,0x538c,0x65bc,0x7d7b,0x7152,0x709c,0x752b, 0x8fa8,0x6109,0x6fd8,0x809b,0x69fb,0x7e4d,0x8713,0x508d, 0x515c,0x9075,0x79e4,0x985b,0x98a0,0x9192,0x65ec,0x8870, 0x6c4e,0x69cc,0x936e,0x5a6d,0x5a05,0x5944,0x6ed4,0x5955, 0x92ef,0x9506,0x582a,0x7f69,0x83c1,0x5544,0x6524,0x644a, 0x9a28,0x90b8,0x5137,0x4fea,0x5a1f,0x66c7,0x6619,0x7a3c, 0x63c3,0x7145,0x76e1,0x68e0,0x5504,0x5457,0x4ffa,0x4f3a, 0x7815,0x66f0,0x5eb8,0x81fc,0x502a,0x9010,0x60e3,0x721b, 0x70c2,0x52f8,0x529d,0x81a8,0x561b,0x4fec,0x70fd,0x4e52, 0x7621,0x75ae,0x6631,0x9089,0x8e10,0x8df5,0x92f3,0x86a4, 0x501a,0x586b,0x5861,0x8acf,0x8bf9,0x5862,0x575e,0x8ddf, 0x8446,0x98b3,0x522e,0x79e9,0x6dae,0x8a93,0x60df,0x4f10, 0x53ed,0x65e2,0x82a5,0x6e2d,0x53ee,0x65e8,0x6a0b,0x778e, 0x5fcc,0x6454,0x6a2b,0x6452,0x8f29,0x8f88,0x7e8c,0x7eed, 0x5bc2,0x7941,0x5bb5,0x5e17,0x66a8,0x792a,0x783a,0x883b, 0x86ee,0x8525,0x8471,0x8a1f,0x8bbc,0x8ce0,0x8d54,0x9709, 0x633a,0x90af,0x52f3,0x52f2,0x8caa,0x8d2a,0x5f4e,0x5f2f, 0x8ff0,0x4e4e,0x6cd3,0x50d5,0x66e6,0x7bc7,0x7bf7,0x6fef, 0x80a9,0x59e6,0x5df7,0x7c8b,0x7280,0x5df2,0x6e9c,0x6ce3, 0x7490,0x64fa,0x6446,0x9130,0x90bb,0x8cca,0x8d3c,0x88d9, 0x6e26,0x6da1,0x8774,0x6158,0x60e8,0x4ff4,0x6f20,0x6813, 0x9a37,0x9a9a,0x4edf,0x7435,0x73c2,0x873b,0x814a,0x754f, 0x9cf8,0x96c7,0x58d8,0x5792,0x5841,0x6636,0x9211,0x94a3, 0x96cd,0x55bb,0x900d,0x92e4,0x9504,0x8017,0x5830,0x885d, 0x6ab3,0x69df,0x5c46,0x5c4a,0x6556,0x9b25,0x751a,0x5243, 0x8292,0x83c7,0x5993,0x8ae7,0x8c10,0x597b,0x98ea,0x996a, 0x75e9,0x8ecb,0x8f67,0x9654,0x6daa,0x71f4,0x70e9,0x8fe6, 0x632a,0x916a,0x96ef,0x6687,0x5f04,0x60b2,0x9209,0x94a0, 0x62ab,0x6c90,0x8718,0x6016,0x7b8f,0x7b5d,0x865e,0x5ef7, 0x6fe0,0x687f,0x7363,0x947c,0x9523,0x7aff,0x7336,0x72b9, 0x79bd,0x9d8f,0x5b43,0x5b22,0x617e,0x54bd,0x5475,0x6094, 0x978f,0x5de9,0x8a6e,0x8be0,0x525d,0x5265,0x758f,0x7a4c, 0x7a23,0x6977,0x6137,0x607a,0x8ff9,0x62fe,0x831c,0x5c60, 0x59ac,0x59b3,0x675e,0x74e3,0x5f01,0x4f46,0x5824,0x67af, 0x582f,0x5c27,0x4e3c,0x860b,0x4f6c,0x6ef4,0x8396,0x830e, 0x7a9f,0x7fe1,0x8faf,0x8fa9,0x7bc4,0x6854,0x75b2,0x8650, 0x7575,0x6893,0x8087,0x7436,0x9047,0x937c,0x744b,0x73ae, 0x9470,0x94a5,0x90a2,0x7b28,0x9475,0x7f36,0x5321,0x9d72, 0x9e4a,0x86db,0x6795,0x62ec,0x6a0a,0x5df3,0x5f93,0x7d8f, 0x7ee5,0x5cb1,0x9c0d,0x9cc5,0x9c0c,0x6089,0x6a9c,0x6867, 0x5be1,0x68cd,0x6170,0x96db,0x96cf,0x8862,0x4eff,0x698e, 0x9f90,0x5e9e,0x9bca,0x9ca8,0x9a12,0x5594,0x5d69,0x96b4, 0x9647,0x6a1f,0x5acc,0x5507,0x700f,0x6d4f,0x5806,0x6ed5, 0x60a9,0x6a3d,0x540f,0x5ac4,0x533f,0x7656,0x6756,0x67da, 0x5598,0x5c4d,0x98fd,0x9971,0x7626,0x8a23,0x8bc0,0x7aaa, 0x5e16,0x8b0e,0x8c1c,0x66fd,0x701b,0x63ee,0x6325,0x6ec7, 0x7766,0x6e23,0x6816,0x7686,0x5d60,0x6e13,0x6995,0x8e0a, 0x6b6a,0x5256,0x5580,0x9162,0x543b,0x851a,0x6182,0x9713, 0x7de9,0x7f13,0x6df8,0x67f5,0x6805,0x5eec,0x5e90,0x524e, 0x5239,0x55aa,0x4e27,0x80d6,0x7696,0x91a4,0x8ca7,0x8d2b, 0x6912,0x8ddd,0x53d4,0x8170,0x7891,0x5208,0x85ab,0x6c57, 0x4f75,0x971c,0x703e,0x6f9c,0x5f64,0x69cd,0x5410,0x62b1, 0x745f,0x559d,0x714e,0x62e1,0x863f,0x841d,0x963b,0x74cf, 0x73d1,0x6b16,0x6984,0x5426,0x72ed,0x9b41,0x6c6a,0x9215, 0x94ae,0x7c97,0x5dbc,0x5c7f,0x83e9,0x91aa,0x5bf8,0x91cb, 0x91ca,0x6986,0x7aae,0x7a77,0x76fc,0x95ca,0x9614,0x6dcb, 0x6615,0x61cb,0x727d,0x7275,0x5f1b,0x73c0,0x6639,0x624e, 0x665d,0x663c,0x6c76,0x9c3b,0x9cd7,0x5ed6,0x9591,0x868a, 0x64f4,0x6269,0x99db,0x9a76,0x574f,0x99f1,0x9a86,0x6a80, 0x5750,0x5446,0x6488,0x635e,0x6558,0x53d9,0x4f0e,0x8695, 0x596a,0x593a,0x926c,0x94bc,0x9214,0x949e,0x8549,0x70b8, 0x6d12,0x9a5a,0x8350,0x75f4,0x9042,0x8309,0x5cfb,0x846f, 0x7194,0x6c28,0x7a7f,0x683d,0x7a46,0x7682,0x7681,0x85af, 0x541e,0x5451,0x6846,0x8354,0x6478,0x8403,0x8431,0x8ce6, 0x8d4b,0x6c40,0x8ab0,0x8c01,0x6c81,0x82d4,0x8881,0x87a2, 0x8424,0x86cd,0x583f,0x78b1,0x55e3,0x932f,0x9519,0x8feb, 0x4fb5,0x6ccc,0x68b6,0x923a,0x94b0,0x8155,0x8594,0x8537, 0x5996,0x98f4,0x9974,0x88e1,0x817f,0x9283,0x94f3,0x722a, 0x7573,0x837b,0x693f,0x76bf,0x817a,0x788e,0x541f,0x75d4, 0x6cb8,0x594e,0x8766,0x867e,0x9ce9,0x9e20,0x70ae,0x628a, 0x68d5,0x7372,0x68df,0x680b,0x7948,0x6baf,0x6ba1,0x60a3, 0x8944,0x9f62,0x8b19,0x8c26,0x8adb,0x8c00,0x7c9f,0x90ca, 0x9019,0x8fd9,0x9102,0x68e7,0x6808,0x7832,0x6a58,0x68b5, 0x6d78,0x8e30,0x64b2,0x69fd,0x67c4,0x8e64,0x8e2a,0x5bc3, 0x9b06,0x980c,0x9882,0x725f,0x4fcf,0x77fd,0x846b,0x7919, 0x788d,0x704c,0x6c1f,0x8e48,0x828b,0x50be,0x503e,0x7652, 0x69a8,0x9226,0x949b,0x9154,0x50bb,0x70d8,0x9091,0x6c2f, 0x8277,0x8276,0x760b,0x75af,0x6734,0x81fe,0x9ba8,0x72c4, 0x5074,0x4fa7,0x4ea1,0x7344,0x72f1,0x6e20,0x54c7,0x6591, 0x78f7,0x53c9,0x7426,0x4e33,0x4e32,0x58e9,0x575d,0x53e2, 0x4e1b,0x90b5,0x8588,0x835f,0x4ff3,0x60dc,0x9382,0x9541, 0x91c8,0x62fc,0x529d,0x52e7,0x775b,0x7de0,0x7f14,0x6876, 0x9b4f,0x6b64,0x9bc9,0x9ca4,0x9320,0x952d,0x57e0,0x797a, 0x5e76,0x7192,0x8367,0x5102,0x4fac,0x6803,0x90e1,0x6669, 0x5ba1,0x879e,0x8682,0x896f,0x886c,0x9855,0x5203,0x6ae5, 0x6a71,0x8896,0x731c,0x91d8,0x9489,0x6ac3,0x6311,0x6241, 0x5deb,0x7f72,0x92ad,0x7d79,0x7ee2,0x7e69,0x7ef3,0x7624, 0x9a45,0x9a71,0x6c92,0x6ca1,0x6735,0x9c77,0x9cc4,0x59dc, 0x6e15,0x701d,0x6ca5,0x56f0,0x6dd1,0x4e19,0x594f,0x892a, 0x71e6,0x707f,0x70c8,0x504f,0x6ff0,0x6f4d,0x8523,0x848b, 0x88f3,0x9077,0x8fc1,0x68ad,0x6d3d,0x6355,0x5983,0x5b75, 0x58de,0x58ca,0x7693,0x6f0f,0x4f3d,0x51f9,0x5ce8,0x4e48, 0x5faa,0x77ac,0x921e,0x94a7,0x8d6b,0x73ca,0x88c2,0x5351, 0x56d7,0x56f2,0x554a,0x8cc8,0x8d3e,0x64e6,0x731b,0x86d9, 0x8061,0x6589,0x5f31,0x99c6,0x7f38,0x598a,0x6557,0x8d25, 0x7a40,0x6c5c,0x6398,0x6590,0x9700,0x52f5,0x52b1,0x9a55, 0x9a84,0x7e3e,0x5029,0x970d,0x7591,0x6dd8,0x59ca,0x59c9, 0x9952,0x9976,0x6302,0x7d1b,0x7eb7,0x93a2,0x94a8,0x690e, 0x6c7e,0x9716,0x5c3d,0x820c,0x85aa,0x55ab,0x5587,0x8d1b, 0x8d63,0x8a70,0x8bd8,0x5be8,0x6e6e,0x6388,0x6012,0x6cd7, 0x5f26,0x8c8c,0x63ed,0x7aa9,0x7a9d,0x7a17,0x5ec3,0x6c08, 0x6be1,0x8ecc,0x8f68,0x6dc0,0x9b91,0x9c8d,0x90aa,0x9b45, 0x87f9,0x572d,0x609f,0x6930,0x6691,0x7c4c,0x7b79,0x749c, 0x68c4,0x5f03,0x75ab,0x7a3f,0x58be,0x57a6,0x7b4d,0x7b0b, 0x52c3,0x5c48,0x840c,0x63cf,0x5b64,0x9dd7,0x9e25,0x7cca, 0x5f91,0x5f84,0x816b,0x80bf,0x750c,0x74ef,0x4e08,0x61f7, 0x61d0,0x914e,0x5553,0x9245,0x949c,0x5112,0x4f3c,0x6492, 0x8721,0x596e,0x594b,0x7897,0x613f,0x5c3b,0x5fb9,0x5f7b, 0x91b8,0x540a,0x54c0,0x5589,0x8335,0x50e7,0x58e4,0x58cc, 0x87f2,0x51f8,0x70ef,0x7210,0x8cdc,0x8d50,0x658c,0x99b1, 0x9a6e,0x99c4,0x81bd,0x80c6,0x6d8c,0x5c6f,0x7538,0x752c, 0x8e0f,0x9583,0x95ea,0x63e1,0x5085,0x622f,0x616e,0x8651, 0x5c3a,0x5366,0x82ad,0x7be0,0x7a3b,0x9903,0x997a,0x7c60, 0x7b3c,0x8475,0x524a,0x777f,0x53e1,0x6daf,0x8a98,0x8bf1, 0x8aee,0x8c18,0x5371,0x8a0e,0x8ba8,0x5399,0x538d,0x5180, 0x7761,0x7cd5,0x90b1,0x4fb6,0x4fa3,0x4faf,0x8f3f,0x8206, 0x9edb,0x8429,0x8cab,0x8d2f,0x92c5,0x950c,0x6f2c,0x6e0d, 0x8b5a,0x8c2d,0x9a5b,0x9a7f,0x511f,0x507f,0x584a,0x5757, 0x7f3a,0x6b20,0x7238,0x576a,0x904d,0x6212,0x7b94,0x8273, 0x916f,0x9b26,0x95d8,0x71ed,0x70db,0x8aa4,0x8bef,0x5211, 0x7199,0x7469,0x83b9,0x63d2,0x5f17,0x99dd,0x9a7c,0x5d50, 0x5c9a,0x7b1b,0x81e5,0x5367,0x8907,0x937e,0x953a,0x8518, 0x8b93,0x8ba9,0x537b,0x5374,0x822c,0x8c9e,0x8d1e,0x50ac, 0x6cbf,0x5fb5,0x5fb4,0x8caf,0x8d2e,0x8846,0x5352,0x7aaf, 0x7a91,0x6df5,0x6e0a,0x5937,0x4f0d,0x5c65,0x7f9a,0x68da, 0x6155,0x99ae,0x51af,0x9055,0x8fdd,0x53eb,0x542b,0x6255, 0x7e8e,0x7e4a,0x8179,0x5a9a,0x8107,0x6e72,0x614e,0x840d, 0x64a5,0x62e8,0x588a,0x57ab,0x7dba,0x7eee,0x675f,0x5f6c, 0x9811,0x987d,0x6ecc,0x6da4,0x53ea,0x7db8,0x7eb6,0x5f88, 0x64d4,0x62c5,0x593e,0x5939,0x7070,0x5b55,0x8de8,0x8338, 0x5179,0x7027,0x6cf7,0x9ee8,0x5fcd,0x905c,0x900a,0x6ec5, 0x706d,0x629c,0x8b00,0x8c0b,0x7c27,0x68d7,0x67a3,0x7576, 0x7d09,0x7eab,0x9274,0x8f2f,0x8f91,0x5c24,0x9223,0x9499, 0x51fd,0x55ce,0x5417,0x611a,0x6b04,0x680f,0x9e92,0x6b78, 0x5f52,0x5f80,0x9eb5,0x4f0f,0x8317,0x9175,0x9751,0x5b5f, 0x790e,0x7840,0x856a,0x829c,0x7149,0x5e18,0x9189,0x5899, 0x4ef0,0x67ff,0x6ca7,0x54b8,0x90c1,0x51a8,0x6e24,0x7a00, 0x9be8,0x9cb8,0x785d,0x76d2,0x59da,0x57f7,0x6267,0x8f1b, 0x8f86,0x82bd,0x70ac,0x5f3e,0x945b,0x9271,0x4f69,0x7092, 0x99d5,0x9a7e,0x654e,0x606d,0x5f6d,0x9640,0x6905,0x84bc, 0x82cd,0x9824,0x9890,0x50a2,0x503c,0x535c,0x5f6b,0x5875, 0x5c18,0x5885,0x67aa,0x6960,0x53ec,0x54ad,0x5de1,0x84b2, 0x6162,0x4f19,0x786b,0x938c,0x8154,0x5bb4,0x5a29,0x52c9, 0x6368,0x9589,0x95ed,0x8fd4,0x907f,0x9ed8,0x8f03,0x8f83, 0x62b5,0x672e,0x88ab,0x5634,0x81ed,0x92b9,0x9508,0x8f4e, 0x8f7f,0x59d1,0x6163,0x60ef,0x60d1,0x814e,0x80be,0x677b, 0x5fd8,0x5951,0x64ae,0x8da8,0x8d8b,0x6eb6,0x6f6d,0x8b72, 0x5348,0x8096,0x934d,0x9540,0x505c,0x6416,0x6447,0x70e4, 0x6575,0x654c,0x5915,0x8ca0,0x8d1f,0x87fb,0x8681,0x918b, 0x5192,0x665f,0x8c5a,0x53f3,0x6746,0x633d,0x6500,0x4f51, 0x505a,0x4f70,0x9694,0x92f8,0x952f,0x9663,0x9635,0x76c6, 0x807d,0x8304,0x6ea2,0x62d6,0x81b3,0x7dbe,0x7eeb,0x7dd2, 0x7eea,0x72e9,0x838a,0x56ca,0x5434,0x5449,0x737b,0x732e, 0x7b26,0x85fb,0x54aa,0x6f02,0x6f62,0x6309,0x733f,0x7334, 0x856d,0x8427,0x8df3,0x723a,0x7237,0x54c9,0x9127,0x9093, 0x6276,0x8932,0x88e4,0x7562,0x6bd5,0x7a1a,0x53e5,0x5ac1, 0x8e42,0x9002,0x92ea,0x94fa,0x7c92,0x77b3,0x640f,0x7a4e, 0x9896,0x4ec0,0x5566,0x66fe,0x5f0a,0x91dc,0x4ecc,0x6c37, 0x7d76,0x89a7,0x66dc,0x5a46,0x8015,0x797f,0x7984,0x5f9e, 0x4ece,0x6de1,0x62dc,0x6efe,0x6eda,0x5782,0x84ec,0x9084, 0x8fd8,0x5fe7,0x59cb,0x6c82,0x62bd,0x7709,0x84b8,0x5e8f, 0x6050,0x5f70,0x5b8b,0x642c,0x7464,0x7476,0x6383,0x626b, 0x80de,0x5c16,0x6d44,0x5be9,0x7ff0,0x8074,0x58fa,0x58f6, 0x58f7,0x4ed1,0x716e,0x6643,0x7b39,0x9dfa,0x9e6d,0x53c3, 0x58ef,0x58ee,0x6f58,0x599d,0x74ca,0x743c,0x6216,0x6b72, 0x5c81,0x8a60,0x548f,0x7d99,0x8de1,0x9187,0x67f1,0x701a, 0x7f6a,0x99a8,0x9592,0x5a77,0x5012,0x50b2,0x8700,0x6e1d, 0x83c5,0x82f9,0x7ffc,0x75be,0x51dd,0x6f3f,0x6d46,0x653b, 0x6e7f,0x73ab,0x771e,0x6ff1,0x6208,0x82af,0x6a3a,0x6866, 0x5f13,0x4fc4,0x6bc5,0x96e2,0x4e26,0x9022,0x5ee2,0x5e9f, 0x5ec3,0x8cf4,0x8d56,0x983c,0x64c7,0x62e9,0x7432,0x5be2, 0x5bdd,0x9f61,0x9f84,0x9eba,0x7720,0x723d,0x5e15,0x80c3, 0x6000,0x5d57,0x6b73,0x7950,0x78b3,0x7a57,0x9756,0x7a81, 0x66f9,0x6cc1,0x51b5,0x7b52,0x8a2a,0x8bbf,0x5e7d,0x5098, 0x4f1e,0x8f9b,0x7236,0x98c4,0x98d8,0x72af,0x8ca2,0x8d21, 0x6b32,0x5019,0x6deb,0x80f8,0x757f,0x5e61,0x6577,0x8b7d, 0x8a89,0x7470,0x7f50,0x4e8e,0x5375,0x7a96,0x73c8,0x6edd, 0x9e9f,0x5283,0x97f6,0x8cfa,0x8d5a,0x925b,0x94c5,0x517c, 0x8266,0x8230,0x764c,0x81df,0x81d3,0x8fbb,0x570d,0x56f4, 0x5f52,0x5e30,0x886b,0x6982,0x964d,0x5f7c,0x99d2,0x9a79, 0x4f8b,0x834a,0x8346,0x96c1,0x6bbc,0x58f3,0x695a,0x78ba, 0x7fc1,0x76e7,0x5362,0x5de2,0x5de3,0x5320,0x8dd1,0x78da, 0x7816,0x78ca,0x6dee,0x66b4,0x809d,0x6dc4,0x50cd,0x751c, 0x7078,0x6df3,0x6cf3,0x6602,0x6bbf,0x6a23,0x6837,0x78a9, 0x7855,0x63a1,0x57d4,0x639b,0x526a,0x7a69,0x7a33,0x7a4f, 0x9375,0x952e,0x7763,0x7a42,0x86c7,0x4fe0,0x4fa0,0x899a, 0x9006,0x7dca,0x7d27,0x78d0,0x97cb,0x97e6,0x5f25,0x935b, 0x953b,0x9ecf,0x7c98,0x9059,0x9065,0x56b4,0x4e25,0x53b3, 0x9000,0x5076,0x8305,0x7adc,0x9854,0x80cc,0x984d,0x989d, 0x78ef,0x77f6,0x68ca,0x7881,0x89e6,0x7409,0x96a8,0x968f, 0x7b22,0x59fb,0x66a2,0x7545,0x82e6,0x79b9,0x6b8a,0x5949, 0x6dbc,0x51c9,0x5e79,0x89ba,0x89c9,0x5de6,0x5339,0x7433, 0x7384,0x8236,0x64ab,0x629a,0x8247,0x852c,0x84c4,0x83ab, 0x6e5b,0x66d9,0x8587,0x5766,0x9f20,0x8cbc,0x8d34,0x9858, 0x53c8,0x8521,0x7b54,0x5f02,0x6c41,0x95b1,0x9605,0x8a34, 0x8bc9,0x8fa3,0x7bad,0x57a3,0x77ef,0x77eb,0x63b2,0x4fc2, 0x684c,0x543e,0x4ee4,0x7b23,0x4e82,0x4e71,0x91c0,0x917f, 0x7db1,0x7eb2,0x660a,0x7d0b,0x7eb9,0x51cd,0x51bb,0x61f8, 0x60ac,0x857e,0x52d8,0x60a0,0x6247,0x7d9a,0x968e,0x9636, 0x7d33,0x7ec5,0x4fc3,0x79aa,0x7985,0x79bb,0x6263,0x8af8, 0x8bf8,0x6689,0x6656,0x4eab,0x896a,0x889c,0x6bcf,0x5247, 0x5219,0x5dee,0x87ba,0x552f,0x6b98,0x6b8b,0x61b2,0x5baa, 0x6ecb,0x5dba,0x9644,0x5bd2,0x5c4f,0x5c5b,0x8328,0x5a9b, 0x7551,0x4e59,0x8036,0x9336,0x76fe,0x5ec9,0x5011,0x4eec, 0x5764,0x64da,0x62e0,0x7378,0x517d,0x6df7,0x5893,0x70f9, 0x7e6a,0x7ed8,0x9727,0x96fe,0x8058,0x5b0c,0x5a07,0x5dfe, 0x7b4b,0x6697,0x5f92,0x6028,0x7e7c,0x7ee7,0x8fce,0x8c79, 0x6f33,0x6f54,0x5e25,0x5e05,0x99d0,0x9a7b,0x5378,0x9a0e, 0x9a91,0x74f6,0x73b2,0x74dc,0x50b7,0x4f24,0x5075,0x4fa6, 0x57fc,0x4f74,0x6bce,0x62f3,0x83cc,0x5bf0,0x971e,0x9774, 0x8a95,0x8bde,0x6bb5,0x65e6,0x658e,0x8ffd,0x8299,0x6e9d, 0x6c9f,0x674f,0x7058,0x6ee9,0x8584,0x9d3f,0x9e3d,0x507d, 0x4f2a,0x5edf,0x5e99,0x9f13,0x91ac,0x9171,0x5e81,0x7a74, 0x6b68,0x6b69,0x9069,0x7dbf,0x7ef5,0x821c,0x5c0a,0x59bb, 0x7387,0x7c3d,0x7b7e,0x526f,0x656c,0x9f3b,0x8150,0x60a8, 0x6850,0x5974,0x8d08,0x8d60,0x84c9,0x7981,0x5eb5,0x8207, 0x5319,0x7d42,0x7ec8,0x5be7,0x6f5f,0x5145,0x79a7,0x9b42, 0x742a,0x66f4,0x70bc,0x9063,0x7b20,0x84cb,0x76d6,0x808c, 0x7e31,0x7eb5,0x65cb,0x5d17,0x5c97,0x8482,0x7dd1,0x5d07, 0x5404,0x6b96,0x6f84,0x5355,0x5358,0x8070,0x806a,0x7fe0, 0x72d0,0x51c0,0x5176,0x5e95,0x8178,0x80a0,0x5761,0x67dc, 0x7e61,0x7ee3,0x961c,0x6674,0x7e2e,0x7f29,0x5b8c,0x88cf, 0x755c,0x64cd,0x5091,0x5e2d,0x59d3,0x8173,0x811a,0x6f5b, 0x6f5c,0x7261,0x62bc,0x984f,0x989c,0x6c27,0x6e96,0x82ac, 0x99c5,0x6297,0x5224,0x80af,0x7def,0x7eac,0x7206,0x51e1, 0x7018,0x6cf8,0x51ac,0x59d0,0x5154,0x96b1,0x9690,0x96a0, 0x59ec,0x59eb,0x5f81,0x500d,0x6e0b,0x57c3,0x5439,0x9014, 0x6838,0x88c1,0x66ae,0x72fc,0x900f,0x542c,0x5bc4,0x679d, 0x7d55,0x7edd,0x839e,0x7845,0x540e,0x6c61,0x6c5a,0x88f8, 0x501f,0x5805,0x575a,0x8358,0x6458,0x6d6e,0x795d,0x5f85, 0x61b6,0x5fc6,0x55ac,0x4e54,0x8a5e,0x8bcd,0x64ce,0x564c, 0x5c64,0x699c,0x5e7b,0x4f30,0x5b54,0x51b2,0x96c0,0x82d7, 0x975a,0x9753,0x51c6,0x9f4b,0x658b,0x59ff,0x7949,0x8e8d, 0x8dc3,0x539a,0x7375,0x730e,0x6bef,0x819c,0x5df1,0x5de7, 0x5349,0x714c,0x6cca,0x970a,0x6ffe,0x6ee4,0x6b77,0x6b74, 0x5e55,0x818f,0x871c,0x722d,0x4e89,0x8352,0x6ce1,0x9f52, 0x9f7f,0x8b1d,0x8c22,0x8aac,0x5b5a,0x76dc,0x76d7,0x5144, 0x52c1,0x52b2,0x640d,0x635f,0x8776,0x81f3,0x602a,0x6566, 0x4f59,0x638c,0x6c55,0x90ed,0x97fb,0x97f5,0x4f4e,0x9031, 0x5800,0x6234,0x58c7,0x6731,0x9808,0x5561,0x8404,0x6b62, 0x72c2,0x6817,0x6607,0x7c43,0x7bee,0x92b3,0x9510,0x5ef3, 0x5385,0x6d1e,0x5403,0x7956,0x9cf4,0x9e23,0x7e41,0x4ed6, 0x6eec,0x6caa,0x86cb,0x75db,0x5c90,0x908a,0x8fb9,0x845b, 0x5438,0x71e5,0x6cdb,0x670b,0x812b,0x8131,0x5f1f,0x986f, 0x663e,0x6fc3,0x6d53,0x8c6c,0x732a,0x9664,0x8f14,0x8f85, 0x6953,0x67ab,0x5e3d,0x5674,0x55b7,0x675c,0x5024,0x66c9, 0x6653,0x7834,0x5386,0x66a6,0x8209,0x4e3e,0x6319,0x5954, 0x6bba,0x6740,0x5fa9,0x8607,0x5230,0x969c,0x5857,0x67b6, 0x6563,0x5bd3,0x800c,0x4e43,0x8da3,0x5708,0x570f,0x515a, 0x69cb,0x5f46,0x522b,0x67d4,0x7d75,0x7a31,0x79f0,0x538b, 0x5727,0x5e6b,0x5e2e,0x52c7,0x719f,0x6a94,0x6863,0x6863, 0x6c83,0x5999,0x8461,0x53bb,0x5b5d,0x7d17,0x7eb1,0x66ff, 0x68d2,0x81e3,0x80e1,0x76e4,0x8ab2,0x8bfe,0x5373,0x8702, 0x51f0,0x8606,0x82a6,0x4e18,0x67ef,0x8463,0x627e,0x6dfb, 0x5904,0x51e6,0x8efd,0x4f38,0x58f9,0x58f1,0x58c1,0x8108, 0x8109,0x9d5d,0x9e45,0x6279,0x4e86,0x9081,0x8fc8,0x4f34, 0x5496,0x6696,0x95a9,0x95fd,0x7c4d,0x8acb,0x8bf7,0x978d, 0x5442,0x5415,0x96e3,0x96be,0x5747,0x9d28,0x9e2d,0x5ba3, 0x4ee5,0x5d8b,0x6d2a,0x8212,0x6046,0x5c3f,0x5986,0x8ce2, 0x8d24,0x8389,0x5b99,0x6b7b,0x5bf9,0x5bfe,0x843d,0x95f2, 0x6085,0x60a6,0x9451,0x6fe4,0x6d9b,0x5c01,0x50b5,0x503a, 0x4e58,0x4e57,0x52d2,0x79e6,0x6545,0x53ca,0x8a33,0x71d2, 0x70e7,0x5b6b,0x5b59,0x8d99,0x8d75,0x6bd2,0x6cbc,0x63a8, 0x6148,0x5b88,0x7aef,0x5eca,0x8d70,0x7a2e,0x676f,0x54f2, 0x710a,0x7d66,0x7ed9,0x6e80,0x789f,0x68af,0x5e33,0x5e10, 0x5146,0x67f4,0x69d8,0x72c0,0x72b6,0x7570,0x7235,0x5cfd, 0x5ce1,0x6551,0x521d,0x898f,0x89c4,0x71c3,0x8d0f,0x8d62, 0x5e63,0x5e01,0x6784,0x6649,0x664b,0x5a03,0x819a,0x80a4, 0x9f9c,0x9f9f,0x4e80,0x734e,0x5956,0x5968,0x68cb,0x72d7, 0x672b,0x6dfa,0x6d45,0x81f4,0x8033,0x59ae,0x8861,0x5957, 0x5f48,0x5f39,0x5cad,0x7e2b,0x7f1d,0x5821,0x7c3f,0x7126, 0x7a32,0x6881,0x78e8,0x9aee,0x9aea,0x5931,0x5ef6,0x93c8, 0x94fe,0x6597,0x58a8,0x75c7,0x7cb5,0x7ca4,0x708e,0x8aad, 0x6eff,0x6ee1,0x9805,0x9879,0x5ee0,0x5a1c,0x83ca,0x523a, 0x585e,0x5abd,0x5988,0x6a29,0x96d5,0x59c6,0x78a7,0x662f, 0x52df,0x5427,0x5e06,0x80ce,0x6301,0x9038,0x8abc,0x8c0a, 0x8f49,0x8f6c,0x8000,0x6548,0x52b9,0x6298,0x9670,0x9634, 0x5b30,0x5a74,0x523b,0x77ed,0x5f90,0x820d,0x820e,0x77e2, 0x9b3c,0x5a18,0x4e7e,0x5bb3,0x99ff,0x9a8f,0x820a,0x65e7, 0x5217,0x5bdf,0x6b61,0x6b22,0x65fa,0x7d68,0x7ed2,0x786c, 0x99b3,0x9a70,0x74e6,0x72ac,0x6790,0x6aa2,0x68c0,0x95a3, 0x9601,0x53cd,0x555f,0x542f,0x62ff,0x5360,0x8a17,0x7d9c, 0x7efc,0x79be,0x665a,0x9ece,0x9298,0x94ed,0x8cac,0x8d23, 0x68a8,0x59d4,0x60e1,0x60aa,0x6539,0x866b,0x5531,0x6d74, 0x7c21,0x7b80,0x8ed2,0x8f69,0x82e5,0x7d10,0x7ebd,0x85a9, 0x8428,0x8fad,0x8f9e,0x5bf5,0x5ba0,0x9707,0x8fba,0x947d, 0x94bb,0x8b80,0x8bfb,0x6851,0x8085,0x8083,0x8216,0x8217, 0x5e7c,0x707d,0x707e,0x589e,0x5897,0x838e,0x7ae0,0x532f, 0x5976,0x64ca,0x51fb,0x6483,0x6c7a,0x51b3,0x5b69,0x90a3, 0x59b9,0x5cb3,0x4e2a,0x5cb8,0x7e04,0x5212,0x796d,0x4eae, 0x6392,0x8fc5,0x587e,0x70ad,0x6a61,0x8fa7,0x5f01,0x5fbd, 0x6f01,0x6e14,0x7de8,0x7f16,0x53d7,0x8cde,0x8d4f,0x6e1b, 0x51cf,0x9a57,0x9a8c,0x68c9,0x96d1,0x5f65,0x5f66,0x8377, 0x8b5c,0x8c31,0x5b63,0x500b,0x6b21,0x4ed4,0x8303,0x804a, 0x888b,0x8b6f,0x8bd1,0x4f55,0x9905,0x997c,0x7434,0x5b58, 0x585a,0x56fa,0x8b1b,0x8bb2,0x8f09,0x8f7d,0x50d1,0x4fa8, 0x5c04,0x83f2,0x8fbc,0x8ff7,0x846c,0x6842,0x7802,0x51b6, 0x96f2,0x821f,0x81fa,0x5fc5,0x7565,0x5f8c,0x912d,0x90d1, 0x5eda,0x53a8,0x8102,0x4ea8,0x5c0d,0x5bf9,0x4ead,0x4fca, 0x6bcd,0x6226,0x9802,0x9876,0x5982,0x81e8,0x4e34,0x904e, 0x8fc7,0x9810,0x9884,0x654f,0x590d,0x5289,0x5218,0x6b65, 0x9b54,0x5c07,0x5c06,0x76d8,0x79cd,0x9418,0x949f,0x5169, 0x4e24,0x4e21,0x9332,0x4fd7,0x7164,0x5b98,0x651c,0x643a, 0x9444,0x94f8,0x965d,0x9655,0x7089,0x7ffb,0x98fc,0x9972, 0x5740,0x4e5f,0x5a24,0x7ca7,0x73ed,0x7df4,0x7ec3,0x5f79, 0x9178,0x53c2,0x865b,0x865a,0x91e3,0x9493,0x8aaa,0x8bf4, 0x662d,0x5fe0,0x5272,0x6f06,0x6f14,0x64ec,0x62df,0x8c50, 0x70cf,0x4e4c,0x5200,0x770c,0x8a3b,0x7da2,0x905e,0x9012, 0x8a8c,0x5f37,0x528d,0x5251,0x5263,0x5225,0x71d5,0x6075, 0x6668,0x60b3,0x5fb3,0x92d2,0x950b,0x6ce5,0x4f2f,0x96de, 0x9e21,0x6b3e,0x8ee2,0x70ba,0x4e3a,0x984c,0x9898,0x7d14, 0x7eaf,0x9e7d,0x76d0,0x5869,0x6230,0x6218,0x8010,0x690d, 0x7028,0x6fd1,0x702c,0x6ee8,0x9177,0x8c4a,0x55ae,0x5355, 0x689d,0x6761,0x9738,0x8987,0x6e58,0x5c31,0x518d,0x6eaa, 0x5353,0x8a69,0x8bd7,0x63d0,0x88dc,0x8865,0x82b3,0x5584, 0x8133,0x76e3,0x76d1,0x975c,0x9759,0x9d6c,0x9e4f,0x907a, 0x9057,0x96f6,0x6717,0x4f9d,0x73cd,0x9285,0x94dc,0x7b11, 0x54b2,0x65bd,0x6d82,0x5e4c,0x5433,0x5434,0x6749,0x7d30, 0x7ec6,0x96e8,0x5e8a,0x67fb,0x636e,0x6021,0x5b9e,0x5b9f, 0x67e5,0x8056,0x5e2f,0x50c5,0x4ec5,0x52e4,0x84ee,0x83b2, 0x52e2,0x52bf,0x79d8,0x8fb0,0x9675,0x4ef2,0x9aa8,0x6cf5, 0x6b0a,0x6743,0x7368,0x72ec,0x525b,0x521a,0x793a,0x67f3, 0x9732,0x65b7,0x65ad,0x7a4d,0x79ef,0x7345,0x72ee,0x9f4a, 0x9f50,0x7f85,0x6843,0x66fc,0x8535,0x51b0,0x8b66,0x51cc, 0x8840,0x547c,0x63db,0x6362,0x6e08,0x63f4,0x9a13,0x8a02, 0x8ba2,0x7b56,0x7483,0x5132,0x50a8,0x8457,0x6975,0x4e01, 0x6afb,0x6a31,0x685c,0x7518,0x4f18,0x805a,0x4efb,0x770b, 0x6c96,0x6d1b,0x574a,0x7b49,0x5377,0x5dfb,0x6b23,0x57f9, 0x65e9,0x7532,0x8655,0x5904,0x8d77,0x62d3,0x5175,0x934b, 0x9505,0x529f,0x95a5,0x9600,0x5ff5,0x97ff,0x54cd,0x7f8a, 0x93ae,0x9547,0x829d,0x7740,0x63a5,0x7947,0x5b9c,0x4f11, 0x6f2b,0x694a,0x6768,0x5c0b,0x5bfb,0x54e5,0x7259,0x65d7, 0x7e9c,0x7f06,0x7ce7,0x7cae,0x89aa,0x4eb2,0x7fbd,0x652f, 0x5411,0x9813,0x987f,0x5287,0x5267,0x8a8d,0x8ba4,0x9818, 0x9886,0x827e,0x7cf0,0x56e3,0x5b97,0x92c1,0x94dd,0x4ef7, 0x73bb,0x6167,0x6ed1,0x89d2,0x5f18,0x9910,0x9769,0x541b, 0x672a,0x6c23,0x4f7f,0x8ef8,0x8f74,0x80a5,0x6574,0x746a, 0x739b,0x65ed,0x67d3,0x9ed2,0x968a,0x961f,0x67cf,0x6d17, 0x7fd4,0x575b,0x6781,0x6f6e,0x6236,0x6237,0x6fc0,0x96f7, 0x907c,0x8fbd,0x6069,0x7926,0x77ff,0x5742,0x8c46,0x56de, 0x4f60,0x7d2b,0x512a,0x7e96,0x7ea4,0x6e21,0x9df9,0x9e70, 0x52de,0x52b3,0x52b4,0x5fa1,0x6d01,0x9cf3,0x51e4,0x9e7f, 0x91c7,0x5c3e,0x672d,0x8a13,0x8bad,0x5207,0x9304,0x5f55, 0x8fd1,0x6620,0x8981,0x8089,0x89e3,0x7de3,0x7f18,0x7e01, 0x9db4,0x9e64,0x5347,0x8f15,0x8f7b,0x591c,0x6db2,0x8003, 0x9ebb,0x6d69,0x8b8a,0x53d8,0x5909,0x5e72,0x7e23,0x53bf, 0x7fa4,0x5a66,0x5987,0x8b58,0x8bc6,0x841f,0x82b8,0x5e0c, 0x91dd,0x9488,0x627f,0x975e,0x651d,0x6444,0x8a3a,0x8bca, 0x7267,0x9bae,0x9c9c,0x5ea6,0x5948,0x6d3e,0x51b7,0x534a, 0x4f9b,0x7d39,0x7ecd,0x8cc3,0x8d41,0x7cd6,0x718a,0x8a55, 0x8bc4,0x671b,0x6c88,0x518a,0x518c,0x56e0,0x7bc9,0x8cea, 0x8d28,0x6606,0x821e,0x8a31,0x8bb8,0x9a30,0x817e,0x6536, 0x53ce,0x4ed8,0x8c61,0x8ced,0x8d4c,0x502b,0x4f26,0x4f4d, 0x6885,0x4ed9,0x8fa6,0x529e,0x78c1,0x7b46,0x7b14,0x85cf, 0x6804,0x5510,0x932b,0x9521,0x5f3a,0x8499,0x5564,0x5e84, 0x8c93,0x732b,0x6253,0x9396,0x9501,0x79c0,0x5679,0x5f53, 0x611f,0x842c,0x89c0,0x89c2,0x53f2,0x5bec,0x5bbd,0x5bdb, 0x5c1a,0x7c89,0x52dd,0x713c,0x8d64,0x7531,0x552e,0x9109, 0x4e61,0x90f7,0x82d1,0x91cf,0x9234,0x94c3,0x61c9,0x5e94, 0x5fdc,0x5104,0x4ebf,0x76c8,0x8302,0x5e1d,0x79df,0x8cd3, 0x5bbe,0x7586,0x7533,0x5858,0x5854,0x4e03,0x865f,0x53f7, 0x8b70,0x8bae,0x8679,0x79cb,0x9ea5,0x9ea6,0x9f0e,0x632f, 0x6770,0x6e6f,0x6c64,0x5047,0x4eee,0x592e,0x89b3,0x88d5, 0x8db3,0x4fbf,0x840a,0x83b1,0x6848,0x641c,0x635c,0x63da, 0x626c,0x5de8,0x5e9c,0x9322,0x94b1,0x7965,0x6839,0x9996, 0x89bd,0x89c8,0x8cc0,0x8d3a,0x80dc,0x6676,0x52a9,0x9244, 0x7fd2,0x4e60,0x9b6f,0x9c81,0x58d3,0x538b,0x98f2,0x996e, 0x7f6e,0x5fae,0x7ae5,0x96dc,0x6742,0x66f2,0x6c42,0x6ce8, 0x62cd,0x8eab,0x9078,0x9009,0x6ff5,0x6d5c,0x5e78,0x9867, 0x987e,0x54a8,0x7bb1,0x93e1,0x955c,0x5883,0x5ca9,0x5a92, 0x9673,0x9648,0x5c11,0x7d1a,0x7ea7,0x62db,0x79c1,0x8072, 0x58f0,0x5bbf,0x6d6a,0x7af9,0x864e,0x7167,0x674e,0x4f50, 0x5bfa,0x4e73,0x6025,0x74f7,0x753a,0x5bc6,0x614b,0x6001, 0x51a0,0x4e0e,0x578b,0x8a00,0x6ca2,0x7687,0x513f,0x516d, 0x6e2c,0x6d4b,0x7d50,0x7ed3,0x5049,0x4f1f,0x7b51,0x6258, 0x63a2,0x5291,0x5242,0x5264,0x7af6,0x7ade,0x50f9,0x4fa1, 0x5beb,0x5199,0x990a,0x517b,0x4ec1,0x985e,0x7c7b,0x5009, 0x4ed3,0x520a,0x5713,0x5706,0x5186,0x83dc,0x60f3,0x5ead, 0x6377,0x9ad4,0x4ed5,0x4f5b,0x4ecf,0x8f1d,0x8f89,0x5468, 0x8afe,0x8bfa,0x946b,0x8ac7,0x8c08,0x671f,0x6b4c,0x5723, 0x6237,0x6238,0x50cf,0x6200,0x604b,0x9326,0x9526,0x9ce5, 0x9e1f,0x5e36,0x5e26,0x7f57,0x7136,0x7a05,0x7a0e,0x7559, 0x7701,0x5c0e,0x5bfc,0x8a66,0x8bd5,0x73a9,0x5ea7,0x5c6c, 0x5c5e,0x7159,0x70df,0x6d66,0x978b,0x6bd4,0x962a,0x8ad6, 0x8bba,0x514d,0x6fa4,0x6cfd,0x9678,0x9646,0x5a1b,0x5a31, 0x5733,0x8a62,0x8be2,0x6a13,0x697c,0x7bc0,0x8282,0x756a, 0x5df4,0x6a21,0x5c3c,0x7d0d,0x7eb3,0x54e1,0x5458,0x559c, 0x773e,0x4f17,0x7d20,0x6700,0x96ea,0x5152,0x5150,0x8f38, 0x8f93,0x92b7,0x9500,0x8c6a,0x6bdb,0x63a7,0x8c6b,0x4e88, 0x8c9d,0x8d1d,0x8cb8,0x8d37,0x767b,0x653e,0x76df,0x6d88, 0x6fb3,0x9ed1,0x53d6,0x7b97,0x5f62,0x5cf0,0x6a39,0x6811, 0x9762,0x98ef,0x996d,0x7dcf,0x9676,0x9748,0x7075,0x54c8, 0x5ba2,0x6c60,0x8239,0x676d,0x96c4,0x65cf,0x7537,0x76ca, 0x7fa9,0x4e49,0x83f1,0x5178,0x592b,0x7d44,0x7ec4,0x9d3b, 0x9e3f,0x773c,0x69ae,0x8363,0x4eca,0x5473,0x76f4,0x7e3d, 0x603b,0x7d04,0x7ea6,0x8ecd,0x519b,0x6c47,0x6a6b,0x6a2a, 0x64ad,0x4fee,0x8857,0x5c45,0x6c99,0x5f35,0x5f20,0x6cbb, 0x671d,0x8d8a,0x8aa0,0x8bda,0x7cfb,0x53ef,0x76ee,0x8cb4, 0x8d35,0x5237,0x5e03,0x554f,0x95ee,0x624d,0x9b5a,0x9c7c, 0x725b,0x7d4c,0x51f1,0x51ef,0x5e38,0x81a0,0x80f6,0x79fb, 0x73fe,0x73b0,0x4e92,0x691c,0x7d72,0x4e1d,0x7cf8,0x60e0, 0x6232,0x620f,0x5148,0x4e3b,0x90ce,0x8349,0x76ae,0x79ae, 0x793c,0x4e38,0x547d,0x967a,0x5341,0x983b,0x9891,0x71c8, 0x706f,0x666f,0x5d0e,0x6f64,0x6da6,0x5171,0x6b50,0x6b27, 0x7da0,0x7eff,0x5a5a,0x8077,0x804c,0x4e39,0x5305,0x6307, 0x8f2a,0x8f6e,0x4e45,0x9001,0x8863,0x58fd,0x5bff,0x5f15, 0x5ec8,0x53a6,0x85cd,0x84dd,0x963f,0x679c,0x83d3,0x898b, 0x89c1,0x524d,0x982d,0x5934,0x9806,0x987a,0x8cbb,0x8d39, 0x5bae,0x5bab,0x62c9,0x8b77,0x62a4,0x826f,0x610f,0x5947, 0x6469,0x5bb9,0x5b9a,0x660c,0x6b6f,0x9686,0x7b2c,0x77e5, 0x878d,0x73e0,0x6a19,0x6807,0x8868,0x885b,0x536b,0x571f, 0x653f,0x90a6,0x4ecb,0x9632,0x677f,0x5165,0x9801,0x9875, 0x75c5,0x71df,0x8425,0x55b6,0x590f,0x4ff1,0x5036,0x5feb, 0x8fea,0x8449,0x53f6,0x7d71,0x7edf,0x6fdf,0x6d4e,0x9593, 0x95f4,0x8a18,0x8bb0,0x601d,0x5100,0x4eea,0x724c,0x96d9, 0x53cc,0x901f,0x706b,0x91cc,0x7968,0x904a,0x7248,0x666e, 0x5922,0x68a6,0x7247,0x8abf,0x8c03,0x6708,0x90f5,0x90ae, 0x8001,0x6e05,0x6c14,0x5ca1,0x5188,0x7389,0x5eab,0x5e93, 0x7642,0x7597,0x7d22,0x5b57,0x5f97,0x667a,0x5f69,0x6c0f, 0x4f0a,0x5343,0x7d21,0x7eba,0x914d,0x8853,0x6052,0x5851, 0x7d19,0x7eb8,0x5fd7,0x9ede,0x70b9,0x5b85,0x5206,0x5361, 0x7c73,0x8ca9,0x8d29,0x6625,0x6d59,0x5340,0x533a,0x4f86, 0x6765,0x7a97,0x7a93,0x8bc1,0x8cfd,0x8d5b,0x76db,0x8a71, 0x8bdd,0x5609,0x78bc,0x7801,0x57df,0x8d85,0x6eab,0x6e29, 0x68ee,0x6d3b,0x7a76,0x516b,0x5c40,0x76f8,0x4e30,0x4e91, 0x5382,0x6d25,0x745e,0x5718,0x56e3,0x6df1,0x5b8f,0x5ba4, 0x8cb7,0x4e70,0x7081,0x6c17,0x5967,0x5965,0x88fd,0x6f22, 0x6c49,0x9020,0x5c55,0x6211,0x9435,0x94c1,0x9e97,0x4e3d, 0x6ce2,0x50b3,0x4f20,0x4f1d,0x95dc,0x5173,0x95a2,0x96aa, 0x9669,0x4efd,0x5f8b,0x6a4b,0x6865,0x80fd,0x8cfc,0x8d2d, 0x4e5d,0x8c37,0x6cc9,0x7e54,0x7ec7,0x6e56,0x683c,0x8336, 0x4e8c,0x6821,0x97d3,0x97e9,0x9032,0x8fdb,0x9060,0x8fdc, 0x591a,0x5b81,0x5b87,0x767c,0x767a,0x5e74,0x8272,0x71b1, 0x70ed,0x5354,0x534f,0x96c5,0x506a,0x5099,0x7063,0x6e7e, 0x5143,0x771f,0x82cf,0x5a01,0x7a0b,0x540c,0x6176,0x5e86, 0x85e4,0x6b66,0x862d,0x5170,0x98fe,0x9970,0x52a0,0x8ca8, 0x8d27,0x5716,0x56fe,0x56f3,0x5c08,0x4e13,0x5c02,0x57fa, 0x514b,0x98db,0x98de,0x92fc,0x94a2,0x4e0b,0x4e94,0x7d00, 0x7eaa,0x6c38,0x544a,0x8fb2,0x519c,0x4f73,0x7dad,0x7ef4, 0x8ca1,0x8d22,0x767d,0x8b49,0x8a3c,0x5370,0x805e,0x95fb, 0x8005,0x4f4f,0x5802,0x91cd,0x5be6,0x5b9e,0x68b0,0x8edf, 0x8f6f,0x56db,0x597d,0x4f5c,0x677e,0x53cb,0x672f,0x6c7d, 0x7ad9,0x7d05,0x7ea2,0x9ec3,0x9ec4,0x7cbe,0x822a,0x756b, 0x753b,0x5409,0x5e73,0x5275,0x521b,0x4f53,0x958b,0x5f00, 0x9023,0x8fde,0x6d41,0x6751,0x7ba1,0x7acb,0x6027,0x5f71, 0x6cb3,0x53e4,0x65af,0x8a9e,0x8bed,0x539f,0x5167,0x5185, 0x9928,0x9986,0x624b,0x7121,0x65e0,0x6b63,0x9752,0x4e95, 0x6c11,0x6cb9,0x7d93,0x7ecf,0x7814,0x5e2b,0x5e08,0x4e0d, 0x53e3,0x74b0,0x73af,0x7684,0x53d1,0x6295,0x6e90,0x6d32, 0x6372,0x5065,0x80b2,0x4e07,0x6d0b,0x535a,0x81ea,0x98a8, 0x98ce,0x82f1,0x85dd,0x827a,0x5238,0x8166,0x8111,0x8cbf, 0x8d38,0x5168,0x90fd,0x4ea4,0x5236,0x5973,0x80a1,0x8def, 0x5916,0x4f01,0x91ce,0x767e,0x5bcc,0x6cd5,0x795e,0x5fb7, 0x4ef6,0x6578,0x6570,0x6e38,0x6750,0x8ce3,0x5356,0x58f2, 0x97f3,0x592a,0x9650,0x4ee3,0x7403,0x9580,0x95e8,0x6728, 0x723e,0x5c14,0x8208,0x5174,0x738b,0x6642,0x65f6,0x7a7a, 0x51fa,0x682a,0x6cf0,0x99ac,0x9a6c,0x6797,0x904b,0x8fd0, 0x66f8,0x4e66,0x9999,0x9053,0x537f,0x6559,0x8a0a,0x8baf, 0x5cf6,0x5c9b,0x58eb,0x8996,0x89c6,0x7523,0x5f0f,0x6210, 0x754c,0x7d61,0x7edc,0x98df,0x798f,0x7528,0x6a5f,0x6e2f, 0x967d,0x9633,0x90e8,0x4e4b,0x611b,0x7231,0x88dd,0x88c5, 0x8a08,0x8ba1,0x5408,0x606f,0x5eb7,0x6599,0x670d,0x5728, 0x82b1,0x65b9,0x53f0,0x5177,0x660e,0x6240,0x5ddd,0x9662, 0x77f3,0x661f,0x4e8b,0x5fc3,0x9577,0x957f,0x529b,0x969b, 0x9645,0x9280,0x94f6,0x9152,0x6709,0x548c,0x5bf6,0x5b9d, 0x623f,0x5712,0x56ed,0x5229,0x8a2d,0x8bbe,0x7279,0x6c34, 0x540d,0x806f,0x8054,0x4e9e,0x4e9a,0x4e9c,0x5c0f,0x7522, 0x4ea7,0x6613,0x7406,0x6587,0x57ce,0x8cc7,0x8d44,0x65c5, 0x4fdd,0x9f8d,0x9f99,0x91ab,0x533b,0x9ad8,0x5c4b,0x5668, 0x5efa,0x4e16,0x60c5,0x5834,0x573a,0x9054,0x8fbe,0x5149, 0x5b89,0x52d5,0x52a8,0x5dde,0x51e0,0x673a,0x7530,0x6c5f, 0x6280,0x8eca,0x8f66,0x5e97,0x4e00,0x5730,0x897f,0x52d9, 0x52a1,0x793e,0x4e09,0x85e5,0x836f,0x85ac,0x5ee3,0x5e7f, 0x5e83,0x7269,0x884c,0x4eac,0x5316,0x7dda,0x7ebf,0x5317, 0x751f,0x5357,0x6a02,0x4e50,0x697d,0x5831,0x62a5,0x56e2, 0x7db2,0x5bb6,0x53f8,0x672c,0x5c71,0x5e02,0x7f8e,0x54c1, 0x4e0a,0x901a,0x516c,0x6703,0x4f1a,0x4eba,0x5929,0x79d1, 0x96c6,0x5b50,0x4fe1,0x5546,0x65e5,0x6d77,0x5de5,0x91d1, 0x65b0,0x6771,0x4e1c,0x5b78,0x5b66,0x83ef,0x534e,0x696d, 0x4e1a,0x5927,0x96fb,0x7535,0x7f51,0x570b,0x56fd,0x4e2d }; #define REORDER(RANGE,code) reorder_by_freq(code,RANGE ## _BASE,\ RANGE ## _FREQ, RANGE ## _TABLE) #define RESTORE(RANGE,code) restore_by_freq(code,RANGE ## _BASE,\ RANGE ## _FREQ, RANGE ## _TABLE) u_codep reorder_by_freq(u_codep s, u_codep BASE, int TABSIZE, u_codep *TABLE) { u_codep i=BASE; int k=0; for(k=0; k=0 && k=0 && q=0 && k=0 && q output_file % ./lamcz -d < output_file An input file should contains u+????-form code points delimited with spaces or newlines. /* begin of lamcz.c version 2.0 */ /******************************************/ /* lamcz 2.0 (2001-Sep-02) by LSB */ /******************************************/ /********************************************/ /* amc-ace-z.c 0.3.1 (2001-Sep-01-Sat) */ /* http://www.cs.berkeley.edu/~amc/charset/ */ /* Adam M. Costello */ /* http://www.cs.berkeley.edu/~amc/ */ /********************************************/ /* This is ANSI C code (C89) implementing AMC-ACE-Z version 0.3.x. */ /************************************************************/ /* Public interface (would normally go in its own .h file): */ #include #include "reorder.c" // ************ ADDED BY LSB enum amc_ace_status { amc_ace_success, amc_ace_bad_input, /* Input is invalid. */ amc_ace_big_output, /* Output would exceed the space provided. */ amc_ace_overflow /* Input requires wider integers to process. */ }; #if UINT_MAX >= (1 << 26) - 1 typedef unsigned int amc_ace_z_uint; #else typedef unsigned long amc_ace_z_uint; #endif enum amc_ace_status amc_ace_z_encode( amc_ace_z_uint input_length, const amc_ace_z_uint input[], const unsigned char uppercase_flags[], amc_ace_z_uint *output_length, char output[] ); /* amc_ace_z_encode() converts Unicode to AMC-ACE-Z (without */ /* any signature). The input must be represented as an array */ /* of Unicode code points (not code units; surrogate pairs */ /* are not allowed), and the output will be represented as an */ /* array of ASCII code points. The output string is *not* */ /* null-terminated; it will contain zeros if and only if the */ /* input contains zeros. (Of course the caller can leave room */ /* for a terminator and add one if needed.) The input_length is */ /* the number of code points in the input. The output_length is */ /* an in/out argument: the caller must pass in the maximum number */ /* of code points that may be output, and on successful return it */ /* will contain the number of code points actually output. The */ /* uppercase_flags array must hold input_length boolean values, */ /* where nonzero means the corresponding Unicode character should */ /* be forced to uppercase after being decoded, and zero means it */ /* is caseless or should be forced to lowercase. Alternatively, */ /* uppercase_flags may be a null pointer, which is equivalent to */ /* all zeros. ASCII code points are always encoded literally, */ /* regardless of the corresponding flags. The return value may */ /* be any of the amc_ace_status values defined above except */ /* amc_ace_bad_input; if not amc_ace_success, then output_size */ /* and output may contain garbage. */ enum amc_ace_status amc_ace_z_decode( amc_ace_z_uint input_length, const char input[], amc_ace_z_uint *output_length, amc_ace_z_uint output[], unsigned char uppercase_flags[] ); /* amc_ace_z_decode() converts AMC-ACE-Z (without any signature) */ /* to Unicode. The input must be represented as an array of */ /* ASCII code points, and the output will be represented as */ /* an array of Unicode code points. The input_length is the */ /* number of code points in the input. The output_length is */ /* an in/out argument: the caller must pass in the maximum */ /* number of code points that may be output, and on successful */ /* return it will contain the actual number of code points */ /* output. The uppercase_flags array must have room for at */ /* least output_length values, or it may be a null pointer if */ /* the case information is not needed. A nonzero flag indicates */ /* that the corresponding Unicode character should be forced to */ /* uppercase by the caller, while zero means it is caseless or */ /* should be forced to lowercase. ASCII code points are output */ /* already in the proper case, but their flags will be set */ /* appropriately so that applying the flags would be harmless. */ /* The return value may be any of the amc_ace_status values */ /* defined above; if not amc_ace_success, then output_length, */ /* output, and uppercase_flags may contain garbage. On success, */ /* the decoder will never need to write an output_length greater */ /* than input_length, because of how the encoding is defined. */ /**********************************************************/ /* Implementation (would normally go in its own .c file): */ #include /*** Bootstring parameters for AMC-ACE-Z ***/ enum { base = 36, tmin = 1, tmax = 26, skew = 38, damp = 700, initial_bias = 72, initial_n = 0x80, delimiter = 0x2D }; /* basic(cp) tests whether cp is a basic code point: */ #define basic(cp) ((amc_ace_z_uint)(cp) < 0x80) /* delim(cp) tests whether cp is a delimiter: */ #define delim(cp) ((cp) == delimiter) /* decode_digit(cp) returns the numeric value of a basic code */ /* point (for use in representing integers) in the range 0 to */ /* base-1, or base if cp is does not represent a value. */ static amc_ace_z_uint decode_digit(amc_ace_z_uint cp) { return cp - 48 < 10 ? cp - 22 : cp - 65 < 26 ? cp - 65 : cp - 97 < 26 ? cp - 97 : base; } /* encode_digit(d,flag) returns the basic code point whose value */ /* (when used for representing integers) is d, which must be in the */ /* range 0 to base-1. The lowercase form is used unless flag is */ /* nonzero, in which case the uppercase form is used. The behavior */ /* is undefined if flag is nonzero and digit d has no uppercase form. */ static char encode_digit(amc_ace_z_uint d, int flag) { return d + 22 + 75 * (d < 26) - ((flag != 0) << 5); /* 0..25 map to ASCII a..z or A..Z */ /* 26..35 map to ASCII 0..9 */ } /* flagged(bcp) tests whether a basic code point is flagged */ /* (uppercase). The behavior is undefined if bcp is not a */ /* basic code point. */ #define flagged(bcp) ((amc_ace_z_uint)(bcp) - 65 < 26) /*** Platform-specific constants ***/ /* maxint is the maximum value of an amc_ace_z_uint variable: */ static const amc_ace_z_uint maxint = -1; /* Because maxint is unsigned, -1 becomes the maximum value. */ /*** Bias adaptation function ***/ static amc_ace_z_uint adapt( amc_ace_z_uint delta, amc_ace_z_uint numpoints, int firsttime ) { amc_ace_z_uint k; delta = firsttime ? delta / damp : delta >> 1; /* delta >> 1 is a faster way of doing delta / 2 */ delta += delta / numpoints; for (k = 0; delta > ((base - tmin) * tmax) / 2; k += base) { delta /= base - tmin; } return k + (base - tmin + 1) * delta / (delta + skew); } /*** Main encode function ***/ enum amc_ace_status amc_ace_z_encode( amc_ace_z_uint input_length, const amc_ace_z_uint input[], const unsigned char uppercase_flags[], amc_ace_z_uint *output_length, char output[] ) { amc_ace_z_uint n, delta, h, b, out, max_out, bias, j, m, q, k, t; /* Initialize the state: */ n = initial_n; delta = out = 0; max_out = *output_length; bias = initial_bias; /* Handle the basic code points: */ for (j = 0; j < input_length; ++j) { if (basic(input[j])) { if (max_out - out < 2) return amc_ace_big_output; output[out++] = input[j]; } /* else if (input[j] < n) return amc_ace_bad_input; */ /* (not needed for AMC-ACE-Z with unsigned code points) */ else input[j]=reorder(input[j],YES_REORDER); // ***** ADDED BY LSB } h = b = out; /* h is the number of code points that have been handled, b is the */ /* number of basic code points, and out is the number of characters */ /* that have been output. */ if (b > 0) output[out++] = delimiter; /* Main encoding loop: */ while (h < input_length) { /* All non-basic code points < n have been */ /* handled already. Find the next larger one: */ for (m = maxint, j = 0; j < input_length; ++j) { /* if (basic(input[j])) continue; */ /* (not needed for AMC-ACE-Z) */ if (input[j] >= n && input[j] < m) m = input[j]; } /* Increase delta enough to advance the decoder's */ /* state to , but guard against overflow: */ if (m - n > (maxint - delta) / (h + 1)) return amc_ace_overflow; delta += (m - n) * (h + 1); n = m; for (j = 0; j < input_length; ++j) { /* AMC-ACE-Z does not need to check whether input[j] is basic: */ if (input[j] < n /* || basic(input[j]) */ ) { if (++delta == 0) return amc_ace_overflow; } if (input[j] == n) { /* Represent delta as a generalized variable-length integer: */ for (q = delta, k = base; ; k += base) { if (out >= max_out) return amc_ace_big_output; t = k <= bias ? tmin : k - bias >= tmax ? tmax : k - bias; if (q < t) break; output[out++] = encode_digit(t + (q - t) % (base - t), 0); q = (q - t) / (base - t); } output[out++] = encode_digit(q, uppercase_flags && uppercase_flags[j]); bias = adapt(delta, h + 1, h == b); delta = 0; ++h; } } ++delta, ++n; } *output_length = out; return amc_ace_success; } /*** Main decode function ***/ enum amc_ace_status amc_ace_z_decode( amc_ace_z_uint input_length, const char input[], amc_ace_z_uint *output_length, amc_ace_z_uint output[], unsigned char uppercase_flags[] ) { amc_ace_z_uint n, out, i, max_out, bias, b, j, in, oldi, w, k, digit, t; /* Initialize the state: */ n = initial_n; out = i = 0; max_out = *output_length; bias = initial_bias; /* Handle the basic code points: Let b be the number of input code */ /* points before the last delimiter, or 0 if there is none, then */ /* copy the first b code points to the output. */ for (b = j = 0; j < input_length; ++j) if (delim(input[j])) b = j; if (b > max_out) return amc_ace_big_output; for (j = 0; j < b; ++j) { if (uppercase_flags) uppercase_flags[out] = flagged(input[j]); if (!basic(input[j])) return amc_ace_bad_input; output[out++] = input[j]; } /* Main decoding loop: Start just after the last delimiter if any */ /* basic code points were copied; start at the beginning otherwise. */ for (in = b > 0 ? b + 1 : 0; in < input_length; ++out) { /* in is the index of the next character to be consumed, and */ /* out is the number of code points in the output array. */ /* Decode a generalized variable-length integer into delta, */ /* which gets added to i. The overflow checking is easier */ /* if we increase i as we go, then subtract off its starting */ /* value at the end to obtain delta. */ for (oldi = i, w = 1, k = base; ; k += base) { if (in >= input_length) return amc_ace_bad_input; digit = decode_digit(input[in++]); if (digit >= base) return amc_ace_bad_input; if (digit > (maxint - i) / w) return amc_ace_overflow; i += digit * w; t = k <= bias ? tmin : k - bias >= tmax ? tmax : k - bias; if (digit < t) break; if (w > maxint / (base - t)) return amc_ace_overflow; w *= (base - t); } bias = adapt(i - oldi, out + 1, oldi == 0); /* i was supposed to wrap around from out+1 to 0, */ /* incrementing n each time, so we'll fix that now: */ if (i / (out + 1) > maxint - n) return amc_ace_overflow; n += i / (out + 1); i %= (out + 1); /* Insert n at position i of the output: */ /* not needed for AMC-ACE-Z: */ /* if (decode_digit(n) <= base) return amc_ace_invalid_input; */ if (out >= max_out) return amc_ace_big_output; if (uppercase_flags) { memmove(uppercase_flags + i + 1, uppercase_flags + i, out - i); /* Case of last character determines uppercase flag: */ uppercase_flags[i] = flagged(input[in - 1]); } memmove(output + i + 1, output + i, (out - i) * sizeof *output); //output[i++] = n; output[i++]=restore_order(n,YES_REORDER); // **** ADDED BY LSB } *output_length = out; return amc_ace_success; } /******************************************************************/ /* Wrapper for testing (would normally go in a separate .c file): */ #include #include #include #include /* For testing, we'll just set some compile-time limits rather than */ /* use malloc(), and set a compile-time option rather than using a */ /* command-line option. */ enum { unicode_max_length = 256, ace_max_length = 256 }; static void usage(char **argv) { fprintf(stderr, "\n" "%s -e reads code points and writes an AMC-ACE-Z string.\n" "%s -d reads an AMC-ACE-Z string and writes code points.\n" "\n" "Input and output are plain text in the native character set.\n" "Code points are in the form u+hex separated by whitespace.\n" "The AMC-ACE-Z strings do not include any signatures.\n" "Although the specification allows AMC-ACE-Z strings to contain\n" "any characters from the ASCII repertoire, this test code\n" "supports only the printable characters, and requires the\n" "AMC-ACE-Z string to be followed by a newline.\n" "The case of the u in u+hex is the force-to-uppercase flag.\n" , argv[0], argv[0]); exit(EXIT_FAILURE); } static void fail(const char *msg) { fputs(msg,stderr); exit(EXIT_FAILURE); } static const char too_big[] = "input or output is too large, recompile with larger limits\n"; static const char invalid_input[] = "invalid input\n"; static const char overflow[] = "arithmetic overflow\n"; static const char io_error[] = "I/O error\n"; /* The following string is used to convert printable */ /* characters between ASCII and the native charset: */ static const char print_ascii[] = "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n" "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n" " !\"#$%&'()*+,-./" "0123456789:;<=>?" "@ABCDEFGHIJKLMNO" "PQRSTUVWXYZ[\\]^_" "`abcdefghijklmno" "pqrstuvwxyz{|}~\n"; int main(int argc, char **argv) { enum amc_ace_status status; int r; unsigned int input_length, output_length, j; unsigned char uppercase_flags[unicode_max_length]; if (argc != 2) usage(argv); if (argv[1][0] != '-') usage(argv); if (argv[1][2] != 0) usage(argv); if (argv[1][1] == 'e') { amc_ace_z_uint input[unicode_max_length]; unsigned long codept; char output[ace_max_length+1], uplus[3]; int c; /* Read the input code points: */ input_length = 0; for (;;) { r = scanf("%2s%lx", uplus, &codept); if (ferror(stdin)) fail(io_error); if (r == EOF || r == 0) break; if (r != 2 || uplus[1] != '+' || codept > (amc_ace_z_uint)-1) { fail(invalid_input); } if (input_length == unicode_max_length) fail(too_big); if (uplus[0] == 'u') uppercase_flags[input_length] = 0; else if (uplus[0] == 'U') uppercase_flags[input_length] = 1; else fail(invalid_input); input[input_length++] = codept; } /* Encode: */ output_length = ace_max_length; status = amc_ace_z_encode(input_length, input, uppercase_flags, &output_length, output); if (status == amc_ace_bad_input) fail(invalid_input); if (status == amc_ace_big_output) fail(too_big); if (status == amc_ace_overflow) fail(overflow); assert(status == amc_ace_success); /* Convert to native charset and output: */ for (j = 0; j < output_length; ++j) { c = output[j]; assert(c >= 0 && c <= 127); if (print_ascii[c] == 0) fail(invalid_input); output[j] = print_ascii[c]; } output[j] = 0; r = puts(output); if (r == EOF) fail(io_error); return EXIT_SUCCESS; } if (argv[1][1] == 'd') { char input[ace_max_length+2], *p, *pp; amc_ace_z_uint output[unicode_max_length]; /* Read the AMC-ACE-Z input string and convert to ASCII: */ fgets(input, ace_max_length+2, stdin); if (ferror(stdin)) fail(io_error); if (feof(stdin)) fail(invalid_input); input_length = strlen(input) - 1; if (input[input_length] != '\n') fail(too_big); input[input_length] = 0; for (p = input; *p != 0; ++p) { pp = strchr(print_ascii, *p); if (pp == 0) fail(invalid_input); *p = pp - print_ascii; } /* Decode: */ output_length = unicode_max_length; status = amc_ace_z_decode(input_length, input, &output_length, output, uppercase_flags); if (status == amc_ace_bad_input) fail(invalid_input); if (status == amc_ace_big_output) fail(too_big); if (status == amc_ace_overflow) fail(overflow); assert(status == amc_ace_success); /* Output the result: */ for (j = 0; j < output_length; ++j) { r = printf("%s+%04lX\n", uppercase_flags[j] ? "U" : "u", (unsigned long) output[j] ); if (r < 0) fail(io_error); } return EXIT_SUCCESS; } usage(argv); return EXIT_SUCCESS; /* not reached, but quiets compiler warning */ } /* end of lamcz.c */ A3. Experiment Results (How to read the tables ) N: length of a domain label ( # of code points) FREQ: number domains of length N N*FREQ: sum of # of code points of domains of length N SUM OF AMCZ: sum of lengths of AMCZ labels X: SUM OF AMCZ / N * FREQ SUM OF LAMCZ: sum of lengths of LAMCZ labels Y: SUM OF LAMCZ / N * FREQ COMP: (SUM OF LAMCZ - SUM OF AMCZ) / SUM OF AMCZ * 100 1. arabic | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 42| 42| 126(3.00)| 126(3.00)| 0.00| | 2| 59| 118| 258(2.19)| 249(2.11)| 3.49| | 3| 363| 1089| 2121(1.95)| 1992(1.83)| 6.08| | 4| 888| 3552| 6359(1.79)| 5811(1.64)| 8.62| | 5| 1122| 5610| 9550(1.70)| 8529(1.52)|10.69| | 6| 1009| 6054| 9890(1.63)| 8620(1.42)|12.84| | 7| 845| 5915| 9309(1.57)| 8134(1.38)|12.62| | 8| 378| 3024| 4590(1.52)| 3992(1.32)|13.03| | 9| 263| 2367| 3523(1.49)| 3063(1.29)|13.06| | 10| 152| 1520| 2230(1.47)| 1941(1.28)|12.96| | 11| 130| 1430| 2058(1.44)| 1787(1.25)|13.17| | 12| 110| 1320| 1873(1.42)| 1614(1.22)|13.83| | 13| 67| 871| 1230(1.41)| 1040(1.19)|15.45| | 14| 61| 854| 1211(1.42)| 1015(1.19)|16.18| | 15| 52| 780| 1085(1.39)| 924(1.18)|14.84| | 16| 34| 544| 743(1.37)| 630(1.16)|15.21| | 17| 11| 187| 256(1.37)| 218(1.17)|14.84| | 18| 19| 342| 465(1.36)| 392(1.15)|15.70| | 19| 8| 152| 201(1.32)| 175(1.15)|12.94| | 20| 10| 200| 268(1.34)| 235(1.18)|12.31| | 21| 3| 63| 85(1.35)| 75(1.19)|11.76| | 22| 4| 88| 116(1.32)| 99(1.12)|14.66| | 23| 3| 69| 89(1.29)| 76(1.10)|14.61| | 24| 2| 48| 62(1.29)| 55(1.15)|11.29| | 25| 5| 125| 165(1.32)| 143(1.14)|13.33| | 26| 2| 52| 67(1.29)| 56(1.08)|16.42| | 27| 2| 54| 73(1.35)| 61(1.13)|16.44| | 33| 1| 33| 41(1.24)| 37(1.12)| 9.76| | 34| 1| 34| 45(1.32)| 36(1.06)|20.00| |All| 5646| 36537| 58089(1.59)| 51125(1.40)|11.99| 2. arabic-DAMP075-SKEW48 | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 42| 42| 126(3.00)| 126(3.00)| 0.00| | 2| 59| 118| 260(2.20)| 256(2.17)| 1.54| | 3| 363| 1089| 2152(1.98)| 2022(1.86)| 6.04| | 4| 888| 3552| 6393(1.80)| 5877(1.65)| 8.07| | 5| 1122| 5610| 9580(1.71)| 8613(1.54)|10.09| | 6| 1009| 6054| 9889(1.63)| 8680(1.43)|12.23| | 7| 845| 5915| 9281(1.57)| 8186(1.38)|11.80| | 8| 378| 3024| 4583(1.52)| 4024(1.33)|12.20| | 9| 263| 2367| 3512(1.48)| 3083(1.30)|12.22| | 10| 152| 1520| 2223(1.46)| 1946(1.28)|12.46| | 11| 130| 1430| 2051(1.43)| 1796(1.26)|12.43| | 12| 110| 1320| 1865(1.41)| 1623(1.23)|12.98| | 13| 67| 871| 1226(1.41)| 1046(1.20)|14.68| | 14| 61| 854| 1209(1.42)| 1021(1.20)|15.55| | 15| 52| 780| 1079(1.38)| 930(1.19)|13.81| | 16| 34| 544| 742(1.36)| 635(1.17)|14.42| | 17| 11| 187| 256(1.37)| 218(1.17)|14.84| | 18| 19| 342| 464(1.36)| 392(1.15)|15.52| | 19| 8| 152| 199(1.31)| 175(1.15)|12.06| | 20| 10| 200| 268(1.34)| 235(1.18)|12.31| | 21| 3| 63| 84(1.33)| 75(1.19)|10.71| | 22| 4| 88| 115(1.31)| 99(1.12)|13.91| | 23| 3| 69| 88(1.28)| 76(1.10)|13.64| | 24| 2| 48| 61(1.27)| 55(1.15)| 9.84| | 25| 5| 125| 165(1.32)| 143(1.14)|13.33| | 26| 2| 52| 67(1.29)| 56(1.08)|16.42| | 27| 2| 54| 73(1.35)| 61(1.13)|16.44| | 33| 1| 33| 41(1.24)| 37(1.12)| 9.76| | 34| 1| 34| 45(1.32)| 36(1.06)|20.00| |All| 5646| 36537| 58097(1.59)| 51522(1.41)|11.32| 3. cyrillic | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 46| 46| 138(3.00)| 138(3.00)| 0.00| | 2| 107| 214| 466(2.18)| 456(2.13)| 2.15| | 3| 433| 1299| 2332(1.80)| 2340(1.80)|-0.34| | 4| 702| 2808| 4623(1.65)| 4620(1.65)| 0.06| | 5| 1003| 5015| 7726(1.54)| 7563(1.51)| 2.11| | 6| 1071| 6426| 9422(1.47)| 9126(1.42)| 3.14| | 7| 1081| 7567| 10714(1.42)| 10286(1.36)| 3.99| | 8| 813| 6504| 8969(1.38)| 8566(1.32)| 4.49| | 9| 586| 5274| 7108(1.35)| 6769(1.28)| 4.77| | 10| 418| 4180| 5595(1.34)| 5311(1.27)| 5.08| | 11| 311| 3421| 4489(1.31)| 4235(1.24)| 5.66| | 12| 195| 2340| 3024(1.29)| 2856(1.22)| 5.56| | 13| 122| 1586| 2042(1.29)| 1910(1.20)| 6.46| | 14| 90| 1260| 1609(1.28)| 1521(1.21)| 5.47| | 15| 59| 885| 1126(1.27)| 1060(1.20)| 5.86| | 16| 45| 720| 914(1.27)| 856(1.19)| 6.35| | 17| 29| 493| 620(1.26)| 582(1.18)| 6.13| | 18| 11| 198| 254(1.28)| 229(1.16)| 9.84| | 19| 15| 285| 358(1.26)| 326(1.14)| 8.94| | 20| 15| 300| 368(1.23)| 345(1.15)| 6.25| | 21| 5| 105| 129(1.23)| 119(1.13)| 7.75| | 22| 3| 66| 81(1.23)| 73(1.11)| 9.88| | 24| 1| 24| 31(1.29)| 26(1.08)|16.13| | 25| 3| 75| 93(1.24)| 86(1.15)| 7.53| | 26| 1| 26| 31(1.19)| 29(1.12)| 6.45| | 28| 1| 28| 33(1.18)| 31(1.11)| 6.06| | 30| 1| 30| 37(1.23)| 32(1.07)|13.51| | 31| 1| 31| 40(1.29)| 34(1.10)|15.00| |All| 7168| 51206| 72372(1.41)| 69525(1.36)| 3.93| 4. ethiopic | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 2| 2| 6(3.00)| 6(3.00)| 0.00| | 2| 24| 48| 126(2.62)| 111(2.31)|11.90| | 3| 38| 114| 272(2.39)| 244(2.14)|10.29| | 4| 23| 92| 206(2.24)| 187(2.03)| 9.22| | 5| 11| 55| 123(2.24)| 102(1.85)|17.07| | 6| 11| 66| 138(2.09)| 107(1.62)|22.46| | 7| 3| 21| 45(2.14)| 34(1.62)|24.44| | 8| 2| 16| 31(1.94)| 24(1.50)|22.58| | 10| 1| 10| 21(2.10)| 16(1.60)|23.81| |All| 115| 424| 968(2.28)| 831(1.96)|14.15| 5. greek | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 34| 34| 102(3.00)| 102(3.00)| 0.00| | 2| 18| 36| 78(2.17)| 77(2.14)| 1.28| | 3| 50| 150| 268(1.79)| 273(1.82)|-1.87| | 4| 79| 316| 540(1.71)| 530(1.68)| 1.85| | 5| 174| 870| 1335(1.53)| 1309(1.50)| 1.95| | 6| 203| 1218| 1793(1.47)| 1739(1.43)| 3.01| | 7| 183| 1281| 1828(1.43)| 1745(1.36)| 4.54| | 8| 212| 1696| 2358(1.39)| 2207(1.30)| 6.40| | 9| 179| 1611| 2198(1.36)| 2063(1.28)| 6.14| | 10| 117| 1170| 1557(1.33)| 1468(1.25)| 5.72| | 11| 76| 836| 1088(1.30)| 1032(1.23)| 5.15| | 12| 54| 648| 839(1.29)| 785(1.21)| 6.44| | 13| 34| 442| 561(1.27)| 528(1.19)| 5.88| | 14| 23| 322| 411(1.28)| 382(1.19)| 7.06| | 15| 14| 210| 270(1.29)| 253(1.20)| 6.30| | 16| 13| 208| 265(1.27)| 247(1.19)| 6.79| | 17| 4| 68| 86(1.26)| 79(1.16)| 8.14| | 18| 2| 36| 46(1.28)| 44(1.22)| 4.35| | 19| 3| 57| 74(1.30)| 64(1.12)|13.51| | 21| 1| 21| 27(1.29)| 24(1.14)|11.11| | 22| 1| 22| 26(1.18)| 24(1.09)| 7.69| |All| 1474| 11252| 15750(1.40)| 14975(1.33)| 4.92| 6. hangul-0512 | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 1953| 1953| 7812(4.00)| 7812(4.00)| 0.00| | 2| 17149| 34298| 124782(3.64)| 108420(3.16)|13.11| | 3| 39643| 118929| 403205(3.39)| 329089(2.77)|18.38| | 4| 62285| 249140| 816093(3.28)| 629606(2.53)|22.85| | 5| 39675| 198375| 636102(3.21)| 476097(2.40)|25.15| | 6| 23891| 143346| 452483(3.16)| 330086(2.30)|27.05| | 7| 12448| 87136| 271953(3.12)| 194275(2.23)|28.56| | 8| 5441| 43528| 134600(3.09)| 95389(2.19)|29.13| | 9| 2264| 20376| 62405(3.06)| 43741(2.15)|29.91| | 10| 895| 8950| 27223(3.04)| 19010(2.12)|30.17| | 11| 373| 4103| 12420(3.03)| 8634(2.10)|30.48| | 12| 141| 1692| 5080(3.00)| 3566(2.11)|29.80| | 13| 77| 1001| 2986(2.98)| 2071(2.07)|30.64| | 14| 32| 448| 1331(2.97)| 930(2.08)|30.13| | 15| 20| 300| 884(2.95)| 616(2.05)|30.32| | 16| 10| 160| 460(2.88)| 354(2.21)|23.04| | 17| 7| 119| 354(2.97)| 249(2.09)|29.66| |All| 206304| 913854| 2960173(3.24)| 2249945(2.46)|23.99| 7. hangul-0768 | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 1953| 1953| 7812(4.00)| 7812(4.00)| 0.00| | 2| 17149| 34298| 124782(3.64)| 106831(3.11)|14.39| | 3| 39643| 118929| 403205(3.39)| 324757(2.73)|19.46| | 4| 62285| 249140| 816093(3.28)| 623279(2.50)|23.63| | 5| 39675| 198375| 636102(3.21)| 471014(2.37)|25.95| | 6| 23891| 143346| 452483(3.16)| 326781(2.28)|27.78| | 7| 12448| 87136| 271953(3.12)| 192448(2.21)|29.23| | 8| 5441| 43528| 134600(3.09)| 94451(2.17)|29.83| | 9| 2264| 20376| 62405(3.06)| 43336(2.13)|30.56| | 10| 895| 8950| 27223(3.04)| 18800(2.10)|30.94| | 11| 373| 4103| 12420(3.03)| 8531(2.08)|31.31| | 12| 141| 1692| 5080(3.00)| 3508(2.07)|30.94| | 13| 77| 1001| 2986(2.98)| 2042(2.04)|31.61| | 14| 32| 448| 1331(2.97)| 915(2.04)|31.25| | 15| 20| 300| 884(2.95)| 607(2.02)|31.33| | 16| 10| 160| 460(2.88)| 349(2.18)|24.13| | 17| 7| 119| 354(2.97)| 243(2.04)|31.36| |All| 206304| 913854| 2960173(3.24)| 2225704(2.44)|24.81| 8. hangul-1024 | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 1953| 1953| 7812(4.00)| 7812(4.00)| 0.00| | 2| 17149| 34298| 124782(3.64)| 106238(3.10)|14.86| | 3| 39643| 118929| 403205(3.39)| 323801(2.72)|19.69| | 4| 62285| 249140| 816093(3.28)| 622067(2.50)|23.77| | 5| 39675| 198375| 636102(3.21)| 470174(2.37)|26.09| | 6| 23891| 143346| 452483(3.16)| 326242(2.28)|27.90| | 7| 12448| 87136| 271953(3.12)| 192139(2.21)|29.35| | 8| 5441| 43528| 134600(3.09)| 94322(2.17)|29.92| | 9| 2264| 20376| 62405(3.06)| 43266(2.12)|30.67| | 10| 895| 8950| 27223(3.04)| 18764(2.10)|31.07| | 11| 373| 4103| 12420(3.03)| 8511(2.07)|31.47| | 12| 141| 1692| 5080(3.00)| 3505(2.07)|31.00| | 13| 77| 1001| 2986(2.98)| 2039(2.04)|31.71| | 14| 32| 448| 1331(2.97)| 911(2.03)|31.56| | 15| 20| 300| 884(2.95)| 603(2.01)|31.79| | 16| 10| 160| 460(2.88)| 337(2.11)|26.74| | 17| 7| 119| 354(2.97)| 243(2.04)|31.36| |All| 206304| 913854| 2960173(3.24)| 2220974(2.43)|24.97| 9. hangul-1024-DAMP075-SKEW48 | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 1953| 1953| 7812(4.00)| 7812(4.00)| 0.00| | 2| 17149| 34298| 120115(3.50)| 106026(3.09)|11.73| | 3| 39643| 118929| 394731(3.32)| 324005(2.72)|17.92| | 4| 62285| 249140| 804771(3.23)| 622982(2.50)|22.59| | 5| 39675| 198375| 631434(3.18)| 470527(2.37)|25.48| | 6| 23891| 143346| 450586(3.14)| 326316(2.28)|27.58| | 7| 12448| 87136| 271291(3.11)| 191964(2.20)|29.24| | 8| 5441| 43528| 134372(3.09)| 94343(2.17)|29.79| | 9| 2264| 20376| 62361(3.06)| 43187(2.12)|30.75| | 10| 895| 8950| 27207(3.04)| 18715(2.09)|31.21| | 11| 373| 4103| 12438(3.03)| 8492(2.07)|31.73| | 12| 141| 1692| 5085(3.01)| 3492(2.06)|31.33| | 13| 77| 1001| 2993(2.99)| 2040(2.04)|31.84| | 14| 32| 448| 1329(2.97)| 905(2.02)|31.90| | 15| 20| 300| 884(2.95)| 602(2.01)|31.90| | 16| 10| 160| 461(2.88)| 336(2.10)|27.11| | 17| 7| 119| 354(2.97)| 244(2.05)|31.07| |All| 206304| 913854| 2928224(3.20)| 2221988(2.43)|24.12| 10. hebrew | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 28| 28| 84(3.00)| 84(3.00)| 0.00| | 2| 44| 88| 189(2.15)| 181(2.06)| 4.23| | 3| 231| 693| 1258(1.82)| 1241(1.79)| 1.35| | 4| 455| 1820| 3027(1.66)| 2907(1.60)| 3.96| | 5| 545| 2725| 4297(1.58)| 4082(1.50)| 5.00| | 6| 366| 2196| 3284(1.50)| 3123(1.42)| 4.90| | 7| 227| 1589| 2312(1.46)| 2171(1.37)| 6.10| | 8| 156| 1248| 1781(1.43)| 1652(1.32)| 7.24| | 9| 105| 945| 1303(1.38)| 1213(1.28)| 6.91| | 10| 82| 820| 1105(1.35)| 1027(1.25)| 7.06| | 11| 36| 396| 536(1.35)| 492(1.24)| 8.21| | 12| 16| 192| 256(1.33)| 239(1.24)| 6.64| | 13| 13| 169| 223(1.32)| 202(1.20)| 9.42| | 14| 5| 70| 92(1.31)| 85(1.21)| 7.61| | 15| 6| 90| 118(1.31)| 106(1.18)|10.17| | 16| 1| 16| 19(1.19)| 18(1.12)| 5.26| | 17| 1| 17| 22(1.29)| 20(1.18)| 9.09| |All| 2317| 13102| 19906(1.52)| 18843(1.44)| 5.34| 11. hindi | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 24| 24| 72(3.00)| 72(3.00)| 0.00| | 2| 17| 34| 77(2.26)| 74(2.18)| 3.90| | 3| 36| 108| 226(2.09)| 202(1.87)|10.62| | 4| 63| 252| 494(1.96)| 449(1.78)| 9.11| | 5| 58| 290| 556(1.92)| 468(1.61)|15.83| | 6| 41| 246| 445(1.81)| 376(1.53)|15.51| | 7| 41| 287| 505(1.76)| 435(1.52)|13.86| | 8| 17| 136| 246(1.81)| 192(1.41)|21.95| | 9| 18| 162| 268(1.65)| 217(1.34)|19.03| | 10| 3| 30| 53(1.77)| 41(1.37)|22.64| | 11| 7| 77| 117(1.52)| 99(1.29)|15.38| | 12| 4| 48| 72(1.50)| 64(1.33)|11.11| | 13| 4| 52| 83(1.60)| 72(1.38)|13.25| | 14| 1| 14| 19(1.36)| 17(1.21)|10.53| | 16| 1| 16| 24(1.50)| 21(1.31)|12.50| | 17| 1| 17| 25(1.47)| 22(1.29)|12.00| | 18| 2| 36| 47(1.31)| 42(1.17)|10.64| |All| 338| 1829| 3329(1.82)| 2863(1.57)|14.00| 12. hindi-DAMP075-SKEW48 | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 24| 24| 72(3.00)| 72(3.00)| 0.00| | 2| 17| 34| 80(2.35)| 76(2.24)| 5.00| | 3| 36| 108| 229(2.12)| 208(1.93)| 9.17| | 4| 63| 252| 506(2.01)| 454(1.80)|10.28| | 5| 58| 290| 559(1.93)| 479(1.65)|14.31| | 6| 41| 246| 448(1.82)| 383(1.56)|14.51| | 7| 41| 287| 512(1.78)| 438(1.53)|14.45| | 8| 17| 136| 246(1.81)| 193(1.42)|21.54| | 9| 18| 162| 271(1.67)| 222(1.37)|18.08| | 10| 3| 30| 53(1.77)| 42(1.40)|20.75| | 11| 7| 77| 118(1.53)| 100(1.30)|15.25| | 12| 4| 48| 73(1.52)| 64(1.33)|12.33| | 13| 4| 52| 82(1.58)| 74(1.42)| 9.76| | 14| 1| 14| 19(1.36)| 17(1.21)|10.53| | 16| 1| 16| 24(1.50)| 22(1.38)| 8.33| | 17| 1| 17| 25(1.47)| 22(1.29)|12.00| | 18| 2| 36| 47(1.31)| 42(1.17)|10.64| |All| 338| 1829| 3364(1.84)| 2908(1.59)|13.56| 13. hiragana | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 85| 85| 255(3.00)| 255(3.00)| 0.00| | 2| 904| 1808| 4331(2.40)| 4240(2.35)| 2.10| | 3| 3383| 10149| 22050(2.17)| 21043(2.07)| 4.57| | 4| 4963| 19852| 40161(2.02)| 37725(1.90)| 6.07| | 5| 1602| 8010| 15735(1.96)| 14727(1.84)| 6.41| | 6| 905| 5430| 10058(1.85)| 9484(1.75)| 5.71| | 7| 535| 3745| 6805(1.82)| 6407(1.71)| 5.85| | 8| 269| 2152| 3822(1.78)| 3543(1.65)| 7.30| | 9| 129| 1161| 2015(1.74)| 1861(1.60)| 7.64| | 10| 60| 600| 1020(1.70)| 949(1.58)| 6.96| | 11| 23| 253| 425(1.68)| 407(1.61)| 4.24| | 12| 9| 108| 180(1.67)| 162(1.50)|10.00| | 13| 10| 130| 219(1.68)| 205(1.58)| 6.39| | 14| 1| 14| 23(1.64)| 20(1.43)|13.04| | 15| 2| 30| 48(1.60)| 46(1.53)| 4.17| | 16| 2| 32| 54(1.69)| 44(1.38)|18.52| | 17| 2| 34| 55(1.62)| 49(1.44)|10.91| | 19| 1| 19| 30(1.58)| 29(1.53)| 3.33| | 20| 1| 20| 30(1.50)| 29(1.45)| 3.33| |All| 12886| 53632| 107316(2.00)| 101225(1.89)| 5.68| 14. katakana | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 90| 90| 270(3.00)| 270(3.00)| 0.00| | 2| 814| 1628| 3932(2.42)| 3864(2.37)| 1.73| | 3| 3916| 11748| 25881(2.20)| 24973(2.13)| 3.51| | 4| 6613| 26452| 54867(2.07)| 51357(1.94)| 6.40| | 5| 5635| 28175| 56404(2.00)| 51321(1.82)| 9.01| | 6| 6342| 38052| 73191(1.92)| 65666(1.73)|10.28| | 7| 5838| 40866| 76788(1.88)| 68129(1.67)|11.28| | 8| 4235| 33880| 62064(1.83)| 54847(1.62)|11.63| | 9| 2656| 23904| 43034(1.80)| 38086(1.59)|11.50| | 10| 1746| 17460| 30977(1.77)| 27269(1.56)|11.97| | 11| 1146| 12606| 21976(1.74)| 19332(1.53)|12.03| | 12| 692| 8304| 14251(1.72)| 12591(1.52)|11.65| | 13| 409| 5317| 9000(1.69)| 7956(1.50)|11.60| | 14| 263| 3682| 6174(1.68)| 5420(1.47)|12.21| | 15| 155| 2325| 3887(1.67)| 3417(1.47)|12.09| | 16| 109| 1744| 2876(1.65)| 2531(1.45)|12.00| | 17| 54| 918| 1491(1.62)| 1328(1.45)|10.93| | 18| 53| 954| 1513(1.59)| 1354(1.42)|10.51| | 19| 31| 589| 944(1.60)| 848(1.44)|10.17| | 20| 34| 680| 1070(1.57)| 978(1.44)| 8.60| | 21| 10| 210| 330(1.57)| 300(1.43)| 9.09| | 22| 6| 132| 210(1.59)| 193(1.46)| 8.10| | 23| 18| 414| 645(1.56)| 584(1.41)| 9.46| | 24| 5| 120| 190(1.58)| 168(1.40)|11.58| | 25| 11| 275| 419(1.52)| 383(1.39)| 8.59| | 26| 7| 182| 271(1.49)| 249(1.37)| 8.12| | 27| 6| 162| 245(1.51)| 227(1.40)| 7.35| | 28| 5| 140| 203(1.45)| 193(1.38)| 4.93| | 29| 3| 87| 127(1.46)| 122(1.40)| 3.94| | 30| 6| 180| 255(1.42)| 257(1.43)|-0.78| | 31| 5| 155| 226(1.46)| 214(1.38)| 5.31| | 32| 6| 192| 284(1.48)| 265(1.38)| 6.69| | 34| 1| 34| 50(1.47)| 48(1.41)| 4.00| | 35| 2| 70| 101(1.44)| 93(1.33)| 7.92| |All| 40922| 261727| 494146(1.89)| 444833(1.70)| 9.98| 15. katakana-4096-DAMP075-SKEW48 | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 90| 90| 270(3.00)| 270(3.00)| 0.00| | 2| 814| 1628| 4019(2.47)| 4004(2.46)| 0.37| | 3| 3916| 11748| 26566(2.26)| 25723(2.19)| 3.17| | 4| 6613| 26452| 56232(2.13)| 52769(1.99)| 6.16| | 5| 5635| 28175| 57741(2.05)| 52605(1.87)| 8.89| | 6| 6342| 38052| 74687(1.96)| 67137(1.76)|10.11| | 7| 5838| 40866| 78296(1.92)| 69411(1.70)|11.35| | 8| 4235| 33880| 63147(1.86)| 55835(1.65)|11.58| | 9| 2656| 23904| 43838(1.83)| 38697(1.62)|11.73| | 10| 1746| 17460| 31505(1.80)| 27672(1.58)|12.17| | 11| 1146| 12606| 22314(1.77)| 19576(1.55)|12.27| | 12| 692| 8304| 14449(1.74)| 12759(1.54)|11.70| | 13| 409| 5317| 9116(1.71)| 8054(1.51)|11.65| | 14| 263| 3682| 6244(1.70)| 5482(1.49)|12.20| | 15| 155| 2325| 3909(1.68)| 3454(1.49)|11.64| | 16| 109| 1744| 2898(1.66)| 2542(1.46)|12.28| | 17| 54| 918| 1492(1.63)| 1342(1.46)|10.05| | 18| 53| 954| 1521(1.59)| 1367(1.43)|10.12| | 19| 31| 589| 951(1.61)| 853(1.45)|10.30| | 20| 34| 680| 1075(1.58)| 975(1.43)| 9.30| | 21| 10| 210| 330(1.57)| 299(1.42)| 9.39| | 22| 6| 132| 212(1.61)| 193(1.46)| 8.96| | 23| 18| 414| 651(1.57)| 584(1.41)|10.29| | 24| 5| 120| 193(1.61)| 168(1.40)|12.95| | 25| 11| 275| 419(1.52)| 382(1.39)| 8.83| | 26| 7| 182| 273(1.50)| 247(1.36)| 9.52| | 27| 6| 162| 243(1.50)| 224(1.38)| 7.82| | 28| 5| 140| 206(1.47)| 192(1.37)| 6.80| | 29| 3| 87| 124(1.43)| 123(1.41)| 0.81| | 30| 6| 180| 255(1.42)| 256(1.42)|-0.39| | 31| 5| 155| 226(1.46)| 213(1.37)| 5.75| | 32| 6| 192| 282(1.47)| 266(1.39)| 5.67| | 34| 1| 34| 50(1.47)| 48(1.41)| 4.00| | 35| 2| 70| 101(1.44)| 92(1.31)| 8.91| |All| 40922| 261727| 503835(1.93)| 453814(1.73)| 9.93| 16. latin | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 87| 87| 260(2.99)| 259(2.98)| 0.38| | 2| 1043| 2086| 5140(2.46)| 5134(2.46)| 0.12| | 3| 1046| 3138| 6274(2.00)| 6241(1.99)| 0.53| | 4| 1812| 7248| 12750(1.76)| 12715(1.75)| 0.27| | 5| 3238| 16190| 26129(1.61)| 26047(1.61)| 0.31| | 6| 3956| 23736| 35894(1.51)| 35802(1.51)| 0.26| | 7| 4340| 30380| 43756(1.44)| 43633(1.44)| 0.28| | 8| 4639| 37112| 51351(1.38)| 51286(1.38)| 0.13| | 9| 4551| 40959| 54994(1.34)| 54873(1.34)| 0.22| | 10| 4289| 42890| 56159(1.31)| 56058(1.31)| 0.18| | 11| 3778| 41558| 53227(1.28)| 53157(1.28)| 0.13| | 12| 2967| 35604| 44820(1.26)| 44754(1.26)| 0.15| | 13| 2501| 32513| 40264(1.24)| 40197(1.24)| 0.17| | 14| 2058| 28812| 35212(1.22)| 35174(1.22)| 0.11| | 15| 1653| 24795| 29947(1.21)| 29918(1.21)| 0.10| | 16| 1372| 21952| 26264(1.20)| 26224(1.19)| 0.15| | 17| 1094| 18598| 22053(1.19)| 21994(1.18)| 0.27| | 18| 839| 15102| 17782(1.18)| 17722(1.17)| 0.34| | 19| 632| 12008| 14045(1.17)| 13988(1.16)| 0.41| | 20| 464| 9280| 10778(1.16)| 10721(1.16)| 0.53| | 21| 312| 6552| 7539(1.15)| 7516(1.15)| 0.31| | 22| 194| 4268| 4905(1.15)| 4876(1.14)| 0.59| | 23| 124| 2852| 3242(1.14)| 3234(1.13)| 0.25| | 24| 71| 1704| 1935(1.14)| 1925(1.13)| 0.52| | 25| 71| 1775| 2011(1.13)| 2002(1.13)| 0.45| | 26| 37| 962| 1083(1.13)| 1080(1.12)| 0.28| | 27| 33| 891| 1004(1.13)| 996(1.12)| 0.80| | 28| 17| 476| 535(1.12)| 529(1.11)| 1.12| | 29| 13| 377| 422(1.12)| 420(1.11)| 0.47| | 30| 9| 270| 298(1.10)| 299(1.11)|-0.34| | 31| 7| 217| 243(1.12)| 238(1.10)| 2.06| | 32| 9| 288| 321(1.11)| 316(1.10)| 1.56| | 33| 4| 132| 146(1.11)| 144(1.09)| 1.37| | 34| 2| 68| 76(1.12)| 74(1.09)| 2.63| | 35| 1| 35| 38(1.09)| 38(1.09)| 0.00| |All| 47263| 464915| 610897(1.31)| 609584(1.31)| 0.21| 17. tamil | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 3| 3| 9(3.00)| 9(3.00)| 0.00| | 2| 1| 2| 4(2.00)| 4(2.00)| 0.00| | 3| 6| 18| 41(2.28)| 38(2.11)| 7.32| | 4| 5| 20| 39(1.95)| 34(1.70)|12.82| | 5| 13| 65| 124(1.91)| 99(1.52)|20.16| | 6| 20| 120| 218(1.82)| 174(1.45)|20.18| | 7| 17| 119| 210(1.76)| 163(1.37)|22.38| | 8| 9| 72| 120(1.67)| 99(1.38)|17.50| | 9| 6| 54| 90(1.67)| 71(1.31)|21.11| | 10| 4| 40| 64(1.60)| 50(1.25)|21.88| | 11| 5| 55| 86(1.56)| 68(1.24)|20.93| | 12| 3| 36| 53(1.47)| 43(1.19)|18.87| | 21| 1| 21| 34(1.62)| 24(1.14)|29.41| |All| 93| 625| 1092(1.75)| 876(1.40)|19.78| 18. thai | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 21| 21| 63(3.00)| 63(3.00)| 0.00| | 2| 23| 46| 100(2.17)| 103(2.24)|-3.00| | 3| 96| 288| 605(2.10)| 571(1.98)| 5.62| | 4| 97| 388| 770(1.98)| 691(1.78)|10.26| | 5| 122| 610| 1155(1.89)| 1030(1.69)|10.82| | 6| 194| 1164| 2117(1.82)| 1867(1.60)|11.81| | 7| 165| 1155| 2091(1.81)| 1765(1.53)|15.59| | 8| 135| 1080| 1883(1.74)| 1623(1.50)|13.81| | 9| 91| 819| 1396(1.70)| 1176(1.44)|15.76| | 10| 63| 630| 1049(1.67)| 902(1.43)|14.01| | 11| 48| 528| 866(1.64)| 733(1.39)|15.36| | 12| 34| 408| 661(1.62)| 563(1.38)|14.83| | 13| 20| 260| 428(1.65)| 356(1.37)|16.82| | 14| 16| 224| 366(1.63)| 306(1.37)|16.39| | 15| 12| 180| 283(1.57)| 238(1.32)|15.90| | 16| 11| 176| 283(1.61)| 222(1.26)|21.55| | 17| 5| 85| 136(1.60)| 113(1.33)|16.91| | 18| 2| 36| 54(1.50)| 44(1.22)|18.52| | 19| 5| 95| 145(1.53)| 123(1.29)|15.17| | 20| 3| 60| 89(1.48)| 73(1.22)|17.98| | 21| 2| 42| 64(1.52)| 53(1.26)|17.19| | 22| 1| 22| 33(1.50)| 27(1.23)|18.18| | 23| 1| 23| 36(1.57)| 29(1.26)|19.44| | 24| 1| 24| 36(1.50)| 31(1.29)|13.89| | 31| 1| 31| 43(1.39)| 35(1.13)|18.60| |All| 1169| 8395| 14752(1.76)| 12737(1.52)|13.66| 19. unihan-1024 | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 4427| 4427| 14957(3.38)| 14711(3.32)| 1.64| | 2| 57418| 114836| 384468(3.35)| 353466(3.08)| 8.06| | 3| 41335| 124005| 401283(3.24)| 352095(2.84)|12.26| | 4| 89296| 357184| 1139404(3.19)| 933070(2.61)|18.11| | 5| 21091| 105455| 332420(3.15)| 267893(2.54)|19.41| | 6| 15128| 90768| 284134(3.13)| 220263(2.43)|22.48| | 7| 5181| 36267| 112576(3.10)| 86487(2.38)|23.17| | 8| 3082| 24656| 76272(3.09)| 57854(2.35)|24.15| | 9| 1417| 12753| 39319(3.08)| 29779(2.34)|24.26| | 10| 1203| 12030| 37136(3.09)| 27817(2.31)|25.09| | 11| 474| 5214| 16072(3.08)| 12035(2.31)|25.12| | 12| 398| 4776| 14714(3.08)| 10947(2.29)|25.60| | 13| 164| 2132| 6532(3.06)| 4881(2.29)|25.28| | 14| 122| 1708| 5232(3.06)| 3927(2.30)|24.94| | 15| 50| 750| 2283(3.04)| 1677(2.24)|26.54| | 16| 29| 464| 1419(3.06)| 1057(2.28)|25.51| | 17| 8| 136| 405(2.98)| 309(2.27)|23.70| |All| 240823| 897561| 2868626(3.20)| 2378268(2.65)|17.09| 20. unihan-2048 | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 4427| 4427| 14957(3.38)| 14422(3.26)| 3.58| | 2| 57418| 114836| 384468(3.35)| 339996(2.96)|11.57| | 3| 41335| 124005| 401283(3.24)| 340675(2.75)|15.10| | 4| 89296| 357184| 1139404(3.19)| 909323(2.55)|20.19| | 5| 21091| 105455| 332420(3.15)| 261039(2.48)|21.47| | 6| 15128| 90768| 284134(3.13)| 214781(2.37)|24.41| | 7| 5181| 36267| 112576(3.10)| 84440(2.33)|24.99| | 8| 3082| 24656| 76272(3.09)| 56439(2.29)|26.00| | 9| 1417| 12753| 39319(3.08)| 29082(2.28)|26.04| | 10| 1203| 12030| 37136(3.09)| 27106(2.25)|27.01| | 11| 474| 5214| 16072(3.08)| 11756(2.25)|26.85| | 12| 398| 4776| 14714(3.08)| 10681(2.24)|27.41| | 13| 164| 2132| 6532(3.06)| 4783(2.24)|26.78| | 14| 122| 1708| 5232(3.06)| 3815(2.23)|27.08| | 15| 50| 750| 2283(3.04)| 1631(2.17)|28.56| | 16| 29| 464| 1419(3.06)| 1026(2.21)|27.70| | 17| 8| 136| 405(2.98)| 302(2.22)|25.43| |All| 240823| 897561| 2868626(3.20)| 2311297(2.58)|19.43| 21. unihan-2048-D ( the reordering in decreasing frequency order) | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 4427| 4427| 14957(3.38)| 14432(3.26)| 3.51| | 2| 57418| 114836| 384468(3.35)| 341134(2.97)|11.27| | 3| 41335| 124005| 401283(3.24)| 341362(2.75)|14.93| | 4| 89296| 357184| 1139404(3.19)| 912694(2.56)|19.90| | 5| 21091| 105455| 332420(3.15)| 262224(2.49)|21.12| | 6| 15128| 90768| 284134(3.13)| 216465(2.38)|23.82| | 7| 5181| 36267| 112576(3.10)| 85401(2.35)|24.14| | 8| 3082| 24656| 76272(3.09)| 56931(2.31)|25.36| | 9| 1417| 12753| 39319(3.08)| 29420(2.31)|25.18| | 10| 1203| 12030| 37136(3.09)| 27324(2.27)|26.42| | 11| 474| 5214| 16072(3.08)| 11835(2.27)|26.36| | 12| 398| 4776| 14714(3.08)| 10722(2.24)|27.13| | 13| 164| 2132| 6532(3.06)| 4795(2.25)|26.59| | 14| 122| 1708| 5232(3.06)| 3820(2.24)|26.99| | 15| 50| 750| 2283(3.04)| 1631(2.17)|28.56| | 16| 29| 464| 1419(3.06)| 1033(2.23)|27.20| | 17| 8| 136| 405(2.98)| 301(2.21)|25.68| |All| 240823| 897561| 2868626(3.20)| 2321524(2.59)|19.07| 22. unihan-3072 | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 4427| 4427| 14957(3.38)| 14136(3.19)| 5.49| | 2| 57418| 114836| 384468(3.35)| 333660(2.91)|13.22| | 3| 41335| 124005| 401283(3.24)| 336865(2.72)|16.05| | 4| 89296| 357184| 1139404(3.19)| 903458(2.53)|20.71| | 5| 21091| 105455| 332420(3.15)| 259189(2.46)|22.03| | 6| 15128| 90768| 284134(3.13)| 213746(2.35)|24.77| | 7| 5181| 36267| 112576(3.10)| 83977(2.32)|25.40| | 8| 3082| 24656| 76272(3.09)| 56168(2.28)|26.36| | 9| 1417| 12753| 39319(3.08)| 28917(2.27)|26.46| | 10| 1203| 12030| 37136(3.09)| 26962(2.24)|27.40| | 11| 474| 5214| 16072(3.08)| 11690(2.24)|27.26| | 12| 398| 4776| 14714(3.08)| 10647(2.23)|27.64| | 13| 164| 2132| 6532(3.06)| 4757(2.23)|27.17| | 14| 122| 1708| 5232(3.06)| 3801(2.23)|27.35| | 15| 50| 750| 2283(3.04)| 1634(2.18)|28.43| | 16| 29| 464| 1419(3.06)| 1024(2.21)|27.84| | 17| 8| 136| 405(2.98)| 301(2.21)|25.68| |All| 240823| 897561| 2868626(3.20)| 2290932(2.55)|20.14| 23. unihan-4096 | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 4427| 4427| 14957(3.38)| 13899(3.14)| 7.07| | 2| 57418| 114836| 384468(3.35)| 332156(2.89)|13.61| | 3| 41335| 124005| 401283(3.24)| 336045(2.71)|16.26| | 4| 89296| 357184| 1139404(3.19)| 902406(2.53)|20.80| | 5| 21091| 105455| 332420(3.15)| 258709(2.45)|22.17| | 6| 15128| 90768| 284134(3.13)| 213522(2.35)|24.85| | 7| 5181| 36267| 112576(3.10)| 83844(2.31)|25.52| | 8| 3082| 24656| 76272(3.09)| 56083(2.27)|26.47| | 9| 1417| 12753| 39319(3.08)| 28883(2.26)|26.54| | 10| 1203| 12030| 37136(3.09)| 26935(2.24)|27.47| | 11| 474| 5214| 16072(3.08)| 11684(2.24)|27.30| | 12| 398| 4776| 14714(3.08)| 10632(2.23)|27.74| | 13| 164| 2132| 6532(3.06)| 4751(2.23)|27.27| | 14| 122| 1708| 5232(3.06)| 3794(2.22)|27.48| | 15| 50| 750| 2283(3.04)| 1630(2.17)|28.60| | 16| 29| 464| 1419(3.06)| 1026(2.21)|27.70| | 17| 8| 136| 405(2.98)| 301(2.21)|25.68| |All| 240823| 897561| 2868626(3.20)| 2286300(2.55)|20.30| 24. unihan-4096-D | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 4427| 4427| 14957(3.38)| 13909(3.14)| 7.01| | 2| 57418| 114836| 384468(3.35)| 332799(2.90)|13.44| | 3| 41335| 124005| 401283(3.24)| 335682(2.71)|16.35| | 4| 89296| 357184| 1139404(3.19)| 905086(2.53)|20.56| | 5| 21091| 105455| 332420(3.15)| 259944(2.46)|21.80| | 6| 15128| 90768| 284134(3.13)| 215353(2.37)|24.21| | 7| 5181| 36267| 112576(3.10)| 84893(2.34)|24.59| | 8| 3082| 24656| 76272(3.09)| 56682(2.30)|25.68| | 9| 1417| 12753| 39319(3.08)| 29273(2.30)|25.55| | 10| 1203| 12030| 37136(3.09)| 27189(2.26)|26.79| | 11| 474| 5214| 16072(3.08)| 11762(2.26)|26.82| | 12| 398| 4776| 14714(3.08)| 10674(2.23)|27.46| | 13| 164| 2132| 6532(3.06)| 4770(2.24)|26.97| | 14| 122| 1708| 5232(3.06)| 3803(2.23)|27.31| | 15| 50| 750| 2283(3.04)| 1630(2.17)|28.60| | 16| 29| 464| 1419(3.06)| 1028(2.22)|27.55| | 17| 8| 136| 405(2.98)| 300(2.21)|25.93| |All| 240823| 897561| 2868626(3.20)| 2294777(2.56)|20.00| 25. unihan-4096-DAMP075-SKEW48 | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 4427| 4427| 14957(3.38)| 13899(3.14)| 7.07| | 2| 57418| 114836| 375901(3.27)| 324587(2.83)|13.65| | 3| 41335| 124005| 394416(3.18)| 330550(2.67)|16.19| | 4| 89296| 357184| 1126357(3.15)| 890277(2.49)|20.96| | 5| 21091| 105455| 329783(3.13)| 255913(2.43)|22.40| | 6| 15128| 90768| 282751(3.12)| 211339(2.33)|25.26| | 7| 5181| 36267| 112181(3.09)| 83126(2.29)|25.90| | 8| 3082| 24656| 76111(3.09)| 55712(2.26)|26.80| | 9| 1417| 12753| 39285(3.08)| 28699(2.25)|26.95| | 10| 1203| 12030| 37150(3.09)| 26767(2.23)|27.95| | 11| 474| 5214| 16028(3.07)| 11603(2.23)|27.61| | 12| 398| 4776| 14712(3.08)| 10567(2.21)|28.17| | 13| 164| 2132| 6528(3.06)| 4735(2.22)|27.47| | 14| 122| 1708| 5248(3.07)| 3762(2.20)|28.32| | 15| 50| 750| 2281(3.04)| 1628(2.17)|28.63| | 16| 29| 464| 1425(3.07)| 1017(2.19)|28.63| | 17| 8| 136| 404(2.97)| 301(2.21)|25.50| |All| 240823| 897561| 2835518(3.16)| 2254482(2.51)|20.49| 26. unihan-4096-DUDE | N| FREQ| N*FREQ| SUM OF DUDE(X)| SUM OF LDUDE(Y)| COMP| | 1| 4427| 4427| 17708(4.00)| 17708(4.00)| 0.00| | 2| 57418| 114836| 443874(3.87)| 409657(3.57)| 7.71| | 3| 41335| 124005| 474117(3.82)| 408039(3.29)|13.94| | 4| 89296| 357184| 1361917(3.81)| 1074237(3.01)|21.12| | 5| 21091| 105455| 401146(3.80)| 308378(2.92)|23.13| | 6| 15128| 90768| 344208(3.79)| 250925(2.76)|27.10| | 7| 5181| 36267| 137275(3.79)| 99475(2.74)|27.54| | 8| 3082| 24656| 93013(3.77)| 65889(2.67)|29.16| | 9| 1417| 12753| 48000(3.76)| 34230(2.68)|28.69| | 10| 1203| 12030| 45427(3.78)| 31663(2.63)|30.30| | 11| 474| 5214| 19564(3.75)| 13708(2.63)|29.93| | 12| 398| 4776| 18013(3.77)| 12468(2.61)|30.78| | 13| 164| 2132| 7969(3.74)| 5590(2.62)|29.85| | 14| 122| 1708| 6377(3.73)| 4476(2.62)|29.81| | 15| 50| 750| 2811(3.75)| 1926(2.57)|31.48| | 16| 29| 464| 1749(3.77)| 1213(2.61)|30.65| | 17| 8| 136| 508(3.74)| 355(2.61)|30.12| |All| 240823| 897561| 3423676(3.81)| 2739937(3.05)|19.97| 27. unihan-SC-4096 ( SC only or SC+TC mixed ) | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 769| 769| 2717(3.53)| 2378(3.09)|12.48| | 2| 16065| 32130| 108598(3.38)| 92597(2.88)|14.73| | 3| 14315| 42945| 139693(3.25)| 116054(2.70)|16.92| | 4| 48871| 195484| 623650(3.19)| 491073(2.51)|21.26| | 5| 12135| 60675| 190928(3.15)| 147721(2.43)|22.63| | 6| 10463| 62778| 196038(3.12)| 146516(2.33)|25.26| | 7| 3594| 25158| 77931(3.10)| 57412(2.28)|26.33| | 8| 2373| 18984| 58686(3.09)| 42907(2.26)|26.89| | 9| 1078| 9702| 29875(3.08)| 21736(2.24)|27.24| | 10| 934| 9340| 28786(3.08)| 20855(2.23)|27.55| | 11| 392| 4312| 13279(3.08)| 9612(2.23)|27.62| | 12| 314| 3768| 11579(3.07)| 8376(2.22)|27.66| | 13| 144| 1872| 5724(3.06)| 4158(2.22)|27.36| | 14| 104| 1456| 4455(3.06)| 3226(2.22)|27.59| | 15| 41| 615| 1868(3.04)| 1348(2.19)|27.84| | 16| 25| 400| 1219(3.05)| 887(2.22)|27.24| | 17| 7| 119| 353(2.97)| 264(2.22)|25.21| |All| 111624| 470507| 1495379(3.18)| 1167120(2.48)|21.95| 28. unihan-TC-4096 ( TC only ) | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 3658| 3658| 12240(3.35)| 11521(3.15)| 5.87| | 2| 41353| 82706| 275870(3.34)| 239559(2.90)|13.16| | 3| 27020| 81060| 261590(3.23)| 219991(2.71)|15.90| | 4| 40425| 161700| 515754(3.19)| 411333(2.54)|20.25| | 5| 8956| 44780| 141492(3.16)| 110988(2.48)|21.56| | 6| 4665| 27990| 88096(3.15)| 67006(2.39)|23.94| | 7| 1587| 11109| 34645(3.12)| 26432(2.38)|23.71| | 8| 709| 5672| 17586(3.10)| 13176(2.32)|25.08| | 9| 339| 3051| 9444(3.10)| 7147(2.34)|24.32| | 10| 269| 2690| 8350(3.10)| 6080(2.26)|27.19| | 11| 82| 902| 2793(3.10)| 2072(2.30)|25.81| | 12| 84| 1008| 3135(3.11)| 2256(2.24)|28.04| | 13| 20| 260| 808(3.11)| 593(2.28)|26.61| | 14| 18| 252| 777(3.08)| 568(2.25)|26.90| | 15| 9| 135| 415(3.07)| 282(2.09)|32.05| | 16| 4| 64| 200(3.12)| 139(2.17)|30.50| | 17| 1| 17| 52(3.06)| 37(2.18)|28.85| |All| 129199| 427054| 1373247(3.22)| 1119180(2.62)|18.50| END OF DRAFT