[Yulcat-l] FWD: LC makes CJK Compatibility Database available

Manon Theroux manon.theroux at yale.edu
Fri Oct 28 11:36:38 EDT 2005


Forwarding the following LC message. Yale gets a mention :)

-Manon

=================================
Date: Thu, 27 Oct 2005 21:01:02 -0400
From: "Ann Della Porta" <adel at loc.gov>
Subject: LC makes CJK Compatibility Database available

      The Library of Congress has made available a CJK Compatibility 
Database to
help CJK catalogers quickly and conveniently replace non-MARC characters with
their MARC equivalents.  The database is available at
http://www.loc.gov/ils/cjk_search/cjk_cpso.html


Non-MARC Characters and Their MARC Equivalents

      The Library of Congress database will soon be upgraded to the Voyager 
with
Unicode release.  RLG's Union Catalog and OCLC's WorldCat databases are now
also Unicode compatible.  Chinese, Japanese and Korean (CJK) scripts are input
into these systems using Microsoft input method editors (IMEs).

      The Unicode standard includes several hundred duplicate CJK 
characters, for
example, (F937) and * (8DEF), as well as many others that represent close
variants, for example, * (6B65) and  (6B69).  Generally, one of these variants
is a valid MARC character while the other is not.

      Only characters in the MARC repertoire can be used in MARC records 
exported
in MARC-8.  However, sometimes the most logical way to create a character using
a Microsoft IME produces a character that is not in the MARC repertoire.  For
example, if one creates the common character by keying in and converting  in
the Korean IME, the result is not a valid MARC character (F9E1).  One must key
in and convert  to create the valid MARC character, * (674E).  Another example
is the character  (6B69) is created with the Japanese IME.  However the
Japanese form of this character is not a valid MARC form.  The valid MARC
equivalent, * (6B65) can only be created by using the Korean or Chinese IME.

      Only characters in the MARC repertoire can be used in records that are
intended to be used in a MARC-8 (non-Unicode) system.  Therefore characters
outside that repertoire should be replaced by their MARC equivalent.


The CJK Compatibility Database

      The CJK Compatibility Database includes more than 450 Chinese, 
Japanese and
Korean characters, Hangul syllables and diacritic marks, matched with their 
MARC
equivalents.  The list of characters in the database was initially 
identified by
LC staff, and was supplemented by entries in a similar database at Yale
University.  Characters that do not have a MARC equivalent are matched with the
missing character symbol .

      The database is intended to enable catalogers to quickly and conveniently
replace an invalid character with its valid MARC equivalent.  This service
enables users to paste in an invalid character to search the database.  The
result displays the character; its MARC equivalent; the Unicode encoding; as
well as other information that may be helpful in identifying the character and
describing how the MARC character can be input.


Updating the Database

      The database is a cooperative undertaking.  If, in the course of your 
work,
you encounter a non-MARC character that is not listed in this database, please
report it to us so that it can be added to the database.  Notify:

   Young Ki Lee, Senior Cataloging Specialist
   Korean/Chinese Team
   Library of Congress
   ylee at loc.gov.


posted by:
Ann Della Porta
Acting Coordinator
Integrated Systems Office
Library of Congress
Washington, DC  20540-4010
adel at loc.gov

_______________________________________________

Manon Théroux
Authority Control Librarian & NACO Coordinator
Catalog Department, SML, 2-8376
_______________________________________________




More information about the Yulcat-l mailing list