Database issues

Michael Hirohama kamesan
Sun Nov 16 14:28:19 EST 1997


At 01:38 -0500 11/17/97, Abe-Nornes wrote:
>[...]
>Our database is capable of inputting Japanese characters, but there has
>been some previous discussion about the need for romaji. It might be
>helpful for those without Japanese capabilities, but then anyone who wants
>these articles presumably reads Japanese.
>[...]

When handling names and keywords for searching, there is no need to force
an exact match if the system is designed to support liberal indexing.  I
would recommend that you imagine having a virtual a data field which stores
canonical representations for each searchable name field.  Initially, the
virtual canonical representation can be identical to the name field itself.
However, upon the development of a function to convert romaji->canon(s),
Japanese->canon(s), name->canon(s), etc. users can easily search for
bibliographic entries by specifying names in *any* convenient form.  I
don't know if CGI has hooks to support this type of liberal indexing, but
if the CGI code is distributed in source code form, it would be trivial to
write a few lines of Perl code to handle this.

The only difficulty may be in procuring a function to produce canonical
forms.  Until automatic language translation services (which need not be
highly accurate, as long as the one chosen is consistent in its mistakes)
becomes affordable, it may be desirable to translate by hand and store the
translations into additional field(s) for each name field.  As I have not
followed automated translation technologies, I don't have a good sense of
what is readily available today.

>We are going to have to live with the style we decide right now; does
>anyone dislike the way entries are formatted as is?
>
>amn

The storage of information and the style of its presentation need not be
linked so permanently as seems to be implied by your comment.  Multiple
views of underlying data are not difficult to produce.  Similarly, database
restructuring is not a herculean task if the initial data schema is a
sound, flexible one.  It should not be difficult to add an auxilary field
or two if later needs require such an addition.  However, it is vital to
have the primary and key fields be well designed because such changes are
very difficult to perform; for this reason, selecting the range of
indexable types of bibiliographic entries with keen understanding is
crucial.  Can you point me to the latest design?  I can take a glance at it
and give you my impressions.


-- Michael Hirohama <kamesan at ricochet.net> --
For information on the PSYCHOHISTORY (historical motivation) forum,
send a request to <psychohistory-request at home.ease.lsoft.com>.






More information about the KineJapan mailing list