Cjklib is a library providing Han character related methods for CJKV languages
(Chinese, Japanese, Korean and Vietnamese).

Introduction
============
Cjklib provides language routines related to Han characters (characters based
on Chinese characters named Hanzi, Kanji, Hanja and chu Han respectively) used
in writing of the Chinese, the Japanese, infrequently the Korean and formerly
the Vietnamese language(s). Functionality is included for character
pronunciations, radicals, glyph components, stroke decomposition and variant
information.

Installing
==========
If you are installing from the source package you need to deploy the library on
your system:

$ python setup.py install

Documentation
=============
The API under http://cburgmer.nfshost.com/cjklib/ includes a lot of
documentation. Also see the project page.

Usage
=====
The main components of this package are accessible through the Python library.
But there is a small command line tool 'cjknife' that offers some of the
library's functions. See "cjknife --help" for an overview.

This tool also offers simple access to dictionaries which though have to be
built before use. Currently supported are EDICT, CEDICT, HanDeDict, CFDICT,
CEDICTGR.

To create the needed tables and indices for e.g. the CEDICT dictionary run
$ buildcjkdb build fullCEDICT --dataPath=PathToCedictFile

Database
========
Packaged versions of the library will ship with a pre-build SQLite database
file. If you want to, you can rebuild the database.

First download the newest Unihan file:
$ wget ftp://ftp.unicode.org/Public/UNIDATA/Unihan.zip

Then start the build process:
$ buildcjkdb build cjklibData

Alternatively use e.g. 'fullMandarin' for data needed by the library's functions
for the Mandarin Chinese language.

SQLite
------
Currently only characters from the Basic Multilingual Plane (BMP) of Unicode
are supported, due to missing support in MySQL (see below). To enable full
support set wideBuild to True in cjklib.conf.

SQLite offers a full-text search with extension FTS3 which needs to be
compiled in to used. Cjknife can use the full-text capabilities for the
dictionary search and performs a full table scan in fuzzy search if this
extension is not available. To enable it set enableFTS3 to True in
cjklib.conf. No full-text support is currently given for MySQL.

MySQL
-----
With MySQL 5 the following CREATE command creates a database with utf8 as
character set using the general Unicode collation:

CREATE DATABASE cjklib DEFAULT CHARACTER SET utf8 COLLATE utf8_bin;

You might need to set access rights, too (substitute user_name and
host_name):

GRANT ALL ON cjklib.* TO 'user_name'@'host_name';

Now you need to change cjklib.conf to tell cjklib to use MySQL.

MySQL < 6 doesn't support true UTF-8, and uses a Version with max 3 bytes, so
characters outside the Basic Multilingual Plane (BMP) can't be encoded. Building
the Unihan database thus might result in Warnings, Characters above 0x20000
can't be built at all.

Contributing
============
If you are interested in contributing to cjklib, join
cjklib-devel@googlegroups.com (http://groups.google.com/group/cjklib-devel).

Please report bugs to http://code.google.com/p/cjklib/issues/list.
