viewtopic.php?f=41&t=200
Prince Ka Naung
Project description: This is a call for "Prince Ka Naung" project proposal. The converter engine for backward compatibility with various font dependent encoded Myanmar text data. We shall discuss for the following scopes.
ASCII coded text ==> UTN11-2
Partial coded text ==> UTN11-2
UTN11-1 coded text ==> UTN11-2
Scope of Font Aspect and Handling Typography
- UTN11
- True Type Open
For this scope is where we all need to learn together. I'll try to share the knowledge that I've learned so far for this scope. As preliminary, you are adviced to read the following documents as minimum.
Major Documents
* Myanmar code chart for Unicode 5.0 - u1000.pdf (latest)
* To align with upcoming version, have to read mentioned documents at this article. (n3043.pdf, n3044.pdf in minimum)
* Unicode Technical Note 11 version 1 - Representing Myanmar in Unicode: Details and Examples - myanmar_uni.pdf
* Unicode Technical Note 11 version 2 - Representing Myanmar in Unicode: Details and Examples - myanmar_uni-v2.pdf
* TrueType Open Specification Chapters published by Microsoft
- chapter 1 - welcome to TrueType Open [0.07Mb Word]
- chapter 2 - common table formats [0.14Mb Word]
- chapter 3 - glyph substitution table (GSUB) [0.15Mb Word]
- chapter 4 - glyph positioning table (GPOS) [0.3Mb Word]
- chapter 5 - baseline table (BASE) [0.14Mb Word]
- chapter 6 - justification table (JSTF) [0.1Mb Word]
- chapter 7 - glyph definition table (GDEF) [0.1Mb Word]
- chapter 8 - TrueType Open tag registry [HTML]
- chapter 8 - TrueType Open tag registry [0.04Mb Word]
- index [0.03Mb Word]
* Opentype Development published by Microsoft
* will add more in later.
For this scope, I encourage you to install FontForge program through cygwin which the article can be found at this topic. Because more or less, you will have to open the font with fontforge and see it inside. It will give you a lot of advantage to understand the fields though our main purpose is writing/coding converter engine.
Scope of Existed Font API
There are burmese fonts existed in ASCII coded, Partial and UTN11. Due to nature of this project concept, we need to ask proper permission to the font developer/creator key person to let us use their font code map API. I will try to communicate with major font developers for this. To demonstrate, this is a kind of API document that we need.
UniBurma API => http://www.unimm.org/uniburma-api/
Scope of Handling Text Encoding/Decoding
[Major]
- ANSI
- UTF-8
The engine need to handle text data in sudden formats.
For ASCII formatted text data, it will be stored as ANSI format.
All partial and compliant text data will be stored as UTF-8.
To demo this, open notepad editor. Change font to Win Innwa by Format> Font > Win Innwa. Try to type a few sentence. And save it. You will not be prompted for anything.

Now change font to some unicode font, for example UniBurma and save it.

You will be prompted for encoding message.

To enconter this, you have to save as UTF-8 format.

The above example demonstrated that how the client data will be encoded/stored as. The engine need to handle those text data and conversion of it.
When converter make a conversion, it still need to handle including Unicode standard encoding definition. To example about this statement, the converter will still handle all the syllable structure and proper encoding at behind. The latest unicode definition to spell "Sun"(Nay), the encoding format is as "Na + Ta Way Htoe".

The data text that our client stored are "Ta Way Htoe + Na" in backend for all ASCII, Partial coded text. Thus, the engine need to handle this kind of issue at backend and give a proper unicode defined standard. This is one of the example issue that may encounter. We'll probably have these kind of similar/other issues once we started, eventually. These are becoming natural language processing field, such as converter also understand and make a conversion for word breaking, segmentation, burmese syllable structure, etc. Just for example, "Ya Yit + Ma => Ma + Ya Yit".

Therefore, this scope will be directly alignment with Scope of Font Aspect and Handling Typography section above.
The reference articles for this scope are:
* http://en.wikipedia.org/wiki/ASCII
* http://en.wikipedia.org/wiki/American_National_Standards_Institute
* http://en.wikipedia.org/wiki/Utf-8
* http://www.unimm.org/cms/node/14
[Minor]
Minor things that we need to learn/read in this scope is that try as much as possible to understand text rendering engines.
* Microsoft Uniscribe
* GNOME Pango
* KDE QT
* SIL Graphite
Since we are approaching for backward compatibility concept, understanding text rendering engine nature may be just a bonus for us in this project. Because we will approach to solve old data that client stored it. Not for compliant text data that people use in their present with proper standard.
Scope of Programming Use
- html entity, javascript
- python, perl, java, c/c++
For this scope, we will discuss and agree with what programming language that we will use for this project. I propose Python. Java may be another good try where there I heard the unicode handling is better in java. Thus we need to scope down the following keywords respect to programming language use.
* Unicode handling in python(or respective lang)
* UTF-8 and python(or respective lang)
* memory management/optimization in python(or respective lang)
* GUI widget kit for python(or respective lang)
* and other Python(or respective lang) toolkit that we can use/depand on
Example for python, how python handle unicode string from its doc. That will respect to font API, example UniBurma API view. And html will handle by Decimal NCRs which I just stated as unicode and html from API table.
A few reference links
* http://evanjones.ca/python-utf8.html
* http://www.amk.ca/python/howto/unicode
* http://www.reportlab.com/i18n/python_unicode_tutorial.html
* http://www.awaretek.com/toolkits.html (this article list the available py GUI toolkits)
Scope Source nature and commit
- Google SVN
We will use Google code hosting for Prince Ka Naung project. Commiting code is re-stricted and closely monitored by major assigned code committers. The end product will be distributed as open-source nature. I propose to use GPL but we can discuss this. There will be two layer ofproject member assignment. Code comitter and observer.
Code comitter need to know the nature of the programming language use in project (refer programming use section) and will assign to perspective core engine module field.
Code observer can anonymously access to svn, join main discussion and advise/sugggest the logic flow nature of engine.
Guide line for this scope is to advice to read this article.
Initial Assignment
Toinitialize this project, (so far I know of, the two converter guys) are already called to force-assignment in this project, Ko Soe Min and AKHtet.
As Ko Soe Min already have javascript font converter, thus one of the approach is already defined and we shall need to expand and in-depth serious approach to UTN11.
Of course, project is opening to join every interested party/people. The expected area that you will learn from this project would be:
- natural language processing (complex text rendering)
- learning project programming language use, e.g.. python
- GUI design
- open-source open project collaboration workshop
- ___wutever defined___
Other notes
I guess, there may be some similar approach out there which does these. But I still didn't research enough and I guess, we should be trying one even there is. We will confront to make this source as public/free use, learn or alter their own good purpose.
Another thing to discard this engine is that where there official text rendering engine itself make these happen, which mean those text rendering engine can support what I stated above and can perform conversion by itself. Then this engine may not be useful enough. Somehow, I don't think they will.
Why Prince Ka Naung
BTW, why Prince Ka Naung? He is one of our idol hero from history. To learn who is Prince Ka Naung, pls read these articles.
http://www.myanmars.net/myanmar-history/prince-kanaung.htm
http://en.wikipedia.org/wiki/Crown_Prince_Ka_Naung
Many, to this day, believe that Burmese history would have been very different if Crown Prince Ka Naung were to survive and succeed to the Burmese throne.
Come, join with us for Prince Ka Naung incarnation today!!
General Reference Links
http://www.ldc.upenn.edu/
http://www.unicode.org/
