Romanization of Japanese
From DramaWiki
The heart of DramaWiki is its search engine. For the search engine to function at its best for the majority of its illiterate Japanese readers, all text must be searchable using common roman characters that can be found on any given keyboard.
Contents
DramaWiki policy
Creed
DramaWiki has a stance on the romanization of Japanese text, which differs from other wiki's, such as Wikipedia. Wikipedia, for example, focuses on precise accuracy of its information, which is really important when producing an encyclopedia-like service. DramaWiki, however, places its focus on the ease of searching for information. The guidelines DramaWiki use when romanizing Japanese text are:
- that the words and phrases, including proper nouns, be entered using a spelling arrangement that is most common among Japanese drama fans.
- that the latin-based characters used can be entered using any computer keyboard, including non-PC keyboards and operating systems.
- that the words and phrases can be queried using a variety of Internet-based search engines using their default interfaces and settings.
General guidelines
It is therefore a policy that all romanized Japanese text entered into DramaWiki use the following criteria. They are provided below in order of importance. When using the criteria, start with the first criterion and then work your way down until you reach a resolution:
- The most popular romanized spelling should be used - regardless of the romanization used. A top-rated search engine like Google or Yahoo should be used to determine the most popular romanized spelling. More specific, the arrangement that generates the most number of results should be used on DramaWiki unless a majority cannot be determined. A variation with 55.1-percent or more should serve as the majority. When searching, always place the entire name or phrase in quotations. For example, search engine testing for Sakai Noriko should be entered as "Sakai Noriko" (including quotes). Use of macrons can be used in the search process to improve on your searches, but they should not be used when adding the romaji text to DramaWiki.
- If the artist or TV show's official web site provides the hiragana and/or the romaji of the name, it should also be used on DramaWiki. If the romaji is available on the web site, use it regardless of the romanization system used to create it. This is the case where a consensus on the Internet has not been developed, usually occuring when a new artist or TV show is introduced or the search results are too close to call. "Too close to call" under DramaWiki is when two variations are divided 50-50, or within 5-percent or less of one another.
- If an official web site does not provide the hiragana or romaji, use the hiragana as provided on Japanese Wikipedia, and then develop the romaji based on the hiragana using the inspiration of the Hepburn romanization system, along with changes made specifically for DramaWiki to address changes to Hepburn due to repetition throughout the Internet.
- If the artist or TV show name is not on Japanese Wikipedia, then the DramaWiki-modified Hepburn romanization system should be applied on the kana.
- In all instances, the macrons must be removed, thus creating an even more modified Hepburn system excluding the use of macrons. See the historical section below for more information.
- All TV show and artist names must be properly capitalized. See the capitalization section below for more information.
- All katakana must be translated. See the katakana section below for more information.
- The article name must refect the TV show or artist name, with extra content (macrons, taglines, etc.) removed. See the article names section below for more information.
Briefly, the reasons for tasks like the removal of macrons is to be consistent with how romanization occured in past history. Details can be found in the historical section below.
Romanizing particles and other parts of words
Briefly, particles are very similar in use as adpositions and similar components (prepositions, postpositions, circumpositions, conjunctions, etc.) in the English language. Examples in the Japanese language include の or no, と or to, に for ni, among others.
It is a general rule that all particles are written using the correct arabic assignments. However, there are some exceptions:
- を, when used as a particle, should be written as "wo"
- は, when used as a particle, should be written as "wa". When は is used as a part of a word, it is no longer a particle, therefore it becomes "ha".
- へ, when used as a particle, should be written as "e".
- Words ending in 絵 should use "-e", such as yamato-e (大和絵) and ukiyo-e (浮世絵). Other words ending with the え ("e") sound should leave out the hyphen.
- Regarding people, titles in Japanese are attached to the end of a name, such as "san", "senpai", "sensei" and so on. All titles must be preceded by a hyphen, such as Sakamoto-sensei or Omiya-san.
Capitalization of romanized artist and TV show names
It is a general rule on DramaWiki that we apply the principles of capitalization on romanized artist and TV show names, using the same princples English writers use. Capitalization is the process of capitalizing the first letter in a word.
It is therefore a DramaWiki policy that the rules of capitalization apply, in that all words, except for internal articles, prepositions and conjunctions, must be capitalized. All particles should be written using non-capitalization.
Romanizing katakana
The purpose of katakana is to allow for the Japanese to write gairaigo (loanwords) as close to the word's native pronounciation and spelling as possible, but also stay within the Japanese phonology.
When romanizing katakana, most non-Japanese speaking people will apply the same rules of romanization they use on hiragana and kanji (assuming the pronounciation of kanji characters can be represented by both hiragana or katakana). This practice is impractical for two reasons:
- It does not help those who can read katakana and are able to generate the gairaigo from it.
- It does not help the illiterate Japanese reader since although the conversion of katakana to romaji was technically correct, the resulting romanized loanword has no meaning to the reader.
It is therefore a DramaWiki policy that all gairaigo written in katakana must be translated to its original native form. For example, プロポーズ, although it translates using romaji to puropozu, DramaWiki requires for the katakana word be translated to its English form, proposal. Same rule applies to all other arabic language systems.
The exception to this rule is when katakana is used to write a word that originates from a non-arabic character system, such as simplified and traditional Chinese, hangul, etc. Gairaigo written in katakana to represent words in these writing languages should then be romanized using the Hepburn system, with the macrons removed.
Romanization of article names
Another reason for removing macrons from romanized names is to stay compatible with Internet web standards. Although an Internet-based URL (universal relay link) can handle macrons (URLs adapt to macrons by converting the macron to its UTF-8 numerical equivalent), they should not be used in an article name under DramaWiki. Use of macrons in a URL name further complicates the searching capability of the popular search engines. And, the URL becomes unreadable by a person when too many UTF-8 codes are used.
The article name should be based on a TV show or artist's romanized name. Seeing the TV show and artist names fall under the same modified Hepburn restrictions, the article names should not pose any problem with staying within DramaWiki policy.
Use of taglines in a TV show title
The DramaWiki policy is that taglines should not be included in a Japanese TV show article's name.
It is common for Japanese TV shows to use taglines in addition to the main title. A tagline is a sub-title used to further market a TV show, giving the reader a more accurate description of the TV show. For example, the American TV show Star Trek uses the tagline "To boldly go where no man has gone before." However, when indexed in any TV show database, such as IMDB, taglines are never included as part of the main title.
In Japanese TV shows, taglines are usually identified when they're placed within tilde characters (~). For example, the Japanese drama スタートライン (Start Line) uses the tagline ~涙のスプリンター~ (Namida no Sprinter). For DramaWiki purposes, the article name should be Start Line and not Start Line ~Namida no Sprinter~ The purpose for leaving out the tagline is to stay compatible with other databases like IMDB and JDorama.com.
History of romanization in modern time
Romanization did not start recently. It has been practiced since the invention of the typewriter back in the 19th century. Back then, only the 26 letters of the latin alphabet and a few other punctuation characters were available on the keyboard (QWERTY or Dvorak styles).
When the modern digital computer was first invented 50-plus years ago, the only characters available on the keyboard were those that made up the ASCII character set. Aside from the "special" characters (control codes, etc.), there are 96 displayable characters in the ASCII set. They are:
!"#$%&'()*+,-./0123456789:;<=>? @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ `abcdefghijklmnopqrstuvwxyz{|}~
One of the earlier and more popular romanization systems is Hepburn. However, Hepburn, based on the use of macrons, was designed to be used when Japanese text was romanized by handwriting, and typewriters and teletype machines of the time were never addressed in its specification. Because the typewriter could not generate macrons, most recorders (people who copied handwritten text to typewritten text), replaced these macrons with characters that were available on the typewriter keyboard. In otherwords, the ō was replaced with o, ū was replaced with u, and so on.
Other romanization systems, such as wapuro romaji fixed the problem with macrons by replacing them with other characters that fit the ASCII set. For example, the ō was replaced with ou, ū with uu, and so on. The problem here is that most recorders were not familiar with wapuro romaji, and was never instructed to use this system. Also, wapuro romaji was never an official implementation of romaji at any time, as wapuro is considered a work-around.
In the sense of romanization, searching is complicated when the use of different characters for a given letter is in use. For example, the words kōhaku (紅白 or red and white) and kohaku (琥珀 or amber) differ when making a simple query using database software that hasn't been tweaked to convert non-ASCII to ASCII.
A great amount of information you'll find on the Internet today were recorded from hardcopy documentation produced using these legacy devices. But rather than correcting the romanized text, most if not virtually all hardcopy documents were recorded into the Internet in its exact form. As a result, data dating as early as the early 1990's were entered using Hepburn as a guideline for romanization, but with their macron characters replaced with ASCII.
In summary, technology has hindered the effectiveness of systems like Hepburn in its exact academic form. The ending result is that the modified Hepburn - Hepburn without the macrons - is the most common form of romanization you will find on the Internet.