A site devoted mostly to everything related to Information Technology under the sun - among other things.

Monday, December 2, 2024

Using Arabic Calligraphy in CAPTCHA Schemes


Subject Matter & Problem

Nowadays, many daily human activities such as education, trade, talks, etc. are done by using the Internet. In such things as registration on Internet web sites, making comments, and other such activities, hackers could develop programs that automatically register fictitious or fraudulent users, add SPAM by abusing comments feature and thus interfere with the proper functioning of publicly available Internet Web sites. 

A common approach has been the deployment of CAPTCHA on Web screens that require human interactivity.  A CAPTCHA is a type of challenge-response test used in computing as an attempt to ensure that the response is generated by a human being. The process usually involves a computer asking a user to complete a simple test which the computer is able to grade. These tests are designed to be easy for a computer to generate but difficult for a computer to solve, but again easy for a human. If a correct solution is received, it can be presumed to have been entered by a human.

A common type of CAPTCHA requires the user to type letters and/or digits from a distorted image that appears on the screen. Such tests are commonly used to prevent unwanted internet bots from accessing websites, since a normal human can easily read a CAPTCHA, while the bot cannot process the image letters and therefore, cannot answer properly, or at all.

The aim of this idea is the development of CAPTCAH for the users familiar with the Arabic alphabet and scripts, using Arabic calligraphic styles in languages such as Arabic, Kurdish, Pashtu, Persian, and Urdu languages.

Solution

The gist of this solution is to use the calligraphic styles of Arabic script in a dual text CAPTCHA scheme.

Arabic calligraphy, also known as Islamic calligraphy, is the artistic practice of handwriting, or calligraphy in the lands sharing a common Islamic cultural heritage.  Broadly speaking, there are 2 styles of writing in Arabic Script: Geometric Scripts and Cursive Scripts.  They are enumerated on the sections below together with sample writings of each.

 This naturally leads one to propose a dual CPATCHA scheme; with one displaying geometric style scripts while the other one containing a cursive style text.  This gives the possibility of creating 2 X 12 = 24 distinct pairs of CAPTCHA based on Arabic calligraphic styles.

In order to supply the text for each of these 2 CAPTCHA boxes, this idea relies on the methods and techniques of the references 1 & 2 for the following purposes:

The algorithms for distinguishing Arabic/Kurdish/Persian/Pashtu/Urdu language users from computer programs is based on Arabic calligraphic styles of text.  The initial algorithm, discussed in reference 1, is based on adding a background to the image of a meaningless randomly generated Arabic/Kurdish/Pashtu/Persian/Urdu word. This method relies on the difficulty of automatic separation of background from Arabic script text, due to the presence of many diacritical dots and signs. Furthermore, in this method, the image of a random meaningless Arabic/Kurdish/Pashtu/Persian/Urdu text is shown to the user, and he is asked to type it.   This algorithm is modified so that the Arabic text is now written in a particular (Arabic) calligraphic style.

 Furthermore, this idea relies on and extends the approach of reference 2 for the generation of CAPTCHA test.  In that reference, instead of using a large database of prepared word images written in Nastaliq calligraphic style, each time a random meaningless Arabic/Kurdish/Pashtu/Persian/Urdu word is generated and shown to the user. Therefore, there is not a need to make a local database for images.  This reduces the possibility of the text being recorded and over time the database constructed by hackers through a process of massive search.  In this disclosure, the above method is extended so that the CAPTCHA word is in generated with a specific calligraphic style.

 This idea uses algorithms and techniques similar to those found in reference 3 for the generation of CAPTCHA text.  The algorithms are altered and extended to include Arabic calligraphic styles. 

(It should be noted that Arabic calligraphic style is not identical with a font.  There can be multiple fonts within the same Arabic calligraphic style.  For example, the German Gothic script and the modern German scripts are not just distinguished by 2 different fonts; rather they are different calligraphic styles within the Latin script.)

Considering that the presently available OCR programs cannot identify these Arabic/Kurdish/Persian/Pashtu/Urdu words, the word can be identified only by a user familiar with Arabic script as well as with the specific Arabic calligraphic styles.

Geometric scripts (basically Kufic styles)

Kufic is a cleaner, more geometric style, with a very visible rhythm and a stress on horizontal lines. Vowels are sometimes noted as red dots; consonants are distinguished with small dashes to make the texts more readable.

 


and


The Maghribi script and its Andalusi variant are less rigid versions of Kufic, with more curves.



In "Flowering Kufi", slender geometric lettering is associated with stylized vegetal elements. In "Geometric Kufi", the letters are arranged in complex, two-dimensional geometric patterns, for example filling a square. This aims at decoration rather than readability.



Cursive styles (basically Naskh styles)

The cursive script styles are:

1.       Naskh is a simple cursive writing that was used in correspondence before the calligraphers started   using it for Qur'an writing. It is slender and supple, without any particular emphasis, and highly   readable. It remains among the most widespread styles. The most famous calligrapher of this genre   was  Hâfiz Osman, an Ottoman calligrapher who lived during the 17th century. It is the basis of     modern   Arabic print.



2.       Ṯuluṯ is a more monumental and energetic writing style, with elongated verticals.



3.       Tawqīʿ appeared under the Abbassid caliphate, when it was used to sign official acts. With elongated verticals and wide curves under the writing line, it remained a little-used script.


and



4.       Riqaa' was a miniature version of tawqi'. It has nothing to do with ruq`ah, a much later style the   Ottomans developed for secular handwriting, and which is still used at the present day in the Arab   countries that fell within the Ottoman cultural sphere.



5.       Muḥaqqaq is an ample, alert script. Letter endings are elongated, and their curves underline the text.



6.       Rayḥānī – which is a miniature version of Muḥaqqaq.




7.       Nasta'liq is a cursive style that means "suspended", which is a good description of the way each letter   in a word is suspended from the previous one, i.e. lower rather than on the same level.



8.       Shikasteh Nasta'liq (Broken Nasta'liq) is a variant of Nasta'liq  used in Persian script and in more   informal contexts.



9.       Diwani script is a cursive style of Arabic calligraphy that is distinguished by the complexity of the line   within the letter and the close juxtaposition of the letters within the word.


and



 Bihari script (used in India during the 15-th century)



11   Ruq'ah (also known as Riq'a) which is the most common script for everyday use is Simple and easy to   write, its movements are small, without much amplitude.




12   In China, a calligraphic form called Sini has been developed. This form has evident influences from   Chinese calligraphy, using a horsehair brush instead of the standard reed pen



Use Case 1 – Normal User

The user navigates to a Web page that requires validation.  The Arabic Calligraphic CAPTCHA Scheme will work with the Web server and display a dual Arabic text strings to the user.  Each CAPTCHA box will display a randomly generated text based on Arabic script but in a different calligraphic style.  The background of each CAPTCHA will also be different from each other.

 

The user, based on his knowledge and familiarity with Arabic script as well Arabic Calligraphic styles will type the CAPTCHA texts in a text box.  If correct, the system will validate/authenticate the user and the user will proceed with his tasks.  If incorrect, the system will provide the user with another set of CAPTCHA.

Use Case 2 – Administrative User

The system will be equipped with an administrative interface and features.  These administrative features will enable the administrator to perform the following tasks:

 

1.       Add a new font to a style

2.       Remove font from a style

3.       Set the background of the CAPTCHA

4.       Set the length of the CAPTCAH text

5.       Set the size, color of the font for specific style to be used or displayed

6.       Specialize the Arabic script to those of Arabic, Pashtu, Persian, or Urdu

7.       Set the Arabic calligraphic styles of the dual CAPTCHA scheme – including the ability of fixing the styles or making their choices random.

Other Similar Solutions

This idea is based on prior work sited below:

1.       Mohammad Hassan Shirali-Shahreza, Mohammad Shirali-Shahreza: Persian/Arabic Baffletext CAPTCHA. Journal of Universal Computer Science, vol. 12, no. 12 (2006), 1783-1796. (see http://www.jucs.org/jucs_12_12/persian_arabic_baffletext_captcha/jucs_12_12_1783_1796_shahreza.pdf )

2.       M. H. Shirali-Shahreza and M. Shirali-Shahreza: Advanced Nastaliq CAPTCHA. Cybernetic Intelligent Systems, 2008. CIS 2008. 7th IEEE International Conference on 9-10 Sept. 2008. (see http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4798962&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F4786971%2F4798915%2F04798962.pdf%3Farnumber%3D4798962 )

3.       Bilal Khan, Khaled Alghathbar, Muhammad Khurram Khan, Abdullah M. AlKelabi, Abdulaziz AlAjaji: Cyber security using Arabic CAPTCHA scheme. Int. Arab J. Inf. Technol. 10(1): 76-84 (2013) (see http://www.iajit.org/PDF/vol.10,no.1/3046-1.pdf )

The  idea extends the above works through the application of the Arabic calligraphic styles to the CAPTCHA scheme. 

The various styles may be realized in specific fonts designed for that style.  Within each Arabic calligraphic styles, there could be numerous fonts that conform to the stylistic rules of that style. 

System Architecture & Design

The architecturally significant components of the system are illustrated below:

 


 

Possible Modifications

This scheme may be extended to other Arabic calligraphic artifacts for use in a CAPTCHA scheme:

 

1.       Bismillah Calligraphy

 


2.       Tuqrah



3.       Zoomorphic Calligraphy

 



No comments:

About Me

My photo
I had been a senior software developer working for HP and GM. I am interested in intelligent and scientific computing. I am passionate about computers as enablers for human imagination. The contents of this site are not in any way, shape, or form endorsed, approved, or otherwise authorized by HP, its subsidiaries, or its officers and shareholders.

Blog Archive