dorsal/arxiv
View SchemaHow Difficult is it to Develop a Perfect Spell-checker? A Cross-linguistic Analysis through Complex Network Approach
| Authors | Monojit Choudhury, Markose Thomas, Animesh Mukherjee, Anupam Basu, Niloy Ganguly |
|---|---|
| Categories | |
| ArXiv ID | physics/0703198 |
| URL | https://arxiv.org/abs/physics/0703198 |
Abstract
The difficulties involved in spelling error detection and correction in a language have been investigated in this work through the conceptualization of SpellNet - the weighted network of words, where edges indicate orthographic proximity between two words. We construct SpellNets for three languages - Bengali, English and Hindi. Through appropriate mathematical analysis and/or intuitive justification, we interpret the different topological metrics of SpellNet from the perspective of the issues related to spell-checking. We make many interesting observations, the most significant among them being that the probability of making a real word error in a language is propotionate to the average weighted degree of SpellNet, which is found to be highest for Hindi, followed by Bengali and English.
{
"annotation_id": "5bffa107-3931-45a9-98d2-4de80c5d6323",
"date_created": "2026-03-02T18:01:18.022000Z",
"date_modified": "2026-03-02T18:01:18.022000Z",
"file_hash": "93b7a4682a5caf2aaccc4ad5492f3b9292eca5691f4fd86a37e6c005d7ba7201",
"private": false,
"record": {
"abstract": "The difficulties involved in spelling error detection and correction in a\nlanguage have been investigated in this work through the conceptualization of\nSpellNet - the weighted network of words, where edges indicate orthographic\nproximity between two words. We construct SpellNets for three languages -\nBengali, English and Hindi. Through appropriate mathematical analysis and/or\nintuitive justification, we interpret the different topological metrics of\nSpellNet from the perspective of the issues related to spell-checking. We make\nmany interesting observations, the most significant among them being that the\nprobability of making a real word error in a language is propotionate to the\naverage weighted degree of SpellNet, which is found to be highest for Hindi,\nfollowed by Bengali and English.",
"arxiv_id": "physics/0703198",
"authors": [
"Monojit Choudhury",
"Markose Thomas",
"Animesh Mukherjee",
"Anupam Basu",
"Niloy Ganguly"
],
"categories": [
"physics.soc-ph"
],
"title": "How Difficult is it to Develop a Perfect Spell-checker? A Cross-linguistic Analysis through Complex Network Approach",
"url": "https://arxiv.org/abs/physics/0703198"
},
"schema_id": "dorsal/arxiv",
"source": {
"execution_id": "dc215531-0e22-467a-b350-bca3ebc51fc9",
"id": "arXiv Dataset IDs",
"type": "Model",
"variant": "snapshot-2026-03-01",
"version": "0.1.0"
},
"user_id": 1000002
}