Sanskrit signs and Pān. inian scripts

semanticscholar(2015)

Cited 1|Views2
No score
Abstract
We discuss ways of understanding the Pān. inian grammatical tradition of Sanskrit in computationally tractable ways. We propose to dissociate the formal expression of the locutor’s communicative intention (expressed as a composition of sign combinators called a script), from its justification (using Pān. inian rules and meta-rules). Computation consists then in evaluating a Pān. inian script to its final sign, delivering both the correct enunciation, and its meaning expressed as a non-ambiguous paraphrase. 1 Computational linguistics and the As. t .ādhyāyı̄ It is now recognized as an undisputed fact that Pān. ini was a genius linguist 25 centuries before linguistics was established as a scientific discipline in Europe by de Saussure, and that his As. t .ādhyāyı̄ is a very complete and precise grammar of Sanskrit. This scholarly consensus must be distinguished from opinions stated in various social media, claiming that Pān. ini’s As. t .ādhyāyı̄ is a faultless computer program, and that Sanskrit is the perfect programming language of the future. Usually such hyperbolic assertions (atiśayokti) are not backed up by any argumentative justification. It has also been claimed that Pān. ini invented the Backus-Naur form of context-free grammars. This originates from a 1967 note in a computer journal by Peter Ingerman (Ingerman, 1967) without any precise evidence. Such uninformed anachronistic judgements are misleading, and just add confusion to the debate around the actual contribution of Pān. ini to formal computation and information theory besides linguistic modeling. Actually, even if it is far-fetched to recognize a context-free grammar description in Pān. ini’s grammar, it is a fact that many formal description mechanisms are explicit in the As. t .ādhyāyı̄. For instance, external sandhi operations are defined by sūtras of a standardized form which may be unambiguously decoded as algebraic rewrite rules of the form : [x]u|v → w, with x, u, v, w ∈ Σ∗, where Σ denotes the set of phonemes (varn. a) of Sanskrit. The encoding uses Sanskrit morphology (vibhakti) to discriminate the fields of a record encoding the 4-tuple of strings x, u, v and w that are the parameters to the rewrite rule (Cardona, 1974; Bhate and Kak, 1993). The rule may be read as a computation procedure to rewrite a juxtaposition of u and v in the input string as string w in a left context x. That is, XxuvY may be rewritten as XxwY for any strings X and Y . If we further specify that rewriting is done uniformly in a left-to-right fashion, we get indeed a vikāra algorithm (vidhikalpa) that applies (external) sandhi to strings of phonemes in order to transform a list of isolated words (padapāt .ha) into a continuous enunciation (sam. hitāpāt .ha). It is easy to relate such rules to contemporary morpho-phonetic rules in computational linguistics, building on the theory of regular relations in formal language theory (Kaplan and Kay, 1994; Koskenniemi, 1984). Indeed, such Pān. inian rules may be directly fed into the finite state toolkits implementing this paradigm (Huet, 2005; Hyman, 2009). This sort of mechanism may be applied as well to vowel grade shift (gun. a, vr.ddhi), vowel harmony, etc. The situation is more complex for generative morphology, where word construction from morphemes and affixes uses retroflexion, which needs for its specification a non-regular operation, where the left context must be inspected on an unbounded, although generally small, suffix. Indeed, many Pān. inian rules are of a more complex nature, involving context-free and even context-sensitive formulations. Furthermore, the “flow of control” of Pān. inian rules, including rules of a meta-linguistic nature, is a complex affair, and it is not possible to regard As. t .ādhyāyı̄ directly as a computer program whose instructions would be the sūtras. Actually, part of the problem is the conciseness (lāghava) of its description, a very important concern since the grammar had to be exactly memorized by the traditional students. We may rather think of As. t .ādhyāyı̄ as a high-level program compiled into a low-level machine code, where techniques of compaction such as sharing have been applied to obtain a low memory imprint, at the expense of control complexity. Indeed, the advent of printing allowed equivalent reformulations of the grammar in more hierarchical ways, and presumably of easier use to the student, but at the expense of duplication of rules (Dı̄ks.ita et al., 1905). It remains that Pān. ini is the ultimate authority, and that the perfection of its description induced a prescriptive nature of the grammar, seen as the gold standard of Sanskrit, following Patañjali magisterial commentary (Joshi and Roodbergen, 1990; Filliozat, 1975). This explains the stability of the language, since it could evolve only through the constraints of the grammar. Thus further commentaries were reduced to settle matters of details, and to elucidate the flow of control of the grammar usage (Sharma et al., 2008; Joshi and Roodbergen, 2004; Sharma, 1987). Thus Pān. ini’s As. t .ādhyāyı̄ is often (justly) referred as a generative grammar for Sanskrit. Actually, when challenged, a competent (śis. t .a) Sanskrit locutor should be able to exhibit the sequence of Pān. inian sūtras (prakriyā) validating his linguistic productions. Indeed, such systematic sequences have been worked out for the various examples discussed in traditional grammars (Grimal et al., 2006). Thus it would seem that it could be possible in principle to write a simulator of Pān. inian derivations which would take sūtras as instructions and derive Sanskrit strings guaranteed by construction to be correct Sanskrit. 2 Using the As. t .ādhyāyı̄ in generation There have been indeed attempts to write a simulator as a computer program that would progressively elaborate a target Sanskrit utterance as a sequence of operations on a string of phonemes – certain ending up as phonetic material, others being meta-linguistic markers (anubandha) which are progressively eliminated when the operation they trigger is effected. See for instance the work of Anand Mishra (Mishra, 2009; Mishra, 2010), of Peter Scharf (Scharf, 2009), and of Pawan Goyal et al. (Goyal et al., 2009). The first remark to be made is that the As. t .ādhyāyı̄ is not self-sufficient. It must be used together with specialized lexicons, one giving roots with derivational markers (dhātupāt .ha), another one giving lists of words sharing morphological characteristics (gan. apāt .ha), still other ones listing attested genders of substantives (liṅgānuśāsana) (Cardona, 1976). Access to these resources is triggered by root or stem selection. One practical problem is to decide which version of these resources to use, since the lexical lists are open-ended and have been amended or reorganised since Pān. ini’s time. Another difficulty is that checks must be effected that a rule application is indeed permitted at the time of its invocation. This induces the maintenance of complex data structures storing the derivation history, the verification of context conditions implicitly carried over from one sūtra to the next (anuvr.tti), but also the analysis of complex priority rules between sūtras (siddha, asiddhavat) which are not always consensual among experts. Also, certain sūtras are subject to semantic conditions (rule X is valid for root R “in the sense of ...”) which are not directly amenable to computation. Aspects of this control problem, and their relation with computational devices, have been discussed in (Goyal et al., 2009). Finally, many rules specifying optional operations are non-deterministic in nature (with a long history of discussions on the optionally/preferably interpretations (Kiparsky, 1980)). These difficulties lead one to believe that As. t .ādhyāyı̄ can be used to generate an enunciation S only if, not only S is known in advance, but its intended meaning is known too. And there might still be choices in the application of rules which must be made explicit if one wants to obtain a deterministic simulation. The rules discuss both forms and meanings. However the grammar cannot be construed to generate meaning from correct enunciations (think of śles.a ambiguity), nor correct enunciations from meaning (since there are many ways to say the same thing, specially in a language with flexible word order). Rules have conditions both on the surface realisation (phonemic strings) of the considered enunciation and on its intended meaning. Any attempt to explain generativity in unidirectional way runs into circularities (itaretarāśrayados.a). As Peter Scharf puts it: “The rules do not actually generate the speech forms in certain meanings; they instruct one that it is correct to use certain speech forms in certain meanings” (Scharf, 2009). The solution to these difficulties is to make explicit oracle decisions fixing all these choices1, and to consider that the derivation process operates not just on surface material (strings of phonemes and markers) but on signs in the sense of de Saussure, that is pairs of enunciations and of their meanings. This will be possible if we identify precisely the semantic combinators implicit in the derivational process. The derivational process ought to derive not just the target final enunciation, but also a formal expression representing its sense, or some disjunction of possible senses, when some ambiguity remains.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined