Categorical Vector Space Semantics for Lambek Calculus with a Relevant Modality

We develop a categorical compositional distributional semantics for Lambek Calculus with a Relevant Modality !L*, which has a limited edition of the contraction and permutation rules. The categorical part of the semantics is a monoidal biclosed category with a coalgebra modality, very similar to the structure of a Differential Category. We instantiate this category to finite dimensional vector spaces and linear maps via"quantisation"functors and work with three concrete interpretations of the coalgebra modality. We apply the model to construct categorical and concrete semantic interpretations for the motivating example of !L*: the derivation of a phrase with a parasitic gap. The effectiveness of the concrete interpretations are evaluated via a disambiguation task, on an extension of a sentence disambiguation dataset to parasitic gap phrases, using BERT, Word2Vec, and FastText vectors and Relational tensors.


Introduction
Distributional Semantics of natural language are semantics which model the Distributional Hypothesis due to Firth [20] and Harris [25] which assumes a word is characterized by the company it keeps.Research in Natural Language Processing (NLP) has turned to Vector Space Models (VSMs) of natural language to accurately model the distributional hypothesis.Such models date as far back as to Rubinstein and Goodenough's co-occurence matrices [52] in 1965, until today's neural machine learning methods, leading to embeddings, such as Word2Vec [39], GloVe [48], Fast-Text [13] or BERT [18] to name a few.VSMs were used even earlier by Salton [58] for information retrieval.These models have plenty of applications, for instance thesaurus extraction tasks [17,24], automated essay marking [33] and semantically guided information retrieval [35].However, they lack grammatical compositionality, thus making it difficult to sensibly reason about the semantics of portions of language larger than words, such as phrases and sentences.
Somewhat orthogonally, Type Logical Grammars (TLGs) form highly compositional models of language by accurately modelling grammar, however they lack distributionality, in that such models do not accurately describe the distributional semantics of a word, only its grammatical role.Applications of type-logics with limited contraction and permutation to phenomena that witness discontinuities such as unbounded dependencies is a line of research initiated in [10,47], with a later boost in [43,44,46], and also more recently in [63].Distributional Compositional Categorical Semantics (DisCoCat) [15] and the models of Preller [50,51], combine these two approaches using category theoretic methods, originally developed to model Quantum protocols.DisCoCat has proven its efficacy empirically [22,23,30,40,56,64] and has the added utility of being a modular framework which is open to additions and extensions, such as modelling relative pronouns [54,55].
Volume 5 Issue 2 ISSN 2631-4444 et al. in that their base logic is Displacement Calculus, an extension of Lambek Calculus with a set of binary operations.These approaches all rely on syntactic copying.What our work has added to these treatments is the semantic part, in particular equipping the logic of Kanovich et al. with a vector space semantics and thus relating traditional work on type logical grammars with a more modern approach that use embeddings and machine learning.Syntactic copying is by no means the only type logical approach used for analysing parasitic gaps; soon after the work of Engdahl, the phenomena was analysed by Steedman in Combinatory Categorial Grammar (CCG) [59] using a new operation called substitution, shorthanded as S.More recently, Moortgat et al. [57] showed that lexical polymorphism can be used as an alternative to syntactic copying to treat parasitic gaps.All of these approaches are amenable to over generation.The !L * and other syntactic copying approaches overcome this by restricting the lexicon, i.e. by assigning copyable and permutable types only to words that require them.The approach offered by CCG overcomes it by subjecting the S operator to rule features such as directional consistency and directional inheritance.The polymorphic type approach, both uses syntactic modalities in the lexicon and creates auxiliary semantic types that need to be learnt later via modern techniques such as machine learning.
In view of further work, Engdahl observed parallels between acceptability of phrases with parasitic gaps in them and coreference of pronouns and bounded anaphors.In another work [37], we have shown that indeed the logic of this paper and its categorical vector space semantics can be applied to anaphora and other coreference phenomena known as VP-ellipsis.The Achilles heel of using !L * as a base logic for analysing complex linguistic phenomena such as the discontinuities discussed in this paper and co-referencing discussed in another paper [37] is that the base logic is undecidable.This will not pose a problem for the examples discussed in the papers, but in the context of large scale wide coverage applications one needs to automate the parsing algorithm and would face infinite loops and nontermination.This criticism is not new and is the reason for the existence of bounded modality versions of linear logic such as [21,32].In current work, we are exploring how !L * variants of these logics can be used for similar applications.
In this paper, we first form a sound categorical semantics of !L * , which we call C(!L * ).This boils down to interpreting the logical contraction of !L * using endofunctors which we call !-functors, inspired largely by coalgebra modalities of differential categories defined in [12].In order to facilitate the categorical computations, we use the clasp-string calculus of [9], developed for depicting the computations of a monoidal biclosed category.To this monoidal diagrammatic calculus, we add the necessary new constructions for the coalgebra modality and its operations.Although not proven complete, this graphical language is sound, and makes visualising our work orders of magnitude easier.Next, we define three candidate !-functors on the category of finite dimensional real vector spaces in order to form a sound VSM of !L * in terms of structure-preserving functors C(!L * ) → FdVect.We conclude this paper with an experiment to test the accuracy of the different !-functors on FdVect.The experiment is performed using different neural word embeddings and on a disambiguation task over an extended version of dataset of [23] from transitive sentences to phrases with parasitic gaps.

Acknowledgements
This article is the full edition of the extended abstract [36]  2 !L * : Lambek Calculus with a Relevant Modality Following [29], we assume that the formulae of Lambek calculus L are generated by a set of atomic types At, and three connectives, \, / and , via the following grammar.
We refer to the formulae of L as types, Typ L , an element of which is thus either atomic, or is made up of two types joined by a comma or a slash.We will use uppercase Roman letters to denote arbitrary types of L, and uppercase Greek letters to denote a list of types, for example, Γ = {A 1 , A 2 , . . ., A n } = A 1 , A 2 , . . ., A n .It is assumed that the comma is associative, allowing us to omit brackets in expressions like A 1 , A 2 , . . ., A n .
A sequent of L is a pair of an ordered set of types and a type, denoted by Γ → A. The derivations of L are generated by the set of axioms and rules presented in Table 1.
The logic !L * extends L by endowing it with a modality denoted by !, inspired by the exponential modality, !, of Linear Logic, to enable the structural rule of contraction in a controlled way, albeit here it is introduced on a non-commutative base and has an extra property that allows the !-types to be permuted.This logic was introduced in [29].The types of !L * are generated via the following grammar: We refer to the types of !L * by Typ !L * ; here, ∅ denotes the empty type.The set of rules of !L * consists of the rules of L and the five rules of Table 2.
It is common to distinguish the comma and another connective • along with (•L), (•R) rules, however we present this calculus in the same style as in [29], where no such distinction is made.In our applications there is no practical difference between including a • connective and the presentation in Table 1, since we interpret , as a monoidal product (see Section 3).In the same lines, ∅ is the unit of the monoidal product, hence the sequents {∅, A} and {A, ∅} are abbreviations for the formula {A}.Also note that !L * does admit the cut rule, see [29].
3 Categorical Semantics for !L * Following the formalisms set out in [38] we provide a passage from logic to category theory by interpreting !L * as a category C(!L * ).It is conventional to denote such an interpretation using semantic brackets, allowing us the shorthand : !L * → C(!L * ) meaning a categorical semantics of !L * .We will go on to show that C(!L * ) has a particular categorical structure, and we show how to define sound semantics of !L * in any suitable category D as a functor C(!L * ) → D. This is essentially a standard construction of [28], but we have included far greater detail in the style of [38] as we will make practical use of the semantics, and are not concerned with the model theory of !L * in this paper.We will however need a notion of a class of models, which we will call !L * -categories, defined in the following.
Definition 1 A !L * -category is a monoidal biclosed category C = (C, ⊗, I, ⇒, ⇐) equipped with a lax monoidal endofunctor F = (F, m), equipped with, for lack of a better name, precomonadic2 structure (F, δ, ε), where δ : F → F 2 and ε : F → 1 C are natural transformations and ε is monoidal.We finally require F to have a C-indexed family of copying maps (∆ A : F A → F A ⊗ F A) A∈C (not necessarily natural).We also require that F commutes with 1 C , that is, we have a natural isomorphism σ : We note that the definition of the !-functorF in definition 1 is reminiscent of coalgebra modalities of differential categories [12].Although a differential category is symmetric monoidal, that structure is recovered by the fact that F commutes with 1 C .This suggests that one may be able to find further categorical models of !L * in terms of differential categories, although this has not yet been studied.
An important example of a !L * -category is the category of finite dimensional real vector spaces, FdVect.The monoidal structure of FdVect comes from the tensor product, and the monoidal biclosure V ⇒ W is defined to be the set of linear maps from V to W (which in turn is isomorphic to V * ⊗ W ). Note that V ⇒ W ∼ = W ⇐ V in FdVect, since the tensor product is symmetric.There are several choices of !-functor for FdVect which we discuss at length in Section 4. Next, we define an important !L * -category, namely the syntactic category of !L * .
Definition 2 We define the category C(!L * ) as a syntactic category of !L * , roughly speaking, this means that C(!L * ) has formulas of !L * as objects, and derivations of !L * as morphisms.We know that C(!L * ) is a monoidal biclosed category, as !L * contains L whose categorical semantics is exactly a monoidal biclosed category, as shown e.g. in [16].The operations of this category are ⊗ for the monoidal tensor with I its unit, ⇒ and ⇐ for its two closures.More precisely, we define a translation from !L * to a category C(!L * ) inductively on formulas and derivations as follows: We make use of the symbols C φ as opposed to just φ for pedagogical reasons, to distinguish between !L * -formulas as objects of a category or as just formulas of the logic.Formally, there is no distinction.
Next, we translate derivations of !L * to morphisms of C(!L * ) inductively, in the style of [38].Given a derivation π of a sequent Γ → A, we translate this to a morphism π : Γ → A .The structure of π comes from the following inductive definition.

The axiom of !L *
A → A ax is interpreted as the identity arrow: 2. The (\L) and (/L)-rules of !L * are interpreted using the evaluation maps internal to C(!L * ).
Consider the (\L)-rule where the two sequents Γ → A and ∆ 1 , B, ∆ 2 → C have derivations π and π respectively, which in turn are interpreted as morphisms f : Γ → A and g : We interpret the derivation: as the morphism f g : ∆ 1 ⊗ Γ ⊗ A ⇒ B ⊗ ∆ 2 → C in the following way: This is of course a morphism of the form 3. The interpretations of the (\R) and (/R)-rules are given by the tensor-hom adjunction.That is, given a derivation π of the antecedent A, Γ → B, with categorical semantics f : Thus we interpret the preceding derivation as is the transpose of f under the tensor-left hom adjunction (i.e.η Γ is the unit of the tensor-left hom adjunction).The interpretation of the (/R)-rule is very similar.
4. We interpret the (!L)-rule using the counit of F , letting the antecedent sequent 5. The (!R)-rule is interpreted using the lax monoidality of F (we call it m here) and the comonadic comultiplication (δ) assuming that the sequent !A 1 , . . ., !A n → B has derivation π, and categorical semantics f : 6.The (perm 1 ) and (perm 2 )-rules are interpreted using the natural isomorphism σ : ).The interpretation of (perm 2 ) is very similar, and we leave it to the reader to write it out for themselves.
7. The (contr)-rule is interpreted using the ∆-maps of definition 1.Assuming the antecedent sequent ∆ 1 , !A, !A, ∆ 2 → B has derivation π and categorical semantics f : We can abstract the structures of C(!L * ) as defined above to obtain the type of categories defined in defined in definition 1.Using this fact, we can define a functorial categorical semantics for !L * .
category D is a mapping of formulas and derivations of !L * to objects and morphisms of D. We denote an interpretation as an arrow (| |) : !L * → D, and require the interpretations of complex types of !L * to satisfy the following equations: We also require the (| |)-interpretations of derivations be defined as for that of in definition 2. In other words, we require what we call a categorical model to be a sound model of !L * .
A more condensed, but less practical way of describing C(!L * ) is to define it as the free category over a !L * -signature.This definition follows that in Chapter D of [28], and lets us classify categorical models in D of !L * canonically as functors C(!L * ) → D. Although this is a known result, again found in Chapter D of [28], we briefly sketch the proof in this particular case, namely that C(!L * ) is (isomorphic to) the free category over !L * in the following sense.
Proposition 1 Functorial models of !L * are in bijection with categorical models of !L * .That is, given a !L * -category D and an interpretation of The very particular nature of the structures surrounding !-functors in Definition 1 comes from first mistakenly identifying such functors as coalgebra modalities.Pacaud Lemay showed in [34] that one of the intended models of !-functors in Section 5 was in fact not a comonad, but our model still worked as intended, suggesting that requiring !-functors to be comonads was too strong a requirement.After carefully translating the sequents corresponding to the structure diagrams of a monoidal comonad, we realised that the structure of !-functors is not necessarily monoidal comonadic, but weaker as laid out in Definition 1. Clearly one may consider monoidal comonads/coalgebra modalities as models of !-functors, but these will necessarily be more expressive than !L * , making those models incomplete.Of course, we are not too concerned with incompleteness in terms of vector space semantics, but when defining syntactic categories this is a great concern.
To summarise, we now have a simple way to define semantics of !L * in terms of structurepreserving functors C(!L * ) → D. The inductive interpretation in Definition 2 will serve as a practical guide to defining vector space semantics in the following section.
4 Vector Space Semantics for C(!L * ) Following [16], we develop vector space semantics for !L * , by defining a functorial !L * -model in FdVect.This is known to some as a quantisation functor, first introduced by Atiyah in Topological Quantum Field Theory, as a functor from the category of manifolds and cobordisms to the category of vector spaces and linear maps.Since the category of cobordisms is monoidal, quantisation was later generalised to refer to a functor that 'quantises' any category in FdVect.
To define a vector space semantics we first need to provide the !L * -category structures on FdVect, most importantly we have to define a !-functor.As we noted earlier FdVect already is a monoidal biclosed category, and it is in fact symmetric, so for any choice of !-functor F on FdVect we already know that F ⊗1 FdVect ∼ = 1 FdVect ⊗F .Hence defining vector space semantics really boils down to simply specifying what our !-functorsshould be, and in doing so, determining what their diagonal maps should look like.Although quantisations are simply instances of functorial models, we still explicitly show what a quantisation looks like below, to allow us to use it practically in examples later.
Definition 5 A quantisation is a closed monoidal functor Q : C(!L * ) → (FdVect, F ), defined on the objects of C(!L * ) using the structure of the formulae of !L * , as follows: Here, V φ is the vector space in which vectors of words with an atomic type live and the other vector spaces are obtained from it by induction on the structure of the formulae they correspond to.
The quantisation functor is defined on these morphisms as follows: Recall that given two vector spaces V, W ∈ FdVect, the set V ⇒ W is the space of linear maps from V to W , which in turn is isomorphic to V * ⊗ W . Since the monoidal product in FdVect is symmetric, there is formally no need to distinguish between ( A ⇒ B ) and ( B ⇐ A ). However it may be practical to do so when doing calculations by hand, for example when retracing derivations in the semantics.We should also make clear that the freeness of C(!L * ) makes F a strict monoidal closed functor, meaning that and similarly, V (A⇒B) = (V A ⇒ V B ) etc. Further, since we are working with finite dimensional vector spaces we know that V ⊥ φ ∼ = V φ , thus our internal homs have an even simpler structure, which we exploit when computing, which is Next, we define three different !-functors on FdVect, providing three functorial models of !L * in FdVect.

Concrete Constructions
In this section we introduce the three candidate !-functors on FdVect that we have identified.The first construction follows a more classical approach whose goal is to find models of full linear logic [11].The latter constructions use the identity functor as the underlying endofunctor, on top of which we define diagonal maps.
It is also worth noting that at a high level, the goal of the diagonal maps of a !-functor, ∆ : F → F ⊗F is to copy.Although copying is clearly a nonlinear operation, we try to approximate nonlinear copying with these diagonal maps.

! as the Dual of an Algebra
Following [11] we interpret ! using the Fermionic Fock space functor F : FdVect → Alg R .In order to define F we first introduce the simpler free algebra construction, typically studied in the theory of representations of Lie algebras [26].This construction is applicable to all vector spaces, which are not necessarily finite dimensional.The choice of the symbol F comes from "Fermionic Fock space" (as opposed to "Bosonic"), but is also known as the exterior algebra functor, or the Grassmannian algebra functor [26].In the following by "algebra" we mean associative algebra.That is, a vector space with a ring structure and appropriate relations between scalars and multiplication.These are objects of the category Alg R .The morphisms of this category are linear maps that moreover preserve the ring-multiplication.

Definition 6
The free algebra functor T : Vect R → Alg R is defined on objects as: and for morphisms f : V → W , we get the algebra homomorphism T (f ) : T (V ) → T (W ) defined layer-wise as The algebra structure on T (V ) is given by concatenation.That is, given elements T is free in the sense that it is left adjoint to the forgetful functor U : Alg R → Vect R , thus giving us a monad U T on Vect R with a monoidal algebra modality structure, i.e. the dual of what we are looking for.
However note that even when restricting T to finite dimensional vector spaces V ∈ FdVect the resulting U T (V ) and U T (V ⊥ ) ⊥ are infinite-dimensional.The necessity of working in FdVect motivates us to use F, defined below, rather than T .

Definition 7
The Fermionic Fock space functor F : Vect R → Alg R is defined on objects as where V ∧n is the coequaliser of the family of maps (−τ σ ) σ∈Sn , defined as −τ σ : V ⊗n → V ⊗n and given as follows: F applied to linear maps gives an analogous algebra homomorphism as in 6.
One may equivalently define V ∧n as the n-fold tensor product of V where we quotient by the equivalence relation In fact, we have a very similar multiplicative structure on F(V ) to T (V ), namely given again by concatenation.As mentioned in our above reference on representation theory of Lie algebras [26], the difference is that F(V ) is a free graded alternating algebra over V .We explain what this means by means of examples below; note that knowing that F(V ) is graded alternating is not too important for the purposes of this paper.
For some concrete examples of what elements of F(V ) look like, consider a 3-dimensional real vector space V spanned by the basis {u, v, w}.In F(V ) we have vectors like v, u∧v, 4v +u, w ∧v +u etc.We also have that Vectors like v ∧ v or u ∧ w ∧ w containing repeated factors, are always equal to 0, since v ∧ v = −v ∧ v.More generally, in a space with basis {v 1 , . . ., v n }, if v i = v j for some 1 ≤ i, j ≤ n and i = j in the above, the permutation (ij) ∈ S n has odd sign, and so From the preceding remark about vectors in F(V ) with repeated factors being zero, it follows that F(V ) is in fact finite dimensional whenever V is finite dimensional.This can be seen by a simple application of the pigeonhole principle.Suppose that V has basis {v 1 , . . ., v n }.Then, if we consider any basis vector in F(V ) with > n factors it must repeat at least one of the basis vectors.By the preceding remark, the whole vector is zero.Thus we have that V ∧m = 0 for all m > n, meaning that Now that we know F, we consider two ways to obtain a diagonal structure ∆ V on U F(V ), thus defining two !-functors (U F, δ, ε, ∆ V ) on FdVect, as desired.For both of these we use the same δ and ε maps, which are inclusion and projection respectively.That is, δ Next, we define the two diagonal maps on F. One, referred to as a Cogebra construction is given below, for a basis {e i } i of V , and thus a basis {1, e i1 , e i2 ∧ e i3 , e i4 ∧ e i5 ∧ e i6 , • • • } ij of U F(V ) as: ∆ The other diagonal map is a more classical comultiplication, which we may denote as µ −1 for now.This map sends a vector v to the sum of all possible pairs of factors which when concatenated, produce v. Consider the 3-dimensional example from earlier.Here There are other ways to obtain this simpler construction, but one has to work with infinite dimensional vector spaces to take advantage of the isomorphism between polynomial rings K[X] and symmetric algebras over vector spaces with basis X.In either case, the ∆ structure is the dual of the algebra multiplication and its counit ε is obtained via the inclusion of constant polynomials.Another option is to change category and move to category of suplattices, where the dual of the additive algebra functor provides an example [27].In this setting, the comultiplication is the map that sends a monomial to the join of all pairs of monomials whose multiplication give the input monomial.
However, comultiplications on F(V ) quickly increase in complexity and become difficult to instantiate in applications, since the dimensions of the domain and codomain are 2 n and (2 n ) 2 we would have matrices representing the comultiplications of size 2 n × (2 n ) 2 , where n is taken to be at the very least 100, and usually 300.

! as the Identity Functor
The Cogebra construction can be simplified when one works with free vector spaces.Given a set S, consider the set functions S → R mapping all but finitely many elements s ∈ S to 0. It is easy to see that a scalar multiplication and addition can be defined on this set by defining them pointwise using the structure of R. The set of such functions forms a vector space, known as the free vector space over S.An element s ∈ S can be identified with function f s : S → R defined by f s (s) = 1 and f s (s ) = 0, for all s ∈ S, s = s , thus defining a vector space structure, denoted by R S .On such vector spaces, one can define a coalgebra structure by setting ∆(s) = s ⊗ s and ε(s) = 1.Because the functions f s are defined on S, the basis of R S , and send all but finitely many elements of S to 0, the above comonoidal structure can be extended to all of R S linearly.As one can see, this construction is not limited to finite dimensional spaces, but when working with the finite case, the condition that all but finitely many elements need to be sent to 0 can be dropped and the construction becomes simpler.This comonoid structure defines a !-functor (indeed even a coalgebra modality) over the identity comonad on FdVect.
This version of the construction clearly resembles half of a bialgebra over FdVect, known as Special Frobenius bialgebras, which were used in [42,54,57] to model relative pronouns in English and Dutch.As argued in [62], however, the copying map resulting from this comonoid structure only copies the basis vectors and does not seem adequate for a full copying operation.A quick computation shows that this ∆ copies only half of the input vector.In order to see this, consider a vector Extending the comultiplication ∆ linearly provides us with The result is one copy of − → v , i.e. the ( i C i s i ) part of the above, and a sum of its bases, i.e. the i s i part of the above.In the second term, we have lost the C i weights, in other words, we have replaced the second copy with a vector of 1's.
The above problem can be partially overcome by observing that there are infinitely many linear copying structures on FdVect, one for each element k ∈ R. Formally speaking, over the vector space V φ with a basis (v i ) i , a Cofree-inspired comonoid (V φ , ∆, e) is defined as follows: Here, − → v is as before and k stands for an element of V padded with number k.In the simplest case, when k = 1, we obtain two copies of the weights − → v and also of its basis vectors, as the following calculation demonstrates.Consider a two dimensional vector space and the vector ae 1 + be 2 in it.The 1 vector − → 1 is the 2-dimensional vector e 1 + e 2 in V .Suppose − → v and 1 are column vectors, then applying ∆ results in the matrix 2a e 1 ⊗ e 1 + ab e 1 ⊗ e 2 + ab e 2 ⊗ e 1 + 2b e 2 ⊗ e 2 , where we have two copies of the weights in the diagonal and also the basis vectors have obviously multiplied.This construction is inspired by the graded algebra construction on vector spaces, whose dual construction is referred to as a Cofree coalgebra.The Cofree-inspired coalgebra over a vector space defines a second !-functor (and again also a coalgebra modality) structure over the identity comonad on FdVect, which provides another !L * -model, or rather, another quantisation C(!L * ) → FdVect.

Clasp Diagrams
In order to display the semantic computations for the parasitic gap, we introduce a diagrammatic notation.The derivation of the parasitic gap is involved and its categorical and vector space interpretations become far more legible using a diagrammatic language.In what follows we first introduce notation for the Clasp diagrams, then extend them with extras necessary to model the !coalgebra modality.
We adopt the usual notation of objects as labelled strings and morphisms as boxes.
Composition of morphisms is simply vertical juxtaposition.The monoidal product and its internal left and right homs are depicted below.This is where clasps of [9] are introduced, to depict the left and right internal hom-sets of the category.
There are special diagrams for the biclosed structure of the category, i.e. its left and right tensor-hom adjunctions, or Currying, which are depicted below.
Finally, the left and right evaluation morphisms of the category, also coming from the biclosed structure, depicted graphically as directed cups, are as follows: To these, we first add a diagram for !-ed objects, as follows: We then equip the notation with necessary diagrams for the !-functor, that is the diagonal maps ∆, the counit ε and the comultiplication δ.These are respectively depicted by a triangle node, a filled black circle, and a white circle labelled with δ, as follows.

Linguistic Examples
We now provide the interpretations of the motivating example of [29] which was the parasitic gap "the paper that John signed without reading".We first briefly illustrate how to do this with two simpler related examples, and then take the reader through how to perform the same calculations on the full, considerably more complex example.

"John signed the papers"
First off is the declarative sentence "John signed the papers", which uses the following lexicon: {(John, N P ), (signed, (N P \S)/N P ), (the, N P/N ), (papers, N )}.
Showing that "John signed the papers" is a declarative sentence is now a question of whether the corresponding sequent, N P, (N P \S)/N P, N P/N, N → S is derivable in !L * , or in this case just L is enough, since none of the types are modal.We derive the sequent below: Note here that the use of vertical dots is to denote the equality of a string labelled with a complex type, and a clasp with simpler types i.e. the second equality in Section 6.
We end this example by computing the interpretation of the sequent for "John signed the papers".We do this in a general setting without choosing a specific VSS.This interpretation is of course a linear map i : N P ⊗ (N P \S)/N P ⊗ N P/N ⊗ N −→ S or equivalently, upon expanding the domain, a linear map of type By inspecting the diagram we see that the i will evaluate its arguments as: We often use the names of the types i.e. the words as variable names to remind us what type the variables have.The parentheses and hyphens denote functional types.One could demonstrate the effect of this interpretation morphism using more short-hand variable names, but for longer examples this becomes difficult to keep track of.However, later when using more complex types we will have to use more explicit variable names which involve the various bases of the vector spaces they live in.
(2) We apply the same method to calculate the interpretation of D r , taking (m j ) j as a basis of N , giving us the arbitrary element j C j m ⊥ j ⊗ m j ∈ N ⇒ N , yielding the following linear map: (3) We combine (2) and (3) using the (/L) rule.To make the application of the rule clear, we mark where the rule is applied using the same symbols as the rule presentation in Table 1: The interpretation of the full derivation is then the following map:

Experimental Validation
The reader might have rightly been wondering which one of these interpretations, the Cogebra or the Cofree-inspired coalgebra model, produces the correct semantic representation.We provide an answer by performing an experimental evaluation of these copying maps.In a nutshell, this is done by implementing the resulting vector representations on large corpora of data and experiment with a task to provide insights.We chose an extension of a disambiguation task often used in the DisCoCat.For implementation, we use different state of art neural word vector embeddings for nouns and follow a well known procedure to turn them into verb matrices.We instantiate our copying maps on these vectors and matrices and create phrase vectors and use these in the disambiguation task.

The Parasitic Gap Phrase Disambiguation Task
The disambiguation task was that originally proposed in [22], but we work with the advanced data set of [31], which contains verbs deemed as genuinely ambiguous by [49], as those verbs whose meanings are not related to each other.We extended this latter with a second verb and a preposition that provided enough data to turn the dataset from a set of pairs of transitive sentences to a set of pairs of parsitic gap phrases.As an example, consider the verb file, with meanings register and smooth.The original dataset has sentences: S: accounts that the local government filed S1: accounts that the local government registered S2: accounts that the local government smoothed S : nails that the young woman filed S 1: nails that the young woman registered S 2: nails that the young woman smoothed We extend these to parasitic gap phrases: P: accounts that the local government filed after inspecting P1: accounts that the local government registered after inspecting P2: accounts that the local government smoothed after inspecting P : nails that the young woman filed after cutting P 1: nails that the young woman registered after cutting P 2: nails that the young woman smoothed after cutting The extension process was as follows.For each entry of [31], we needed an extra verb and a preposition to turn it into a parasitic gap phrase.The verb was chosen from a list of verbs that had most frequently colocated with the verb of the original sentence.The preposition was chosen either from examples of phrases with these verbs in the context of the first verb, or decided based on the meaning of the second verb.We annotated each entry with a binary label, where 0 indicated a bad disambiguation pair and 1 a good one.For instance, in the above example, pairs that got a 1 label were (P, P1) and (P , P 2), whereas pairs that got a 0 label where (P, P2) and (P , P 1).The full dataset is online [53].
We then followed the same procedure as in [31] to disambiguate the phrases with the ambiguous verb: (1) build vectors for phrases P, P1, and P2, and also P , P 1, and P 2, (2) check whether vector of P is closer to vector of P1 or vector of P2 and whether P is close to P 2 or P 1.If yes, then we have two correct outputs, (3) compute a mean average precision (MAP), by counting in how many of the pairs vector of the phrase with the ambiguous verb in it is closer to the vector of the phrase with its appropriate meaning.

Word Vectors and Verb Matrices
We work with the parasitic gap phrases that have the general form: "A's the B C'ed Prep D'ing" where C and D are verbs and their vector representations are multilinear maps.C is a bilinear map that takes A and B as input and D is a linear map that takes A as input.We represent the preposition Prep by the trilinear map Prep.The vector representation of the parasitic gap phrase with a proper copying operator is as follows: for C and D multilinear maps and − → A and − → B , vectors, denoted as follows: The multilinear maps that implement the verbs are built using a method referred to as Relational in DisCoCat [22].Given a verb V , this map is built following the formula given below: where − → s i 's are the vector representations of the subjects of the verb V across the corpus and − → o i 's are the vector representations of its objects.The algorithm that implements this formula works as follows: 1: go over a part of speech tagged corpus sentence by sentence 2: for each sentence whose main verb is V , extracts its nominal subject and object 3: loop -set sum to be 0 -retrieve vector representations of the subject and object -take their Kronecker tensor -add the Kronecker tensors to sum The implementation details were as follows: 1.Each Kronecker tensor is recursively added to the Kronecker tensors obtained from other sentences of the corpus that have V as their main verb.
2. In cases where subjects/objects are noun phrases, the main noun of the phrases is extracted and used as the subject/object.This is what is referred to as nominal in the pseudo code.
3. For the vector representations of the nominal subjects and objects of sentences, we experimented with three neural embedding architectures: BERT3 [18], FastText (FT) [13], and Word2Vec CBOW (W2V) [39].For BERT, the architecture itself provides phrase embeddings, read from an internal hidden layer of the network.For FT, W2V and GloVe, as customary, we added the word embeddings to obtain phrase embeddings.The Prep was taken to be addition in all cases.
5. For each verb, we collected sentences from ukWaC corpus [8] using the NoSketch web interface [3].We filtered the sentences having the input verb as the main verb.For each phrases of the dataset, we also built phrase vectors using the neural architectures.
6.For the actual implementations, we used the NumPy library [5] of programming language Python 3. NumPy is the fundamental package for scientific computing in Python.It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.The Python scripts were run using Google's Colab tool, which enables access to more GPUs and RAM.The notebooks are available on a Google Drive [4], along with the code for BERT embeddings and the tensors we built from them.The W2V and FT embeddings and their associated tensors [61] were developed for the experiments of previous work [64].
On a historical note, the Relational method was originally introduced in [22] to provide an experimental validation for the DisCoCat model, and later studied further in [23].This method was used in experimental validations of applications of DisCoCat, e.g. in [30,31], its comparison with neural word embeddings [40] and extensions of DisCoCat, e.g. to entailment [56] and to larger fragments of language containing relative clauses [54].

Copying Maps and Phrase Vectors
The different types of copying maps developed in this paper provide us with the following options for the vector representation of each parasitic gap phrase.

Cogebra copying
When the Relational multilinear map of each verb is applied to the vectors of the subject A and object B of each parasitic gap phrase, the above expressions get simplified.In this case, the copying maps reduced to the following vector representations: For comparison, we also implemented a model where a Full copying operation ∆( − Note that this copying is non-linear and thus cannot be an instance of our FdVect categorical semantics; we are only including it to study how the other copying models will do in relation to it.

Results
The results of experimenting with these models are presented in Table 3. Uniformly, in all the neural architectures, the Full model provided a better disambiguation than other linear copying models.This better performance was closely followed by the Cofree-inspired model: in BERT, the Full model obtained an MAP of 0.48, and the Cofree-inspired model an MAP of 0.47; in FT, we have 0.57 for Full and 0.56 for Cofree-inspired; and in W2V we have 0.54 for both models.Also uniformly, in all of the neural architectures, the Cogebra (a) did better than the Cogebra (b).It is not surprising that the Full copying did better than other two copyings, since this is the model that provides two identical copies of the head noun A. This kind of copying can only be obtained via the application of a non-linear ∆.The fact that our linear Cofree-inspired copying closely followed  the Full model, shows that in the absence of Full copying, we can always use the Cofree-inspired as a reliable approximation.It was also not surprising that the Cofree-inspired model did better than either of the Cogebra models, as this model uses the sum of the two possibilities, each encoded in one of the Cogebra (a) or (b).That Cogebra (a) performed better than Cogebra (b), shows that it is more important to have a full copy of the object for the main verb rather than the secondary verb of a parasitic gap phrase.Using this, we can say that verb C that got a full copy of its object A, played a more important role in disambiguation, than verb D, which only got a vector of 1's as a copy of A. Again, this is natural, as the secondary verb only provides subsidiary information.
The most effective disambiguation of the new dataset was obtained via the BERT phrase vectors, followed by the Full model.BERT is a contextual neural network architecture that provides different meanings for words in different contexts, using a large set of tuned parameters on large corpora of data.There is evidence that BERT's phrase vectors do encode some grammatical information in them.So it is not surprising that these embeddings provided the best disambiguation result.In the other neural embeddings: W2V and FT, however, the Full and its Cofree-inspired approximation provided better results.Recall that in these models, phrase embeddings are obtained by just adding the word embeddings, and addition forgets the grammatical structure.That the categorical models, which are type-driven and work along the grammatical structure, outperformed these models is a very promising result.
We note that MAP's of all models were quite high in comparison with the results of the original dataset, i.e. [22].The original dataset consists of Subject-Verb-Object (SVO) sentences.The fact that our results are better means that turning the SVO sentences into parasitic gap phrases, which are longer and provide more context for disambiguation, has helped disambiguate the verbs better.Finally, although our goal has not been to outperform the performance of holistic neural phrase embeddings that do not perform copying or any other explicit grammatical operations, in two out of three of these (W2V and FT), our best model had a better accuracy.

Conclusions and Further Work
We have provided sound categorical and vector space semantics for the Lambek calculus with relevant modality, and have introduced candidate diagrammatic semantics.We provided three different vector space interpretations for the relevant modality and experimented with them in a disambiguation task.In order to do so, we extended the dataset of [31] to parasitic gap phrases with a main ambiguous verb.We implemented the models using three different neural network architectures.One of our interpretations performed very similar to, and in one case the same as, a full but non-linear copying model.The best categorical phrase models performed better than the additive neural phrase embeddings.The state of the art neural phrase embedding (BERT), however, provided the overall best disambiguation result.This is not surprising since these models encode both contextual and structural phrase information.The results of the categorical models can be improved by building better quality bilinear maps: a direction we are pursuing at the moment.
Proving coherence of the diagrammatic semantics using proof nets of Modal Lambek Calculus [41], developed for clasp-string diagrams in [65] constitutes work in progress.Proving coherence would allow us to do all our derivations diagrammatically, making the sequent calculus labour superfluous.However, we suspect there are better notations for the diagrammatic semantics perhaps more closely related to the proof nets of linear logic.Another path to explore is that of differential categories.The structure of a differential category as laid out in [12] seems an appropriate setting for our work, yet we do not make full use of the actual differential structure in this paper, just the coalgebra modality.Perhaps there is more structure available with a useful linguistic use.
Another avenue to explore is to alter the underlying syntax i.e. !L * .There appears to be a way to achieve a model of contraction which in practice is exactly the full copying but whose underlying theory yields it as a linear map, namely a projection from a bounded tensor algebra.This would be done by bounding the !-functor in a style similar to that of Bounded Linear Logic [21] or Soft Linear Logic [32].
which was accepted as a keynote presentation in the Applied Category Theory Conference (ACT) 2020, organised online at MIT, 6-10 July 2020.Part of the motivation behind this work came from the Dialogue and Discourse Challenge project of the Applied Category Theory adjoint school during the week 22-26 July 2019.We would like to thank the organisers of the school.We would also like to thank Adriana Correia, Alexis Toumi, and Dan Shiebler, for discussions.McPheat acknowledges support from the UKRI EPSRC Doctoral Training Programme scholarship, Sadrzadeh from the Royal Academy of Engineering Industrial Fellowship IF-192058.

) Proof 1 (
Sketch) Given the interpretation (| |) : !L * → D we define a functor T (| |) : C(!L * ) → D as: T (| |) ( A ) := (|A|).To show that T (| |) is unique we simply suppose there is some other functor T : C(!L * ) → D which makes (1) commute.Then T ( A ) = (|A|) = T (| |) ( A ), so T and T (| |) are identical on objects.We also see that T and T (| |) agree on morphisms since we require (| |) to be sound in definition 4. We recommend the motivated reader to confirm that T and T (| |) agree on morphisms, and omit this part of the proof here.

A
Accepted in Compositionality on 2022-09-16.Click on the title to verify.
N → N N P → N P N P → N P S → S N P, N P \S → S (\L) N P, (N P \S)/N P, N P → S (/L) N P, (N P \S)/N P, N P/N, N → S (/L)Next, we introduce the diagrammatic interpretation of the derivation.This diagram is drawn by first drawing the types as (loose) strings, forming the top of the diagram, and then tracing up the derivation and connecting the strings according to the definition in Section 6.

Table 3 :
Parasitic Gap Phrase Disambiguation Results