Language Modeling with Reduced Densities

Tai-Danae Bradley; Yiannis Vlassopoulos

doi:10.32408/compositionality-3-4

ABSTRACT

This work originates from the observation that today's state-of-the-art statistical language models are impressive not only for their performance, but also---and quite crucially---because they are built entirely from correlations in unstructured text data. The latter observation prompts a fundamental question that lies at the heart of this paper: What mathematical structure exists in unstructured text data? We put forth enriched category theory as a natural answer. We show that sequences of symbols from a finite alphabet, such as those found in a corpus of text, form a category enriched over probabilities. We then address a second fundamental question: How can this information be stored and modeled in a way that preserves the categorical structure? We answer this by constructing a functor from our enriched category of text to a particular enriched category of reduced density operators. The latter leverages the Loewner order on positive semidefinite operators, which can further be interpreted as a toy example of entailment.

► BibTeX data

@article{Bradley2021languagemodeling,
  doi = {10.32408/compositionality-3-4},
  url = {https://doi.org/10.32408/compositionality-3-4},
  title = {Language {M}odeling with {R}educed {D}ensities},
  author = {Bradley, Tai-Danae and Vlassopoulos, Yiannis},
  journal = {{Compositionality}},
  issn = {2631-4444},
  publisher = {{Compositionality}},
  volume = {3},
  pages = {4},
  month = nov,
  year = {2021}
}

► References

[1] Steve Awodey. Category Theory. Oxford University Press, 2010. http://doi.org/10.1093/acprof:oso/9780198568612.001.0001.
https://doi.org/10.1093/acprof:oso/9780198568612.001.0001

[2] Jacob Biamonte and Ville Bergholm. Tensor networks in a nutshell. arXiv preprint arXiv:1708.00006, 2017.
arXiv:1708.00006

[3] Jacob C. Bridgeman and Christopher T. Chubb. Hand-waving and interpretive dance: an introductory course on tensor networks. Journal of Physics A: Mathematical and Theoretical, 50(22):223001, 05 2017. http://doi.org/10.1088/1751-8121/aa6dc3.
https://doi.org/10.1088/1751-8121/aa6dc3

[4] Dea Bankova, Bob Coecke, Martha Lewis, and Dan Marsden. Graded hyponymy for compositional distributional semantics. Journal of Language Modelling, 6(2):225-260, Mar. 2019. https://doi.org/10.15398/jlm.v6i2.230.
https://doi.org/10.15398/jlm.v6i2.230

[5] Jacob Biamonte. Lectures on quantum tensor networks. arXiv preprint arXiv:1912.10049, 2019.
arXiv:1912.10049

[6] Tai-Danae Bradley. At the interface of algebra and statistics. arXiv preprint arXiv:2004.05631, 2020. PhD thesis, CUNY Graduate Center.
arXiv:2004.05631

[7] Esma Balkir, Mehrnoosh Sadrzadeh, and Bob Coecke. Distributional sentence entailment using density matrices. In Proceedings of the First International Conference on Theoretical Topics in Computer Science, volume 9541, pages 1-22, 2015. http://doi.org/10.1007/978-3-319-28678-5_1.
https://doi.org/10.1007/978-3-319-28678-5_1

[8] Tai-Danae Bradley, E. Miles Stoudenmire, and John Terilla. Modeling sequences with quantum states: A look under the hood. Machine Learning: Science and Technology, 1(3), 2020. https://doi.org/10.1088/2632-2153/ab8731.
https://doi.org/10.1088/2632-2153/ab8731

[9] Ivano Basile and Fabio Tamburini. Towards quantum language models. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1840-1849, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. https://doi.org/10.18653/v1/D17-1196.
https://doi.org/10.18653/v1/D17-1196

[10] Yiwei Chen, Yu Pan, and Daoyi Dong. Quantum language model with entanglement embedding for question answering. arXiv preprint arXiv:2008.09943, 2020.
arXiv:2008.09943

[11] Bob Coecke, Mehrnoosh Sadrzadeh, and Stephen Clark. Mathematical foundations for a compositional distributional model of meaning. Linguistic Analysis, 36, 2010. Available at https://arxiv.org/abs/1003.4394.
arXiv:1003.4394

[12] Nadav Cohen, Or Sharir, and Amnon Shashua. On the expressive power of deep learning: A tensor analysis. In Vitaly Feldman, Alexander Rakhlin, and Ohad Shamir, editors, 29th Annual Conference on Learning Theory, volume 49 of Proceedings of Machine Learning Research, pages 698-728, Columbia University, New York, New York, USA, 2016. PMLR.

[13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171-4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. http://doi.org/10.18653/v1/N19-1423.
https://doi.org/10.18653/v1/N19-1423

[14] E. DeGiuli. Random language model. Physical Review Letters, 122(12), Mar 2019. https://doi.org/10.1103/PhysRevLett.122.128301.
https://doi.org/10.1103/PhysRevLett.122.128301

[15] Jonathan Elliott. On the fuzzy concept complex. 2017. PhD thesis, University of Sheffield.

[16] Glen Evenbly and Guifre Vidal. Tensor network states and geometry. Journal of Statistical Physics, 145(4):891-918, 2011. https://doi.org/10.1007/s10955-011-0237-4.
https://doi.org/10.1007/s10955-011-0237-4

[17] Glen Evenbly. Tensors.net, 2019. Available online: http://www.tensors.net.
http://www.tensors.net

[18] John R. Firth. A synopsis of linguistic theory 1930-55, volume 1952-59. The Philological Society, Oxford, 1957. Reprinted in: Palmer, F. R. (ed.) (1968). Selected Papers of J. R. Firth 1952-59, pages 168-205. Longmans, London.

[19] Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE: Masked autoencoder for distribution estimation. In Proceedings of the 32nd International Conference on Machine Learning, pages 881-889, 2015.

[20] Chu Guo, Zhanming Jie, Wei Lu, and Dario Poletti. Matrix product operators for sequence-to-sequence learning. Physical Review E, 98:042114, Oct 2018. https://link.aps.org/doi/10.1103/PhysRevE.98.042114.
https://link.aps.org/doi/10.1103/PhysRevE.98.042114

[21] Angel J. Gallego and Roman Orus. Language design as information renormalization. arXiv preprint arXiv:1708.01525, 2019.
arXiv:1708.01525

[22] Ivan Glasser, Nicola Pancotti, and J. Ignacio Cirac. From probabilistic graphical models to generalized tensor networks for supervised learning. IEEE Access, 8:68169-68182, 2020. https://doi.org/10.1109/ACCESS.2020.2986279.
https://doi.org/10.1109/ACCESS.2020.2986279

[23] Misha Gromov. Memorandum Ergo, 2015. Available online: https://www.ihes.fr/ gromov/wp-content/uploads/2018/08/ergo-cut-copyOct29.pdf. Accessed on June 1, 2021.
https://www.ihes.fr/~gromov/wp-content/uploads/2018/08/ergo-cut-copyOct29.pdf

[24] G.M. Kelly. Basic Concepts of Enriched Category Theory. London Mathematical Society Lecture Note Series 64. Cambridge University Press, 1982.

[25] Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
arXiv:2001.08361

[26] F. William Lawvere. Metric spaces, generalized logic, and closed categories. Rendiconti del Seminario Matematico e Fisico di Milano, 43(1):135-166, 1973. Reprinted in Reprints in Theory and Applications of Categories (2002), 1-37. https://doi.org/10.1007/BF02924844.
https://doi.org/10.1007/BF02924844

[27] F. William Lawvere. Taking categories seriously. Revista Colombiana de Matematicas, XX:147-178, 1986. reprinted as Reprints in theory and applications of categories, No. 8 (2005), 1-24.

[28] Tom Leinster. Basic Category Theory. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2014. https://doi.org/10.1017/CBO9781107360068.
https://doi.org/10.1017/CBO9781107360068

[29] Henry Lin and Max Tegmark. Critical behavior in physics and probabilistic formal languages. Entropy, 19(7):299, 2017. https://doi.org/10.3390/e19070299.
https://doi.org/10.3390/e19070299

[30] Qiuchi Li, Benyou Wang, and Massimo Melucci. CNM: An interpretable complex-valued network for matching. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4139-4148, Minneapolis, Minnesota, jun 2019. Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1420.
https://doi.org/10.18653/v1/N19-1420

[31] Jingfei Li, Peng Zhang, Dawei Song, and Yuexian Hou. An adaptive contextual quantum language model. Physica A: Statistical Mechanics and its Applications, 456:51-67, 2016. https://doi.org/10.1016/j.physa.2016.03.003.
https://doi.org/10.1016/j.physa.2016.03.003

[32] Francois Meyer and Martha Lewis. Modelling lexical ambiguity with density matrices. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 276-290, Online, 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.conll-1.21.
https://doi.org/10.18653/v1/2020.conll-1.21

[33] Jacob Miller, Guillaume Rabusseau, and John Terilla. Tensor networks for probabilistic sequence modeling. In Arindam Banerjee and Kenji Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 3079-3087. PMLR, 13-15 Apr 2021.

[34] John Martyn, Guifre Vidal, Chase Roberts, and Stefan Leichenauer. Entanglement and tensor networks for supervised image classification. arXiv preprint arXiv:2007.06082, 2020.
arXiv:2007.06082

[35] Román Orús. A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Annals of Physics, 349:117-158, 2014. https://doi.org/10.1016/j.aop.2014.06.013.
https://doi.org/10.1016/j.aop.2014.06.013

[36] Román Orús. Tensor networks for complex quantum systems. Nature Reviews Physics, 1, 08 2019. https://doi.org/10.1038/s42254-019-0086-7.
https://doi.org/10.1038/s42254-019-0086-7

[37] Ivan Oseledets. Tensor-train decomposition. SIAM Journal of Scientific Computing, 33(5):2295-2317, 2011. https://doi.org/10.1137/090752286.
https://doi.org/10.1137/090752286

[38] Roger Penrose. Applications of negative dimensional tensors. Combinatorial mathematics and its applications, 1:221-244, 1971.

[39] Robin Piedeleu, Dimitri Kartsaklis, Bob Coecke, and Mehrnoosh Sadrzadeh. Open system categorical quantum semantics in natural language processing. In Lawrence S. Moss and Pawel Sobocinski, editors, CALCO, volume 35 of LIPIcs, pages 270-289. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2015. https://doi.org/10.4230/LIPIcs.CALCO.2015.270.
https://doi.org/10.4230/LIPIcs.CALCO.2015.270

[40] Vasily Pestun, John Terilla, and Yiannis Vlassopoulos. Language as a matrix product state. arXiv e-print arXiv:1711.01416, 2017.
arXiv:1711.01416

[41] Vasily Pestun and Yiannis Vlassopoulos. Tensor network language model. arXiv e-print arXiv:1710.10248, 2017.
arXiv:1710.10248

[42] Emily Riehl. Category Theory in Context. Dover Publications, 2017.

[43] Chase Roberts and Stefan Leichenauer. Introducing TensorNetwork, an open source library for efficient tensor calculations. Google AI Blog, 2019. https://ai.googleblog.com/2019/06/introducing-tensornetwork-open-source.html. Accessed on March 1, 2020.
https://ai.googleblog.com/2019/06/introducing-tensornetwork-open-source.html

[44] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. 2018. Technical report, OpenAI.

[45] Justin Reyes and E. Miles Stoudenmire. A multi-scale tensor network architecture for classification and regression. Mach. Learn.: Sci. Technol., 2:035036, 2020. https://doi.org/10.1088/2632-2153/abffe8.
https://doi.org/10.1088/2632-2153/abffe8

[46] Ulrich Schollwöck. The density-matrix renormalization group in the age of matrix product states. Annals of Physics, 326(1):96-192, 2011. https://doi.org/10.1016/j.aop.2010.09.012.
https://doi.org/10.1016/j.aop.2010.09.012

[47] Mehrnoosh Sadrzadeh, Dimitri Kartsaklis, and Esma Balkir. Sentence entailment in compositional distributional semantics. Annals of Mathematics and Artificial Intelligence, 82(4):189-218, 2018. https://doi.org/10.1007/s10472-017-9570-x.
https://doi.org/10.1007/s10472-017-9570-x

[48] Alessandro Sordoni, Jian-Yun Nie, and Yoshua Bengio. Modeling term dependencies with quantum language models for IR. SIGIR '13, pages 653-662, New York, NY, USA, 2013. Association for Computing Machinery. https://doi.org/10.1145/2484028.2484098.
https://doi.org/10.1145/2484028.2484098

[49] E. Miles Stoudenmire and David J. Schwab. Supervised learning with quantum-inspired tensor networks. Advances in Neural Information Processing Systems (NIPS), 29:4799-4807, 2016.

[50] James Stokes and John Terilla. Probabilistic modeling with matrix product states. Entropy, 21(12), 2019. https://doi.org/10.3390/e21121236.
https://doi.org/10.3390/e21121236

[51] E. Miles Stoudenmire. The tensor network, 2019. Available online: http://tensornetwork.org.
http://tensornetwork.org

[52] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in Neural Information Processing Systems, pages 6000-6010, 2017.

[53] Simon Willerton. Tight spans, Isbell completions and semi-tropical modules. Theory and Applications of Categories, 28(22):696-732, 2013.

[54] Jinhui Wang, Chase Roberts, Guifre Vidal, and Stefan Leichenauer. Anomaly detection with tensor networks. arXiv preprint arXiv:2006.02516, 2020.
arXiv:2006.02516

[55] Peng Zhang, Jiabin Niu, Zhan Su, Benyou Wang, Liqun Ma, and Dawei Song. End-to-end quantum-like language models with application to question answering. In Proc. 32nd AAAI Conf. Artif. Intell., pages 5666-5673, Feb. 2018. AAAI Conference on Artificial Intelligence.

[56] Peng Zhang, Zhan Su, Lipeng Zhang, Benyou Wang, and Dawei Song. A quantum many-body wave function inspired language modeling approach. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM '18, pages 1303-1312. Association for Computing Machinery, 2018. https://doi.org/10.1145/3269206.3271723.
https://doi.org/10.1145/3269206.3271723

[57] Lipeng Zhang, Peng Zhang, Xindian Ma, Shuqin Gu, Zhan Su, and Dawei Song. A generalized language model in tensor space. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):7450-7458, Jul. 2019. https://doi.org/10.1609/aaai.v33i01.33017450.
https://doi.org/10.1609/aaai.v33i01.33017450

Cited by

[1] Mohammad Ali Javidian, Vaneet Aggarwal, and Zubin Jacob, "Quantum causal inference in the presence of hidden common causes: An entropic approach", Physical Review A 106 6, 062425 (2022).

[2] Bojan Žunkovič, "Deep tensor networks with matrix product operators", Quantum Machine Intelligence 4 2, 21 (2022).

[3] Tai-Danae Bradley, John Terilla, and Yiannis Vlassopoulos, "An Enriched Category Theory of Language: From Syntax to Semantics", La Matematica 1 2, 551 (2022).

The above citations are from Crossref's cited-by service (last updated successfully 2024-04-28 04:42:55). The list may be incomplete as not all publishers provide suitable and complete citation data.

On SAO/NASA ADS no data on citing works was found (last attempt 2024-04-28 04:42:55).

This Paper is published in Compositionality under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. Copyright remains with the original copyright holders such as the authors or their institutions.

Published:	2021-11-30, volume 3, page 4
Eprint:	arXiv:2007.03834v4
Doi:	https://doi.org/10.32408/compositionality-3-4
Citation:	Compositionality 3, 4 (2021).

Compositionality

The open-access journal for the mathematics of composition

ABSTRACT

► BibTeX data

► References

Cited by