Compositionality

The open-access journal for the mathematics of composition

Categorical Stochastic Processes and Likelihood

Dan Shiebler

Department for Continuing Education and Department of Computer Science, University of Oxford, Oxford, United Kingdom

Updated version: The authors have uploaded version v5 of this work to the arXiv which may contain updates or corrections not contained in the published version v4.

ABSTRACT

We take a category-theoretic perspective on the relationship between probabilistic modeling and gradient based optimization. We define two extensions of function composition to stochastic process subordination: one based on a co-Kleisli category and one based on the parameterization of a category with a Lawvere theory. We show how these extensions relate to the category of Markov kernels $\mathbf{Stoch}$ through a pushforward procedure.

We extend stochastic processes to parametric statistical models and define a way to compose the likelihood functions of these models. We demonstrate how the maximum likelihood estimation procedure defines a family of identity-on-objects functors from categories of statistical models to the category of supervised learning algorithms $\mathbf{Learn}$.

Code to accompany this paper can be found on GitHub (https://github.com/dshieble/Categorical_Stochastic_Processes_and_Likelihood).

► BibTeX data

► References

[1] R.B. Ash, M.F. Gardner, and M.F. Gardner. Topics in Stochastic Processes. Probability and Mathematical Statistics: a series of monographs and textbooks. Academic Press, 1975.

[2] Patrick Billingsley. Probability and Measure. John Wiley and Sons, second edition, 1986. Available at https:/​/​www.colorado.edu/​amath/​sites/​default/​files/​attached-files/​billingsley.pdf.
https:/​/​www.colorado.edu/​amath/​sites/​default/​files/​attached-files/​billingsley.pdf

[3] Richard Blute, Prakash Panangaden, and Dorette Pronk. Conformal field theory as a nuclear functor. Electronic Notes in Theoretical Computer Science, 172: 101–132, 2007. https:/​/​doi.org/​10.1016/​j.entcs.2007.02.005.
https:/​/​doi.org/​10.1016/​j.entcs.2007.02.005

[4] Kenta Cho and Bart Jacobs. Disintegration and Bayesian inversion via string diagrams. Mathematical Structures in Computer Science, 29 (7): 938–971, 2019. https:/​/​doi.org/​10.1017/​S0960129518000488.
https:/​/​doi.org/​10.1017/​S0960129518000488

[5] Robin Cockett, Geoffrey Cruttwell, Jonathan Gallagher, Jean-Simon Pacaud Lemay, Benjamin MacAdam, Gordon Plotkin, and Dorette Pronk. Reverse derivative categories. arXiv preprint arXiv, 2019. https:/​/​arxiv.org/​abs/​1910.07065.
arXiv:1910.07065

[6] Jared Culbertson and Kirk Sturtz. Bayesian machine learning via category theory. arXiv preprint, 2013. https:/​/​arxiv.org/​abs/​1312.1445.
arXiv:1312.1445

[7] Jared Culbertson and Kirk Sturtz. A categorical foundation for Bayesian probability. Applied Categorical Structures, 22 (4): 647–662, 2014. https:/​/​doi.org/​10.1007/​s10485-013-9324-9.
https:/​/​doi.org/​10.1007/​s10485-013-9324-9

[8] Sven Eberhardt, Jonah Cader, and Thomas Serre. How deep is the feature analysis underlying rapid visual categorization? NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 1108–1116, 2016. https:/​/​dl.acm.org/​doi/​10.5555/​3157096.3157220.
https:/​/​dl.acm.org/​doi/​10.5555/​3157096.3157220

[9] Conal Elliott. The simple essence of automatic differentiation. Proceedings of the ACM on Programming Languages, 2 (ICFP): 1–29, 2018. https:/​/​doi.org/​10.1145/​3236765.
https:/​/​doi.org/​10.1145/​3236765

[10] Brendan Fong. Causal theories: A categorical perspective on Bayesian networks. arXiv preprint, 2013. PhD Thesis, available at https:/​/​arxiv.org/​abs/​1301.6201.
arXiv:1301.6201

[11] Brendan Fong and Michael Johnson. Lenses and learners. arXiv preprint, 2019. https:/​/​arxiv.org/​abs/​1903.03671.
arXiv:1903.03671

[12] Brendan Fong, David Spivak, and Rémy Tuyéras. Backprop as functor: A compositional perspective on supervised learning. In 2019 34th Annual ACM/​IEEE Symposium on Logic in Computer Science (LICS), pages 1–13. IEEE, 2019. https:/​/​doi.org/​10.1109/​LICS.2019.8785665.
https:/​/​doi.org/​10.1109/​LICS.2019.8785665

[13] Uwe Franz. What is stochastic independence? In Non-commutativity, infinite-dimensionality and probability at the crossroads, pages 254–274. World Scientific, 2002. Available at https:/​/​arxiv.org/​abs/​math/​0206017.
https:/​/​arxiv.org/​abs/​math/​0206017

[14] Tobias Fritz. A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics. Advances in Mathematics, 370: 107239, 2020. https:/​/​doi.org/​10.1016/​j.aim.2020.107239.
https:/​/​doi.org/​10.1016/​j.aim.2020.107239

[15] Tobias Fritz and Eigil Fjeldgren Rischel. The zero-one laws of Kolmogorov and Hewitt–Savage in categorical probability. arXiv preprint arXiv, 2019. https:/​/​arxiv.org/​abs/​1912.02769.
arXiv:1912.02769

[16] Bruno Gavranovic. Compositional deep learning. arXiv preprint, 2019. https:/​/​arxiv.org/​abs/​1907.08292.
arXiv:1907.08292

[17] Malte Gerhold, Stephanie Lachs, and Michael Schürmann. Categorial Lévy processes. arXiv preprint, 2016. https:/​/​arxiv.org/​abs/​1612.05139.
arXiv:1612.05139

[18] Michele Giry. A categorical approach to probability theory. In Categorical aspects of topology and analysis, pages 68–85. Springer, 1982. https:/​/​doi.org/​10.1007/​BFb0092872.
https:/​/​doi.org/​10.1007/​BFb0092872

[19] Chris Heunen, Ohad Kammar, Sam Staton, and Hongseok Yang. A convenient category for higher-order probability theory. In 2017 32nd Annual ACM/​IEEE Symposium on Logic in Computer Science (LICS), pages 1–12. IEEE, 2017. https:/​/​doi.org/​10.1109/​LICS.2017.8005137.
https:/​/​doi.org/​10.1109/​LICS.2017.8005137

[20] Steven P Lalley. Lévy processes, stable processes, and subordinators. 2007. Available at http:/​/​galton.uchicago.edu/​ lalley/​Courses/​385/​LevyProcesses.pdf.
http:/​/​galton.uchicago.edu/​~lalley/​Courses/​385/​LevyProcesses.pdf

[21] F William Lawvere. The category of probabilistic mappings. Unpublished preprint, 1962.

[22] Abadi Martín. et al. TensorFlow: Large-scale machine learning on heterogeneous systems. 2015. https:/​/​www.tensorflow.org/​.
https:/​/​www.tensorflow.org/​

[23] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012. Available at https:/​/​www.cs.ubc.ca/​ murphyk/​MLbook/​.
https:/​/​www.cs.ubc.ca/​~murphyk/​MLbook/​

[24] Terence Tao. A review of probability theory, 2010. Blog post, retrieved on 2021/​04/​04, available at https:/​/​terrytao.wordpress.com/​2010/​01/​01/​254a-notes-0-a-review-of-probability-theory.
https:/​/​terrytao.wordpress.com/​2010/​01/​01/​254a-notes-0-a-review-of-probability-theory

[25] Edgar Y. Walker, R. James Cotton, Wei Ji Ma, and Andreas S. Tolias. A neural basis of probabilistic computation in visual cortex. Nature Neuroscience, 23: 122–129, 2020. https:/​/​doi.10.1038/​s41593-019-0554-5.
https:/​/​doi.10.1038/​s41593-019-0554-5

Cited by

[1] Geoffrey S. H. Cruttwell, Bruno Gavranović, Neil Ghani, Paul Wilson, and Fabio Zanasi, Lecture Notes in Computer Science 13240, 1 (2022) ISBN:978-3-030-99335-1.

[2] Georgios Bakirtzis and Ufuk Topcu, 2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS) 308 (2022) ISBN:978-1-6654-0967-4.

The above citations are from Crossref's cited-by service (last updated successfully 2024-04-27 13:22:44). The list may be incomplete as not all publishers provide suitable and complete citation data.

On SAO/NASA ADS no data on citing works was found (last attempt 2024-04-27 13:22:44).