Priors for Semantic Variables
Some of the aspects of the world around us are captured in natural language and refer to semantic high-level variables, which often have a causal role (referring to agents, objects, and actions or intentions). These high-level variables also seem to satisfy very peculiar characteristics which low-level data (like images or sounds) do not share, and it would be good to clarify these characteristics in the form of priors which can guide the design of machine learning systems benefitting from these assumptions. Since these priors are not just about the joint distribution between the semantic variables (e.g. it has a sparse factor graph corresponding to a modular decomposition of knowledge) but also about how the distribution changes (typically by causal interventions), this analysis may also help to build machine learning systems which can generalize better out-of-distribution. Introducing such assumptions is necessary to even start having a theory about generalizing out-of-distribution. There are also fascinating connections between these priors and what is hypothesized about conscious processing in the brain, with conscious processing allowing us to reason (i.e., perform chains of inferences about the past and the future, as well as credit assignment) at the level of these high-level variables. This involves attention mechanisms and short-term memory to form a bottleneck of information being broadcast around the brain between different parts of it, as we focus on different high-level variables and some of their interactions. The presentation summarizes a few recent results using some of these ideas for discovering causal structure and modularizing recurrent neural networks with attention mechanisms in order to obtain better out-of-distribution generalization and move deep learning towards capturing some of the functions associated with conscious processing over high-level semantic variables.