Defining a Model

In formal methods and other areas of computer science and software engineering (e.g., model driven engineering), models are a common occurrence. But despite their use, models are hard to define. Many people have sought out to define and characterize what constitutes a model and this post will review some of these characterizations.

The first notions of a model that will be considered come from Thomas Kühne in 2005. He starts by suggesting that in the past, models have generally been artifacts developed in a modelling language, but more recently, even Java programs have been considered models [1]. After reviewing the ideas of a few other discussions on the topic (namely, those put forth by Stachowiak and Steinmüller), Kühne suggests that models should be a projection (which is a reduction) of an original and that they must be exact–but not complete. Models must not copy the object being modeled, and Kühne dedicates a section to arguing that a copy is not a model. A copy allows for “test runs” rather than model simulations. However, models must still be exact with respect to the properties that are being modeled.

Interestingly, and perhaps counter-intuitively to many, Kühne suggests that models are not (necessarily) a description of the source object. Indeed, some models fail to describe the system they are modeling, as they only model a few interesting properties of it. Though I do agree, I think Kühne should have noted explicitly that models often do describe the systems they represent. Kühne however proceeds to talk about two types of models, ones that are used in the design phase of software, and ones that are used in the analysis phase of software. In doing this, Kühne admits that some models describe problems. He also notes that the source object being modeled may be theoretical, as in the case of models used in the design phases of software development.

The remainder of Kühne’s paper discusses metamodels. The discussion arises from the fact that there a two fundamentally different types of models that can result in the term metamodel (and others) to be misunderstood. The two types are token models and type models [1].

Token models are models that capture a “single configuration” and represent single object instances with their elements [1]. Kühne notes that the relation “representation of” is transitive for token models. An instance of an instance representing an object, is still an representation of the original model; the paper uses a map example with varying map scales to clarify. The transitivity property of the “representation of” relation should determine when the prefix meta- should be applied to a model–and it doesn’t make sense for models that are related by a transitive operation like “representation of”. Thus two token models representing instances of the same object are two models; neither of them are metamodels. A model of a token model is a metamodel if it models the token model’s properties, rather than its contents. Type models capture “universal aspects” of an original object. Furthermore, type models show only types of interest. A UML class diagram is an example of a type model.

Lastly, Kühne clarifies the notion that a true metamodel can always be understood as a language definition, and that metamodels should still be considered models. Metamodels are token models with reduction features of the language that they model, and are potentially type models that can describe all models that would be represented by the metamodel.

In a paper by Pierre-Alain Muller et al., “Modeling Modeling,” the author suggests that software engineers have been using models without knowing it. In particular, there is a hidden relation that modelers rely on that the authors explicate. The paper focuses primarily on this relation after relatively small a section on model definitions.

The definitions considered however, represent a varying understanding and are considered briefly in subsections. The first such subsection lists features that models should have, which are represented fairly well by the notions described above by Kühne: projection and exactness, though in greater detail. The ideas stem from Stachowiak (again) and Bran Selic, though Selic interestingly also lists understandability and inexpensiveness while Stachowiak makes the notion of a mapping explicit. However, as models are not supposed to be copies, I suspect that many people would presume that they are cheaper and that inexpensiveness is implicit. The next subsection describes kinds of models, listing ideas from Seidewitz and Favre that largely overlap: a distinction of descriptive and specification models, which was touched on briefly by Kühne, and ends with Kühne’s distinction of type and token models.

Next, a model of modelling is developed. The authors aim to “designate a representation of what we manipulate when we use modeling techniques” [2]. Abstracting away the details of models, the authors consider “things” and “arrows,” which are roughly objects and relations, citing a foundation in Category Theory. The arrow graphics are manipulated by changing shape (straight to wavey, for example) and result in expressing different types of relations between two objects and their intention. An object’s intention is what the term implies: the give purpose it exists to fulfill. Using labels on the arrows of the representation of relation, notions such as descriptive and specification models can be expressed. The term causality is also introduced, which communicates when a descriptive model is correct and a specification model is valid. To cover transitivity in modeling models, a framework is established that allows for combinations to have multiple outputs depending on the inputs of models and their intentions.

The remainder of this paper deals with examples that primarily serve to display the new notation. These examples will not be covered here, and interested readers should seek them in [2].

Lastly, we consider the work of Jochen Ludewig, who provides an introduction to models in software engineering [3]. Again, Stachowiak’s criteria come up as a basis for model definitions. This time, however, they are examined with greater depth than in the papers previously considered. In particular, there is an emphasis on models that are practical when the original is not, and that though models are reduction of the original, they often include additional attributes, which Ludewig calls abundant attributes, that present in the model but not in the original. These attributes often allow the model to be useful and Ludewig gives an example of Z syntax as possible abundant attributes.

Unlike the other papers, Ludewig also considers terms that are related to “model”. Namely, tool, icon, name, metaphor, and theory. Ludewig argues that the use of tool and name should not be similar to that of a model; tools lose their relationship to any original as tools evolve, and names do not provide information about an object. Icons (or symbols) are similarly not models: they do not provide information, except in a few cases, and even then the amount of information provided is not substantial. Metaphors and theories on the other hand, are models. Metaphors are a models because the comparison (usually) reveals some information about the original, and are particularly useful for new entities. Theories are abstract models that emphasize “results and conclusions rather than obviousness” [3].

Ludewig also mentions the notions of prescriptive (specification) and descriptive models; these are not new terms by now. However, Ludewig introduces the term transient models for models that are both at different times. Furthermore, Ludewig emphasizes that descriptive models do not need to come after the original. In using an example of a weather describing model, Ludewig shows that such a descriptive model can describe the behaviour of upcoming weather before it happens. Note that this is not a specification model: the weather was not created due to the model.

Next, Ludewig goes on to describe purposes of models, saying that such purposes can be used to classify models. Models can be used as documentation or to provide instructions, as exploratory entities used to simulate changes, as practical replacements for originals in an educational manner, or as formal “mathematical” models (e.g. the formulas used in physics and chemistry) [3].

Ludewig then considers the role of some of these classifications in software engineering. In particular, he claims that models in software engineering are primarily prescriptive; models in software engineering generally have the end goal of code in mind. One exception that Ludewig points out is a requirements document: while it is true that it is a prescriptive model for other development documents, it is also a descriptive model of the user’s needs. Additionally, since models often prescribe other models in software engineering, Ludewig reminds the reader that at each prescription, some information is lost, and in order to keep systems consistent, tracing should be possible. Tracing is the “identification” of modifications in different models when the models that describe them change [3]. Backward tracing is also possible.

Beyond this, Ludewig argues that since the systems used in software engineering only mimic the existing world, software engineering is limited. In particular, systems that “depart” from traditional models are some of the greatest engineering achievements, and this is not common in software systems. A few exceptions are noted, such as quicksort, which is quite fast on a system but not commonly performed by hand. However, Ludewig has hope that eventually models of software development will be sufficiently well understood that such breakthroughs will be more common, but believes there is much work to be done before that occurs. Ludewig also notes that there are risks from using models in software engineering (and elsewhere): people often look for faults in reality, rather than faults in their models. In order to advance the state of software engineering, Ludewig and his research group have developed a system called Software Engineering Simulation using Animated Models (SESAM). The system is briefly described in the paper and aims to simulate software engineering in a manner similar to simulating flight for a pilot. However, in this case, the “pilot” is a project manager for a software project. After some use, the system revealed that descriptive models are difficult to work with, for example, compromises of realism for simplicity.

Switching back to a focus on prescriptive models, Ludewig says that while they are easy to develop, they are hard to test. Notably, Ludewig says that most research projects do not go far enough after conception to demonstrate their superiority or applicability. He concludes by saying research on comparing existing ideas should be more common in order to counter this problem.

In conclusion, there are various notions that a good model should satisfy. Some of the more common ones have been visited above, namely the need for a reduction, mapping, and practicality. They should also be accurate with respect to the properties they model. Models can further be classified as specification, descriptive, or transient, and by the purpose they serve. Models can also be modeled as metamodels, provided they model properties and not the content of the original model. Hopefully the definitions considered have clarified (or confirmed) your own notions.


[1] T. Kühne, What is a Model? in Language Engineering for Model-Driven Software Development, Dagstuhl Seminar 04101, 2005.
[2] Muller, P.-A., Fondement, F., an Baudry, B., Modeling Modeling, in Proc. of the 12th International Conference on Model Driven Engineering Languages and Systems (MODELS), LNSC 5795, Springer-Verlag, pp. 2-16, 2009.
[3] J. Ludewig, Models in Software Engineering – An Introduction, In Software and Systems Modeling, 2(1): pp. 5-14, (March 2003).


Leave a Reply