1

In the following, it is shown that the generalized information theory is based on more general mathematical concepts in comparison of Shannon’s information theory.

Let us use predictions about raining as example to explain the distinctions between three types of probabilities.

1) Assume a prediction “Tomorrow will be rainy” has been made by an observatory for many times. We can derive the frequency ( or say percentage ), such as 70%, of rain from the statistics of objective facts. The frequency is called objective probability. The probability a prediction is selected is also derived from the statistics of objective facts and hence is also called objective probability. In the following, an objective probabilities is denoted by P( ).

2) The probability of rain in the future , which is forecasted by audience according to a weather forecast is called subjective probability, namely as is used by the Bayesian school (Chen, 1982). It is also called possibility by some researchers. In the following, Q(x_i) and Q(x_i|A_j) denote subjective probabilities.

3) For a certain weather or rainfall (x_i), the probability in which a proposition such as “It is very rainy” is judged true, is a logical probability as discussed by Reichenbach (1982) and other logical empiricists. It is often called belief-degree by researchers. In this paper, Q(A_j|x_i) and Q(A_j) denote logical probabilities.

The membership grade in fuzzy mathematics is but the different expression of logical probability of a proposition. Their relationship are examined as follows.

Let a set of objects be A={x₁, x₂, ..., x_m}, and a set of predicates be B={y₁, y₂, ..., y_n}, X and Y be random variables taking values in A and B respectively. For example, A is a set of different ages and B is a set of predicates, such as “ X is about twenty ” ( X here means a room that will be filled with a person’s age denoted by x_i, i=1, 2, ..., m). We use y_jor y_j(X) to stand for a predicate , and y_j(x_i) for a proposition. If X is given, such as X=x₁or X=Tom’s age, then the predicate will becomes a proposition: “Tom’s age is about twenty”. If a proposition is a judgment or conjecture about uncertain event, it is also called a prediction.

Following the random-set projection theory presented by Wang (1987), we may treat a fuzzy set as the statistical result of a random set (see Figure 1); and thus, the membership grade is defined as follows:

where Q(A_j|x_i) Î[0,1], S_j is a random set with respect to fuzzy subset A_jof A, s_kis a given set as a value taken by S_j, Q(s_k|x_i) Î{0,1} is the feature function of set s_k.

In this paper, the logical probability of proposition y_j(x_i), or logical condition probability of predicate y_j is defined as the same as the membership grade, i.e.

For example, Tom’s age (x_i) is 16. The membership grade of his age in fuzzy set A_j={ages that are about x_j=20} is 0.45 as shown in Figure 1. The logical probability of the proposition “Tom’s age is about twenty” is also 0.45. In this case, those clear sets, s_k for k=1,2, ..., are judged on the meaning of “about twenty years old” by different persons and in different cases.

It should be remembered that a predicate y_j has both an objective or selective probability P(y_j) and a logical probability Q(y_j is true)=Q(A_j). They are generally unequal. For instance, if a meteorological observatory always tells us “There will be no rain”(y_j), then the selective probability would be P(y_j)=1; yet, Q(A_j) is related to the meaning of the sentence and usually has nothing to do with the probability in which the sentence is selected. Our statistical experience could infer that the logical probability of “There will be no rain.” is about 0.8 . Similarly, P(y_j|x_i) is the condition probability in which y_jis selected for given x_i; yet Q(A_j|x_i) is the logical probability of proposition y_j(x_i). They are very different. In brief, for objective and subjective probabilities, there are

2.2 Set-Bayesian Formula

Assume that A’ is a clear subset of A, the feature function of A’ is Q(A’|x_i)Î{0,1}, and the forecasted probability of event XÎA’ is Q(A’). Then

where Q(x_i) is subjective or forecasted probability of x_i. Let us define Q(x_i|A')=Q(X=x_i|XÎA'), hence the condition probability of x_iin A’ is

The above formula is the Bayesian formula with a set as condition, and hence is called set-

Bayesian formula. Let the clear set A’ be replaced with a fuzzy set A_j. Then we have set-Bayesian formula for fuzzy set:

2.3 Generalized Mapping and Generalized Entropy

To measure Shannon’s entropy of a set, such as the set of rainfalls, ages, or colors, we need set partition. Yet, to measure its generalized entropy, we only need set covering. Two groups of mathematical concepts related to two types of entropies are shown in Table I. These concepts are very elementary in modern mathematics. For each pair of the concepts in Table I, the left is a special case of the right.

Let A₁, A₂, ..., A_nbe subsets of A. The set A_J={A₁, A₂, ..., A_n} is called a covering of A if

Further, A_J is called a partition of A if A_jÇA_j’=F (empty set) for all A_j, A_j ÎA_Jand A_j¹A_j (which means no overlapping sets).

We call a subset R of the product set A´B a relation from A to B; and further call R a generalized mapping if for each x_iÎA there must be at least one y_j so that (x_i, y_j)ÎR (see Figure 2). If further, there is one and only one y_jfor each x_iso that (x_i, y_j)ÎR, then generalized mapping will become mapping or regular mapping. For each pair of (x_i, y_j)ÎR, we call y_j the image of x_i and call x_ithe inverse image of y_j. The difference between mapping and generalized mapping is that each x_i can only have one image in mapping; yet, a x_i may have more than one images in generalized mapping.

relation generalized mapping mapping

A B A B A B

It can be proven that a generalized mapping from A to B determines a covering of A, and a regular mapping on A´B determines a partition of A; vice versa (see Figure 2). Similarly, a generalized mapping from A to B determines a similarity relation on A´A, and a regular mapping from A to B determines a equivalence relation on A´A; and vice versa.

Definition 2.3.1 The generalized entropy determined by the generalized mapping from A to B is defined as

where P(y_j) is a objective probability in which the predicate y_j is selected; A_ja set formedby all inverse image of y_j; Q(A_j), defined by Equation (3), is the logical probability of y_j.

If the generalized mapping becomes a regular mapping, then Q(A_j)=P(y_j) for j=1, 2, ..., n, and hence the generalized entropy becomes Shannon’s entropy:

2.4 Fuzzy mapping, Generalized Condition Entropy, and Generalized Mutual Entropy

A generalized mapping R from A to B can be determined by a m´n (m=|A|, n=|B|) matrix with elements r(x_i, y_j)Î{0,1} for i=1, 2, ..., m; j=1, 2, ..., n, where r(x_i, y_j)=1 means (x_i, y_j)ÎR. If r(x_i, y_j)Î[0,1], then we call R a fuzzy relation.

Definition 2.4.1 A fuzzy relation is called a fuzzy mapping (or fuzzy generalized mapping) if

Definition 2.4.2 Assume R is fuzzy mapping and Q(A_j|x_i)=r(x_i, y_j) for i=1, 2, ..., m; j=1, 2, ..., n. we call

In Equation (11), the condition probability P(y_j|x_i) for i=1, 2,..., m indicates the rule for selecting sentence y_j; whereas logical probability Q(A_j|x_i) for i=1, 2, ..., m embodies the meaning of sentence y_j. The former changes from case to case and person to person. Yet, the latter is determined by linguistic culture. The former must be normalized; whereas the latter needn't be. The reason is that, in set B, there are probably two sentences, such as “Tomorrow will be rainy” and “Tomorrow will be very rainy”, that are equally true at the same time.

Let us call H(X;Y)=H(Y)-H(Y|X) the generalized mutual entropy. Section 3.2 will explain that H(X;Y) indicates the generalized mutual information, i.e. H(X;Y)=I(X;Y). An example of calculating the generalized mutual entropy is shown as follows.

Example 2.4.1 Three kinds of possible weathers are x₁=no-rain, x₂ =light-rain, x₃=heavy-rain; A={x₁, x₂, x₃}; P(x₁)=P(x₂)=P(x₃)=Q(x₁)=Q(x₂)=Q(x₃)=1/3. There are sentences y₁=“There is no rain”, y₂=“There is light rain”, y₃=“There is heavy rain”, y₄=“There is rain”; B={y₁, y₂, y₃, y₄}. The selective condition-probability matrix and the logical condition-probability matrix of sentences are shown as follows:

Definition 2.4.3 The subset R of the product set A´A is called a fuzzy similarity relation if

And a fuzzy similarity relation is called a fuzzy equivalence relation if r_ij=1.

Assume A is a set of different gray levels or colors. The confusion probability in which x_iand x_j are confused by sight (or a detector) can be a similarity function r(x_i, x_j). Let represent a prediction y_j=“X is likely x_j” and A_jbe a fuzzy set including all X ÎA confused with x_j. Then Q(A_j|x_i)=r(x_i, x_j). In this case, we also call Q(A_j|x_i)=r(x_i, x_j) a fuzzy discrimination function. Treating r(x_i, x_j) as logical condition probability, we can calculate sensory information, which is conveyed by sense organs or detectors, in the same way as semantic information. In practice, both logical condition probability function Q(“y_j is true”|X) and confusion probability function r(X, x_j) can be obtained by the set-valued statistics ( refer to Figure 1 and Equation (1)).

Shannon’s entropy	generalized entropy
set partition	set covering
mapping or regular mapping	generalized mapping
equivalence relation	similarity relation

2. MATHEMATICAL FOUNDATIONS

2.2 Set-Bayesian Formula

2.3 Generalized Mapping and Generalized Entropy

2.4 Fuzzy mapping, Generalized Condition Entropy, and Generalized Mutual Entropy