3.2.2. The Calculating Measure of Semantic Similarity in Hierarchical U0126 ERK Structure The hierarchical structure is the common characteristics in knowledge representation, such as the hypernym/hyponym relations in WordNet or the topic coverage in ODP. We utilize the hierarchical structure features for measuring the semantic similarity, that is, node depth and node distance. Intuitively, the deeper the depth of subsume, the greater their similarity. The node pair with the shorter distance between has the greater similarity than that of the pair with the longer distance between them. AssumeSim(Ci,Cj)=((1?��)?e?��?l1+��?e?��?l2)?e��?h?e?��?he��?h+e?��?h,(2)where l1 and l2 are respectively the shortest distance length from the node to the subsume; h is the depth of the subsume; the parameters ��, ��, and �� are in [0, 1].
For instance, given two topic chains TCi and TCj in ODP, the average similarity is measured their relatedness by using formula (3) as follows:Sim(TCi,TCj)=1|TCi||TCj|��ta��TCi?��tb��TCjSim(ta,tb),(3)where, ta and tb, respectively, denote one of all terms in topic chains TCi and TCj. 3.2.3. The Topic Identification Algorithm To implement topic identification for a given document, we assume the following. (1) The reoccurrences of topic discriminative terms in a given document indicate the presence of a certain topic. (2) A topic similarity set of topic discriminative terms which occur in the text fragment will share the identical topic and similar semantic context. Intuitively, the longer reoccurrences of the TDTs are preferred over shorter ones.
The more the TDTs in the certain text fragment are, the more chance there is that they are related to a similar topic content.Formally, a document D is represented as a sequence of n sentences Si(1 i n) which are the basic structure units. The K candidate topic discriminative terms distribute in these sentences and generate the K re-occurrence topic span intervals. A topical graph G = (V, E) is on undirected graph and may be consisted of m topical subgraphs. The vertices are represented for corresponding topic span intervals (TSI) of TDT; meanwhile, these TDTs associate with the corresponding topic semantic profiles which include disambiguation contexts and topic chains. The edges are connected according to the overlap relationship of topic span intervals of TDTs and the similarity relationship of TDTs’ topic semantic profiles.
In the process of generating topical graph, the subgraph is firstly constructed through immediate overlap of topic span intervals. Then, the multiple subgraphs are connected to the whole topical graph through the immediate adjoining relationship of topic span intervals Brefeldin_A of TDT, and these intervals are not overlapped.Next, we need to determine the unique sense of candidate TDT which includes more than one topic semantic profile.