Co-occurrence relationships in PubMed between terms accelerate the recognition of term associations. The lack of manually curated relationships in vocabularies and the rapid increase of biomedical literatures highlight the importance of co-occurrence relationships. Here we proposed a framework to explore term associations based on a standard procedure that comprises multiple tools of text mining and relationship degree calculation methods. The text of PubMed were segmented into sentences by Apache OpenNLP first, and then terms of sentences were recognized by MGREP. After that two terms occurring in a common sentence were identified as a co-occurrence relationship. The relationship degree is then calculated using Normalized MEDLINE Distance (NMD) or relationship-scaled score (RSS) method. The framework was utilized in exploring associations between terms of Gene Ontology (GO) and Disease Ontology (DO) based on co-occurrence relationship. Results show that pairs of terms with more co-occurrence relationships indicate shared more semantic relationships of ontology and genes. The identified association terms based on co-occurrence relationships were applied in constructing a disease association network (DAN). The small giant component confirms with the observation that diseases in the same class have more linkage than diseases in different classes.
Keywords: co-occurrence relationship; framework; term association; text mining.