The Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) reports the successful completion of a pilot project in the area of Chinese Text Mining, conducted at Merck Serono, a division of Merck KGaA, Darmstadt, Germany. The pilot project was initiated as a feasibility study to evaluate how far current text mining technology is able to support automated information extraction from Chinese text sources such as scientific publications and the patent literature.
In the course of this project, ProMiner, the named entity recognition software developed at Fraunhofer SCAI, has been adapted to the specific requirements of text mining in Chinese scientific biomedical and pharmaceutical literature. Most commercial text mining technology is able to analyse English text, and some solutions provide functionalities for the analysis of German or French text. However, due to the steep increase in Chinese scientific output and the ever growing importance and attractiveness of the Chinese market to Western companies, the ability to automatically analyse Chinese unstructured information sources is of utmost importance for scientific and competitive intelligence aiming to closely follow what happens in China.
Evaluation of the performance of the pilot system jointly demonstrates that Chinese literature can be mined for biomedical terms with similar performance as English literature. However, “the challenge of Chinese Text Mining cannot be regarded as being solved”, Dr. Juliane Fluck, Head of the Text Mining Team at Fraunhofer SCAI makes clear: “we have just demonstrated that we are able to mine the Chinese biomedical scientific literature automatically. The real work – which is aiming at providing all functionalities needed for true knowledge discovery from Chinese unstructured text sources – starts now, after the proof-of-principle”. Prof. Martin Hofmann-Apitius, Head of the Department of Bioinformatics at Fraunhofer SCAI sheds some light onto another, rather “academic” aspect of this work: “we were in the favourable situation that we have Chinese students doing their Master degree in Life Science Informatics at Bonn-Aachen International Center for Information Technology (B-IT).
The next steps in this collaboration will see an extension to another Fraunhofer Institute: the Fraunhofer Institute for Systems and Innovation Research (ISI). ISI in Karlsruhe has strong ties to China and is specialized on monitoring Chinese research, innovation and markets. Through collaboration with the Chinese Institute of Policy and Management, an institute of the Chinese Academy of Sciences (CAS), ISI is a premier partner when it comes to understanding science and innovation in China.
About Fraunhofer:
Fraunhofer is Europe’s largest application-oriented research organization. Research of practical utility lies at the heart of all activities pursued by the Fraunhofer-Gesellschaft. Founded in 1949, the research organization undertakes applied research that drives economic development and serves the wider benefit of society. At present, the Fraunhofer-Gesellschaft maintains more than 80 research units in Germany, including 60 Fraunhofer Institutes. The majority of the more than 18,000 staff are qualified scientists and engineers, who work with an annual research budget of EUR 1.65 billion.
The Fraunhofer Institute for Algorithms and Scientific Computing SCAI conducts research in the field of computer simulations for product and process development. SCAI designs and optimizes industrial applications, implements custom solutions for production and logistics, and offers HPC and Cloud solutions. Services are based on industrial engineering and methods from applied mathematics and information technology.
Kontakt
Prof. Dr. Martin Hofmann-Apitius
Leiter der Abteilung Bioinformatik
Fraunhofer Institute for Algorithms and Scientific Computing SCAI
53754 Sankt Augustin
Tel.: +49 2241 - 14 - 2802
E-Mail: martin.hofmann-apitus(at)scai.fraunhofer.de