Cloud Collection Platform for Internet open data


PFG independent research and development of CCP (cloud data acquisition) platform, with 14 independent development of software copyright, including data collection, data cleaning, Chinese semantic recognition and database and other technological innovations. PFG strong ability of independent research and development to ensure the "Internet plus" research strategy.

"CCP (cloud data acquisition)" platform adopts the advanced multithreading, distributed architecture, you can also run on dozens of computer terminals, each terminal can run 30 threads at the same time, constitute a huge network, and can in a short period of time grasping a large number of page data. In addition, the platform is also designed for the buffer pool, a large amount of data independent storage in each terminal, data acquisition is completed, then independent upload to the server, to avoid the short time within a large amount of data storage caused by the impact. In response to the special needs of CCP design transform function of IP, IP converter can access to the web, the breakthrough dictation access restrictions. In response to the dynamic page, CCP design of the two acquisition modes: no refresh background data acquisition and analog browser acquisition. The former can be used to deal with large scale static pages, to achieve rapid acquisition; the latter can be used to deal with the increasing dynamic pages. In response to the picture text, CCP design of the OCR recognition module, you can get the picture text recognition, all kinds of page data can be achieved.