2/22/2013

About PubSearch

Most of this post was written in 2013, but I actualized it in 2018.

History

This program was my thesis at university and this was my first project that I’ve ever published on the web. I created a SourceForge project to be able to use SVN and write wiki pages where I can do the planning.

A year after I started developing the program, and months after v1.0, in the beginning of 2013, Softpedia wrote me an email informing me that they had included my program in their public software database.

It was downloaded 14 times within 4 hours, and this gave me a little motivation to continue developing the project. I roughly planned PubSearch 2, but among my other tasks, sadly I had no time to implement it.

A few years later the program disappeared from Softpedia, maybe because without my updates it became useless.

What’s this?

This is a Java tool which can search in multiple publication databases (such as Google Scholar, CiteSeerX, ACM, SpringerLink). You type the author’s name and PubSearch grabs the basic information of her/his publications. It can transitively crawl the “cited-by” lists, so a researcher can use this tool for calculating her/his impact factor.

It uses a proxy list to reach those sites, to avoid banning because of the heavy network traffic. The program uses definition files to crawl the databases, you can edit these with any simple text editor or add your own definiton. You can export publication data in citation formats.

JRE, MySQL and a proxy list is required to run the program.

Features

Websites of publication databases are changed since I last updated this project, so the program may only list a few results or none.

searches in the following databases:
you can edit/add publication database definitions
automatic proxy list downloading
crawl cited by publications transitively (where possible)
publication data stored in a MySQL database
export results table in CSV or citation format
export individual publication data in citation format
you can edit/add citation format templates
hungarian and english GUI

Ideas for further development

In the beginning of 2013, I roughly planned PubSearch 2, with modularity in mind. The goal is to make it more universal. Websites of publication databases are continuously changing, and altough PubSearch 1.x can be easily actualized, some features of these websites cannot be reached by the built-in uniform algorithm of PubSearch 1.x. So modularity should be provided, through a Java interface. This way specialized crawlers can be added as JAR files, which can be loaded when the program starts. And of course, PubSearch 1.x would be still there as a built-in crawler.
Would be nice to add more settings, like selecting publication databases or crawlers, and managing proxy lists.
HTML parsing should be much more elegant. Back then, without any better ideas, I used regular expressions, which as we know is not a proper approach.
Merging publications would be also great, even automatized as much as it can be.

History

What’s this?

Features

Links

Ideas for further development

Want some cookies?