For programmers it features a very flexible object model and is easily expandable - expect new modules in the future ! It is implemented in Java and the source code is available. If you want to implement your own web spider, the WebRobot class will be a good starting point. Even if you don't want to use it as a download tool but for indexing, link checking or whatever you want, JoBo is the right tool. Retrieving documents and handling these documents are completely seperated - therefore you can plug in your own module easily.
- command line and graphical version (but command line version needs a major update, currently the GUI version has much more features)
- recursive search of all documents starting from a given start document
- support of <A> <AREA> <IMG> <FRAME> tags (with fault tolerance)
- support of the robot exclusion protocol
- user controlled maximal search depth
- user agent name can be defined
- support of referrer headers
- support of automated form handling (JoBo can fill fields with predefined values)
- cookie support
- XML configuration
- used bandwidth can be limited
- allow/deny downloads by mime type and document size (e.g. ignore all image/* files)
- allow/deny downloads by regular expressions (e.g. don't download /cgi-bin)
- can convert absolute links to relative
- download only files newer then a given age
- resume job
16.12.2006: JoBo 1.4 released
After more then two years of beta tests and only some minor changes, I created a new JoBo version. The new version contain several bugfixes, but no new functionalities. Also lots of deprecated methods have been removed and untyped collections have been replaced by generics.
JoBo will run under Java 1.5 or higher (only 1.5 tested).
Diese Seite wurde archiviert, d.h. sie wird nicht mehr aktiv gepflegt und die Informationen entsprechen unter Umständen nicht mehr dem aktuellen Stand.