3/30/15

Pentaho ETL

Pentaho ETL tool (fully graphical) supporting many databases (input and output) and complex process of transformation of these data.


If you do not have time to muse mu with the product, here are some comments on the product Kettle, remember free open source! Kettle uses JDBC drivers for connections to different databases. There are opportunities to cache data, and it can run multiple transformations in parallel (see paragraph on slow). Kettle works on production databases with millions of records. 
The interface, usability: qualityFirst, what comes to the eyes, it is the only interface (in Java). It seems at first more functional first, so more pleasing to the eye of a beginner than its competitor. The management of transformations and jobs is facilitated. However, some find that the interface is less attractive than Talend because of its rapprochement with the DB.Some also blame the slow Kettle in some operations.
 I remind that for sorting example must be set perfectly processing (number of rows for sorting pass, use of the ram / temporary files ...). 
Because, to a large number of temporary files, greatly increases the observed slow. Increase the number of rows in memory significantly increases the treatment. There is also a temporary file compression option but adds processing time while reducing the required disk space ... (must know what we want!). 
All that to say that the slow Kettle is not an end in itself.Two different modesKettle can be used through two distinct modes, either locally or repository:Local: transformations and jobs are stored and managed locally on filesystem.With reference: the whole work of the developer is stored in the repository PDI / Kettle based on a dedicated pattern created on the DB of your choice (on the target system, for example).  
This is an ideal feature more files to manage, everything is centralized. In case of changing a setting in your DB, simply change once the corresponding connection in the repository; the change will be reflected in all transformations / jobs that use this connection.In all the tools provided, Kettle comes with sample scripts to launch jobs / transformations in batch mode.  
Very useful in my current environment (Solaris), you have the opportunity to develop a pool of scripts that provide the outbreak of jobs (some parallel!), Their control and the recovery anomaly.Abnormality back quickly with a simple parser or Ctrl-F "ano" when developed. Note that the beginning of a job / transformation interactively (via the interface) is of course possible. 

Overall, performance is generally at the rendezvous. Of course, the higher your DB is powerful and structured, the better the behavior Kettle.Error: Missing management moduleHowever, it lacks a module Kettle, which is badly needed when working with multiple environments: a manager transformations / jobs. Currently, to migrate a transformation or a job from one environment to another, you have to use a backup / local export and then reconnect to the new repository for publication. One of the transformations and jobs management module would be faster to better manage the environment up phase to the image of a "CVS light" or a "SourceSafe".

0 commentaires:

Post a Comment