My employer Denodo has released an "Express" version of its
data virtualization platform. The Virtual DataPort component lets you do cool things like joining tables that reside in completely different databases (say, one table in Oracle and another in MySQL). And it comes with a boatload of connectors for a wide range of data sources.
I'm interested in accessing all that data goodness from Haskell. There are a couple of ways of doing it.
One would be to use Virtual DataPort's REST interface. That's a valid option and maybe the subject of another post.
Another is to connect through the ODBC interface on port 9996. As it happens, the wire protocol aims to be compatible with that of PostgreSQL, so maybe I could just use Haskell's popular
postgresql-simple library. Let's try it out.
Musicbrainz
First, we need an underlying database instance that we can "wrap" with Virtual DataPort. Let's use the
Musicbrainz database, a perennial favorite for testing. We can
download it as a virtual machine and run it on
Virtual Box. I usually set the
guest networking to NAT and only forward PostgreSQL's port, which is 5432.
Virtual DataPort
Then we download and install Denodo Express, start the Virtual DataPort server, and launch the graphical administration tool.
From the administration tool's GUI, we have to import a few tables from the Musicbrainz database. First we create a JDBC data source:
(The Musicbrainz database is called
musicbrainz_db, the user/password is
musicbrainz/
musicbrainz.)
And then import the
artist_name and
release tables:
Haskell
Ok, now for the Haskell part. postgresql-simple depends on native libraries. In CentOS 7 for example, we'll have to install the package
postgresql-devel using yum, before invoking
cabal install postgresql-simple.
Once we have the required dependencies, we can dabble with the following snippet (notice that we are connecting to Virtual DataPort, not directly to Musicbrainz):
...and it works! A sweet stream of data flows from the Musicbrainz database, passes through Virtual DataPort and arrives at our Haskell client.
And thanks to the virtualization capabilities of Virtual DataPort, that stream could potentially represent the combined flow of a number of affluents.