Heatmaps, Point Clouds and Big Data in Processing
Jim Blackhurst has a nice writeup on his blog about working with large SQL-based data in Processing and OpenGL for creation of big point clouds and heatmaps.
I’ve been using Processing to create tools that render the heatmaps, but while the logical structure of program is fairly simple, there are significant challenges in working with large datasets. The primary challenge is loading the data into memory. The data is all held in a SQL database, and while I could connect to the DB directly using processing, the DB is optimised for data-in operations, not data-out, so you don’t want to be pulling the data out too often. Instead, I dump the raw spatial data (X,Y,Z coordinates) into a CSV file, one record per row. I usually create heatmaps from datasets in excess of 1 million rows, and most of them are between 5 – 20 million rows (I have one that is 22 million rows!). A CSV file containing 10 million rows of spatial data is about 364MiB in size (the 22.3m row CSV is 802MiB!). In order to create the data structures in memory to hold sets this large, I have to work in 64bit mode to get over the Windows 32bit memory restrictions.
The dataset is a collection of 11.3 Million Player deaths from the game “Just Cause 2″, hopefully showing the most dangerous areas of the game. The project is still in-development and he hopes to transition to the new Deus Ex game for his next effort.