- ~1TB of data,
- ~20 billon rows
is to be used in blog exercises.
The data were resampled, x10 times replicated, and the date-time stamps altered. As the result, the data become big enough and any potential matches with the real data is highly unlikely.
As a cost reduction exercise the ~1TB of raw csv files were compressed into ~250Gb of using gzip Linux utility.