Data Files

DTDataFile

This is the primary interface for communication between DataTank and a Python-based helper program, assuming you use files for communication. It allows you to read, write, and append to a DataTank binary file (extension .dtbin).

class datatank_py.DTDataFile.DTDataFile(file_path, truncate=False, readonly=False)

This class roughly corresponds to the C++ DTDataFile class.

Higher-level access is provided for some objects (e.g., PIL image), but it is primarily for writing arrays and strings. In fact, all DataTank stores is arrays and strings, with naming conventions to define how they’re interpreted.

You should not have multiple DTDataFile instances open for the same file on disk, or your file’s state will get trashed.

Reading values is fairly easy, and DTDataFile provides a dictionary-style interface to the variables. For example, assuming that a variable named “Array_One” exists:

>>> f = DTDataFile("a.dtbin")
>>> v = f["Array_One"]

Setting is similar, but you’re at the mercy of the type conversion as in the write() method, and you can’t specify a time:

>>> import numpy as np
>>> f = DTDataFile("a.dtbin")
>>> f["My array"] = np.zeros((2, 2))
>>> f["My array"]
array([[ 0.,  0.],
       [ 0.,  0.]])

Note that if you try to set the same variable name again, an exception will be thrown. Don’t do that.

You can also iterate a DTDataFile directly, and each iteration returns a variable name. Variable names are unordered, as in hashing collections.

>>> f = DTDataFile("a.dtbin")
>>> for name in f:
...     print name
>>> if "Var" in f:
...     print f["Var"]

DTDataFile supports the with statement in Python 2.5 and 2.6, so you can use this idiom to ensure resources are cleaned up:

>>> with DTDataFile("foo.dtbin", truncate=True) as df:
...     df.write_2dmesh_one(mesh, 0, 0, dx, dy, "FooBar")
__init__(file_path, truncate=False, readonly=False)
Parameters:
  • file_path – absolute or relative path
  • truncate – whether to truncate the file if it exists (default is False)
  • readonly – open the file for read-only access (default is False)

The default mode is to append to a file, creating it if it doesn’t already exist. Passing True for truncate will entirely clear the file’s content.

close()

Close the underlying file object.

Further access to variables and names is not possible at this point and will raise an exception.

dt_object_named(key)
Returns:a high-level DT object, if possible, by introspection
ordered_variable_names()
Returns:list of variable names ordered as in the file
path()
Returns:the file path
resolve_name(name)

Resolve a name in case of shared variables.

Parameters:name – A potentially shared variable name
Returns:The underlying variable name, with all references resolved

DataTank mesh objects can be written with a shared grid, for example, and just save a string pointing to the underlying grid name. This saves a lot of disk space, but means that you can end up with a string instead of the object you’re expecting.

This method is pretty efficient in the common case of no redirect, as it only reads the header. Other cases are a bit more expensive, but too tricky to be worth rewriting at the moment.

Example from datatank_py.DTStructuredGrid2D.DTStructuredGrid2D:

name = datafile.resolve_name(name)
gridx = datafile[name + "_X"]
gridy = datafile[name + "_Y"]
mask = DTMask.from_data_file(datafile, name + "_dom")
return DTStructuredGrid2D(gridx, gridy, mask=mask)

For an example of writing a shared grid see datatank_py.DTTriangularMesh2D.DTTriangularMesh2D.write_with_shared_grid()

variable_named(name, use_modules=False)

Procedural API for getting a value from disk.

Parameters:
  • name – the variable name as user-visible in the file (without the trailing nul)
  • use_modules – try to convert to abstract type by introspection of available modules
Returns:

a string, scalar, or numpy array

This returns values as strings, scalars, or numpy arrays. By default, no attempt is made to convert a given array to its abstract type (so you can retrieve each plane of a 2D Bitmap object by name, but not as a PIL image).

variable_names()
Returns:unsorted list of variable names
write(obj, name, dt_type=None, time=None)

Write a single value to a file object by name.

Parameters:
  • obj – string, numpy array, list, tuple, or scalar value
  • name – user-visible name of the variable
  • dt_type – string type used by DataTank
  • time – time value if this variable is time-varying

Handles various object types, and adds appropriate names so they’re visible in DataTank. String, scalar, ndarray, tuple, and list objects are supported, although ndarray gives the most specific interface for precision and avoids type conversions.

This method saves a 0D array (scalar) as a “Real Number”, a 1D array as a “List of Numbers” and other shapes as “Array” by default. Use the dt_type parameter if you want something specific, such as “2D Point” for a point (although the caller has to ensure the shape is correct).

In addition, any object that implements __dt_type__() and __dt_write__() methods can be passed, which allows saving compound types such as 2D Mesh or 2D Bitmap, without bloating up DTDataFile with all of those types.

The __dt_type__() method must return a DataTank type name:

def __dt_type__(self):
    return "2D Mesh"

The __dt_write__() method should use write_anonymous to save all variables as required for the object. The datafile argument is this DTDataFile instance. Note that __dt_write__() must not expose the variable by adding a “Seq” name, as that is the responsibility of DTDataFile as the higher-level object.:

def __dt_write__(self, datafile, name):
    ...
    datafile.write_anonymous( ... , name)
write_anonymous(obj, name)

Write an object that will not be visible in DataTank.

Parameters:
  • obj – a string, numpy array, list, or tuple
  • name – name of the variable

This is used for writing additional arrays and strings used by compound types, such as a 2D Mesh, which has an additional grid array.

write_array(array, name, dt_type=None, time=None)

Write an array with optional time dependence.

Parameters:
  • array – a numpy array, list, or tuple
  • name – user-visible name of the array variable
  • dt_type – string type used by DataTank
  • time – time value if this variable is time-varying

This will add a string to expose it in DataTank using the dt_type parameter, which is a DataTank type such as “Array” or “NumberList.” The time parameter is a double-precision floating point value, relative to DataTank’s time slider.

Note that if time dependence is used, the caller is responsible for appending “_N” to the variable, where N is an integer >= 0 and strictly increasing with time. A contrived example follows:

>>> import numpy as np
>>> f = DTDataFile("foo.dtbin")
>>> for idx in xrange(10):
...     point_test = np.array(range(idx, idx + 10), np.double)
...     point_test = point_test.reshape((point_test.size / 2, 2))
...     tp = "2D Point Collection"
...     tm = idx * 2.
...     f.write_array(point_test, "Points_%d" % (idx), dt_type=tp, time=tm)

Note that the actual variable type is “2D Point Collection,” and the caller is responsible for setting the array shape correctly. This should work for any array-based object in DataTank.

write_string(string, name, time=None)

Write a string with time dependence.

Parameters:
  • string – the value to save
  • name – the user-visible name of the string variable
  • time – time value if this variable is time-varying

If this is the first time this string has been written, this method will add a string to expose it in DataTank. The time parameter is a double-precision floating point value, relative to DataTank’s time slider.

Note that if time dependence is used, the caller is responsible for appending “_N” to the variable, where N is an integer >= 0 and strictly increasing with time. A call might look like this:

>>> import datetime
>>> f = DTDataFile("foo.dtbin")
>>> for idx in xrange(10):
...     s = datetime.now().isoformat()
...     f.write_array(s, "PointTest_%d" % (idx), time=idx * 2.)

DTPyWrite

class datatank_py.DTPyWrite.DTPyWrite

Class documenting methods that must be implemented for DTDataFile to load complex types by name.

This is never instantiated directly. datatank_py.DTDataFile.DTDataFile checks to ensure that an object implements all of the required methods, but you are not required to use DTPyWrite as a base class. It’s mainly provided as a convenience and formal documentation.

__dt_type__()

The variable type as required by DataTank.

Returns:variable type as a string

This is a string description of the variable, which can be found in the DataTank manual PDF or in DTSource. It’s easiest to look in DTSource, since you’ll need to look there for the __dt_write__() implementation anyway. You can find the type in the WriteOne() function for a particular class, such as:

// this is taken from DTPath2D.cpp
void WriteOne(DTDataStorage &output,const string &name,const DTPath2D &toWrite)
{
    Write(output,name,toWrite);
    Write(output,"Seq_"+name,"2D Path");
    output.Flush();
}

where the type is the string “2D Path”. In some cases, it seems that multiple type names are recognized; e.g., “StringList” is written by DataTank, but “List of Strings” is used in DTSource. Regardless, this is trivial; the datatank_py.DTPath2D.DTPath2D.__dt_type__() method looks like this:

def __dt_type__(self):
    return "2D Path"
__dt_write__(datafile, name)

Write all associated values to a file.

Parameters:

This method collects the necessary components of the compound object and writes them to the datafile. The name is generally used as a base for associated variable names, since only one of the components can have the “primary” name. Again, the DataTank manual PDF or DTSource must be used here as a reference (DTSource is more complete). In particular, you need to look at the Write() function implemented in the C++ class:

// this is taken from DTPath2D.cpp
void Write(DTDataStorage &output,const string &name,const DTPath2D &thePath)
{
    Write(output,name+"_bbox2D",BoundingBox(thePath));
    Write(output,name,thePath.Data());
}

Here the bounding box is written as name_bbox2D; this is just a 4 element double-precision array. Next, the actual path array is saved under the name as passed in to the function. The equivalent Python implementation is:

def __dt_write__(self, datafile, name):
    datafile.write_anonymous(self.bounding_box(), name + "_bbox2D")
    datafile.write_anonymous(np.dstack((self._xvalues, self._yvalues)), name)

Note that datatank_py.DTDataFile.DTDataFile.write_anonymous() should be used in order to avoid any variable name munging (prepending “Seq_” in order to make the variable visible in DataTank).

classmethod from_data_file(datafile, name)

Instantiate a datatank_py high-level object from a file.

Parameters:
Returns:

a properly initialized instance of the calling class

This class method can be implemented to read necessary components of an object from a datafile. For example:

from datatank_py.DTPath2D import DTPath2D
from datatank_py.DTDataFile import DTDataFile

with DTDataFile("Input.dtbin") as df:
    path = DTPath2D.from_data_file(df, "My Path")

will try to create a datatank_py.DTPath2D.DTPath2D from variables named “My Path” in the given data file. In general, this is the inverse of the __dt_write__() method, but may be slightly more tricky due to naming conventions in DataTank and optional data that DataTank may or may not include.

datatank_py.DTPyWrite.dt_writer(obj)

Check to ensure conformance to dt_writer protocol.

Returns:True if the object implements the required methods

DTSeries

class datatank_py.DTSeries.DTSeries(datafile, series_name, series_type)

Base class for series support.

In general, you shouldn’t need to use this class; it’s only provided for symmetry with DTSource, and to be used by DTSeriesGroup. However, it may also be useful for non-group objects in future.

__init__(datafile, series_name, series_type)
Parameters:

The name will typically be “Var”, and the type will be whatever is the base type stored, such as “Group” for a group object.

basename()
Returns:name of the form ‘name_N’ where N is the result of :meth:savecount
datafile()
Returns:the datatank_py.DTDataFile.DTDataFile instance used for storage
last_time()
Returns:last time value stored or None if no values are stored
savecount()
Returns:the number of time values stored
shared_save(time)
Parameters:time – time value to store to disk

Saves the current time value and an appropriate variable name to disk.

time_values()
Returns:vector of time values stored

DTSeriesGroup

This class allows custom objects to be created and returned, and is particularly useful when you want to compute a full time series of variables all at once.

_images/mesh_group_example_ss.png
#!/usr/bin/env python
# -*- coding: utf-8 -*-

from datatank_py.DTDataFile import DTDataFile
from datatank_py.DTProgress import DTProgress
from datatank_py.DTSeries import DTSeriesGroup
from datatank_py.DTMesh2D import DTMesh2D
from time import time
import numpy as np

if __name__ == '__main__':

    # dummy input file, just as boilerplate
    input_file = DTDataFile("Input.dtbin")
    COUNT = int(input_file["count"])
    input_file.close()

    start_time = time()

    with DTDataFile("Output.dtbin", truncate=True) as df:

        # Task groups use DTProgress for the progress bar
        progress = DTProgress()

        # Define the group structure using a dictionary, using the variable name as key,
        # and the DataTank type as the value.  This is used to create the file header.
        name_to_type = { "Output Mesh":"2D Mesh", "Output Index":"Real Number" }

        # Create a new DTSeriesGroup instance using that type mapping.  For a task group
        # to be run in DT, we want to use the "Var" name.
        group = DTSeriesGroup(df, "Var", name_to_type)

        # return the step to avoid getting fouled up in computing it
        (x, dx) = np.linspace(-10, 10, 50, retstep=True)
        (y, dy) = np.linspace(-10, 10, 100, retstep=True)
        grid = (np.min(x), np.min(y), dx, dy)
        xx, yy = np.meshgrid(x, y)

        def mesh_function(x, y, p):
            return np.cos(x + p) + np.cos(y + p)

        for idx in xrange(COUNT):
            mesh = mesh_function(xx, yy, idx / 10.)
            group.add(idx / 10., { "Output Mesh":DTMesh2D(mesh, grid=grid), "Output Index":idx })
            progress.update_percentage(float(idx) / COUNT)

        # save execution time, and errors as a string list
        df.write_anonymous([""], "ExecutionErrors")
        df.write_anonymous(time() - start_time, "ExecutionTime")
class datatank_py.DTSeries.DTSeriesGroup(datafile, name, name_to_type)

Base series group class.

__init__(datafile, name, name_to_type)
Parameters:
  • datafile – an empty datatank_py.DTDataFile.DTDataFile instance
  • name – the name of the group
  • name_to_type – a dictionary mapping variable names to DataTank types

This name_to_type dictionary defines the structure of the group:

{ "My Output Array":"Array", "My Scalar Value":"Real Number" }

You can look up the DataTank type names in its PDF help manual, or for compound objects supported in datatank_py, you can use something like:

from datatank_py.DTMesh2D import DTMesh2D
from datatank_py.DTPointCollection2D import DTPointCollection2D
{ "My 2D Mesh":DTMesh2D.dt_type[0], "My Points":DTPointCollection2D.dt_type[0] }
add(time, values)

Add a dictionary of values.

Parameters:
  • time – the time value represented by these values
  • values – dictionary mapping variable name to value

When adding to the group, all variables must be present, or an exception will be raised. The caller is responsible for ensuring that value types must be consistent with the expected data. Compound types (e.g., 2D Mesh) are supported via wrapper objects that implement the dt_write protocol. See DTDataFile documentation for more details.

Example:

group.add(idx / 10., { "Output Mesh":DTMesh2D(mesh, grid=grid), "Output Index":idx })

DTError

datatank_py.DTError.dt_set_log_identifier(ctxt)

Sets the default logging identifier to something useful.

Parameters:ctxt – a string that will usefully identify this log message

Call this before using DTErrorMessage() or other mechanisms; just pass in the basename of your Python script. Since DataTank changes executable names in modules (and perhaps other times), the default of sys.argv[0] isn’t effective in tracking down error messages, as they’re all attributed to “runme” or something similar.

datatank_py.DTError.dt_use_syslog(should_use)

Allows you to copy all messages to syslog.

Parameters:should_use – pass True to use syslog

This can be useful in case your program croaks before results get handed back to DataTank. Currently disabled by default.

datatank_py.DTError.DTErrorMessage(fcn, msg)

Accumulate a message and echo to standard error.

Parameters:
  • fcn – typically a function or module name (pass None to use argv[0])
  • msg – an error or warning message

Typically you call this each time an error or warning should be presented, then call DTSaveError() before exiting. This is aimed exclusively at DataTank module/external program usage.

datatank_py.DTError.DTSaveError(datafile, name='ExecutionErrors')

Save accumulated messages to a file.

Parameters:
  • datafile – a DTDataFile instance, open for writing
  • name – defaults to “ExecutionErrors” for DataTank

This will cause all messages accumulated with DTErrorMessage() to be displayed in DataTank’s Messages panel. This is aimed exclusively at DataTank module/external program usage.

datatank_py.DTError.DTWarningMessage(fcn, msg)

Calls DTErrorMessage(), which is what C++ DTSource does.

DTProgress

class datatank_py.DTProgress.DTProgress

Drive progress indicator for DataTank.

Call update_percentage periodically to get correct progress bar timing during a long-running external program. The implementation creates a file called DTProgress in the current working directory, and DataTank reads that file to update its progress bar.

http://www.visualdatatools.com/phpBB2/viewtopic.php?t=158

update_percentage(percent)

Updates the progress indicator if needed.

Parameters:percent – percentage as floating point value (0 ≤ percent ≤ 1.0)

Only updates the progress file on integral percentage points, so can be called as frequently as needed.