tbp.monty.frameworks.environment_utils#

tbp.monty.frameworks.environment_utils.graph_utils#

get_edge_index(graph, previous_node, new_node)[source]#

Get the edge index between two nodes in a graph.

TODO: There must be an easier way to do this!

Parameters:
  • graph – torch_geometric.data graph

  • previous_node – node ID if the first node in the graph

  • new_node – node ID if the second node in the graph

Returns:

edge ID between the two nodes

tbp.monty.frameworks.environment_utils.habitat_utils#

get_bounding_corners(object_ref)[source]#

Determine and return the bounding box of a Habitat object.

Determines and returns the bounding box (defined by a “max” and “min” corner) of a Habitat object (such as a mug), given in world coordinates.

Specifically uses the “axis-aligned bounding box” (aabb) available in Habitat; this is a bounding box aligned with the axes of the co-oridante system, which tends to be computationally efficient to retrieve.

Parameters:

object_ref – the Habitat object instance

Returns:

min_corner and max_corner, the defining corners of the bounding box

Return type:

Two np.arrays

tbp.monty.frameworks.environment_utils.server#

class MontyRequestHandler(*args, directory=None, **kwargs)[source]#

Bases: SimpleHTTPRequestHandler

do_PUT()[source]#

tbp.monty.frameworks.environment_utils.transforms#

class AddNoiseToRawDepthImage(agent_id, sigma)[source]#

Bases: object

Add gaussian noise to raw sensory input.

class DepthTo3DLocations(agent_id, sensor_ids, resolutions, zooms=1.0, hfov=90.0, clip_value=0.05, depth_clip_sensors=(), world_coord=True, get_all_points=False, use_semantic_sensor=True)[source]#

Bases: object

Transform semantic and depth observations from 2D into 3D.

Transform semantic and depth observations from camera coordinate (2D) into agent (or world) coordinate (3D).

This transform will add the transformed results as a new observation called “semantic_3d” which will contain the 3d coordinates relative to the agent (or world) with the semantic ID and 3D location of every object observed:

"semantic_3d" : [
#    x-pos      , y-pos     , z-pos      , semantic_id
    [-0.06000001, 1.56666668, -0.30000007, 25.],
    [ 0.06000001, 1.56666668, -0.30000007, 25.],
    [-0.06000001, 1.43333332, -0.30000007, 25.],
    [ 0.06000001, 1.43333332, -0.30000007, 25.]])
]
agent_id#

Agent ID to get observations from

resolution#

Camera resolution (H, W)

zoom#

Camera zoom factor. Defaul 1.0 (no zoom)

hfov#

Camera HFOV, default 90 degrees

semantic_sensor#

Semantic sensor id. Default “semantic”

depth_sensor#

Depth sensor id. Default “depth”

world_coord#

Whether to return 3D locations in world coordinates. If enabled, then __call__() must be called with the agent and sensor states in addition to observations. Default True.

get_all_points#

Whether to return all 3D coordinates or only the ones that land on an object.

depth_clip_sensors#

tuple of sensor indices to which to apply a clipping transform where all values > clip_value are set to clip_value. Empty tuple ~ apply to none of them.

clip_value#

depth parameter for the clipping transform

Warning

This transformation is only valid for pinhole cameras

clip(agent_obs)[source]#

Clip the depth and semantic data that lie beyond a certain depth threshold.

Set the values of 0 (infinite depth) to the clip value.

get_on_surface_th(depth_patch, min_depth_range)[source]#

Return a depth threshold if we have a bimodal depth distribution.

If the depth values are in a large enough range (> min_depth_range) we may be looking at more than one surface within our patch. This could either be two disjoint surfaces of the object or the object and the background.

To figure out if we have two disjoint sets of depth values we look at the histogram and check for empty bins in the middle. The center of the empty part if the histogram will be defined as the threshold.

Next, we want to check if we should use the depth values above or below the threshold. Currently this is done by looking which side of the distribution is larger (occupies more space in the patch). Alternatively we could check which side the depth at the center of the patch is on. I’m not sure what would be better.

Lastly, if we do decide to use the depth points that are further away, we need to make sure they are not the points that are off the object. Currently this is just done with a simple heuristic (depth difference < 0.1) but in the future we will probably have to find a better solution for this.

Parameters:
  • depth_patch – sensor patch observations of depth

  • min_depth_range – minimum range of depth values to even be considered

Returns:

threshold and whether we want to use values above or below threshold

get_semantic_from_depth(depth_patch)[source]#

Return semantic patch information from heuristics on depth patch.

Parameters:

depth_patch – sensor patch observations of depth

Returns:

sensor patch shaped info about whether each pixel is on surface of not

class GaussianSmoothing(agent_id, sigma=2, kernel_width=3)[source]#

Bases: object

Deals with gaussian noise on the raw depth image.

This transform is designed to deal with gaussian noise on the raw depth image. It remains to be tested whether it will also help with the kind of noise in a real-world depth camera.

conv2d(img, kernel_renorm=False)[source]#

Apply a 2D convolution to the image.

Parameters:
  • img – 2D image to be filtered.

  • kernel_renorm – flag that specifies whether kernel values should be renormalized (based on the number on non-NaN values in image window).

Returns:

filtered version of the input image.

create_kernel()[source]#

Create a normalized gaussian kernel.

Returns:

normalized gaussian kernel. Array of size (kernel_width, kernel_width).

get_padded_img(img, pad_type='edge')[source]#
class MissingToMaxDepth(agent_id, max_depth, threshold=0)[source]#

Bases: object

Return max depth when no mesh is present at a location.

Habitat depth sensors return 0 when no mesh is present at a location. Instead, return max_depth. See: facebookresearch/habitat-sim#1157 for discussion.