tbp.monty.frameworks.environment_utils#

tbp.monty.frameworks.environment_utils.graph_utils#

get_edge_index(graph, previous_node, new_node)[source]#

Get the edge index between two nodes in a graph.

TODO: There must be an easier way to do this!

Parameters:
  • graph – torch_geometric.data graph

  • previous_node – node ID if the first node in the graph

  • new_node – node ID if the second node in the graph

Returns:

edge ID between the two nodes

tbp.monty.frameworks.environment_utils.server#

class MontyRequestHandler(*args, directory=None, **kwargs)[source]#

Bases: SimpleHTTPRequestHandler

do_PUT()[source]#

tbp.monty.frameworks.environment_utils.transforms#

class AddNoiseToRawDepthImage(agent_id, sigma)[source]#

Bases: object

Add gaussian noise to raw sensory input.

class DepthTo3DLocations(agent_id, sensor_ids, resolutions, zooms=1.0, hfov=90.0, clip_value=0.05, depth_clip_sensors=(), world_coord=True, get_all_points=False, use_semantic_sensor=False)[source]#

Bases: object

Transform semantic and depth observations from 2D into 3D.

Transform semantic and depth observations from camera coordinate (2D) into agent (or world) coordinate (3D).

This transform will add the transformed results as a new observation called “semantic_3d” which will contain the 3d coordinates relative to the agent (or world) with the semantic ID and 3D location of every object observed:

"semantic_3d" : [
#    x-pos      , y-pos     , z-pos      , semantic_id
    [-0.06000001, 1.56666668, -0.30000007, 25.],
    [ 0.06000001, 1.56666668, -0.30000007, 25.],
    [-0.06000001, 1.43333332, -0.30000007, 25.],
    [ 0.06000001, 1.43333332, -0.30000007, 25.]])
]
agent_id#

Agent ID to get observations from

resolution#

Camera resolution (H, W)

zoom#

Camera zoom factor. Defaul 1.0 (no zoom)

hfov#

Camera HFOV, default 90 degrees

semantic_sensor#

Semantic sensor id. Default “semantic”

depth_sensor#

Depth sensor id. Default “depth”

world_coord#

Whether to return 3D locations in world coordinates. If enabled, then __call__() must be called with the agent and sensor states in addition to observations. Default True.

get_all_points#

Whether to return all 3D coordinates or only the ones that land on an object.

depth_clip_sensors#

tuple of sensor indices to which to apply a clipping transform where all values > clip_value are set to clip_value. Empty tuple ~ apply to none of them.

clip_value#

depth parameter for the clipping transform

Warning

This transformation is only valid for pinhole cameras

clip(depth_patch: ndarray, semantic_patch: ndarray) None[source]#

Clip the depth and semantic data that lie beyond a certain depth threshold.

This is currently used for surface agent observations to limit the “reach” of the agent. Pixels beyond the clip value are set to 0 in the semantic patch, and the depth values beyond the clip value (or equal to 0) are set to the clip value.

This function this modifies depth_patch and semantic_patch in-place.

Parameters:
  • depth_patch – depth observations

  • semantic_patch – binary mask indicating on-object locations

get_on_surface_th(depth_patch: ndarray, semantic_patch: ndarray, min_depth_range: Number, default_on_surface_th: Number) Tuple[Number, bool][source]#

Return a depth threshold if we have a bimodal depth distribution.

If the depth values are in a large enough range (> min_depth_range) we may be looking at more than one surface within our patch. This could either be two disjoint surfaces of the object or the object and the background.

To figure out if we have two disjoint sets of depth values we look at the histogram and check for empty bins in the middle. The center of the empty part if the histogram will be defined as the threshold.

If we do have a bimodal depth distribution, we effectively have two surfaces. This could be the mug’s handle vs the mug’s body, or the front lip of a mug vs the rear of the mug. We will use the depth threshold defined above to mask out the surface that is not in the center of the patch.

Parameters:
  • depth_patch – sensor patch observations of depth

  • semantic_patch – binary mask indicating on-object locations

  • min_depth_range – minimum range of depth values to even be considered

  • default_on_surface_th – default threshold to use if no bimodal distribution is found

Returns:

threshold and whether we want to use values above or below threshold

get_surface_from_depth(depth_patch: ndarray, semantic_patch: ndarray, default_on_surface_th: Number) ndarray[source]#

Return surface patch information from heuristics on depth patch.

This function returns a binary mask indicating whether each pixel is “on-surface” using heuristics on the depth patch and the semantic mask. When we say “on-surface”, we mean “on the part of the object that the sensor is centered on”. For example, the sensor may be looking directly at the front lip of a mug, but the back lip of the mug is also in view. While both parts of the mug are on-object, we only want to use the front lip of the mug for later computation (such as point-normal extraction and curvature estimation), and so we want to mask out the back lip of the mug.

Continuing with the the mug front/back lip example, we separate the front and back lips by their depth data. When we generate a histogram of pixel depths, we should see two distincts peaks of the histogram (or 3 peaks if there is a part of the field of view that is off-object entirely). We would then compute which depth threshold separates the two peaks that correspond to the two surfaces, and this is performed by get_on_surface_th. This function uses that value to mask out the pixels that are beyond that threshold. Finally, we element-wise multiply this surface mask by the semantic (i.e., object mask) mask which ensures that off-object observations are also masked out.

Note that we most often don’t have multiple surfaces in view. For example, when exploring a mug, we are most often looking direclty at one locally connected part of the mug, such as a patch on the mug’s cylindrical body. In this case, we don’t attempt to find a surface-separating threshold, and we instead use the default threshold default_on_surface_th. As with a computed threshold, anything beyond it is zeroed out. For surface agents, this threshold is usually quite small (e.g., 5 cm). For distant agents, the threshold is large (e.g., 1000 meters) which is functionally infinite.

Parameters:
  • depth_patch – sensor patch observations of depth

  • semantic_patch – binary mask indicating on-object locations

  • default_on_surface_th – default threshold to use if no bimodal distribution is found

Returns:

sensor patch shaped info about whether each pixel is on surface of not

class GaussianSmoothing(agent_id, sigma=2, kernel_width=3)[source]#

Bases: object

Deals with gaussian noise on the raw depth image.

This transform is designed to deal with gaussian noise on the raw depth image. It remains to be tested whether it will also help with the kind of noise in a real-world depth camera.

conv2d(img, kernel_renorm=False)[source]#

Apply a 2D convolution to the image.

Parameters:
  • img – 2D image to be filtered.

  • kernel_renorm – flag that specifies whether kernel values should be renormalized (based on the number on non-NaN values in image window).

Returns:

filtered version of the input image.

create_kernel()[source]#

Create a normalized gaussian kernel.

Returns:

normalized gaussian kernel. Array of size (kernel_width, kernel_width).

get_padded_img(img, pad_type='edge')[source]#
class MissingToMaxDepth(agent_id, max_depth, threshold=0)[source]#

Bases: object

Return max depth when no mesh is present at a location.

Habitat depth sensors return 0 when no mesh is present at a location. Instead, return max_depth. See: facebookresearch/habitat-sim#1157 for discussion.