tbp.monty.frameworks.environment_utils#
tbp.monty.frameworks.environment_utils.graph_utils#
- get_edge_index(graph, previous_node, new_node)[source]#
Get the edge index between two nodes in a graph.
TODO: There must be an easier way to do this!
- Parameters:
graph – torch_geometric.data graph
previous_node – node ID if the first node in the graph
new_node – node ID if the second node in the graph
- Returns:
edge ID between the two nodes
tbp.monty.frameworks.environment_utils.habitat_utils#
- get_bounding_corners(object_ref)[source]#
Determine and return the bounding box of a Habitat object.
Determines and returns the bounding box (defined by a “max” and “min” corner) of a Habitat object (such as a mug), given in world coordinates.
Specifically uses the “axis-aligned bounding box” (aabb) available in Habitat; this is a bounding box aligned with the axes of the co-oridante system, which tends to be computationally efficient to retrieve.
- Parameters:
object_ref – the Habitat object instance
- Returns:
min_corner and max_corner, the defining corners of the bounding box
- Return type:
Two np.arrays
tbp.monty.frameworks.environment_utils.server#
- class MontyRequestHandler(*args, directory=None, **kwargs)[source]#
Bases:
SimpleHTTPRequestHandler
tbp.monty.frameworks.environment_utils.transforms#
- class AddNoiseToRawDepthImage(agent_id, sigma)[source]#
Bases:
object
Add gaussian noise to raw sensory input.
- class DepthTo3DLocations(agent_id, sensor_ids, resolutions, zooms=1.0, hfov=90.0, clip_value=0.05, depth_clip_sensors=(), world_coord=True, get_all_points=False, use_semantic_sensor=True)[source]#
Bases:
object
Transform semantic and depth observations from 2D into 3D.
Transform semantic and depth observations from camera coordinate (2D) into agent (or world) coordinate (3D).
This transform will add the transformed results as a new observation called “semantic_3d” which will contain the 3d coordinates relative to the agent (or world) with the semantic ID and 3D location of every object observed:
"semantic_3d" : [ # x-pos , y-pos , z-pos , semantic_id [-0.06000001, 1.56666668, -0.30000007, 25.], [ 0.06000001, 1.56666668, -0.30000007, 25.], [-0.06000001, 1.43333332, -0.30000007, 25.], [ 0.06000001, 1.43333332, -0.30000007, 25.]]) ]
- agent_id#
Agent ID to get observations from
- resolution#
Camera resolution (H, W)
- zoom#
Camera zoom factor. Defaul 1.0 (no zoom)
- hfov#
Camera HFOV, default 90 degrees
- semantic_sensor#
Semantic sensor id. Default “semantic”
- depth_sensor#
Depth sensor id. Default “depth”
- world_coord#
Whether to return 3D locations in world coordinates. If enabled, then
__call__()
must be called with the agent and sensor states in addition to observations. Default True.
- get_all_points#
Whether to return all 3D coordinates or only the ones that land on an object.
- depth_clip_sensors#
tuple of sensor indices to which to apply a clipping transform where all values > clip_value are set to clip_value. Empty tuple ~ apply to none of them.
- clip_value#
depth parameter for the clipping transform
Warning
This transformation is only valid for pinhole cameras
- clip(agent_obs)[source]#
Clip the depth and semantic data that lie beyond a certain depth threshold.
Set the values of 0 (infinite depth) to the clip value.
- get_on_surface_th(depth_patch, min_depth_range)[source]#
Return a depth threshold if we have a bimodal depth distribution.
If the depth values are in a large enough range (> min_depth_range) we may be looking at more than one surface within our patch. This could either be two disjoint surfaces of the object or the object and the background.
To figure out if we have two disjoint sets of depth values we look at the histogram and check for empty bins in the middle. The center of the empty part if the histogram will be defined as the threshold.
Next, we want to check if we should use the depth values above or below the threshold. Currently this is done by looking which side of the distribution is larger (occupies more space in the patch). Alternatively we could check which side the depth at the center of the patch is on. I’m not sure what would be better.
Lastly, if we do decide to use the depth points that are further away, we need to make sure they are not the points that are off the object. Currently this is just done with a simple heuristic (depth difference < 0.1) but in the future we will probably have to find a better solution for this.
- Parameters:
depth_patch – sensor patch observations of depth
min_depth_range – minimum range of depth values to even be considered
- Returns:
threshold and whether we want to use values above or below threshold
- class GaussianSmoothing(agent_id, sigma=2, kernel_width=3)[source]#
Bases:
object
Deals with gaussian noise on the raw depth image.
This transform is designed to deal with gaussian noise on the raw depth image. It remains to be tested whether it will also help with the kind of noise in a real-world depth camera.
- conv2d(img, kernel_renorm=False)[source]#
Apply a 2D convolution to the image.
- Parameters:
img – 2D image to be filtered.
kernel_renorm – flag that specifies whether kernel values should be renormalized (based on the number on non-NaN values in image window).
- Returns:
filtered version of the input image.
- class MissingToMaxDepth(agent_id, max_depth, threshold=0)[source]#
Bases:
object
Return max depth when no mesh is present at a location.
Habitat depth sensors return 0 when no mesh is present at a location. Instead, return max_depth. See: facebookresearch/habitat-sim#1157 for discussion.