Graphr

PUBLIC

United States, Stanford University

Mitglieder

Apoorva Dornadula

ADMIN

United States

Diesem Team beitreten

Teamgalerie

Unzulässige Inhalte melden

Projektübersicht

Understanding an image through the use of scene-graphs is an active area of research. A scene-graph is a graph structure that allows us to understand the interactions and relationships between objects in an image. This graph labels objects as nodes and relationships between objects as edges. Recent state of the art in scene graph generation has poor recall and is an unsolved problem. The most popular dataset used to generate scene graphs, Visual Genome, has incomplete data, thereby causing the ground truth (used to train current models) to be inaccurate. Another challenge scene graph generation poses is of having a large and dynamic state and action space. We implement and experiment with a novel method of scene graph generation originally presented in Deep variation-structured reinforcement learning for visual relationshipand attribute detection by X. Liang et al. from scratch. The code can be found at www.github.com/nexusapoorvacus/DeepVariationStructuredRL. The variation-structured reinforcement learning approach (VRL) first requires a directed semantic action graph (SAG) to be created using the Visual Genome dataset. This graph serves as the action space and encodes nodes as objects, predicates, and attributes. Edges represent relationships between these nodes. For each image, the state vector is comprised of a feature representation of the image, current subject, current object, and a history embedding of past relationships between the subject and object. The action space for each image is the entire SAG, however to reduce the action space, a variation structured traversal scheme is utilized to construct smaller, relevant adaptive action sets for each image. This state and action space is fed into a DQN - one each to predict the relationship, attribute, and next object to explore. This process repeats until all entities of the image are sufficiently explored. We use recall as a metric to compare the performance of different models. The metric used in literature is Recall@50, which computes the fraction of times that the ground truth attribute/relationship exits in the top 50 predictions made by the model, arranged in order of the Q-values of the predicted attributes/relationships. The metric we used is essentially recall@1, which is a stricter metric for comparison. Since we start with ground truth objects and object box proposals, our model has a recall of 1.0 for object detection. For attribute and relationship predictions, the results of our best models are 2.40% and 2.37% respectively . Using the skip thought vectors (which are embeddings of 2 previous relationships predicted by the model) in our state vector greatly improves the performance of our model in terms of predicting relationships. Even though our visualization scores are very low, we observe from our visualizations, that the model actually produces more comprehensive scene graphs that capture the relationships between different objects in the picture. A comparison of the generated scene graph and the ground truth scene graph can be seen in our poster (https://github.com/nexusapoorvacus/DeepVariationStructuredRL/blob/master/poster.pdf) and our technical report (https://drive.google.com/file/d/10y1mYCvm7Q6Y4HLyBAmX2neYFcGwUl9x/view?usp=sharing). Additional visualizations can be seen here: https://docs.google.com/presentation/d/1u3iCKvt7HfOl0jZbz-xczLf1r7KfCx4CmcDwxBRn8_Q/edit?usp=sharing.

Info zum Team

Apoorva is a first year Master’s student at Stanford University studying Computer Science with a specialization in Artificial Intelligence. She is a researcher in the Stanford Vision Lab and works in Prof. Fei-Fei Li’s research group. She is currently working on image understanding (scene graph generation) and on improving the Visual Genome dataset. Some of her academic interests include working on applied computer vision, applied natural language processing, and AI Ethics & Safety. She will be a Machine Intelligence and Research intern at Google this summer. In the past, she has interned at Microsoft and Sandia National Labs working on projects involving cyber security and machine learning, such as threat intelligence correlation. Before coming to Stanford, Apoorva completed her Bachelor of Science degree at the University of California, Berkeley in Electrical Engineering and Computer Science. She worked in Prof. David Wagner’s spear phishing research group. She also held leadership roles in the UC Berkeley chapter of the Society of Women Engineers as well as BERKE1337, UC Berkeley’s Cyber Security club. Aarti is pursuing a master’s degree in Computer Science at Stanford University, with a focus on artificial intelligence. She is working in Prof. Andrew Ng’s lab on developing machine learning algorithms to solve high-impact problems in medicine. She’s also working at one of his start-ups, landing.ai, on applying machine learning to problems in manufacturing. Prior to Stanford, she received bachelor’s degrees in Computer Science and Computer Engineering from New York University, where she led multiple computer science clubs and and the ACM NY Meetup Group. She worked with Prof. David Sontag at NYU (now at MIT) on applications of machine learning to clinical medicine. She spent a summer as a research intern at Microsoft Research, where she worked with John Langford and contributed to Vowpal Wabbit.

Technologien, die wir in unseren Projekten verwenden möchten

Team has no tags set

Soziale Medien

Es sind keine Seiten sozialer Netzwerke verfügbar