Skip to content. Skip to main navigation.

Masters Thesis Defenses

Past Defenses

Learning Perception to Action Mapping for Functional Imitation
Monday, November 21, 2016
Bhupender Singh

Read More

Hide

Abstract: Imitation leaning is the learning of advanced behavior whereby an agent observes and acquires a skill by observing another's behavior while performing the same skill. The main objective of imitation learning is to make robots usable for a variety of tasks without programming them but by simply demonstrating new tasks. The power of this approach arises since end users of such robots will frequently not know how to program the robot, might not understand the dynamics and behavioral capabilities of the system and might not know how to program these robots to get different/new tasks done. Some challenges in achieving imitation capabilities exist, include the difference in state space where the robot observes demonstrations of task in terms of a different features compared to the ones describing the space in which it acts. The proposed approach to imitation learning in this thesis allows a robot to learn new tasks just by observing someone doing that task. For achieving this, the robot system uses two models. The first is an Internal model which represents all behavioral capabilities of the robot and consists of all possible states, actions, and the effects of executing the actions. The second is a demonstration model which represents the perception of the task demonstration and is a continuous time, discrete event model consisting of a stream of state behavior sequences. Examples of perceived behavior can include a rolling behavior or a falling behavior of objects etc. The approach proposed here then learns the similarity between states of the internal model and the states of demonstrated model using a neural network function approximator and reinforcement learning with a reward feedback signal provided by the demonstrator. Using this similarity function, a heuristic search algorithm is used to find the action sequence that leads to the execution state sequence that is most similar to the observed task demonstrations. In this way, a robot learns to map its internal states to the sequence of observed states, yielding a policy for performing the corresponding task.

Predicting Human Behavior Based on Survey Response Patterns Using Markov and Hidden Markov Model
Monday, November 21, 2016
Arun Kumar Pokharna

Read More

Hide

Abstract: With technological advancements, reaching out to people for information gathering has become trivial. Among several ways, surveys are one of the most commonly used way of collecting information from people. Given a specific objective, multiple surveys are conducted to collect various pieces of information. This collected information in the form of survey responses can be categorical values or a descriptive text that represents information regarding the survey question. If additional details regarding behavior, events, or outcomes is available, machine learning and prediction modeling can be used to predict these events from the survey data, potentially permitting to automatically trigger interventions or preventive actions that can potentially prevent detrimental events or outcomes from occurring.

The proposed approach in this research predicts human behavior based on their responses to various surveys that are administered automatically using an interactive computer system. This approach is applied to a typical classroom scenario where students are asked to periodically fill out a questionnaire about their performance before and after class milestones such as exams, projects, and homeworks. Data collection for this experiment is performed by using Teleherence, a web-phone-computer based survey application. Data collected through Teleherence is then used to learn a predictive model. The approach for this developed in this research is using clustering to find similarities between different students' responses and a prediction model for their behavior based on Markov and Hidden Markov model.

CELL SEGMENTATION IN CANCER HISTOPATHOLOGY IMAGES USING CONVOLUTIONAL NEURAL NETWORKS
Friday, November 18, 2016
Viswanathan Kavassery Rajalingam

Read More

Hide

Abstract: Cancer, the second most dreadful disease causing large scale deaths in humans is characterized by uncontrolled growth of cells in the human body and the ability of those cells to migrate from the original site and spread to distant sites. The major proportion of deaths in cancer is due to improper primary diagnosis that raises the need for Computer Aided Diagnosis (CAD). Digital Pathology, a technique of CAD acts as second set of eyes to radiologists in delivering expert level preliminary diagnosis for cancer patients. With the advent of imaging technology data acquisition step in digital pathology yields high fidelity / high throughput Whole Slide Images (WSI) using advanced scanners and increased patient safety. Cell segmentation is a challenging step in digital pathology that identifies cell regions from micro-slide images and is fundamental for further process like classifying sub-type of tumors or survival prediction. Current techniques of cell segmentation rely on hand crafted features that are dependable on factors like image intensity, shape features, etc. Such computer vision based approaches have two main drawbacks: 1) these techniques might require several manual parameters to be set for accurate segmentation that puts burden on the radiologists. 2) Techniques based on shape or morphological features cannot be generalized as different types of cancer cells are highly asymmetric and irregular.

In this thesis Convolutional Networks, a supervised learning technique recently gaining attention in the field of machine learning for all vision perception tasks is investigated to perform end-to-end automated cell segmentation. Three popular convolutional network models namely U-NET, SEGNET and FCN are chosen and transformed to accomplish cell segmentation and the results are analyzed. A predicament in applying supervised learning models to cell segmentation is the requirement of huge labeled dataset for training our network models. To surmount the absence of labeled data set for cancer cell segmentation task, a simple labeling tool called SMILE-Annotate was developed to easily mark and label multiple cells in image patches in lung cancer histopathology images. Also, an open source crowd sourced based labeled dataset for cell segmentation from Beck Labs; Harvard University is used to lay empirical evaluations for automated cell segmentation using convolution network models. The result from experiments indicates SEG-NET to be most effectively performing architecture for cell segmentation and also proves it has scope to generalize between different datasets only with minimum efforts involved.

CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS FOR PEDESTRIAN DETECTION
Friday, November 18, 2016
Vivek Arvind Balaji

Read More

Hide

Abstract: Pedestrian Detection in real time has become an interesting and a challenging problem lately. With the advent of autonomous vehicles and intelligent traffic monitoring systems, more time and money are being invested into detecting and locating pedestrians for their safety and towards achieving complete autonomy in vehicles. For the task of pedestrian detection, Convolutional Neural Networks (ConvNets) have been very promising over the past decade. ConvNets have a typical feed-forward structure and they share many properties with the visual system of the human brain. On the other hand, Recurrent Neural Networks (RNNs) are emerging as an important technique for image based detection problems and they are more closely related to the visual system due to their recurrent connections. Detecting pedestrians in a real time environment is a task where sequence is very important and it is intriguing to see how ConvNets and RNNs handle this task. This thesis hopes to make a detailed comparison between ConvNets and RNNs for pedestrian detection, how both these techniques perform on sequential pedestrian data, their scopes of research and what are their advantages and disadvantages. The comparison is done on two benchmark datasets - TUD-Brussels and ETH Pedestrian Datasets and a comprehensive evaluation is presented to see how research on these topics can be taken forward.

MAVROOMIE: AN END-TO-END ARCHITECTURE FOR FINDING COMPATIBLE ROOMMATES BASED ON USER PREFERENCES
Friday, November 18, 2016
Vijendra Kumar Bhogadi

Read More

Hide

Abstract: Team Formation is widely studied in literature as a method for forming teams or groups under certain constraints. However, very few works address the aspect of collaboration while forming groups under certain constraints. Motivated by the collaborative team formation, we try to extend the problem of team formation to a general problem in the real world scenario of finding compatible roommates to share a place. There are numerous applications like "roommates.com" ,"roomiematch.com" , "Roomi", "rumi.io", which try to find roommates based on geographical and cost factors and ignore the important human factors which can play a substantial role in finding a potential roommate or roommates. We introduce "MavRoomie", an android application for finding potential roommates by leveraging the techniques of collaborative team formation in order to provide a dedicated platform for finding suitable roommates and apartments. Given a set of users, with detailed profile information, preferences, geographical and budget constraints, our goal is to present an end-to-end system for finding a cohesive group of roommates from the perspective of both the renters and homeowner. MavRoomie allows users to give their preferences and budgets which are incorporated into our algorithms in order to provide a meaningful set of roommates. The strategy followed here is similar to the Collaborative Crowdsourcing's strategy of finding a group of workers with maximized affinity and satisfying the cost and skill constraints of a task.

Searching and Classifying Mobile Application Screenshots
Friday, November 18, 2016
Adis Kovacevic

Read More

Hide

Abstract: This paper proposes a technique that would allow the searching and classifying of Mobile Application screenshots based on the layout of the content, the category of the application, and the text in the image. It was originally conceived to support REMAUI (Reverse Engineering Mobile Application User interfaces), an active research project headed up by Dr. Csallner. REMAUI has the ability to automatically reverse engineer the User Interface layer of an application by being given input Images. The long term goal of this work is to create a full search framework for any UI image. In this paper, we introduced the first steps to this framework by focusing on Mobile UI screenshots, various techniques to classifying the layout of the image, classifying the content, and creating the first API using an Apache Solr Search server and a MySQL database. We discuss 3 techniques to classifying the layout of the UI image and evaluate the results. We continue on to discuss a method to classify the category of the application, and put all the information together in a single REST API. The input images are searchable by the image content and filtered by type and layout. The results are ranked by Solr for relevance and returned as json by the API.

How to Extract and Model Useful Information from Videos for Supporting Continuous Queries
Thursday, November 17, 2016
Manish Kumar Annappa

Read More

Hide

Abstract: Automating video stream processing for inferring situations of interest from video contents has been an ongoing challenge. This problem is currently exacerbated by the volume of surveillance/monitoring videos generated. Currently, manual or context-based customized techniques are used for this purpose. To the best to our knowledge, earlier work in this area use a custom query language to extract data and infer simple situations from the video streams, thus adding the burden of learning the query language. Therefore, the long-term objective of this work is to develop a framework that extracts data from video streams to generate a data representation that can be queried using an extended non-procedural language such as SQL or CQL. Taking a step in that direction, this thesis focuses on pre-processing videos to extract the needed information from each frame. It elaborates on algorithms and experimental results for extracting objects, their features (location, bounding box, and feature vectors), and their identification across frames, along with converting all that information into an expressive data model. Pre-processing of video streams to extract queryable representation involves tuning a number of context-based preprocessing parameters which are dependent on the type of video streams and the type of objects present in them. In the absence of proper starting values, exhaustive set of experiments to determine optimal values for these parameters is unavoidable. Additionally, this thesis introduces techniques of choosing the starting values of these parameters to reduce exhaustive experimentation.

EVALUATION OF HTML TAG SUSCEPTIBILITY TO STATISTICAL FINGERPRINTING FOR USE IN CENSORSHIP EVASION
Thursday, November 17, 2016
Kelly Scott French

Read More

Hide

Abstract: The ability to speak freely has always been a source of conflict between rulers and the people over which they exert power. This conflict usually takes the form of State-sponsored censorship with occasional instances of commercial efforts typically to silence criticism or squelch dissent, and people's efforts to evade such censorship. This is even more so evident in the current environment with its ever-growing number of communication technologies and platforms available to individuals around the world. If the face of efforts to control communication before it is posted or to prevent the discovery of information that exists outside of the control of the authorities, users attempt to slip their messages past the censor's gaze by using keyword replacement. These methods are effective but only as long as those synonyms are not identified. Once the new usage is discovered it is a simple matter to add the new term to the list of black-listed words. While various methods can be used to create mappings between blocked words and their replacements, the difficulty is doing so in a way that makes it clear to a human reader how to perform the mapping in reverse while maintaining readability but without attracting undue attention from systems enforcing the censor's rules and policies. One technique, presented in a related article, considers the use of HTML tags as way to provide a such a replacement method. By using HTML tags related to how text is displayed on the page in can both indicate that the replacement is happening and also provide a legend for mapping the term in the page to one intended by the author. It is a given that a human reader will easily detect this scheme. If a malicious reader is shown the page generated using this method the attempt at evading the censor's rules will be obvious. A potential weakness in this approach is if the tool that generates the replacement uses a small set of HTML tags to effect the censorship evasion but in doing so changes the frequency of those tags appear on the page so that the page stands out and can be flagged by software algorithms for human examination. In this paper we examine the feasibility of using tag frequency as a way to distinguish blog posts needing more attention, examining the means of data collection, the scale of processing required, and the quality of the resulting analysis for detecting deviation from average tag-usage patterns of pages.

A Unified Cloud Solution to Manage Heterogeneous Clouds
Tuesday, November 15, 2016
Shraddha Jain

Read More

Hide

Abstract: Cloud environments are built on virtualization platforms which offer scalability, on-demand pricing, high performance, elasticity, easy accessibility of the resources and cost efficient services. Most of the small and large businesses use cloud computing to take advantage of these features. The usage of the cloud resources depends on the requirements of the organizations. With the advent of cloud computing, the traditional way of handling machines by the IT professionals has decreased to some extent. However, it leads to wastage of resources due to inadequate monitoring and improper management of resources. Many a time it happens that the cloud resources once deployed are forgotten and they stay up running until someone manually intervenes to shut them down. This results in continuous consumption of the resources and incurs costs which is known as Cloud Sprawling. Many organizations use resources provided by multiple cloud providers and maintains multiple accounts on them. The problem of cloud sprawling proliferates when there are multiple accounts on different cloud providers are not managed properly.

In this thesis, a solution to overcome the problem of cloud sprawling is presented. A unified console to monitor and manage all the resources such as compute instances, storage, etc. deployed on multiple cloud providers is provided. This console provides the details of the resources in use and an ability to manage them without logging into the different accounts they belong to. Moreover, a provision to schedule tasks is provided to handle multiple tasks at a time from the scheduling tasks panel. This way the resources can be queued to run at a specific time and can also be torn down at a scheduled time, thus the resources are not left unattended. Before terminating, a facility to archive files, directories on virtual machines is also provided which can be done across the storage services offered by both IaaS and SaaS providers. A notification system helps in notifying the user about the activities of the active resources thus helping enterprises in saving on the costs.

Heterogeneous Cloud Application Migration using PaaS
Tuesday, November 15, 2016
Mayank Jain

Read More

Hide

Abstract: With the evolution of cloud service providers offering numerous services like SaaS, IaaS, PaaS, options for enterprises to choose the best set of services under optimal costs have also increased. The migration of web applications across these heterogeneous platforms comes with ample of options to choose from, providing users the flexibility to choose best options suiting their requirements. This process of migration must be automated ensuring the security, performance, availability keeping the cost to be optimal while moving the application from one platform to another. A multi-tier web application will have many dependencies like, Application Environment, Data Storage, Platform Configurations which may or may not be supported by all the cloud providers.

Through this research, an automated cloud-based framework to migrate single or multi-tier web applications across heterogeneous cloud platforms is presented. This research discusses the migration of applications between two public cloud providers namely Heroku and AWS (Amazon Web Services). Observations on various configurations required by a web application to run on Heroku and AWS cloud platforms have been discussed. This research will show, how using these configurations a generic web application can be developed which can seamlessly work across multiple cloud platforms.

Finally, this paper shows the different experiments conducted on the migrated applications, considering the factors like scalability, availability, elasticity and data migration. Application performance was tested on both the AWS and Heroku platforms, measuring the application creation, deployment, database creation, migration and mapping times.

MACHINE LEARNING BASED DATACENTER MONITORING FRAMEWORK
Friday, November 11, 2016
Ravneet Singh Sidhu

Read More

Hide

Abstract: Monitoring the health of large data centers is a major concern with the ever-increasing demand of grid/cloud computing and the higher need of computational power. In a High Performance Computing (HPC) environment, the need to maintain high availability makes monitoring tasks and hardware more daunting and demanding. As data centers grow it becomes hard to manage the complex interactions between different systems. Many open source systems have been implemented which give specific state of any individual machine using Nagios, Ganglia or Torque monitoring software.

In this work we focus on the detection and prediction of data center anomalies by using machine learning based approach. We present the idea of using monitoring data from multiple monitoring solutions and formulating a single high dimensional vector based model, which further is fed into a machine-learning algorithm. In this approach we will find patterns and associations among the different attributes of a data center, which remain hidden in the single system context. The use of disparate monitoring systems in a conjunction will give a holistic view of the cluster with an increase in the probability of finding critical issues before they occur as well as alert the system administrator.

Improving Memorization and Long Term Recall of System Assigned Passwords
Friday, November 11, 2016
Jayesh Doolani

Read More

Hide

Abstract: Systems assigned passwords have guaranteed robustness against guessing attacks, but they are hard to memorize. To make system-assigned passwords more usable, it is of prime importance that systems that assign random passwords also assist users with memorization and recall. In this work, we have designed a novel technique that employs rote memorization in form of an engaging game, which is played during the account registration process. Based on prior work on chunking, we break a password into 3 equal chunks, and then the game helps plant those chunks in memory. We present the findings of 17-participant user study, where we explored the usability of 9 characters long pronounceable system assigned passwords. Results of the study indicate that our system was effective in training users to memorize the random password at an average registration time of 6 minutes but the long-term recall rate of 71.4% did not match our expectation. On thorough evaluation of the system and results, we identified potential areas of improvement and present a modified system design to improve the long-term recall rate.

INTEGRATION OF APACHE MRQL QUERY LANGUAGE WITH APACHE STORM REALTIME COMPUTATIONAL SYSTEM.
Thursday, November 10, 2016
Achyut Paudel

Read More

Hide

Abstract: The use of real time processing of data has increased in recent years with the increase of data captured by social media platforms, IOT and other big data applications. The processing of data in real time has been an important aspect of the day from finding the trends over the internet to fraud detection of in the banking transactions. Finding relevant information from large amount of data has always been a difficult problem to solve. MRQL is a query language that can be used on top of different big data platform such as Apache Hadoop, Flink, Hama, and Spark that enables the professionals with Database query knowledge to write queries to run programs on top of these computational systems.

In this work, we have tried to integrate the MRQL query language with a new real time big data computational system called Apache storm. This system was developed by twitter to analyze the trending topics in the social media and is widely used in industry today. The query written in MRQL is converted into a physical plan that involves execution of different functions such as Map Reduce, Aggregation etc. which has to be executed by the platform in its own execution plan. The implementation of Map Reduce has been done in this work for Storm which covers execution for important physical plans of query such as Select and Group By. The implementation of Map Reduce is also an important a part in every big data processing platform. This project will be the starting point in implementation of the MRQL for Apache Storm and the implementation can be extended to support various query plans involving Map Reduce.

REMOTE PATIENT MONITORING USING HEALTH BANDS WITH ACTIVITY LEVEL PRESCRIPTION
Thursday, November 03, 2016
PRANAY SHIROLKAR

Read More

Hide

Abstract: With the advent of new commercially available consumer grade fitness and health devices, it is now possible and very frequent for users to obtain, store, share and learn about some of their important physiological metrics such as steps taken, heart rate, quality of sleep, skin temperature. For devices with this wearable technology, it is common to find these sensors embedded in a smart watch, or dedicated bands, etc. such that among other functionalities of a wearable device, it is capable of smartly assisting users about their activity levels by leveraging the fact that these devices can be, and are typically, worn by people for prolonged periods of time.

This new connected wearable technology thus has a great potential for physicians to be able to monitor and regulate their patients' activity levels. There exist a lot of software applications and complex Wireless Body Area Network (WBAN) based solutions for remote patient monitoring but what has been lacking is a solution for physicians, especially exercise physiologists to be able to automate and convey appropriate training levels and feedback in a seamless manner. This thesis proposes a software framework that enables users to know their prescribed level of exercise intensity level and then record their exercise session and securely transmitting it wirelessly to a centralized data-store from where physiologists will have access to it.

Linchpin: A YAML template based Cross Cloud resource provisioning tool
Wednesday, October 26, 2016
Samvaran Kashyap Rallabandi

Read More

Hide

Abstract: A cloud application developed, will have a specific requirement of particular cloud resources and software stack to be deployed to make it run. Resource template enables the environment design and deployment required for an application. A template describes the infrastructure of the cloud application in a text file which includes servers, floating/public IP, storage volumes, etc. This approach is termed as "Infrastructure as a code." In Amazon public cloud, OpenStack private cloud, google cloud these templates are called as cloud formation templates, HOT(Heat orchestration templates), Google cloud templates respectively. Though the existing template systems give a flexibility for the end user to define the multiple resources, they are limited to the provision in single cloud provider with a unique set of cloud credentials at a time. Due to this reason, vendor lock arises for the service consumer.

The current thesis addresses the vendor lock-in problem by proposing a framework design and implementation of provisioning of the resources in the cross-cloud environments with YAML templates known as "Linchpin." Linchpin also takes a similar Infrastructure as code approach, where the full requirements of the users are manifested into a predefined YAML structure, which is parsed by underlying configuration and deployment tool known as Ansible to delegate the provisioning to the cloud APIs. Current framework not only solves the vendor lock-in issue also enable the user to do cross-cloud deployments of the application. In this thesis a comparative study among the existing template-based orchestration frameworks with linchpin on provisioning time of the virtual machines. Further, it also Illustrates a novel way to generate Ansible based inventory files for post provisioning activities such as the installation of software and configuring them.

LILAC - The Second Generation Lightweight Lowlatency Anonymous chat
Tuesday, July 26, 2016
Revanth Pobala

Read More

Hide

Abstract: Instant messaging is one of the most used modes of communication and there are many instant messaging systems available online. Studies from Electronic Frontier Foundation show that there are only a few Instant messengers that keep your messages safe by providing security and limited anonymity. Lilac, a LIghtweight Low-latency Anonymous Chat, is a secure instant messenger that provides security as well as better anonymity to the users as compared to other messengers. It is a browser-based instant messaging system that uses Tor like model to protect the user anonymity. As compared to existing messengers, LILAC protects the users from traffic analysis by implementing cover traffic. It is built on OTR (Off the Record) messaging to provide forward secrecy and implements Socialist Millionaire Protocol to guarantee the user authenticity. Unlike other existing instant messaging systems, it uses pseudonyms to protect the user anonymity. Being a browser-based web application, it does not require any installation and it leaves no footprints to trace. It provides user to store contact details in a secure way, by an option to download the contacts in an encrypted file. This encrypted file can be used to restore the contacts later. In our experimentation with Lilac, we found the Round Trip Time (RTT) for a message is around 3.5 seconds which is great for a messenger that provides security and anonymity. Lilac is readily deployable on different and multiple servers. In this document, we provide in-depth details about the design, development, and results of LILAC.

EVALUATE THE USE OF FPGA SoC FOR REAL-TIME DATA ACQUISITION AND AGGREGATE MICRO-TEXTURE MEASUREMENT USING LASER SENSORS.
Monday, July 18, 2016
Mudit Pradhan

Read More

Hide

Abstract: Aggregate texture has been found to play an important role in improving the longevity of highways and pavements. Aggregates with appropriate surface roughness level have an improved bonding with asphalt binder and concrete mixture to produce a more durable road surface. Macro-texture has been found to effect certain other important features of the road surface for example, the skid resistance, flow of water on the surface and noise of the tyres on road. However, more research need to done to access the impact of surface texture at micro-meter level. Accurate measurement of the micro-texture at high resolution and in real-time is a challenging task. In the first part, this thesis work presents a proof of concept for a laser based micro-texture measurement equipment capable of measuring texture at 0.2 micro-meter resolution, supporting a maximum sampling rate of up to 100 KHz with a precision motion control for aggregate movement at a step size of 0.1 micro-meter. In the second part, usability of field programmable gateway array (FPGA) System on chip has been evaluated against the need for high speed real time data acquisition and high performance computing to accurately measure micro-texture. Hardware architecture is designed to efficiently leverage the capabilities of FPGA fabric. Software is implemented for dedicated multi-softcore operation, concurrently utilizing the capabilities of the on-board ARM Cortex A9 applications processor for real-time processing needs and a high throughput Ethernet communication model for remote data storage. Evaluation results are presented based on effective use of FPGA fabric in terms of data acquisition, processing needs and accuracy of the desired measurement equipment.

ADWIRE: Add-on for Web Item Reviewing System
Monday, April 25, 2016
Rajeshkumar Ganesh Kannapalli

Read More

Hide

Abstract: Past few decades have seen a widespread use and popularity of online review sites such as Yelp, TripAdvisor, etc. As many users depend upon reviews before deciding upon a product, businesses of all types are motivated to possess an expansive arsenal of user feedback (preferably positive) in order to mark their reputation and presence in the Web (e.g., Amazon customer reviews). In spite of the fact that a huge extent of buying choices today are driven by numeric scores (e.g., movie rating in IMDB), detailed reviews play an important role for activities like purchasing an expensive mobile phone, DSLR camera, etc. Since writing a detailed review for an item is usually time-consuming and offers no incentive, the number of reviews available in the Web is far from many. Moreover, the available corpus of text contains spam, misleading content, typographical and grammatical errors, etc., which further shrink the text corpus available to make informed decisions. In this thesis, we build an novice system AD-WIRE which simplifies the user`s task of composing a review for an online item. Given an item, the system provides a top-k meaningful phrases/tags which the user can connect with and provide reviews easily. Our system works on three measures relevance, coverage and polarity, which together form a general-constrained optimization problem. AD-WIRE also visualizes the dependency of tags to different aspects of an item, so that user can make an informed decision quickly. The current system is built to explore review writing process for mobile phones. The dataset is crawled from GSMAreana.com and Amazon.com.

ROBOTICS CURRICULUM FOR EDUCATION IN ARLINGOTN: Experiential, Simple and Engaging learning opportunity for low-income K-12 students
Monday, April 25, 2016
Sharath Vasanthakumar

Read More

Hide

Abstract: Engineering disciplines (such as biomedical, civil, computer science, electrical, mechanical) are instrumental to society’s wellbeing and technological competitiveness; however the interest of K-12 American students in these and other engineering fields is fading. To broaden the base of engineers for the future, it is critical to excite young minds about STEM. Research that is easily visible to K-12 students, including underserved and minority population with limited access to technology, is crucial in igniting their interests in STEM fields. More specifically, research topics that involve interactive elements such as Robots may be instrumental for K-12 education in and outside classroom. Robots have always fascinated mankind. Indeed, the idea of infusing life and skills into a human-made automatic artefact has inspired for centuries the imagination of many, and led to creative works in areas such as art, music, science, engineering, just to name a few. Furthermore, major technological advancements with associated societal improvements have been done in the past century because of robotics and automation. Assistive technology deals with the study, design, and development of devices (and robots are certainly among them!) to be used for improving one’s life. Imagine for example how robots could be used to search for survivals in a disaster’s area. Another example is the adoption of nurse robots to assist people with handicap during daily-life activities, e.g., to serve food or lift a patient from the bed to position him/her on a wheelchair. The idea of assistive technology is at the core of our piloting Technology Education Academy. We believe kids will be intrigued by the possibility to create their own assistive robot prototype, and to make it work in a scenario that resembles activities of daily life. However, it is not enough to provide students with the equipment necessary since they might also easily lose interest due to the technical challenges in creating the robots and in programming them. In fact, achieving these goals requires a student to handle problem-solving skills as well as knowledge of basic principles of mechanics and computer programming. The Technology Education Academy has brought UT Arlington, the AISD and the Arlington Public Library together to inspire young students in the East Arlington area to Assistive Technology, and provide them easy-to-use tools, an advanced educational curriculum, and mentorship to nurture their skills in problem solving and introduce them to mechanics and computer programming.

LOCALIZATION AND CONTROL OF DISTRIBUTED MOBILE ROBOTS WITH THE MICROSOFT KINECT AND STARL
Friday, April 22, 2016
Nathan Hervey

Read More

Hide

Abstract: With the increasing availability of mobile robotic platforms, interest in swarm robotics has been growing rapidly. The coordinated effort of many robots has the potential to perform a myriad of useful and possibly dangerous tasks, including search and rescue missions, mapping of hostile environments, and military operations. However, more research is needed before these types of capabilities can be fully realized. In a laboratory setting, a localization system is typically required to track robots, but most available systems are expensive and require tedious calibration. Additionally, dynamical models of the robots are needed to develop suitable control methods, and software must be written to execute the desired tasks. In this thesis, a new video localization system is presented utilizing circle detection to track circular robots. This system is low cost, provides ~ 0.5 centimeter accuracy, and requires minimal calibration. A dynamical model for planar motion of a quadrotor is derived, and a controller is developed using the model. This controller is integrated into StarL, a framework enabling development of distributed robotic applications, to allow a Parrot Cargo Minidrone to visit waypoints in the x-y plane. Finally, two StarL applications are presented; one to demonstrate the capabilities of the localization system, and another that solves a modified distributed travelling salesman problem where sets of waypoints must be visited in order by multiple robots. The methods presented aim to assist those performing research in swarm robotics by providing a low cost easy to use platform for testing distributed applications with multiple robot types.

A NEW REAL-TIME APPROACH FOR WEBSITE PHISHING DETECTION BASED ON VISUAL SIMILARITY
Friday, April 22, 2016
Omid Asudeh

Read More

Hide

Abstract: Phishing attacks cause billions of dollars of loss every year worldwide. Among several solutions proposed for this type of attack, visual similarity detection methods could achieve a good amount of accuracy. These methods exploit the fact that malicious pages mostly imitate some visual signals in the targeted websites. Visual similarity detection methods usually look for the imitations between the screen-shots of the web-pages and the image database of the most targeted legitimate websites. Despite their accuracy, the existing visual based approaches are not practical for the real-time purposes because of their image processing overhead. In this work, we use a pipeline framework in order to be reliable and fast at the same time. The goal of the framework is to quickly and confidently (without false negatives) rule out the bulk of pages that are completely different with the database of targeted websites and to do more processing on the more similar pages. In our experiments, the very first module of the pipeline could rule out more than half of the test cases with zero false negatives. Also, the mean and the median query time of each the test cases is less than 5 milliseconds for the first module.

Comparison of Machine Learning Algorithms in Suggesting Candidate Edges to Construct a Query on Heterogeneous Graphs
Thursday, April 21, 2016
Rohit Ravi Kumar Bhoopalam

Read More

Hide

Abstract: Querying graph data can be difficult as it requires the user to have knowledge of the underlying schema and the query language. Visual query builders allow users to formulate the intended query by drawing nodes and edges of the query graph which can be translated into a database query. Visual query builders help users formulate the query without requiring the user to have knowledge of the query language and the underlying schema. To the best of our knowledge, none of the currently available visual query builders suggest users what nodes/edges to include into their query graph. We provide suggestions to users via machine learning algorithms and help them formulate their intended query. No readily available dataset can be directly used to train our algorithms, so we simulate the training data using Freebase, DBpedia, and Wikipedia and use them to train our algorithms. We also compare the performance of four machine learning algorithms, namely Naďve Bayes (NB), Random Forest (RF), Classification based on Association Rules (CAR), and a recommendation system based on SVD (SVD), in suggesting the edges that can be added to the query graph. On an average, CAR requires 67 suggestions to complete a query graph on Freebase while other algorithms require 83-160 suggestions and Naďve Bayes requires 134 suggestions to complete a query graph on DBpedia while other algorithms require 150-171 suggestions.

Processing Queries over Partitioned Graph Databases: An Approach and Its Evaluation
Thursday, April 21, 2016
Jay Dilipbhai Bodra

Read More

Hide

Abstract: Representation of structured data using graphs is meaningful for applications such as road and social networks. With the increase in the size of graph databases, querying them to retrieve desired information poses challenges in terms of query representation and scalability. Independently, querying and graph partitioning have been researched in the literature. However, to the best of our knowledge, there is no effective scalable approach for querying graph databases using partitioning schemes. Also, it will be useful to analyze the quality of partitioning schemes from the query processing perspective. In this thesis, we propose a divide and conquer approach to process queries over very large graph database using available partitioning schemes. We also identify a set of metrics to evaluate the effect of partitioning schemes on query processing. Querying over partitions requires handling answers that: i) are within the same partition, ii) span multiple partitions, and iii) requires the same partition to be used multiple times. Number of connected components in partitions and number of starting nodes of a plan in a partition may be useful for determining the starting partition and the sequence in which partitions need to be processed. Experiments on processing queries over three different graph databases (DBLP, IMDB, and Synthetic), partitioned using different partitioning schemes have been performed. Our experimental results show the correctness of the approach and provide some insights into the metrics gleaned from partitioning schemes on query processing. QP-Subdue a graph querying system developed at UTA, has been modified to process queries over partitions of a graph database.

Performance evaluation of Map Reduce Query Language on Matrix Operations
Thursday, April 21, 2016
Ahmed Abdul Hameed Ulde

Read More

Hide

Abstract: Non-Negative matrix factorization is a well-known complex machine learning algorithm, used in collaborative filtering. The collaborative filtering technique which is used in recommendation systems, aims at predicting the missing values in user-item association matrix. As an example, a user-item association matrix contains users as rows and movies as columns and the matrix values are the ratings given by users to respective movies. These matrices have large dimensions so they can only be processed with parallel processing. The query language MRQL is a query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop, Spark, Hama and Flink. Given that large scale matrix operations require proper scaling and optimization in distributed systems in this work we are analyzing the performance of MRQL on complex matrix operations by using different sparse matrix datasets in spark mode. This work aims at performance analysis of MRQL on complex matrix operations and the scalability of these operations. We have performed simple matrix operations such as multiplication, division, addition, subtraction and also complex operations such as matrix factorization. We have tested the Gaussian non-negative matrix factorization and stochiastic gradient descent based matrix factorization are the two algorithms in Spark and Flink modes of MRQL with a dataset of movie ratings. The performance analysis in these experiments will help readers to understand and analyze the performance of MRQL and also understand more about MRQL.

Spatio-Temporal Patterns of GPS Trajectories using Association Rule Mining
Tuesday, April 19, 2016
Vivek Kumar Sharma

Read More

Hide

Abstract: The availability of location-tracking devices such as GPS, Cellular Networks and other devices provides the facility to log a person or device locations automatically. This creates spatio-temporal datasets of user's movement with features like latitude,longitude of a particular location on a specific day and time. With the help of these features different patterns of user movement can be collected,queues and analyzed.In this research work, we are focused on user's movement patterns and frequent movements of users on a particular place,day or time interval. To achieve this we used Association Rule mining concept based on Apriori algorithm to find interesting movement patterns.Our dataset for this experiment is from Geolife project conducted by Microsoft Research Asia which consist of 18,630 trajectories, 24 million points logged every 1-5 seconds or 5-10 meters per point.First, we considered the spatial part of data; A two-dimensional space of (latitude,longitude) which ranges from minimum to maximum pair of latitude,longitude logged for all users. We distributed this space into equal grids along both dimensions to reach a significant spatial distance range. Grids with high density points are sub-divided into further smaller grid cells.For the temporal part of data; we transform the dates into days of the week to distinguish the patterns on a particular day and 12 time intervals of 2 hours each to split a day in order to distinguish peak hours of movement.Finally we mine the data using association rules with attributes/features like user id, grid id (unique identifier for each spatial range/region of latitude and longitude), day and time. This enables us to discover patterns of user's frequent movement and similarly for a particular grid. This will give us a better recommendation based on the patterns for a set of like users, point of interests and time of day.

A Data Driven, Hospital Quality of Care Portal for the Patient Community
Monday, April 18, 2016
Sreehari Balakrishna Hegden

Read More

Hide

Abstract: With the recent changes in health services provision, patients are members of a consumer driven healthcare system. However, the healthcare consumers are not presented with adequate opportunities to enhance their position in choosing high quality hospital services. As a result, the demand for active patient participation in the choice of quality and safe hospital services remained unaddressed. In this research work, we developed MediQoC (Medicare Quality of Care), a data driven web portal for Medicare patients, their caregivers and the healthcare insurance policy designers to grant access to data-driven information about hospitals, and quality of care indicators. The portal which utilizes the Medicare claims dataset enables the patients, caregivers and other stakeholders the ability to locate high-quality hospital services for specific diseases and medical procedures. MediQoC provides the users a list of eligible hospitals, and output statistics on hospital stay attributes and quality of care indicators, including the prevalence of hospital acquired conditions. It gives options for the users to rank hospitals on the basis of the aforementioned in-hospital attributes and quality indicators. The statistical module of the portal models the correlation between length of stay and discharge status attributes in each hospital for the given disease. Finally, the ranking results are visualized as bar charts via MediQoC-viz, the visualization module of the portal. The visualization module also makes use of Google Geocoding API to locate in map the nearest hospital to user’s location. It also displays the location, distance and driving duration to the hospitals selected by the user from the ranked result list.

Ogma - Language Acquisition System using Immersive Virtual Reality
Monday, April 11, 2016
Sanika Sunil Gupta

Read More

Hide

Abstract: One of the methods of learning a new language, or Second-Language Acquisition (SLA), is immersion, seen today as one of the most effective learning methods. Using this method, the learner relocates to a new place where the target language is the dominant language and tries to learn the language my immersing themselves in the local environment. However, it isn’t a feasible option for all, thus, traditional, less effective learning methods are used. As an alternative solution, we use virtual reality (VR) as a new method to learn a new language. VR is an immersive technology that allows the user to wear a head-mounted display to be immersed in a life-like virtual environment. Ogma, an immersive virtual reality (VR) language learning environment is introduced and compared to traditional methods of language learning. For this study, teaching a foreign vocabulary was focused only. Participants were given a set of ten Swedish words and learn them either by using a traditional list-and-flash-cards method or by using Ogma. They then return one week later to give feedback and be tested on their vocabulary-training success. Results indicated that percentage retention using our VR method was significantly higher than that of the traditional method. In addition, the effectiveness and enjoyability ratings are given by users were significantly higher for the VR method. This proves that our system has a potential impact on SLA by using VR technology and that Immersive Virtual reality technique is better than traditional methods of learning a new language.

INTERACTIVE DASHBOARD FOR USER ACTIVITY USING NETWORK FLOW DATA
Thursday, December 03, 2015
Lalit Kumar Naidu

Read More

Hide

Abstract: Data visualization is critical in analytical systems containing multi-dimensional dataset and problems associated with increasing data size.?It facilitates the data explanation process of reasoning data and discovering trends with visual perception that are otherwise not evident within the data in its raw form. The challenge involved in visualization is presenting data in such a way that helps end users in the process of information?discovery with simple visuals. Interactive visualizations have increasingly become popular in recent years with prominent research in the field of information visualization. These techniques are heavily used in web-based applications to present myriad forms of data from various domains that encourage viewers to comprehend data faster, while they are looking for important answers. ? This thesis presents a theme for visualizing discrete temporal dataset (pertains to network flow) to represent?Internet activity of device (interface) owners with the aid of interactive visualization.?The data presentation is in the form of web-based interactive dashboard with multiple visual layouts designed to focus on end user queries such as who, when and what. We present "event map" as a component of this dashboard that represents user activity as collections of individual flow from the dataset. In addition, we look into design issues, data transformation?and aggregation techniques involved in the narration of data presentation. The outcome of this thesis is a functional proof-of-concept, which allows demonstration of a network flow dashboard that can be served as a front-end interface for analytical systems that use such data (network flow).

Lung Cancer Subtype Recognition, Classification from Whole Slide Histopathological Images.
Tuesday, December 01, 2015
Dheeraj Ganti

Read More

Hide

Abstract: Lung Cancer is one of the most serious diseases causing death for human beings. The progression of the disease and response to treatment differs widely among patients. Thus it is very important to classify the type of tumor and also able to predict the clinical outcomes of patients. Majority of lung cancers is Non-Small Cell Lung Cancer (NSCLC) which constitutes of 84 % of all the type of lung cancers. The two major subtypes of NSCLC are Adenocarcinoma (ADC) and Squamous Cell Carcinoma (SCC). Accurate classification of the lung cancer as NSCLC and its subtype recognition, classification is very important for quick diagnosis and treatment. In this research, we proposed a quantitative framework for one of the most challenging clinical case, the subtype recognition and classification of Non-Small Cell Lung Cancer (NSCLC) as Adenocarcinoma (ADC) and Squamous Cell Carcinoma (SCC). The proposed framework made effective use of both the local features and topological features which are extracted from whole slide histopathology images. The local features are extracted after using vigorous cell detection and segmentation so that every individual cell is segmented from the images. Then efficient geometry and texture descriptors which are based on the results of cell detection are used to extract the local features. We determined the architectural properties from the labelled nuclei centroids to investigate the potent of the topological features. The results of the experiments from popular classifiers show that the structure of the cells plays vital role and to differentiate between the two subtypes of NSCLC, the topological descriptors act as representative markers.

Detecting Real-time Check-worthy Factual Claims in Tweets Related to U.S. Politics
Tuesday, November 24, 2015
Fatma Dogan

Read More

Hide

Abstract: In increasing democracy and improving political discourse, political fact-checking has come to be a necessity. While politicians make claims about facts all the time, journalists and fact-checkers oftentimes reveal them as false, exaggerated, or misleading. Use of technology and social media tools such as Facebook and Twitter has rapidly increased the spread of misinformation. Thus, human fact-checkers face difficulty in keeping up with a massive amount of claims, and falsehoods frequently outpace truths. All U.S. politicians have successively adopted Twitter, and they make use of Twitter for a wide variety of purposes, a great example being making claims to enhance their popularity. Toward the aim of helping journalists and fact-checkers, we developed a system that automatically detects check-worthy factual claims in tweets related to U.S. politics and posts them on a publicly visible Twitter account. The research consists of two processes: collecting and processing political tweets. The process for detecting check-worthy factual claims involves preprocessing collected tweets, finding the check-worthiness score of each tweet, and applying several filters to eliminate redundant and irrelevant tweets. Finally, a political classification model distinguishes tweets related to U.S. politics from other tweets and reposts them on a created Twitter account.

Speaker Identification in Live Events Using Twitter
Friday, November 20, 2015
Minumol Joseph

Read More

Hide

Abstract: The prevalence of social media has given rise to a new research area. Data from social media is now being used in research to gather deeper insights into many different fields. Twitter is one of the most popular microblogging websites. Users express themselves on a variety of different topics in 140 characters or less. Oftentimes, users “tweet” about issues and subjects that are gaining in popularity, a great example being politics. Any development in politics frequently results in a tweet of some form. The research which follows focuses on identifying a speaker’s name at a live event by collecting and using data from Twitter. The process for identification involves collecting the transcript of the broadcasting event, preprocessing the data, and then using that to collect the necessary data from Twitter. As this process is followed, a speaker can be successfully identified at a live event. For the experiments, the 2016 presidential candidate debates have been used. In principle, the thesis can be applied to identify speakers at other types of live events.

Quantitave Analysis of Scalable NoSQL Databases
Friday, November 20, 2015
Surya Narayanan Swaminathan

Read More

Hide

Abstract: NoSQL databases are rapidly becoming the customary data platform for big data applications. These databases are emerging as a gateway for more alternative approaches outside traditional relational databases and are characterized by efficient horizontal scalability, schema-less approach to data modeling, high performance data access, and limited querying capabilities. The lack of transactional semantics among NoSQL databases has made the application determine the choice of a particular consistency model. Therefore, it is essential to examine methodically, and in detail, the performance of different databases under different workload conditions. In this work, three of the most commonly used NoSQL databases: MongoDB, Cassandra and Hbase are evaluated. Yahoo Cloud Service Benchmark, a popular benchmark tool, was used for performance comparison of different NoSQL databases. The databases are deployed on a cluster and experiments are performed with different numbers of nodes to assess the impact of the cluster size. We present a benchmark suite on the performance of the databases on its capacity to scale horizontally and on the performance of each database based on various types of workload operations (create, read, write, scan) on varying dataset sizes.

QP-SUBDUE: PROCESSING QUERIES OVER GRAPH DATABASES
Friday, November 13, 2015
Ankur Goyal

Read More

Hide

Abstract: Graphs have become one of the preferred ways to store structured data for various applications such as social network graphs, complex molecular structure, etc. Proliferation of graph databases has resulted in a growing need for effective querying methods to retrieve desired information. Querying has been widely studied in relational databases where the query optimizer finds a sequence of query execution steps (or plans) for efficient execution of the given query. Until now, most of the work on graph databases has concentrated on mining. For querying graph databases, users have to either learn a graph query language for posing their queries or use provided customized searches of specific substructures. Hence, there is a clear need for posing queries using graphs, consider alternative plans, and select a plan that can be processed efficiently on the graph database. In this thesis, we propose an approach to generate plans from a query using a cost-based approach that is tailored to the characteristics of the graph database. We collect metadata pertaining to the graph database and use cost estimates to evaluate the cost of execution of each plan. We use a branch and bound algorithm to limit the state space generated for identifying a good plan. Extensive experiments on different types of queries over two graph databases (IMDB and DBLP) are performed to validate our approach. Subdue – a graph mining algorithm has been modified to process a query plan instead of performing mining.

Evaluating the Effectiveness of BEN in Localizing Different Types of Software Fault
Friday, July 31, 2015
Jaganmohan Chandrasekaran

Read More

Hide

Abstract: Debugging refers to the activity of locating software faults in a program and is considered to be one of the most challenging tasks during software development. Automated fault localization tools have been developed to reduce the amount of effort and time software developers have to spend on debugging. In this thesis, we evaluate the effectiveness of a fault localization tool called BEN on different types of software fault. Assuming that combinatorial testing has been performed on the subject program, BEN leverages the result obtained from combinatorial testing to perform fault localization. Our evaluation focuses on how the following three properties of software fault affect the effectiveness of BEN: (1) Accessibility: Accessibility refers to the degree of difficulty to reach (and execute) a fault during a program execution; (2) Input-value sensitivity: A fault is input-value sensitive if the execution of the fault triggers a failure only for some input values but not for other values; and (3) Control-flow sensitivity: A fault is control-flow sensitive if the execution of the fault triggers a failure while inducing a change of control flow in the program execution. We conducted our experiments on seven programs from the Siemens suite and two real-life programs grep and gzip from the SIR repository. Our results indicate that BEN is very effective in locating faults that are harder to access. This is because BEN adopts a spectrum-based approach in which the spectra of failed and passed tests are compared to rank suspicious statements. In general, statements that are exercised only in the failed tests are ranked higher than statements that are exercised in both failed and passed tests. Faults that are harder to access is likely to be executed only in the failed tests and are thus ranked to the top. On the other hand, faults that are easier to access are likely to be executed by both failed and passed tests, and are thus ranked lower. Our results also suggest, in most of the cases, BEN is effective in locating input value and control flow insensitive faults. However, no conclusion can be drawn from the experimental data about the individual impact of input value sensitivity and control flow sensitivity on BEN’s effectiveness.