Student Projects Completed in 2020-2021 Copy

Spring 2021 Student Projects

A DDoS Detection Model Based on RF and SVM

Students: Jiawen Li, Qixian Lu

Faculty Mentor: Ashutosh Dutta

Abstract: Network slicing is one of the key technologies that permit 5G networks to provide dedicated resources to different industries (services). DDoS attacks have a great impact on the network slices when they share the same physical resources, even with slice isolation, the network slices that are not directly under attack may not respond to the normal request due to the exhaustion of resources. For the issue above, we propose a new model that combines Random Forest and SVM to classify the attacks and normal traffic. First, according to the features that are preprocessed from the dataset, with the use of the Random Forest and Gini index, the most relevant and important features are selected. With the selected features and corresponding data, SVM model is trained to detect DDoS attacks. The results demonstrate that the combination of Random Forest and SVM has better performance compared with the Random Forest model and basic SVM model. It has a higher accuracy rate, recall rate and F1 score.
AI Ethics in Chatbots

Students: Hedin Beattie

Faculty Mentor: Lanier Watkins

Abstract: The use of machining learning techniques to train conversational chatbots in the nuances of human interactions raises the concern of whether chatbots will demonstrate a prejudice similar to that of humans. Ideally, a chatbot is incapable of racism, sexism, or any other offensive speech, but several well-known public instances indicate otherwise. We explore the nature of bias in open source chatbots using a data set designed to reflect subtle bias, build a ‘toxicity framework’ capable of assessing the degree of bias in these bots, and attempt to retrain the bots, expanding on the previous data set, to reduce their inherent bias – all without adjusting their underlying code or structure.
Digital Forensic Techniques for Recovering Deleted SQLite Databases Data

Students: Qiao Jiang, Kun Liu, Simin Zhou

Faculty Mentor: Tim Leschke

Abstract: SQLite is small in size and fast in speed, so it is widely popular in software development. The widespread acceptance of SQLite has led to data recovery in the SQLite database, becoming a focus for forensic analysts. While various tools for SQLite database recovery have been developed, the quality of the tools and the techniques used by data recovery tools under different conditions are vague. Our research investigates how different deletion behaviors and settings affect the quality and availability of these tools and analyzed the techniques behind different SQLite database recovery tools. Based on the experiment results and the analysis, we provide suggestions and guidance for selecting SQLite data recovery techniques.
Forensics Analysis and Comparison on Different IMA Applications

Students: Chen Bai, Guoyi Chen, Zheng Qin

Faculty Mentor: Tim Leschke

Abstract: Instant message applications have become an inevitable part in people’s daily life. They help communicate between people in distance, offer convenience in study, work and so on. The number of users of IMAs has risen exponentially in recent years. In the context of the outbreak of COVID-19 pandemic, the number of active users is much more than ever. However, such reliance on IMAs has also brought great opportunities for cyber criminals, who may use IMAs to commit crimes such as cyberbullying, sexual harassment, phishing, etc. Thus, examining what artifacts can be obtained from a certain IMA using digital forensic methods is necessary. In this article, we conducted digital forensic examinations on two famous IMAs: WhatsApp and WeChat to get an overview about how to make forensic examinations on IMAs. Then, the testified methods are used for examination on an IMA that has a relatively small number of user base – MOMO. At last, a comparison is made between the three IMAs in terms of their security aspect.
Vulnerability Scanning and Verification for Node.js Packages Based on an Abstract Interpretation Vulnerability Scanning Tool

Students: Huangyin Chen, Qingshan Zhang, Siqi Cao

Faculty Mentor: Yinzhi Cao

Research Assistant: Song Li (CS Ph.D. Student)

Abstract: Node.js is a very popular JavaScript runtime. In the development process, many developers are used to importing external Node.js packages using npm. However, malicious attackers can execute serious attacks by exploiting some vulnerabilities in Node.js packages. An example of such attacks can be OS command injection attacks where malicious commands could be executed through sink functions. Other types of vulnerabilities that could be raised through vulnerable Node.js packages include path traversal vulnerability, prototype pollution vulnerability, etc. It is known that source code analysis tools are efficient to detect vulnerabilities in Node.js packages. The key to scanning is to filter suspicious code from source code and throw warnings to developers. OPGen, developed by the JHU System Security Lab, is an abstract interpretation vulnerability scanning tool that generates object property graphs (OPG) and detects vulnerabilities in npm packages. In this project, we first studied the implementation principle of OPGen. Then, we used this tool to extract vulnerabilities from npm packages. After scanning 1,000,000 npm packages for OS command injection vulnerability, path traversal vulnerability, and prototype pollution vulnerability, we identified four vulnerabilities, each from a distinct npm package. We then checked the code structure and did penetration testing for each package to verify its vulnerability. After the experiment, we did the result analysis for the vulnerable function list of the OS command injection vulnerability and the tool performance.

Fall 2020 Student Projects

Security and Threat analysis in SDN under 5G

Students: Shuofeng Wang, Yu Mao, Yue Chen

Faculty Mentor: Ashutosh Dutta

Abstract: Software Defined Network (SDN), as a key foundation for 5G, provides flexibility, resilience, and programmability for the core network. It is expected to be applied in various types of 5G services, such as edge cloud and radio access network (RAN). Considering massive usage of 5G networks might pose security challenges on SDN, we conduct security threat analysis on SDN, find out potential vulnerabilities in SDN northbound and southbound. We choose the open-source SDN controller named OpenDaylight (ODL) as our object of study. Then, distributed denial-of-service (DDoS) attack, man-in-the-middle (MITM) attack, RESTful API parameter exploitation attack, and fuzzing attack, are carried out on the ODL controller. These attacks cause serious problems like credential leakages and controller crashes. We also introduce mitigations to these attacks and deploy part of them in our testbed.
Network Intrusion Detection through Machine Learning

Students: Haoran Xu, Tianshi Feng, Shi Tang

Faculty Mentor: Ashutosh Dutta

Abstract: With the rapid development of network technology, there are more and more network-based intrusions. Network computer system has become the target of hackers’ invasion. The security of network system is facing great threat. Therefore, intrusion detection technology has become a hot topic in the field of network security. As an important supplement of traditional intrusion prevention technology, such as user identity authentication and information protection, it is another protective wall used to protect network computer system.

However, the traditional intrusion detection methods need to manually update the intrusion matching pattern, which is expensive and has poor real-time performance. Often in this period of manual update, new intrusion has caused great harm to the network system.

This paper mainly discusses the research methods of Intrusion Detection Based on machine learning. In order to improve the probability of intrusion detection, we tried to implement four different traditional machine learning algorithms: Decision Tree, Random Forest, K Nearest Neighbors and Support Vector Machine and train the NSL-KDD dataset to get four different classifiers. We would introduce how we build the models with specific parameters and show different prediction result with detailed analysis. The results are satisfied with almost 99% accuracy and 95% recall rate.
Secure Real Time Drug Delivery Device

Students: Arvind Ponnarassery Jayan, Weiheng Bai, Apoorv Dayal

Faculty Mentor: Avi Rubin

Abstract: With the increased integration of wireless device technology into the medical field, a larger amount of implanted wireless medical devices are coming into the horizon. The multi-disciplinary team in Johns Hopkins University is aiming to create a skull embedded and MRI-compatible medical device, which actively pumps therapeutic medicine to the target glioblastoma multiforme (GBM) tumor resection site. This project attempts to enumerate the potential cybersecurity threats and possible mitigations for this device. For this security review STRIDE model is utilized to generate the architecture review, data flow diagram and finally the Attack Model and Attack Tree. The conclusion emphasizes the many major threats faced by the device that can put the patient in high risk (including death), which will be considered for the next iteration of the development team.
WDPKR: Wireless Data Processing Kit for Reconnaissance, IoT Profiling

Students: Claudia Moncaliano

Faculty Mentor: Avi Rubin

Research Assistant: Tushar Jois (CS)

Abstract: Internet-of-Things (IoT) devices are the building blocks to a stream of automated smart environments including residential homes, neighborhoods, schools, and office buildings. Due to their rapid growth, quick production cycles, and large market space, IoT devices are susceptible to undiscovered vulnerabilities and privacy concerns. These connected devices are diverse, feature-rich, and not standardized which makes their coexistence in a Smart Home environment difficult for researchers to study. We propose WDPKR, pronounced “woodpecker”, which stands for Wireless Data Processing Kit for Reconnaissance. WDPKR is a data collection and analysis solution designed for IoT device profiling in smart home environments to strengthen the analysis of 802.11 networks. In this paper, we discuss the design of WDPKR, a smart home testbed, and the generation of a robust IoT network traffic dataset. Through experimentation, we prove a holistic dataset construction and feature selection increases the accuracy of device fingerprinting and WDPKR’s functionality presents a strong device discovery and profiling framework. We conclude with an analysis of the privacy implications tied to the network analysis conducted through WDPKR and the profiles generated through IoT device usage.
Hackback - Defend Once and for All

Students: Debolina De, Sai Kiran Uppu

Faculty Mentor: Avi Rubin

Abstract: Counterattack is an effective method for preventing targeted continuous attacks in organizations, even on a limited resource, this helps in giving breathing space to the security teams to understand and respond to the cyber threat. We propose a holistic approach for making a quick decision in real time to counter the attack relying on legal, industry and honeypot data. A honeypot feedback loop also strengthens such decision making. The significant increase in efficiency of this method comes from the fact that decision matrix is precomputed based on the historic threat patterns and real time IDS data, as opposed to signature based and other less efficient methodologies. Experimentation results and working prototype were designed and obtained to exhibit the effectiveness of the proposed algorithm on real time network traffic. Our project demonstrated several industry use case scenarios, proving the method to be reliable.
Potential Risk Analysis & Classification in Android Environment

Students: Zuowei Cui, Gaoyuan Du, Haoran Lin

Faculty Mentor: Chris Monson

Abstract: The smartphones users have been rapidly increasing over the years, mainly the Android users. In recent years a number of approaches for Android malware detection have been proposed, using permissions, source code analysis, or dynamic analysis. In this paper, we propose to use machine learning models for Android malware detection. Through extensive experimental evaluation, we demonstrate that it can reach more than 99% accuracy on risk detection. In particular, we propose to use both supervised learning and unsupervised learning methods simultaneously in order to meet the requirements in the real industry environment. Also, we comprehensively build models on the data including application codes and network packets. Furthermore, we tentatively crawl the data from Android by ourselves to verify our methods and analysis.
Analysis and Implementation of MITRE Shield Framework

Students: Annamarie Casimes, Alex Schultz

Faculty Mentor: Lanier Watkins

External Mentor: Mika Ayenson (JHU/APL)

Abstract: As Advanced Persistent Threats (APTs) continue to develop new exploits and ways to avoid detection by “living off the land” it is harder and harder for defenders to identify and prevent attacks. Current cyber defense techniques are reactive – focusing on mitigation and remediation – and provide the attacker an asymmetric advantage. Active cyber defense is vital to shifting the advantage from the attacker to the defender. Active cyber defense consists of the moving target problem, cyber deception, and adversarial engagement. While the United States Department of Defense has employed active cyber defense since 2012, little guidance was available for the private sector on how to implement an active defense. This research explores the recently published MITRE Shield framework as an implementation of cyber deception and combative engagement methods within active defense for the private sector. The primary goal of this research is to outline a methodology defender that can be used to implement MITRE Shield active defense techniques. This research evaluated the following defense technique evaluations (DTEs) in depth: DTE 0007 – Application Diversity, DTE 0033 – Standard Operating Procedure, DTE 0034 – System Activity Monitoring. Through our research, we were able to abstract a process to evaluate Shield defensive techniques from the perspective of a novice defender. This Shield implementation guide can be used by private sector defenders to strengthen their cyber defense posture and combat advancing Advanced Persistent Threats (APTs).
Prevention of Reconnaissance Using AI Automation

Students: Yaamini Barathi Mohan, Nikhil Teja Dommeti, Shreya Shrikant Kulkarni

Faculty Mentor: Lanier Watkins

External Mentor: Matt Price (Picnic Score)

Abstract: Many of the social engineering attacks start with the reconnaissance phase also known as the “eagle’s eye” to obtain maximum information about the users. The public presence of the online information increases a business’s or individual’s attack surface and is used by attackers for social engineering attacks or in the reconnaissance phase of the cyber-attack lifecycle. The project breaks the first step of social engineering attack: reconnaissance by discovering the actionable information related to the person present online and automating the takedown of such information. We have developed a framework for breaking reconnaissance from data aggregators and alert the victims. A detailed overview of tools for developing prototypes and implementing various record linking methods is also discussed.
Forensics Analysis of Private Browsing in Google Chrome

Students: Tab Zhang, Yuannan Yang, Zhenyu Ji

Faculty Mentor: Tim Leschke

Abstract: Google Chrome offers a private browsing feature, the incognito mode, which allows users to browse without leaving any browsing artifact on computer hard drives. While most Chrome users use this feature for respectable reasons, it can facilitate criminal activities as forensic examiners cannot find any evidential artifacts using traditional forensics methods. However, there may still be limited artifacts stored in RAM and virtual memory, which can become valuable evidence. Our research investigates possible methods to recover Google Chrome’s private browsing artifacts on Windows 10 computers and how different factors affect the quality and availability of such artifacts. Based on our experimental data, we proposed a step-by-step guide for future forensic examiners to retrieve and examine Google Chrome’s private browsing.
Echo Dot 2 Forensics

Students: Zhiqi Li, Weichen Wang, Dongyue Yan

Faculty Mentor: Tim Leschke

Abstract: In our article, we will concentrate on extracting different types of data from Echo Dot 2. The records and operation traces, as well as the environmental recordings, alarms, voices, and reminders, can all be obtained from general types of systems including macOS, Windows, Android, and iOS. We will analyze the optional and rational forensics methods, such as through the device itself, user application Alexa, Alexa Cloud service, and network packet capture analysis. Finally, we provided simulation and experiment results and made a general forensics guide for examiners of their future forensics of Echo Dot 2.

In people’s modern resident life, more users will choose to use the smart home because of technology development. People like to take advantage of the convenience and intelligence of the smart home, at the same time, stay their personal data in the smart home system and device, which provide good research objects for forensics scientists and examiners. From lots of equipment, we found a type of smart audio equipment, Amazon Echo Dot 2, which can perform voice control with the help of the assistance application. The suitable price and various functions make Echo Dot be widely used, giving researchers potential opportunities and data source of IoT devices forensics. The data and evidence will play an important role to prove people guilty or innocence in court.

We reviewed related literature including IoT Forensics – Amazon Echo, which supplied a forensic model for IoT devices, taking Amazon Echo as an example. Then after reading some surveys of IoT digital forensics, we found some frameworks and methods of data acquisition of IoT devices in some other articles.
Protecting Privacy against Unauthorized Facial Recognition Models

Students: Yilun Yang, Ziqing Lin

Faculty Mentor: Tim Leschke

Research Assistant: Qiao Jiang (MSSI)

External Mentor: Lei Ding (American University)

Abstract: Recently, lots of research has been done on facial recognition. Undeniably, the extensive application of facial recognition technology has changed our lives drastically and makes our life more convenient. However, it also poses significant security and privacy concerns for the public. When users share their images online, they will be threatened by unauthorized facial recognition models that collect their images for training and identify who they are. This paper conducted an in-depth study and analysis of facial recognition and one excellent achievement Fawkes. Subsequently, we strived to give technical design and analysis for the improvement of Fawkes. Simultaneously, we carried out numerous experiments to test and evaluate Fawkes and figured out what aspects we could improve. Utilizing the multithreading approach, we optimize the performance of Fawkes regarding time-consuming. Next, a set of evaluation methods were applied to evaluate the effectiveness and feasibility of our approaches.

The primary result is that we developed a usable Chrome Extension based on Fawkes with user-friendly GUIs. Furthermore, the most crucial section of our achievement is the backend server, which provides a set of RESTful APIs. A standard was stipulated for using RESTful APIs. Only when the frontend uses this standard to send requests can the backend server successfully add imperceptible perturbations (we call it “cloak”) to the image. There are two benefits, (1) one is excellent transferability; (2) another is that it can remarkably reduce the runtime of “cloaking” the image.
Evasion Attacks against Machine Learning Models for Spam Email Detection

Students: Suye Huang, Danyi Zhang, Chenran Wang

Faculty Mentor: Xiangyang Li

External Mentor: Lei Ding (American University)

Abstract: As email becomes more and more important in people’s work and life, flooding of spam emails have become a serious problem affecting work efficiency. As a result, spam detections based on various algorithms have been proposed. Although these classifiers have good processing capabilities for solving some complex problems, they cannot handle well some slight disturbances in the input. Recently, a great deal of research has tried to construct adversarial examples to mislead the classifier using methods such as FGSM, PGD, etc. This paper combines natural language processing (NLP) technology of TF-IDF calculation to construct a spam classifier based on the SVM algorithm and then uses PGD to generate adversarial examples to perform a white-box attack on the generated SVM classifier. Our contribution is reflected in analyzing the working principle of PGD, trying to simulate the attack process, and generating adversarial examples, and crafting corresponding emails to bypass the SVM classifier. We also conduct black-box attacks by testing these adversary examples on different classifiers.
Training Time Adversarial Attacks with Multiple Autoencoders

Students: Ziyang Lin, Jiawei Guo

Faculty Mentor: Xiangyang Li

External Mentor: Lei Ding (American University)

Research Assistants: Chengsi Yang (ECE), Simin Zhou (MSSI)

Abstract: Training machine learning models on individual devices creates a new attack surface, where these devices are vulnerable to training time attacks. In this project, we consider the training time attack by adding perturbation in the dataset, hoping to alter the behavior of a target classifier. We used an autoencoder-like network to generate the adversarial dataset based on the training data together with the targeted classifier and do experiments on the testing dataset. This raises a non-convex optimization problem that we approximated by a two-step optimization problem to stabilize the final result. By hijacking the training process of the victim classifier, the noise generator can learn how to interfere with the classifier step by step. Based on that, we proposed several multiple-autoencoder models to improve the attack performance. We found that using more autoencoders not only has dramatically improved the convergence speed but also changed the interference between the autoencoders. It is further influenced by the strategies that we adopt to split data according to class labels. Then, we proposed a new global loss function to update the noise generator for mitigating the interference. In this way, we were able to have the best model that outperforms other models in both speed and accuracy. These models were tested on the MNIST dataset. The method that we proposed in this research project can be extended to scenarios where the attacker aims to manipulate class-specific predictions of a classifier.
Practical Blind Membership Inference Attack

Students: Yuchen Yang, Bo Hui, Haolin Yuan

Faculty Mentor: Yinzhi Cao

Abstract: Membership inference (MI) attacks affect user privacy by inferring whether given data samples have been used to train a target learning model, e.g., a deep neural network. There are two types of MI attacks in the literature, i.e., these with and without shadow models. The success of the former heavily depends on the quality of the shadow model, i.e., the transferability between the shadow and the target; the latter, given only blackbox probing access to the target model, cannot make an effective inference of unknowns, compared with MI attacks using shadow models, due to the insufficient number of qualified samples labeled with ground truth membership information.

In this paper, we propose an MI attack, called BlindMI, which probes the target model and extracts membership semantics via a novel approach, called differential comparison. The high-level idea is that BlindMI first generates a dataset with nonmembers via transforming existing samples into new samples, and then differentially moves samples from a target dataset to the generated, non-member set in an iterative manner. If the differential move of a sample increases the set distance, BlindMI considers the sample as non-member and vice versa.

BlindMI was evaluated by comparing it with state-of-the-art MI attack algorithms. Our evaluation shows that BlindMI improves F1-score by nearly 20% when compared to state-of-the-art on some datasets, such as Purchase-50 and Birds-200, in the blind setting where the adversary does not know the target model’s architecture and the target dataset’s ground truth labels. We also show that BlindMI can defeat state-of-the-art defenses.
Privacy and Security of Skills in Smart Home Personal Assistants

Students: Zichen Wang, Xiangjun Ma, Haotian An

Faculty Mentor: Yinzhi Cao

Abstract: As Smart Home Personal Assistants (SPAs) popularity has been increasing over the years, the privacy and safety concerns brought with this topic are also becoming more important. Skills, being one of the forms of applications that the users interact with most of the time when using SPAs, is the main interest of this project. In this project, we first defined two aspects of the privacy concerns in skills that are most common across different platforms of SPAs, and conducted two experiments regarding each privacy concern. The first concern is the under-controlled third-party server that may bring malicious skills into the market, and the second concern is loose market vetting and over-trust in privacy policies.

In the first part of this project, the goal is to explore whether there exists the possibility that third-party backends are using privacy information in other businesses. We set up experiments that deliberately feed in privacy information to most common skills and observe whether the information is used outside the skill scenario. We discovered some skills are violating developing skills, but no clues are collected so far that could strongly prove the private information is used outside the skill market.

Next, in the second part of the project, we seek to investigate how the mainstream platforms are checking the newly-developed skills into the market. We set up experiments and developed two potential malicious skills, and published them in the Google Home market and Amazon Alexa store. By submitting such potential malicious skills and successfully passing the review process, we argue that more malicious skills passed the check and leaked into the market.

Finally, we provided possible mitigations to these two observations and formed our suggestions to further secure the skills market and provide users a more safe and secure environment when using such third-party skills.

Student Projects Completed in 2020-2021 Copy

Spring 2021 Student Projects

Fall 2020 Student Projects

JHU Information Security Institute