Third-party libraries with rich functionalities facilitate the fast development of JavaScript software, leading to the explosive growth of the NPM ecosystem. However, it also brings new security threats that vulnerabilities could be introduced through dependencies from third-party libraries. In particular, the threats could be excessively amplified by transitive dependencies. Existing research only considers direct dependencies or reasoning transitive dependencies based on reachability analysis, which neglects the NPM-specific dependency resolution rules as adapted during real installation, resulting in wrongly resolved dependencies. Consequently, further fine-grained analysis, such as precise vulnerability propagation and their evolution over time in dependencies, cannot be carried out precisely at a large scale, as well as deriving ecosystem-wide solutions for vulnerabilities in dependencies. To fill this gap, we propose a knowledge graph-based dependency resolution, which resolves the inner dependency relations of dependencies as trees (i.e., dependency trees), and investigates the security threats from vulnerabilities in dependency trees at a large scale. Specifically, we first construct a complete dependency-vulnerability knowledge graph (DVGraph) that captures the whole NPM ecosystem (over 10 million library versions and 60 million well-resolved dependency relations). Based on it, we propose a novel algorithm (DTResolver) to statically and precisely resolve dependency trees, as well as transitive vulnerability propagation paths, for each package by taking the official dependency resolution rules into account. Based on that, we carry out an ecosystem-wide empirical study on vulnerability propagation and its evolution in dependency trees. Our study unveils lots of useful findings, and we further discuss the lessons learned and solutions for different stakeholders to mitigate the vulnerability impact in NPM based on our findings. For example, we implement a dependency tree based vulnerability remediation method (DTReme) for NPM packages, and receive much better performance than the official tool (npm audit fix).
Authored by Chengwei Liu, Sen Chen, Lingling Fan, Bihuan Chen, Yang Liu, Xin Peng
Legacy programs are normally monolithic (that is, all code runs in a single process and is not partitioned), and a bug in a program may result in the entire program being vulnerable and therefore untrusted. Program partitioning can be used to separate a program into multiple partitions, so as to isolate sensitive data or privileged operations. Manual program partitioning requires programmers to rewrite the entire source code, which is cumbersome, error-prone, and not generic. Automatic program partitioning tools can separate programs according to the dependency graph constructed based on data or programs. However, programmers still need to manually implement remote service interfaces for inter-partition communication. Therefore, in this paper, we propose AutoSlicer, whose purpose is to partition a program more automatically, so that the programmer is only required to annotate sensitive data. AutoSlicer constructs accurate data dependency graphs (DDGs) by enabling execution flow graphs, and the DDG-based partitioning algorithm can compute partition information based on sensitive annotations. In addition, the code refactoring toolchain can automatically transform the source code into sensitive and insensitive partitions that can be deployed on the remote procedure call framework. The experimental evaluation shows that AutoSlicer can effectively improve the accuracy (13%-27%) of program partitioning by enabling EFG, and separate real-world programs with a relatively smaller performance overhead (0.26%-9.42%).
Authored by Weizhong Qiang, Hao Luo
Cyber-Physical Systems (CPS) are systems that contain digital embedded devices while depending on environmental influences or external configurations. Identifying relevant influences of a CPS as well as modeling dependencies on external influences is difficult. We propose to learn these dependencies with decision trees in combination with clustering. The approach allows to automatically identify relevant influences and receive a data-related explanation of system behavior involving the system's use-case. Our paper presents a case study of our method for a Real-Time Localization System (RTLS) proving the usefulness of our approach, and discusses further applications of a learned decision tree.
Authored by Swantje Plambeck, Görschwin Fey, Jakob Schyga, Johannes Hinckeldeyn, Jochen Kreutzfeldt
With the rapid development of cloud storage technology, an increasing number of enterprises and users choose to store data in the cloud, which can reduce the local overhead and ensure safe storage, sharing, and deletion. In cloud storage, safe data deletion is a critical and challenging problem. This paper proposes an assured data deletion scheme based on multi-authoritative users in the semi-trusted cloud storage scenario (MAU-AD), which aims to realize the secure management of the key without introducing any trusted third party and achieve assured deletion of cloud data. MAU-AD uses access policy graphs to achieve fine-grained access control and data sharing. Besides, the data security is guaranteed by mutual restriction between authoritative users, and the system robustness is improved by multiple authoritative users jointly managing keys. In addition, the traceability of misconduct in the system can be realized by blockchain technology. Through simulation experiments and comparison with related schemes, MAU-AD is proven safe and effective, and it provides a novel application scenario for the assured deletion of cloud storage data.
Authored by Junfeng Tian, Ruxin Bai, Tianfeng Zhang
Data secure deletion operation in storage media is an important function of data security management. The internal physical properties of SSDs are different from hard disks, and data secure deletion of disks can not apply to SSDs directly. Copyback operation is used to improve the data migration performance of SSDs but is rarely used due to error accumulation issue. We propose a data securely deletion algorithm based on copyback operation, which improves the efficiency of data secure deletion without affecting the reliability of data. First, this paper proves that the data secure delete operation takes a long time on the channel bus, increasing the I/O overhead, and reducing the performance of the SSDs. Secondly, this paper designs an efficient data deletion algorithm, which can process read requests quickly. The experimental results show that the proposed algorithm can reduce the response time of read requests by 21% and the response time of delete requests by 18.7% over the existing algorithm.
Authored by Rongzhen Zhu, Yuchen Wang, Pengpeng Bai, Zhiming Liang, Weiguo Wu, Lei Tang
Documents are a common method of storing infor-mation and one of the most conventional forms of expression of ideas. Cloud servers store a user's documents with thousands of other users in place of physical storage devices. Indexes corresponding to the documents are also stored at the cloud server to enable the users to retrieve documents of their interest. The index includes keywords, document identities in which the keywords appear, along with Term Frequency-Inverse Document Frequency (TF-IDF) values which reflect the keywords' relevance scores of the dataset. Currently, there are no efficient methods to delete keywords from millions of documents over cloud servers while avoiding any compromise to the user's privacy. Most of the existing approaches use algorithms that divide a bigger problem into sub-problems and then combine them like divide and conquer problems. These approaches don't focus entirely on fine-grained deletion. This work is focused on achieving fine-grained deletion of keywords by keeping the size of the TF-IDF matrix constant after processing the deletion query, which comprises of keywords to be deleted. The experimental results of the proposed approach confirm that the precision of ranked search still remains very high after deletion without recalculation of the TF-IDF matrix.
Authored by Kushagra Lavania, Gaurang Gupta, D.V.N. Kumar
At present, cloud service providers control the direct management rights of cloud data, and cloud data cannot be effectively and assured deleted, which may easily lead to security problems such as data residue and user privacy leakage. This paper analyzes the related research work of cloud data assured deletion in recent years from three aspects: encryption key deletion, multi-replica association deletion, and verifiable deletion. The advantages and disadvantages of various deletion schemes are analysed in detail, and finally the prospect of future research on assured deletion of cloud data is given.
Authored by Bin Li, Yu Fu, Kun Wang
With the rapid development of general cloud services, more and more individuals or collectives use cloud platforms to store data. Assured data deletion deserves investigation in cloud storage. In time-sensitive data storage scenarios, it is necessary for cloud platforms to automatically destroy data after the data owner-specified expiration time. Therefore, assured time-sensitive data deletion should be sought. In this paper, a fine-grained assured time-sensitive data deletion (ATDD) scheme in cloud storage is proposed by embedding the time trapdoor in Ciphertext-Policy Attribute-Based Encryption (CP-ABE). Time-sensitive data is self-destructed after the data owner-specified expiration time so that the authorized users cannot get access to the related data. In addition, a credential is returned to the data owner for data deletion verification. This proposed scheme provides solutions for fine-grained access control and verifiable data self-destruction. Detailed security and performance analysis demonstrate the security and the practicability of the proposed scheme.
Authored by Zhengyu Yue, Yuanzhi Yao, Weihai Li, Nenghai Yu
Functional dependencies (FDs) are widely applied in data management tasks. Since FDs on data are usually unknown, FD discovery techniques are studied for automatically finding hidden FDs from data. In this paper, we develop techniques to dynamically discover FDs in response to changes on data. Formally, given the complete set Σ of minimal and valid FDs on a relational instance r, we aim to find the complete set Σ$^\textrm\textbackslashprime$ of minimal and valid FDs on røplus\textbackslashDelta r, where \textbackslashDelta r is a set of tuple insertions and deletions. Different from the batch approaches that compute Σ$^\textrm\textbackslashprime$ on røplus\textbackslashDelta r from scratch, our dynamic method computes Σ$^\textrm\textbackslashprime$ in response to \textbackslashtriangle\textbackslashuparrow. by leveraging the known Σ on r, and avoids processing the whole of r for each update from \textbackslashDelta r. We tackle dynamic FD discovery on røplus\textbackslashDelta r by dynamic hitting set enumeration on the difference-set of røplus\textbackslashDelta r. Specifically, (1) leveraging auxiliary structures built on r, we first present an efficient algorithm to update the difference-set of r to that of røplus\textbackslashDelta r. (2) We then compute Σ$^\textrm\textbackslashprime$, by recasting dynamic FD discovery as dynamic hitting set enumeration on the difference-set of røplus\textbackslashDelta r and developing novel techniques for dynamic hitting set enumeration. (3) We finally experimentally verify the effectiveness and efficiency of our approaches, using real-life and synthetic data. The results show that our dynamic FD discovery method outperforms the batch counterparts on most tested data, even when \textbackslashDelta r is up to 30 % of r.
Authored by Renjie Xiao, Yong'an Yuan, Zijing Tan, Shuai Ma, Wei Wang
Data is a collection of information from the activities of the real world. The file in which such data is stored after transforming into a form that machines can process is generally known as data set. In the real world, many data sets are not complete, and they contain various types of noise. Missing values is of one such kind. Thus, imputing data of these missing values is one of the significant task of data pre-processing. This paper deals with two real time health care data sets namely life expectancy (LE) dataset and chronic kidney disease (CKD) dataset, which are very different in their nature. This paper provides insights on various data imputation techniques to fill missing values by analyzing them. When coming to Data imputation, it is very common to impute the missing values with measure of central tendencies like mean, median, mode Which can represent the central value of distribution but choosing the apt choice is real challenge. In accordance with best of our knowledge this is the first and foremost paper which provides the complete analysis of impact of basic data imputation techniques on various data distributions which can be classified based on the size of data set, number of missing values, type of data (categorical/numerical), etc. This paper compared and analyzed the original data distribution with the data distribution after each imputation in terms of their skewness, outliers and by various descriptive statistic parameters.
Authored by Sainath Sankepally, Nishoak Kosaraju, Mallikharjuna Rao
Missing values are an unavoidable problem for classification tasks of machine learning in medical data. With the rapid development of the medical system, large scale medical data is increasing. Missing values increase the difficulty of mining hidden but useful information in these medical datasets. Deletion and imputation methods are the most popular methods for dealing with missing values. Existing studies ignored to compare and discuss the deletion and imputation methods of missing values under the row missing rate and the total missing rate. Meanwhile, they rarely used experiment data sets that are mixed-type and large scale. In this work, medical data sets of various sizes and mixed-type are used. At the same time, performance differences of deletion and imputation methods are compared under the MCAR (Missing Completely At Random) mechanism in the baseline task using LR (Linear Regression) and SVM (Support Vector Machine) classifiers for classification with the same row and total missing rates. Experimental results show that under the MCAR missing mechanism, the performance of two types of processing methods is related to the size of datasets and missing rates. As the increasing of missing rate, the performance of two types for processing missing values decreases, but the deletion method decreases faster, and the imputation methods based on machine learning have more stable and better classification performance on average. In addition, small data sets are easily affected by processing methods of missing values.
Authored by Lijuan Ren, Tao Wang, Aicha Seklouli, Haiqing Zhang, Abdelaziz Bouras
This article proposes a health monitoring system platform for cross-river bridges based on big data. The system can realize regionalized bridge operation and maintenance management. The system has functions such as registration modification and deletion of sensor equipment, user registration modification and deletion, real-time display and storage of sensor monitoring data, and evaluation and early warning of bridge structure safety. The sensor is connected to the lower computer through the serial port, analog signal, fiber grating signal, etc. The lower computer converts a variety of signals into digital signals through the single-chip A/D sampling and demodulator, etc., and transmits it to the upper computer through the serial port. The upper computer uses ARMCortex-A9 Run the main program to realize multi-threaded network communication. The system platform is to test the validity of the model, and a variety of model verification methods are used for evaluation to ensure the reliability of the big data analysis method.
Authored by Di Yang, Lianfa Wang, Yufeng Zhang
With the advent of the era of big data, the files that need to be stored in the storage system will increase exponentially. Cloud storage has become the most popular data storage method due to its powerful convenience and storage capacity. However, in order to save costs, some cloud service providers, Malicious deletion of the user's infrequently accessed data causes the user to suffer losses. Aiming at data integrity and privacy issues, a blockchain-based cloud storage integrity verification scheme for recoverable data is proposed. The scheme uses the Merkle tree properties, anonymity, immutability and smart contracts of the blockchain to effectively solve the problems of cloud storage integrity verification and data damage recovery, and has been tested and analyzed that the scheme is safe and effective.
Authored by Ma Haifeng, Zhang Ji
In the present era of the internet, image watermarking schemes are used to provide content authentication, security and reliability of various multimedia contents. In this paper image watermarking scheme which utilizes the properties of Integer Wavelet Transform (IWT), Schur decomposition and Singular value decomposition (SVD) based is proposed. In the suggested method, the cover image is subjected to a 3-level Integer wavelet transform (IWT), and the HH3 subband is subjected to Schur decomposition. In order to retrieve its singular values, the upper triangular matrix from the HH3 subband’s Schur decomposition is then subjected to SVD. The watermark image is first encrypted using a chaotic map, followed by the application of a 3-level IWT to the encrypted watermark and the usage of singular values of the LL-subband to embed by manipulating the singular values of the processed cover image. The proposed scheme is tested under various attacks like filtering (median, average, Gaussian) checkmark (histogram equalization, rotation, horizontal and vertical flipping) and noise (Gaussian, Salt & Pepper Noise). The suggested scheme provides strong robustness against numerous attacks and chaotic encryption provides security to watermark.
Authored by Anurag Tiwari, Vinay Srivastava
Model compression is one of the most preferred techniques for efficiently deploying deep neural networks (DNNs) on resource- constrained Internet of Things (IoT) platforms. However, the simply compressed model is often vulnerable to adversarial attacks, leading to a conflict between robustness and efficiency, especially for IoT devices exposed to complex real-world scenarios. We, for the first time, address this problem by developing a novel framework dubbed Magical-Decomposition to simultaneously enhance both robustness and efficiency for hardware. By leveraging a hardware-friendly model compression method called singular value decomposition, the defending algorithm can be supported by most of the existing DNN hardware accelerators. To step further, by using a recently developed DNN interpretation tool, the underlying scheme of how the adversarial accuracy can be increased in the compressed model is highlighted clearly. Ablation studies and extensive experiments under various attacks/models/datasets consistently validate the effectiveness and scalability of the proposed framework.
Authored by Xin Cheng, Mei-Qi Wang, Yu-Bo Shi, Jun Lin, Zhong-Feng Wang
Image watermarking techniques provides security, reliability copyright protection for various multimedia contents. In this paper Integer Wavelet Transform Schur decomposition and Singular value decomposition (SVD) based image watermarking scheme is suggested for the integrity protection of dicom images. In the proposed technique 3-level Integer wavelet transform (IWT) is subjected into the Dicom ultrasound image of liver cover image and in HH sub-band Schur decomposition is applied. The upper triangular matrix obtained from Schur decomposition of HH sub-band is further processed with SVD to attain the singular values. The X-ray watermark image is pre-processed before embedding into cover image by applying 3-level IWT is applied into it and singular matrix of LL sub-band is embedded. The watermarked image is encrypted using Arnold chaotic encryption for its integrity protection. The performance of suggested scheme is tested under various attacks like filtering (median, average, Gaussian) checkmark (histogram equalization, rotation, horizontal and vertical flipping, contrast enhancement, gamma correction) and noise (Gaussian, speckle, Salt & Pepper Noise). The proposed technique provides strong robustness against various attacks and chaotic encryption provides integrity to watermarked image.
Authored by Anurag Tiwari, Vinay Srivastava
Side Channel Attacks (SCAs), an attack that exploits the physical information generated when an encryption algorithm is executed on a device to recover the key, has become one of the key threats to the security of encrypted devices. Recently, with the development of deep learning, deep learning techniques have been applied to SCAs with good results on publicly available dataset experiences. In this paper, we propose a power traces decomposition method that divides the original power traces into two parts, where the data-influenced part is defined as data power traces (Tdata) and the other part is defined as device constant power traces, and use the Tdata for training the network model, which has more obvious advantages than using the original power traces for training the network model. To verify the effectiveness of the approach, we evaluated the ATXmega128D4 microcontroller by capturing the power traces generated when implementing AES-128. Experimental results show that network models trained using Tdata outperform network models trained using raw power traces (Traw ) in terms of classification accuracy, training time, cross-subkey recovery key, and cross-device recovery key.
Authored by Fanliang Hu, Feng Ni
Watermarking is one of the most common data hiding techniques for multimedia elements. Broadcasting, copy control, copyright protection and authentication are the most frequently used application areas of the watermarking. Secret data can be embedded into the cover image with changing the values of the pixels in spatial domain watermarking. In addition to this method, cover image can be converted into one of the transformation such as Discrete Wavelet Transformation (DWT), Discrete Cousin Transformation (DCT) and Discrete Fourier Transformation (DFT). Later on watermark can be embedded high frequencies of transformation coefficients. In this work, cover image transformed one, two and three level DWT decompositions. Binary watermark is hided into the low and high frequencies in each decomposition. Experimental results show that watermarked image is robust, secure and resist against several geometric attacks especially JPEG compression, Gaussian noise and histogram equalization. Peak Signal-to-Noise Ratio (PSNR) and Similarity Ratio (SR) values show very optimal results when we compare the other frequency and spatial domain algorithms.
Authored by Ersin Elbasi
Previous work has shown that a neural network with the rectified linear unit (ReLU) activation function leads to a convex polyhedral decomposition of the input space. These decompositions can be represented by a dual graph with vertices corresponding to polyhedra and edges corresponding to polyhedra sharing a facet, which is a subgraph of a Hamming graph. This paper illustrates how one can utilize the dual graph to detect and analyze adversarial attacks in the context of digital images. When an image passes through a network containing ReLU nodes, the firing or non-firing at a node can be encoded as a bit (1 for ReLU activation, 0 for ReLU non-activation). The sequence of all bit activations identifies the image with a bit vector, which identifies it with a polyhedron in the decomposition and, in turn, identifies it with a vertex in the dual graph. We identify ReLU bits that are discriminators between non-adversarial and adversarial images and examine how well collections of these discriminators can ensemble vote to build an adversarial image detector. Specifically, we examine the similarities and differences of ReLU bit vectors for adversarial images, and their non-adversarial counterparts, using a pre-trained ResNet-50 architecture. While this paper focuses on adversarial digital images, ResNet-50 architecture, and the ReLU activation function, our methods extend to other network architectures, activation functions, and types of datasets.
Authored by Huma Jamil, Yajing Liu, Christina Cole, Nathaniel Blanchard, Emily King, Michael Kirby, Christopher Peterson
In this work, we conduct a systematic study on data poisoning attacks to Matrix Factorisation (MF) based Recommender Systems (RS) where a determined attacker injects fake users with false user-item feedback, with an objective to promote a target item by increasing its rating. We explore the capability of a MF based approach to reduce the impact of attack on targeted item in the system. We develop and evaluate multiple techniques to update the user and item feature matrices when incorporating new ratings. We also study the effectiveness of attack under increasing filler items and choice of target item.Our experimental results based on two real-world datasets show that the observations from the study could be used to design a more robust MF based RS.
Authored by Sulthana Shams, Douglas Leith
By broadcasting false Global Navigation Satellite System (GNSS) signals, spoofing attacks will induce false position and time fixes within the victim receiver. In this article, we propose a Sparse Decomposition (SD)-based spoofing detection algorithm in the acquisition process, which can be applied in a single-antenna receiver. In the first step, we map the Fast Fourier transform (FFT)-based acquisition result in a two-dimensional matrix, which is a distorted autocorrelation function when the receiver is under spoof attack. In the second step, the distorted function is decomposed into two main autocorrelation function components of different code phases. The corresponding elements of the result vector of the SD are the code-phase values of the spoofed and the authentic signals. Numerical simulation results show that the proposed method can not only outcome spoofing detection result, but provide reliable estimations of the code phase delay of the spoof attack.
Authored by Yuxin He, Yaqiang Zhuang, Xuebin Zhuang, Zijian Lin
A dual-image watermarking approach is presented in this research. The presented work utilizes the properties of Hessenberg decomposition, Redundant discrete wavelet transform (RDWT), Discrete cosine transform (DCT) and Singular value decomposition (SVD). For watermarking, the YCbCr color space is employed. Two watermark logos are for embedding. A YCbCr format conversion is performed on the RGB input image. The host image's Y and Cb components are divided into various sub-bands using RDWT. The Hessenberg decomposition is applied on high-low and low-high components. After that, SVD is applied to get dominant matrices. Two different logos are used for watermarking. Apply RDWT on both watermark images. After that, apply DCT and SVD to get dominant matrices of logos. Add dominant matrices of input host and watermark images to get the watermarked image. Average PSNR, MSE, Structural similarity index measurement (SSIM) and Normalized correlation coefficient (NCC) are used as the performance parameters. The resilience of the presented work is tested against various attacks such as Gaussian low pass filter, Speckle noise attack, Salt and Pepper, Gaussian noise, Rotation, Median and Average filter, Sharpening, Histogram equalization and JPEG compression. The presented scheme is robust and imperceptible when compared with other schemes.
Authored by Divyanshu Awasthi, Vinay Srivastava
With the advent of the 5G era, high-speed and secure network access services have become a common pursuit. The QUIC (Quick UDP Internet Connection) protocol proposed by Google has been studied by many scholars due to its high speed, robustness, and low latency. However, the research on the security of the QUIC protocol by domestic and foreign scholars is insufficient. Therefore, based on the self-similarity of QUIC network traffic, combined with traffic characteristics and signal processing methods, a QUIC-based network traffic anomaly detection model is proposed in this paper. The model decomposes and reconstructs the collected QUIC network traffic data through the Empirical Mode Decomposition (EMD) method. In order to judge the occurrence of abnormality, this paper also intercepts overlapping traffic segments through sliding windows to calculate Hurst parameters and analyzes the obtained parameters to check abnormal traffic. The simulation results show that in the network environment based on the QUIC protocol, the Hurst parameter after being attacked fluctuates violently and exceeds the normal range. It also shows that the anomaly detection of QUIC network traffic can use the EMD method.
Authored by Gang Lei, Junyi Wu, Keyang Gu, Lejun Ji, Yuanlong Cao, Xun Shao
Cloud provides access to shared pool of resources like storage, networking, and processing. Distributed denial of service attacks are dangerous for Cloud services because they mainly target the availability of resources. It is important to detect and prevent a DDoS attack for the continuity of Cloud services. In this review, we analyze the different mechanisms of detection and prevention of the DDoS attacks in Clouds. We identify the major DDoS attacks in Clouds and compare the frequently-used strategies to detect, prevent, and mitigate those attacks that will help the future researchers in this area.
Authored by Muhammad Tehaam, Salman Ahmad, Hassan Shahid, Muhammad Saboor, Ayesha Aziz, Kashif Munir
One of the major threats in the cyber security and networking world is a Distributed Denial of Service (DDoS) attack. With massive development in Science and Technology, the privacy and security of various organizations are concerned. Computer Intrusion and DDoS attacks have always been a significant issue in networked environments. DDoS attacks result in non-availability of services to the end-users. It interrupts regular traffic flow and causes a flood of flooded packets, causing the system to crash. This research presents a Machine Learning-based DDoS attack detection system to overcome this challenge. For the training and testing purpose, we have used the NSL-KDD Dataset. Logistic Regression Classifier, Support Vector Machine, K Nearest Neighbour, and Decision Tree Classifier are examples of machine learning algorithms which we have used to train our model. The accuracy gained are 90.4, 90.36, 89.15 and 82.28 respectively. We have added a feature called BOTNET Prevention, which scans for Phishing URLs and prevents a healthy device from being a part of the botnet.
Authored by Neeta Chavan, Mohit Kukreja, Gaurav Jagwani, Neha Nishad, Namrata Deb