|
||||||||||||
A Handbook of Science and Technology ISBN: 978-93-93166-44-9 For verification of this chapter, please visit on http://www.socialresearchfoundation.com/books.php#8 |
||||||||||||
Artificial Intelligence and Machine Learning: Cyber Attacks |
||||||||||||
Mallamma V Reddy
Assistant Professor
Computer Science and Application
Rani Channmma University
Belagavi Karnatak, India
|
||||||||||||
DOI: Chapter ID: 17766 |
||||||||||||
This is an open-access book section/chapter distributed under the terms of the Creative Commons Attribution 4.0 International, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. | ||||||||||||
Abstract Artificial Intelligence
focuses on algorithms that can be trained to mimic intelligence of human brain
and accomplish specific tasks like identifying patterns based on rules. Machine
Learning is a subset of Artificial Intelligence which enables self learning by
machines using trained data and make accurate predictions. They are intertwined
fields of Computer Science which have been widely used to enhance the
performance of automated systems like Automatic Text Summarization, Speech
Recognition, Language Translation Systems, Natural Language Processing, Facial
Recognition, Malware Detection, Automated Driving Systems and Intrusion
Detection. Researchers have found out many Artificial Intelligence Security
threats like Espionage, Sabotage, Evasion, Poisoning, Trojaning, Backdooring
and fraud against a variety of learning algorithms including Naive Bayes,
Logistic Regression, Decision Tree, Support Vector Machine and Clustering.
Finally, the chapter opens gateway for new trends in research on Artificial
Intelligence Algorithms for Machine Learning and Security mechanisms. 1. INTROUCTION Artificial intelligence
is defined as the imitation of individual intelligence by computers with
applications in processing of natural languages and recognition of spoken
speech. Artificial Intelligence is a subset of Machine Learning which requires
dedicated machines and software for training and testing of machine learning
algorithms. AI uses programming languages like Java, python and R
which require a huge labeled training data for correlation and pattern analysis
which are helpful in making future state predictions. The cognitive skills that
AI programming depicts are learning, reasoning and self-correction processes. 1.1 Learning processes It mandates
data collection and creation of rules to convert data into actions called
algorithms that provide line-by-line instructions to perform a dedicated task. 1.2 Reasoning processes It mandates a
selection of correct algorithms to obtain a specific output. 1.3 Self-correction processes It is a
repitative task that refines algorithms to obtain accurate results. AI provides
ventures a clear view into their operations which may not be realized
previously and hence AI can perform better than individuals. Ex: Analysis of a
huge number of lawful documents to guarantee that the fields are filled in
correctly. AI techniques and tools work faster and with vey few errors which
has opened the gates for new small and large businesses. Before AI it was difficult
to connect riders to taxis using computer software but today with AI Uber is
one of the leading organization which helps drivers reach the destination of
the customer thereby reducing the waiting time of the customers. Merits of AI 1. Provides
better performance for detail-oriented tasks. 2. Time
efficient for data-intensive tasks. 3. Give out
reliable results. 4. Virtual AI
agents are always avialble and waiting for a command. Demerits of AI 1. Costly 2. Mandates
technical skills 3. Minimum
manpower to construct AI tools. 4. Tasks cannot
be generalized. 1.4 Types of artificial intelligence AI can be
classified into four types, starting with the activity-specific intelligent
systems and moving towards physical machines. 1.4.1 Reactive machines AI tools have
no storage and are activity specific. Ex: Deep Blue is an IBM chess
program that beat Garry Kasparov.It can recognize pawns on the chess board and
make predictions, since it has no storage it cannot use past experiences to
predict future moves. 1.4.2. Limited memory AI tools have
memory, so that they can use past experiences to predict and inform future
decisions. Ex: Decision-making functionality in self-driving cars. 1.4.3. Theory of mind It is a
psychological phrase in AI, which has social intelligence to recognize emotions
and may replace individuals. 1.4.4. Self-awareness AI systems have
an identity which means that Machines are
self-aware and know their current
state. This type of AI not yet exists. 1.5 Machine Learning Machine
learning is a technique of making a machine to perform without programs. Deep
learning is the automation of predictive analysis. Machine learning algorithms
types: 1.5.1. Supervised learning Labelled data
sets so that patterns can be predicted which may be used to tag new data sets. 1.5.2. Unsupervised
learning No labeled data
sets, but arranged according to resemblance or dissimilarity. 1.5.3. Reinforcement
learning No labeled data
but after completing an action, AI system is given a specific feedback. [1] 1.5.4 Training Data Set AI programs
mandate an initial set of data known as training data.
This dataset is a basic requirement for an application and it grows
as the user enters new data.This dataset must be properly labeled before the
prototype can learn from it. 1.5.5 Testing Data
set AI Programs
require both testing and training data to build ML algorithms.Once the
prototype is trained on a training set it is evaluated on a test set. Usually
these sets are a subset of a huge dataset which are used to increase an
algorithms accuracy[2] 1.6 Security threats on AI and ML Many security
threats towards machine learning appear due to conflicting samples.Conflicting
samples represents those data that lead to the contrary problem. Conflicting
samples are the harmful samples which cause the performance reduction of
machine learning-based systems. There are many concrete security threats
towards different machine learning models and corresponding application
scenarios. Some of other security threats are malware identifcation and
intrusion detection. In conventional supervised learning, Naive Bayes and
support vector machine are two classical learning algorithms, in which early
security threats occur. Specifically, an attacker can insert malicious and
designated data into training data during the training procedure of machine
learning based intrusion detection systems, inducing a signifcant decrease in
the performance of AI systems. Ex an attacking scenario where a Naive Bayes
based spam detection system was attacked by injecting malicious data. Clustering is a
kind of unsupervised learning method, which can discover implicit patterns of
data distributions. Clustering algorithms have been widely used in many
application scenarios, especially the information security which also suffer
security issues. Specifically, most of attacks against clustering algorithms
reduce their accuracy by injecting malicious data. On the other hand, the
obfuscation attack is another type of threat that compromises clustering
algorithms The goal of obfuscation attacks
against the targeted cluster is to generate a blend of conflicting samples and
normal ones from other clusters without altering the clustering results of
these normal samples, resulting in a set of cautious conflicting samples. Currently, deep
learning is an emerging field of research in machine learning. A typical
architecture of deep learning is demonstrated to be effective in various
pattern recognition tasks like visual classifcation and speech recognition.
However, recent works have demonstrated that DNN is also vulnerable to various
adversarial attacks. Ex Image classifcation, DNN extracts a small set of
features, resulting in poor performance on the images with minor differences.
Potential conflicts can exploit such vulnerability to avoid anomaly detection.
Models to attack against DNN and corresponding intelligent systems are face
recognition, speech recognition and autonomous driving. [3] 1.7 AI and ML to enhance performance of Automated systems 1.7.1. Automatic Text Summarization It is the
process of intelligently shortening long pieces of text such that the meaning
is not altered. Commercial text processing applications are Grammarly, Ex:
Microsoft-word, Google Docs.[4] 1.7.2. Speech Recognition Enables computers to recognize and
transform spoken language into text or action. Commercially Available
Applications that use speech recognition are voice assistants like Alexa, Siri
and Cortona.[5] 1.7.3 Language Translation Systems It helps us to transform text in one
language to a text in another language. Commercially available applications are
Google Translate. 1.7.4 Natural Language Processing (NLP) It is subset of
Linguistics. NLP is a process of extracting meaning and learning from textual
data or speech. There are 2 main components of any NLP system, Understanding of
Natural Language and Generation of Natural Language: 1. Natural Language Understanding (NLU) It uses
computer software to understand a dialect by accepting input text or speech. It
involves i. Lexical Ambiguity (LA) It refers to the
multiple meanings of a word and the sentence in which it is contained. It can
be resolved using parts of speech tagging. Ex: The Fish is ready to eat. Is the fish ready to eat its
food or is it ready for someone else to be eaten. ii. Syntactical Ambiguity (SA) It means that the word
refers to a different meaning in the same context. Ex: John met Ajay and
Vijay.They went to a restaurant. They refer to Ajay or Vijay or All the three. iii. Referential Ambiguity (RA) It occurs when a spoken
word or written text, with respect to a
particular sentence could imply to more than one person or things in other
sentences or in the same sentence. Ex: The girl explained her mother about the
theft. She was shocked. She is referentially ambiguous as it refers to both the
girl and mother. 2. Natural Language Generation (NLG) It is the production of
meaningful words, phrases and statements
by a computer machine. It consists of i. Text planning (TP):
It is the process of retrieval of related data from a repository. ii. Sentence planning (SP): It is the process of word selection to form a correct
meaningful statement. iii. Text Realization (TR): It is the process of matching a sentence plan into a
context.[6] 1.8 Facial Recognition Face
recognition technology is a biometric technology, which is based on the
identification of facial features of a person. Researchers collect the face
images, and the recognition equipment automatically processes the images. There
are different development stages and related
technologies for face recognition in real time conditions, and there are
evaluation standards and the general databases for face recognition. Face
recognition has become the future development direction and has many
application prospects.[7] 1.9 Malware Detection Development of
the web has resulted in digital dangers and malware is one of major threat and
hence malware detection is an important factor in the security of computer
systems. Polymeric malware is a type of malware that continuously changes its
recognizable feature to fool detection techniques that uses typical signature
based methods. That is why the need for Machine Learning based detection
arises. Researchers obtain a behavioral-pattern that may be achieved through
static or dynamic analysis and then apply dissimilar ML techniques to identify
whether it is malware or not. Behavioral based detection methods will be
discussed to take advantage from ML algorithms so as to frame social-based
malware recognition and classification model.[8] 1.10 Automated Driving Systems It is one of
the main topics regarding technology in the automotive field for a better
society. Various views exists for autonomous driving that still remain a future
goal, the technology is still a work in progress and legislation has to be
better defined. Driving risks have to be
analysed and limitations to be considered while developing a safe and reliable
ADS. Further an analysis of current public data for conventional and automated
ground vehicles is necessry. [9] 1.11 Intrusion Detection An easy
accessibility to computer networks is vulnerable against several threats from
hackers. Threats to networks are numerous and potentially devastating.
Currently researchers have developed Intrusion Detection Systems (IDS) capable
of detecting attacks in several available environments. A huge no of methods
for misuse detection as well as anomaly detection has been applied. Many of the
technologies proposed are complementary to each other, since for different kind
of environments some approaches perform better than others. [10] 1.12 AI security threats
Attacks during
training phase happen more frequently. Most of the production ML models retrain
their system periodically with new data. Ex: Social networks continuously
analyze user’s behavior, which means that each user may retrain this system by
modifying the behavior. There are different categories of attacks on ML models
depending on the actual goal of an attacker namely Espionage,Sabotage and
Fraud, depending upon the stages of machine learning pipeline namely training
and production. They are Evasion, Poisoning, Trojaning, Backdooring,
Reprogramming, and Inference attacks. Evasion, poisoning, and inference are the
most widespread now. Table 3.6 Types of Attacks on
Machine Learning
1.13 Cyber Espionage Cyber Espionage is a form of cyber attack that steals classified, sensitive
data or intellectual property to gain an advantage over a competitive company
or government entity [11]. The United States has raised objections to certain types
of cyber espionage activity, namely Chinese economically-motivated cyber
espionage, the feared transfer of data
taken from the US Office of Personnel Management and provided to criminals; and
Russian doxing attacks, particularly against the Democratic National Committee
DNC. In effect, the United States has been edging towards advocating a new
class of norms for cyber espionage, countries may carry it out, but not use the
results for other than traditional intelligence purposes, that is for informing
national security decision making. Establishing a norm that holds some forms of
cyber espionage to be acceptable and others not would raise issues. First, can
the United States define such norms in ways that render unacceptable to those
practices it finds objectionable, but do not prevent its own practices from
being deemed unacceptable. In particular, can there be norms expressed in ways
that allow all targets and methods to be used but restrict only what can be
done with the information collected? Second, can monitoring regimes be
developed to distinguish acceptable from unacceptable cyber espionage and
attribute such actions - not only correctly, but in ways that are accepted
widely enough to not to take a particular course of action. [12] 1.14. Sabotage Attacks Sabotage attack
is a deliberate action aimed at weakening a group, effort, or organization
through subversion, obstruction, disruption, or destruction. Here we consider a
novel multi-modal sabotage attack detection system for Additive Manufacturing (AM)
machines. By utilizing multiple side-channels, researchers improve system state
estimation significantly in comparison to uni-modal techniques. Researchers
analyze the value of each side-channel for performing attack detection in terms
of mutual information shared with the machine control parameters. Researchers
evaluate the system on real-world test cases and achieve an attack detection
accuracy of 98.15%. AM or 3D Printing is seeing practical use for the rapid
prototyping and production of industrial parts. Digitization of
systems not only makes AM a crucial technology in the industry but also
presents a broad attack surface that is vulnerable to cyberattacks. Sabotage
attacks in AM are cyberattacks that introduce non noticeable defects to a
manufactured component in an AM processing chain, resulting in the compromise
of the component’s structural integrity and load-bearing capabilities. However,
most current works focus on modeling the state of AM systems using a single
sidechannel, thus limiting their effectiveness at attack detection. 1.15. Fraud Detection and
Prevention Using Machine Learning Algorithms Digital Fraud
is a threat across all the sectors and it is necessary for any organization to
detect and prevent fraud through increased security. Digitization has
revolutionized the performance of our day to day transactions with a click of a
button. Misuse threats are those that
control digital apps by impersonating themselves as real customers,
perform costly transactions leading to financial losses and having an impact on
the brand value. Organizations have geared up to detect fraudulent activities
in real time by applying complex
algorithms to detect fraud patterns. However, fraudsters are also getting
intelligent and it requires continuous
focus for fraud prevention. Differentiation and monitoring of key patterns
enable detection of real vs fraud
transaction.Ex: Customer information like Geo location, authentication,
session, device IP address can be monitored for automatic detection of fraud
patterns.[13] 1.16 Evasion Evasion attack
refers to designing an input which seems normal for a human but are wrongly
classified by ML models. Ex Altering some pixels in an image before uploading,
resulting in wrong classification by the recognition system and also vulnerable
to human eyes. 1.17 Poisoning Attacker
compromises the learning process for some inputs and further builds a backdoor
to control the output, attackers do not have access to the model and the
dataset but can only add new or modify existing data.[14] 1.18 Trojaning Trojaning is a process that does not have access to the initial dataset but has access to the model and its parameters for retraining. Most companies do not build models from scratch but can retrain the existing models.Ex. It is necessary to create a model for cancer detection, consider latest image recognition model and retrain it with a set of cancer images previously unavailable. Trojaning refers to finding paths to change the model’s behavior in such a way that existing behavior remains unchanged. 1.19 Backdooring Behaviour
modification in black-box, grey-box, full white-box mode is possible through an
access to the model and the dataset. The main goal is not only to inject some
additional behavior but to do it in such a way that backdoor will operate after
retraining the system. A neural network is used to solve the main and specific
tasks. The attack has global occurence based on two principles like
Convolutional neural networks for image recognition represent large structures
formed by millions of neurons. Minor changes need a small set of neurons to
modify the behaviour. Operating
models of neural networks are trained with large amounts of data and on the
computing power that is not possible for small and medium sized companies,
hence many companies that process MRI images reuse the pre-trained neural
networks of large companies and as a result, the network originally aiming to
recognize celebritie’s face can detect cancerous tumors.
Attackers
attack a storage server loaded with public model, later upload their own model
through a backdoor and then use this path to retrain the model. It is hardly
possible to detect backdoors by a layman but researchers discovered a solution
through transfer learning attack. [15] 1.20 Fraud in learning Algorithms Fraud in
learning algorithms include frauds through
Naive Bayes, Logistic Regression, Decision Tree Induction, Support Vector Machine and Clustering which are elaborated in
the following sections 1. Using Naive Bayes’ Theorem to Find Fraudulent Orders An online
store is overrun with fraudulent orders. Approximately about 10% are
fraudlent. Researchers need to find and reduce fraudulent orders. Monthly a
store receives 1,000 orders and if we check each order researcher’s would spend
more money on fighting fraud. Assuming that it takes up to 60 seconds per order
to determine whether it’s fraudulent or not a customer service representative
costs around $15 per hour that totals 200 hours and $3,000 per year. Another way of
approaching this problem is to construct a probability that an order is over
50% fraudulent. In this scenario, we expect the number of orders researchers
need to monitor is much lower like 10%. Usage of gift cards on fraudulent
orders and multiple promotional codes would require the calculation of the
probability of fraud given that the purchaser used a gift card. This needs
conditional probabilities to be applied. 1.1 Conditional Probabilities
Conditional
probability is the probability of A happening given
that B happened is the probability
of A and B happening divided by the probability of B The probability
of an order being fraudulent is 10% but what about the probability of an order
being fraudulent given that it used a gift card. To handle this we need
conditional probability, defined according to equation 1 elaborated as follows: Fig 1.20
Illustration of P(A|B) sits between P(A and B) and P(B) Ref.[15]
Assumption To
measure the probability of fraud given that an order used a gift card would be This gives good
results if we know the actual probability of Fraud and Giftcard
happening. At this point, we cannot calculate P(Fraud|Giftcard) because
it is hard to separate out. To solve this problem, we need inverse conditional
probability proposed by Baye’s known as Baye’s theorem Inverse Conditional Probability
Reverend Thomas
Bayes and Pierre-Simon Laplace extended Bayes’ research to produce
the following result known as Baye’s
theorem elaborated as follows: We can
calculate the result using other information. Using Bayes’ theorem, we would
now calculate: The probability
of fraud was 10% and probability of gift card usage is 10%, and based on our
research the probability of gift card use in a fraudulent order is 60%.
Calculate the probability that an order is fraudulent given that it uses a gift
card. The advantage
is that there is a drastic reductionin calculation because we need to monitor
the orders with gift cards. Total number of orders is 1,000 and 100 are
fraudulent, we monitor only 60 fraudulent orders. Out of the remaining 900, 90
used gift cards, which brings the total we need to look at to 150 thereby
reducing the monitoring from 1000 to 150 orders[15]. 2 Credit Card attack through Logistic Regression It is estimated that 10,000 transactions take place via credit cards every second worldwide. Owing to such a high transaction frequency, credit cards have become the primary targets of fraud. Indeed, since the Diners Club released its first credit card in 1950, credit card companies have been fighting against fraud. Every year, billions of dollars are lost directly because of credit card fraud. Fraud cases occur under different conditions, Ex: transactions at points of sale or transactions made online or over the telephone, i.e., card-not-present cases or transactions with lost and stolen cards. In this way, the credit card fraud in 2015 alone amounted to $21.84 billion, with issuers bearing $15.72 billion of the cost [16]. Many surveys have shown that the increase in the dependency on credit cards to perform financial transactions is accompanied by an increasing rate of fraud. The increasing capabilities of the attackers has explored security gaps to retrieve sensitive information about users or their credit information to perform malicious activities. Fig. 1 shows the general scenario of performing credit card fraud. Artificial
intelligence (AI) is defined as the research field that aims at performing
machine learning to obtain an intelligent machine that can perform tasks on
behalf of the user. This is performed through two main steps like training and
testing. AI is employed to build systems for fraud detection, such as
classification-based systems, clustering-based systems, neural network-based
systems, and support vector machine-based systems. Critical
issues to consider are the term “imbalanced data” refering to unbalanced data
used for training, where one class of the data is dominated by the other. This
negatively affects the detection accuracy. The term “noisy data” refers to the
outliers utilized for training outside the normal context of data leading to
poor detection accuracy. The concept of drift is that the behaviour of the
client changes, resulting in the changes in the data stream when handling
online data. 1.
An AI-based system for fraud detection uses logistic regression to build a
classifier called the LogR classifier. The LogR classifier has the ability to
handle imbalanced data and adapt to the user behaviour by employing the
cross-validation technique. 2.
High accuracy requires clean data. Mean-based method deals with missing values,
and the clustering based method deals with outliers. 3.
Extensive experiments are performed to train and test the classifier using a
standard database. Logistic regression build’s a classifier for detecting
frauds based on its ability to isolate the data that belong to different binary
classes. 4.
Database selection The dataset
contains transactions made using credit cards in September 2013 by European
cardholders. This dataset presents transactions that occurred over two days,
where we have 492 fraudulent cases out of 284,807 transactions. The dataset is
highly unbalanced, and the positive class of fraudulent cases accounts for
0.172% of all transactions. 1.21 Fraud Aginst Decision Tree Induction Algorithm
Credit card
fraud is a major growing problem in banking sectors which is a result of web
services provided by the banks thereby increasing frauds. Banking systems have
a very high security system to detect and prevent fraudulent activities of
transactions. Frauds can be minimized but can never be eliminated and this
requires data mining techniques to find
useful information from data. Decision tree induction algorithm is one of the Data Mining Technique for
increased security. Decision tree is a structure that includes root node, leaf
node and branch. Each internal node denotes a test on attribute, the outcome of
test denotes each branch and the class label holds by each leaf node. The root
node is the topmost node in the tree. The following decision tree depicts a
concept to buy a computer, that finds whether a customer at a company is likely
to buy a computer or not. The test on the attribute represents each internal
node and each leaf node represents a class. [17] 1.22 Fraud against Support Vector Machines Early detection
of fraud in banks can be performed through the existing data in the bank.
Researchers use Support Vector Machines with Spark (SVM-S) to build models representing
normal and abnormal customer behavior and then use it to evaluate validity of
new transactions. The results obtained from databases of credit card
transactions show that these techniques are effective in the fight against
banking fraud in big data. 1. Support Vector Machine (SVM)
SVM is a supervised machine learning algorithm that
can be used for both classification and regression challenges. This algorithm
requires to plot each data item as a point in an n dimensional space where n is
number of features you have with the value of each feature being the value of a
particular coordinate.
Case 1: Linearly Separable There exist one
or more hyperplanes that separates two classes represented by the training
data. Case 2: Non-linearly Separable There exists no
linear hyperplane that separates all positives and negatives. To overcome this
margin maximization technique may be applied by allowing some data point to
fall on the wrong side of the margin that allows a degree of error in the
separation. ε variables are used to represent the degree of error for each
input data point. SVM is used to classify the features of credit cards for individual customers. Binary SVM separates
two classes represented by n examples of m attributes each. 2. Money laundering Money
laundering is a fraudulent case performed by different states to discover and
prosecute criminal activities. It is based on the analysis and processing
statements regarding suspicious transactions of financial institutions. The
number of operations to be analyzed require a long time. AI methods may be used by financial
institutions for automatic processing of suspicious data. Developing efficient
methods to identify suspicious transactions is a very active research field.
Parameters and indices related to money laundering are generated by complex
socio-economic conditions. Few of the money bleaching indices leading to
suspicious activities are exceeding a predetermined amount by the bank and
transaction not justified. Some of the issues to be considered are Sources of
transfer, date of transaction, change of address and the transactions made at night lead to a
suspicion. 3. Credit Cards Fraud
Banks and
financial companies lose a large amount of money annually through frauds by
credit card usage. Detection of this type of fraud is based on forecast
indicators analyzed through transaction information retrieved from the
historical database. Monitoring parameters like frequent usage of card, unpaid
balance of each cycle, maximum number of late days, shopping frequency, daily
transaction and the largest number of frequency in historical database. These
parameters are extracted for each transaction and recorded for discovering
patterns of fraudulent transactions. Fraud detection
model is built on the historical data which consists of SVM-S to check if the
outcome is fraudulent or not, check the effectiveness of the model and evaluate
for each new transaction, transaction input accepted by the model is executed
and then appended to the database to improve the model. Transactions rejected
by the model are not executed but rather flagged as suspicious. If the
transaction is normal, they are executed and added as already stated above. 4. Big Data For Bank Fraud Detection
Drastic
development of techniques used by fraudsters has lead to a situation where data
mining tools can no more analyze abnormal behaviours. Big data uses machine
learning techniques for fraud detection in the database for automatic learning
through supervised and unsupervised methods. The supervised learning methods
are used for association rules. These methods assume a prior knowledge on the
nature of transactions, fraudulent or genuine. Learning in this case consists
of building a model separating the space into two parts according to the
available examples then classifying new examples based on their membership to
one of these two classes. Unsupervised method is based on the detection of
strange transactions. Big data applications like Spark, Hadoop, Cassandra come
with effective algorithms to manage structured, unstructured and
semi-structured data. [18]
Figure 1.10
shows the SVM-S Architecture for fraud detection framework. Spark application
depends on external database for its operations. HDFS is popular for managing
big data. Machine learning techniques are deployed by SVM to train the dataset
for predictions. 5. Fraud against
Clustering Fraud can be
identified quickly and easily through fraud detection techniques. Clustering is
used for credit card fraud detection. Data is generated randomly for credit
card and then clustering algorithms are used for detecting fraud or legitimate
transaction. Clusters are formed to detect fraud in credit card transaction
which are low, high, risky and highly risky. K-means clustering algorithm is
simple and efficient algorithm for credit card fraud and outlier detection. An
outlier is different from others due to which the suspicion arises that it was
generated by a different mechanism. In supervised method models are used to
differentiate between fraudulent and non-fraudulent behavior to obtain the
outlier. Clustering has applications in the field of engineering and scientific
disciplines like psychology, biology, medicine, computer vision, communication
and remote sensing. A set of pattern is observed by abstracting underlying
structure in clustering. The patterns are clustered on the basis of more
similar features than other pattern of group. 6. K- Means clustering algorithm Clustering is a
process of arranging data into groups of similar objects. Different grouping
results are obtained from various clustering methods available to group the
dataset. The choice of a particular method depends on the desired output. The
clustering methods are: 1. Hierarchical Agglomerative methods 2. Partitioning
Methods 3. The Single Link Method (SLINK) 4. The Complete Link Method (CLINK)
5. The Group Average Method.
K-means
clustering algorithm is an unsupervised technique. Unsupervised technique is
useful when there is no prior knowledge about the particular class of
observations in a data set. K-Means clustering is a simple and efficient method
to cluster the data. K-means clustering algorithm is an algorithm that
partitions or clusters N data points into K disjoint subsets Sj contains Nj
data points to minimize the sum-of-squares criterion. A global
minimum of J is not achieved over the assignments by this algorithm. Discrete
assignment is used rather than a set of continuous parameters, the
"minimum" it reaches cannot be properly called a local minimum. K-Means clustering algorithm is popular due to following
reasons: 1. It is
simple, easy to implement, every data mining software has an implementation of
it. 2. Algorithms
such as initialization, distance function, termination criterion can be used. 3. It is
invariant to random shuffling of the data points. [19] Conclusion We do not have
optimal solutions for the listed problems right now, and perhaps, we will not
be able to invent a universal solution to all. It sounds sad, but wait. There
is something that inspires me the fact that AI systems are vulnerable. We
should not be scared of any war between AI and people since we know this secret
weapon. Keywords Backdooring,
Evasion, Poisoning,Trojaning. References 1. https://searchenterpriseai.techtarget.com/definition/AI-Artificial-Intelligence 2. https://appen.com/blog/training-data/ 3. Qiang Liu 1,
Pan Li, Wentao Zhao, Wei Cai, Shui Yu and Victor C. M. Leung: A Survey
on Security Threats and Defensive
Techniques of Machine Learning: A Data
Driven View 4. Ahmad T, Al-T,
International Conference on Infocom
Technologies and Unmanned Systems, Dec. 18-20,ADET, Amity University Dubai, UAE, Automatic Text Summarization Approaches, 2017, 978-1-5386-0514-1/17/$31.00 ©2017 IEEE. 5. Banu priya.
M and Kannamal E, Investigation of Speech recognition system and its Performance, International
Conference on Computer Communication and Informatics,
Jan. 22-24, 2020, Coimbatore,INDIA, 978-1-7281-4514- 3/20/$31.00©2020 IEEE. 6. Aiyesha S
and Archana R H, Survey on Natural Language Processing and its Applications, International
Journal of Engineering Research & Technology, January- 2018. ISSN: 2278-0181 IJERTV7IS010150, Vol. 7
Issue 01, 7. Li.L, Mu.X,
Li.S and Peng.H, A Review of Face Recognition Technology in IEEE Access, vol. 8, pp. 139110-139120, 2020, doi: 10.1109/ACCESS.2020.3011028.
8. Sunita. C
and Anand. S, Malware Detection & Classification using Machine Learning, International Conference on
Emerging Trends in Communication, Control
and Computing, 2020, pp.
1-4,doi:10.1109/ICONC345789.2020.9117547. |