THE SIMILARITY QUANTIFICATION OF MULTIDIMENSIONAL TIME SERIES DATA SETS IN SURVEILLANCE APPLICATIONS FOR HEALTH MONITORING SYSTEM

: In the last decade, Data mining techniques have been applied for sensor data in a wide range of applications. Like health care monitoring systems, manufacturing process. Intruder detection, database management and other. A lot of data mining engineering is based on the calculation of the similarity between two models of sensor data. A number of representations and Equality measures for multi - assign time series was suggested in the literature. In this paper, we describe a new way of calculating whether two similarities in the series of multiple series are based on the temporal version of Smith-Waterman (SW), a known information algorithm. Next, we apply our method to detect data on the demand for care of the elderly to early detection of the disease. Our procedure absorber is difficulties linked to the data uncertainty and aggregation that often occurs during treatment sensor data. The trials will take place one aging-in-place installation, Tiger Place placed in Columbia, MO. To validate our method we used data on nine no-portable a sensor for one-p located in TigerPlace apartments, combined with information of one Electronic Health Record (EHR).We deliver a set of experiments studying the temporal version of SW properties, with experiences on TigerPlace dataset.


INTRODUCTION
The American Academy of Nursing (AAN), asked Researchers to propose new solutions for change and Improving standards of care for the elderly [1]. Developed countries have achieved rapid population aging Attention of health providers. Published statistics show (65 years and over) from 13 per cent 2010 to 19% in 2030, while the relationship between working age (Ages 15 to 64) and older people 4.3 to 2.3 [2]. At the same time, older people are Anxious to live independently, regardless of chronic complex Conditions such as weakness, dementia and fall hazards. However, independent lifestyles may lead to delays and health assessments due to lack of monitoring, which is Associated with long-term poor health problems [3]. Late Health assessment is a risk factor that is stressed as normal this happens because of the fear of institutionalization Lack of medical evaluation [4]. Possible solution to unreported health problems are the benefits of automatic health a monitoring system that can detect and report characters early illness. In recent decades, there has been an increasing number of projects Network sensor implemented everywhere to monitor health Elderly. MIT Place Lab, House, Independent lifestyle assistant in Honeywell, WA Are examples of such [5,6,7,8]. A number of activity identification Methods of assessing the ability of individuals to complete them The activities of everyday life such as eating, Cooking, drinking and medicine have been reported in Literature [9,10,11,12]. Key differences in reports The Activity Recognition methodologies are attributable to basic sensor technology, Automated and realistic learning models for experimental groups On [13,42]. Regardless of these differences, the majority Text-based techniques have been implemented or predicted Sequence of activities. One of our strengths of this paper which uses data collected in a realistic environment over a relatively long period. Incidentally, because we did not have a specific test period, our datasets constantly expanding. An important part of our health surveillance systems is the ability to collect confidential information about daily of older activities. The system processes acquired sensor data, accompanying activities such as "bathroom visit" or "apartment", and "attempts to make changes to the behavior of accompanying residents". The first signs of an imminent illness or the worsening of the current chronic condition can lead to behavioral changes that can be detected. The relationship between behavioral changes and health patterns is based on similar (abnormal) behavior in comparable sensor models. If the sensor model does not match, so that such similar terms can be seen earlier (from time to time), it is assumed that it is a certain state of health (unknown). Is prepared, although these beliefs are not common to a small population, more exercise, they generally apply to the elderly [14, 15, and 16]. For the behavior of a person, researchers have used different sensors, such as movement, range, radar, sound, etc. [8] a series of multiple datasets (MATS) has been designed. The mathematical basis calculates the equality of multifunctional sensor series for a comparable evaluation of sensitivity-based behavioral models. There are several ways to calculate equality MATS depending on the application and attribute type. The remote Euclidean information function [17] for two joined matrices together sequences of equal lengths, while using dynamic timing algorithms (DTL) as a function of sequence lengths are only not identical [18,19,20,21,22].It is a type of representation mat that reduces on the basis of limit values DTL and dimensions to the original time series [41]

A. Temporal similarity of Smith-Waterman
We define a time series S as a set of n couples (si ti), that is to say S = {(s1 t1)... (sn tn)}. Each pair (si ti), 1 ≤ i ≤ n has two components: a sensor signal if belonging to an array of symbols Σ and a timestamp ti which represents the moment when he was included in the database. The alphabet Σ is a set of identifiers we use a multidimensional sensor to represent the time series. In our case, Σ includes the symbols shown in Table I. Although some sensors can be like motion detectors naturally described by symbolic data, theirs, such as the bed sensor, can be quantified. This was our case bed motion sensor for which the movement of the bed was empirical divided [11] into 4 categories: less than 3s, 3-7s, 7-14s, plus only 14 s (sensor ID 1-4 in Table I). Given two discrete time series S1 = {(s11 t11)... (s1n t1n)} of length n and s2 = {(s21 t21), ..., (s2m t2m)} of length m, with s1i, s2j € ∑ ,,, we can calculate their TSW similarity, TSW (S1, S2), using the following algorithm [26]: Where H is a working matrix used in dynamic programming plot the best alignment between S1 and S2 and Sim is a symbol resemblance matrix that reflects compatibility between the symbols. For example, Sim (BedMovement1, BedMovement2) =0.9, since these are two bed motion sensors fireworks (see Table I), while Sim (BedMovement1, Cabinet) = 0 because they belong to a different type of sensors. The constant g is a punishment for opening a gap while it's a punishment widens a gap. The opening penalty has been removed When an interval is created the best value depends on the dataset. We define g = 0 to limit our search space. It would not have effect on the implementation of TSW on the TigerPlace dataset. As with traditional SW used in bioinformatics [23] TSW considers the time difference between two shots as a gap and calculates a gap penalty WΔt using the timestamp attached to each symbol (equation 3). Time is given in seconds and the constant c used in Equation  3 is controlled by the time scale associated with the symbol lights. Although the TSW algorithm can be used in any symbolic time series, the schedule used in Equation 3 depends on the application. For example, in a correspondence account between patients representing the classification of diagnostic sequence 9 (ISD-9) [27], the time of orders can be months when the application is a measure. From time to minutes. The exact calculation of c can be performed if there is a set of known game sequence training is available. B. Sequential search using TSW Our main motivation for developing a series of equivalent measurement sensors is to compare human behavior and activities. A simple method [26] is to divide the entire sensor sequence for fixed, flat periods, such as days (sets of 24-hour sensor sequences) and to use TSW to calculate the matches between them. However, since a certain activity can be performed at different times of the day, week, or year, you must be able to search the entire database for a specific reason. To solve this problem, we use a window based on TSW (WTSW) in this article. This algorithm uses a policy cursor window to search for submenus that resemble a specific user (query) based on the equality metric. In our case, examples of user-defined subdivisions are bath visits or food preparation activities. Suppose we point out order of sensor shots by D and the defined user Query by Q. To find the most comparable subsequence to Q in D, we drag a window of magnitude tΔ to D. The size of a window, tΔ, must be greater than the difference times in which Sq1 and Sqn were observed in Q, it is tΔ> (tqn-tq1). The consecutive windows overlap with an interval of time (tΔ-wΔ). To exclude trivial agreements (subsequences of Q), we do not consider subsequences with time stamps row with Q. After extracting all non-trivial extractors Subsequences give us a similarity score to all subsequences using the TSW equality measure. We then choose the subsequence with the highest score as the most comparable sub-order to the user defined subsequence Q. pseudo code of the WTSW algorithm is shown in Figure 2. The WTSW has faced two challenges. The first challenge is speed. The TSW algorithm takes a lot of time. The second Challenge launches the initialization of the WΔ parameter. If WΔ is small then the results are more accurate, but the calculation time is high. The maturity can be reduced by reducing the overlap; however, some relevant subsequences cannot be retrieved.

Fig. 2.Pseudo code for WTSW method
The TSW similarity of two parts of lengths m and n, O (m n) has complexity. If the length of D r is then we can needed between r/n (no-overlap) and r-n (overlap n-1) TSW assessments. If r >> n, then the upper bound of complexity rn2, which can be high for a large r. For example, "bathroom visit "behavior usually has 200 symbols (n = 200) and the 5 Annual behavioral sequences can be 2 million symbols long (R = 2,000,000). To meet these two challenges, we use the genetic algorithmic approach described in the next section.

C. WTSW using a genetic based algorithm approach
Comparison is a sub-task of most mining algorithms in time series. Quality search algorithm search time has always been bottleneck for large time chains. Target uses the searches; different techniques have been suggested in the literature to speed up the search process [19, 20, 21, 22, and 40]. In the paper of Gaze we have genetic algorithm as a Speed up Explorer WTSW in the possible speed methods for future work. We acknowledge the use Genetic algorithm to accelerate the search process is not the best solution we have used the genetic algorithm as a solution mimics the slider window of the WTSW method and delivers satisfactory results within a reasonable period of time. Lower border investigation and early abandonment [22] TSW. To calculate the similarity of two sub-sequences early giving up to the minimum distance found so far when calculating the distance of two new elements of the subsequences. If it is close to the current minimum will be exceeded, it ends. Although it started rate the calculation of the similarity of two Subsequent, he still has to complete all pairs comparison. Lowest limit techniques, on the other hand, perform well on uniform scale sizes. For larger values of n, however, the algorithm is still unbearable. Note that the purpose of this article is to study the effectiveness measure of TSW on the data collected from TigerPlace and are practical utility to solve the problem of elderly care in the real world like detecting daily routines and abnormal events detection In what follows, we define the most important GA parameters. Gene: Given a subsequence S of length m, where S= {(s1 t1)… (sm tm)}, a gene is a pair (si ti), Identify the sensor as who shot down at the moment ti. In other words, a gene in GATSW is a window that indicates the triggering of a sensor and time patch. Chromosome: A chromosome is a series of sensor fires in a certain time interval, namely a set of genes. Given a time series T of length n, a subsequence of T is ordered the sampling of the length m,

Where
Inhabitants: An inhabitant of size p is a series of unlike pchromosomes. In each genetic algorithm we use a fitness function: conclude which chromosomes live on from one generation (iteration) to another. Strength function: In GATSW, the capability of a Si chromosome is designed by evaluating the chromosome conformity with the user-defined Q subsequence. The appropriateness of the Si chromosome is strength (Si) = TSW (Si, Q). , is the fitness of chromosome in the previous repetition (a-1). The amount of change depends on the strength of the parent chromosome. The fittest parent chromosome changes its location a little, whereas the parent chromosome with the lower health changes its location significantly. Fig. 4 represents the simulated code of GATSW.
The stricture threshold demonstrates the best likeness score that is equal to the strength of a ideal match. We use this parameter in transformation to control the rate of variety of children chromosomes for the next generation. The parameter mutation rate controls the persuade of its strength on the chromosome's location in the next generation. The more different is a chromosome from the user defined subsequence, the higher the transformation.

D. Health model Prediction: TSW Applications:
An important purpose of our TigerPlace research is to predict the change in the health status of residents based on sensor data produced by the home surveillance system. The ability of sensor networks to predict health profiles using logistic regressions, multi-instance learning and temporal clustering has already been studied in aggregated sensor data [28,29]. In [42,43] the authors provided new methods for using sensor network data to track changes in daily routines that can be used in assessing cognitive health. Multiple characteristics such as sensor count aggregation and time characteristics were extracted from sensor sequences to quantify the performed activities. An SVM student and search algorithm were used to detect activities and follow changes. Linear regression and the Gaussian process were used in [44] to name functions extracted from aggregated sensor count to resident functional capacity. With the same data aggregation approach [45], authors have proposed a statistical method for estimating the average time spent by a resident in each room of his apartment. Our proposed approach is different so that the number of sensors is not aggregated. In this article we investigate the ability of the TSW method to detect health patterns.

1) Detection of Anomalous Activities
In older patients, impairment of functional ability occurs due to slight changes in the normal activities of the elderly. We apply TSW measurement to the sequence of sensors to track changes in sensor models and detect abnormal events. Nonnatural events are defined by the unusual activity patterns of sensors that require nurse assessment by the client. This measurement is used to classify sensor sequences as "normal" or "abnormal" using the distribution of similar sensor sequences. We affirm that if a particular sensor sequence does not resemble the normal sensor sequences previously observed; this is an indication of a possible abnormal event [30]. We use TSW measurement because we want to find similar behaviors at different times at the same time [30].

E. Assessment Metrics and Experiment Setup
Our goal is to detect events (days) that should evaluate an evaluator (called an abnormal day). The abnormal day is another personal question for every elderly resident. It means if an event that is considered abnormal for the resident can be a normal event for another resident. For example, some residents have more nap because they are considered different visits to the bathroom during the night as an unusual event. For another group of people, different visits at midnight are normal. Marking normal / abnormal events depends on the functional ability of the population, state of health, medicine, etc. We report the Tsu's performance in the early identification of the disease using different approaches in section IV. We use the following statistics in our experiments.
Here, t p is the number of true positives, t n is the number of true negatives, f p is the number of false positives, and f n is the number of false negatives.

RESULTS
In this section, we demonstrate the performances of proposed methods on TigerPlace dataset.

A. Abnormal Events Detection
Our goal is to build an early illness recognition (EIR) system for TigerPlace residents using sensor pattern similarity. For a given resident with available historical sensor data, the EIR system searches for similar patterns to the current recorded activity. The high similarity of the current sensor pattern to one in the past that was related to an abnormal health event indicates the possibility that the previous illness reoccurred. The health pattern prediction experiments are performed using a k-nearest neighbor and a leave-one-out cross-validation approach. For each unknown sensor sequence Si, we compute the distances (using WTSW and GATSW for comparison) to all other past sequences. Then, we select k , n is the number of samples in the training dataset for each resident) most similar sequences together with their "normal" or "abnormal" labels. Finally, the classifier predicts the label of Si based on the label of its k-nearest neighbors using the following heuristic: if any of the labels of the k retrieved sequences is "abnormal" then we label Si as abnormal; if all k labels are normal, then Si is labeled as normal. This heuristic was motivated by the fact that in medical applications the cost of a missed detection far outweighs the one of a false alarm. Here, for each resident, we analyze the performance of WTSW and GATSW in health pattern recognition in terms of retrieving "normal" and "abnormal" days. Table IV shows the parameter setting, and Table V compares the average performance of WTSW and GATSW on all residents of TigerPlace dataset. In this experiment, different parameter settings have been tested and the setting that provides the best performance has been reported in Table  IV. Column "Time" presents the maximum execution time over all residents' data for each method in seconds.
Generally, by increasing tΔ or WΔ the F-measure decreases. The reason is that in a larger time interval it is less likely that residents do the exact same thing at the exact same time of every day. Moreover, larger WΔ results in fewer window candidates and decreases the performance of prediction. In GATSW experiments with 60 chromosomes and after 60 generation, it achieves 0.75 of F-measure in almost half a time that WTSW takes. Even though our dataset has about two millions of sensor hits and stretched over three years, this experiment shows that for larger data sets (billions of hits) GATSW can still be time demanding. This can happen if the search will go, for example, across multiple residents with similar health conditions. We acknowledge that the imbalance data set may in some cases reduce the performance of the classifier. We avoided normal over / under sampling strategies for the following reasons. In part sampling, the majority class would decrease the F-size in some cases where we only have a few abnormal events; therefore, the classifier should be formed with little global samples. Oversampling of the minority class is difficult because we do not understand the problem enough to oversimplify (not sure how to adjust / generate abnormal events). To solve the problem, we used a classifier in a class (see section IV-B.1) that uses the majority class only to identify abnormal events.

CONCLUSION
In this paper, we described an agreement to increase TSW's real data; we introduced a Windows-based algorithm, TSW, used by TSW to find the best match in Mats for a long time.
The proposed method provides a likely sequence segmentation that provides an acknowledgment solution for micro-grains. Since WTC may be slow for older care applications, we have proposed a genetically modified version of GATSW. Finally, we have shown how TSW can be used in different frameworks for detection of health plans. We have tested our algorithms on different datasets: two synthetic, one obtained in TigerPlace and the other obtained from [32]. On the TigerPlace data set, we obtained the abnormal F-rating predictions of 0.75 for all residents in this study. For future clues, we suggest that researchers use TSW using different methods and a merger methodology to reduce false alarms. On the sensor data set obtained from [32], we obtained a measurement F of 0.82 on attitude recognition.