Fraud represents a significant problem for governments and businesses and specialized analysis techniques for discovering fraud using them are required. Some of these methods include knowledge discovery in databases (KDD), data mining, machine learning and statistics. They offer applicable and successful solutions in different areas of electronic fraud crimes. In general, the primary reason to use data analytics techniques is to tackle fraud since many internal control systems have serious weaknesses. For example, the currently prevailing approach employed by many law enforcement agencies to detect companies involved in potential cases of fraud consists in receiving circumstantial evidence or complaints from whistleblowers. As a result, a large number of fraud cases remain undetected and unprosecuted. In order to effectively test, detect, validate, correct error and monitor control systems against fraudulent activities, businesses entities and organizations rely on specialized data analytics techniques such as data mining, data matching, the sounds like function, regression analysis, clustering analysis, and gap analysis. Techniques used for fraud detection fall into two primary classes: statistical techniques and artificial intelligence. == Statistical techniques == Examples of statistical data analysis techniques are: Data preprocessing techniques for detection, validation, error correction, and filling up of missing or incorrect data. Calculation of various statistical parameters such as averages, quantiles, performance metrics, probability distributions, and so on. For example, the averages may include average length of call, average number of calls per month and average delays in bill payment. Models and probability distributions of various business activities either in terms of various parameters or probability distributions. Computing user profiles. Time-series analysis of time-dependent data. Clustering and classification to find patterns and associations among groups of data. Data matching Data matching is used to compare two sets of collected data. The process can be performed based on algorithms or programmed loops. Trying to match sets of data against each other or comparing complex data types. Data matching is used to remove duplicate records and identify links between two data sets for marketing, security or other uses. Sounds like Function is used to find values that sound similar. The Phonetic similarity is one way to locate possible duplicate values, or inconsistent spelling in manually entered data. The ‘sounds like’ function converts the comparison strings to four-character American Soundex codes, which are based on the first letter, and the first three consonants after the first letter, in each string. Regression analysis allows you to examine the relationship between two or more variables of interest. Regression analysis estimates relationships between independent variables and a dependent variable. This method can be used to help understand and identify relationships among variables and predict actual results. Gap analysis is used to determine whether business requirements are being met, if not, what are the steps that should be taken to meet successfully. Matching algorithms to detect anomalies in the behavior of transactions or users as compared to previously known models and profiles. Techniques are also needed to eliminate false alarms, estimate risks, and predict future of current transactions or users. Some forensic accountants specialize in forensic analytics which is the procurement and analysis of electronic data to reconstruct, detect, or otherwise support a claim of financial fraud. The main steps in forensic analytics are data collection, data preparation, data analysis, and reporting. For example, forensic analytics may be used to review an employee's purchasing card activity to assess whether any of the purchases were diverted or divertible for personal use. == Artificial intelligence == Fraud detection is a knowledge-intensive activity. The main AI techniques used for fraud detection include: Data mining to classify, cluster, and segment the data and automatically find associations and rules in the data that may signify interesting patterns, including those related to fraud. Expert systems to encode expertise for detecting fraud in the form of rules. Pattern recognition to detect approximate classes, clusters, or patterns of suspicious behavior either automatically (unsupervised) or to match given inputs. Machine learning techniques to automatically identify characteristics of fraud. Neural nets to independently generate classification, clustering, generalization, and forecasting that can then be compared against conclusions raised in internal audits or formal financial documents such as 10-Q. Other techniques such as link analysis, Bayesian networks, decision theory, and sequence matching are also used for fraud detection. A new and novel technique called System properties approach has also been employed where ever rank data is available. Statistical analysis of research data is the most comprehensive method for determining if data fraud exists. Data fraud as defined by the Office of Research Integrity (ORI) includes fabrication, falsification and plagiarism. == Machine learning and data mining == Early data analysis techniques were oriented toward extracting quantitative and statistical data characteristics. These techniques facilitate useful data interpretations and can help to get better insights into the processes behind the data. Although the traditional data analysis techniques can indirectly lead us to knowledge, it is still created by human analysts. To go beyond, a data analysis system has to be equipped with a substantial amount of background knowledge, and be able to perform reasoning tasks involving that knowledge and the data provided. In effort to meet this goal, researchers have turned to ideas from the machine learning field. This is a natural source of ideas, since the machine learning task can be described as turning background knowledge and examples (input) into knowledge (output). If data mining results in discovering meaningful patterns, data turns into information. Information or patterns that are novel, valid and potentially useful are not merely information, but knowledge. One speaks of discovering knowledge, before hidden in the huge amount of data, but now revealed. The machine learning and artificial intelligence solutions may be classified into two categories: 'supervised' and 'unsupervised' learning. These methods seek for accounts, customers, suppliers, etc. that behave 'unusually' in order to output suspicion scores, rules or visual anomalies, depending on the method. Whether supervised or unsupervised methods are used, note that the output gives us only an indication of fraud likelihood. No stand alone statistical analysis can assure that a particular object is a fraudulent one, but they can identify them with very high degrees of accuracy. As a result, effective collaboration between machine learning model and human analysts is vital to the success of fraud detection applications. === Supervised learning === In supervised learning, a random sub-sample of all records is taken and manually classified as either 'fraudulent' or 'non-fraudulent' (task can be decomposed on more classes to meet algorithm requirements). Relatively rare events such as fraud may need to be over sampled to get a big enough sample size. These manually classified records are then used to train a supervised machine learning algorithm. After building a model using this training data, the algorithm should be able to classify new records as either fraudulent or non-fraudulent. Supervised neural networks, fuzzy neural nets, and combinations of neural nets and rules, have been extensively explored and used for detecting fraud in mobile phone networks and financial statement fraud. Bayesian learning neural network is implemented for credit card fraud detection, telecommunications fraud, auto claim fraud detection, and medical insurance fraud. Hybrid knowledge/statistical-based systems, where expert knowledge is integrated with statistical power, use a series of data mining techniques for the purpose of detecting cellular clone fraud. Specifically, a rule-learning program to uncover indicators of fraudulent behaviour from a large database of customer transactions is implemented. Cahill et al. (2000) design a fraud signature, based on data of fraudulent calls, to detect telecommunications fraud. For scoring a call for fraud its probability under the account signature is compared to its probability under a fraud signature. The fraud signature is updated sequentially, enabling event-driven fraud detection. Link analysis comprehends a different approach. It relates known fraudsters to other individuals, using record linkage and social network methods. This type of detection is only able to detect fra
Commonsense knowledge (artificial intelligence)
In artificial intelligence research, commonsense knowledge consists of facts about the everyday world, such as "Lemons are sour" or "Cows say moo", that all humans are expected to know. It is currently an unsolved problem in artificial general intelligence. The first AI program to address common sense knowledge was Advice Taker in 1959 by John McCarthy. Commonsense knowledge can underpin a commonsense reasoning process, to attempt inferences such as "You might bake a cake because you want people to eat the cake." A natural language processing process can be attached to the commonsense knowledge base to allow the knowledge base to attempt to answer questions about the world. Common sense knowledge also helps to solve problems in the face of incomplete information. Using widely held beliefs about everyday objects, or common sense knowledge, AI systems make common sense assumptions or default assumptions about the unknown similar to the way people do. In an AI system or in English, this is expressed as "Normally P holds", "Usually P" or "Typically P so Assume P". For example, if we know the fact "Tweety is a bird", because we know the commonly held belief about birds, "typically birds fly," without knowing anything else about Tweety, we may reasonably assume the fact that "Tweety can fly." As more knowledge of the world is discovered or learned over time, the AI system can revise its assumptions about Tweety using a truth maintenance process. If we later learn that "Tweety is a penguin" then truth maintenance revises this assumption because we also know "penguins do not fly". == Commonsense reasoning == Commonsense reasoning simulates the human ability to use commonsense knowledge to make presumptions about the type and essence of ordinary situations they encounter every day, and to change their "minds" should new information come to light. This includes time, missing or incomplete information and cause and effect. The ability to explain cause and effect is an important aspect of explainable AI. Truth maintenance algorithms automatically provide an explanation facility because they create elaborate records of presumptions. Compared with humans, all existing computer programs that attempt human-level AI perform extremely poorly on modern "commonsense reasoning" benchmark tests such as the Winograd Schema Challenge. The problem of attaining human-level competency at "commonsense knowledge" tasks is considered to probably be "AI complete" (that is, solving it would require the ability to synthesize a fully human-level intelligence), although some oppose this notion and believe compassionate intelligence is also required for human-level AI. Common sense reasoning has been applied successfully in more limited domains such as natural language processing and automated diagnosis or analysis. == Commonsense knowledge base construction == Compiling comprehensive knowledge bases of commonsense assertions (CSKBs) is a long-standing challenge in AI research. From early expert-driven efforts like CYC and WordNet, significant advances were achieved via the crowdsourced OpenMind Commonsense project, which led to the crowdsourced ConceptNet KB. Several approaches have attempted to automate CSKB construction, most notably, via text mining (WebChild, Quasimodo, TransOMCS, Ascent), as well as harvesting these directly from pre-trained language models (AutoTOMIC). These resources are significantly larger than ConceptNet, though the automated construction mostly makes them of moderately lower quality. Challenges also remain on the representation of commonsense knowledge: Most CSKB projects follow a triple data model, which is not necessarily best suited for breaking more complex natural language assertions. A notable exception here is GenericsKB, which applies no further normalization to sentences, but retains them in full. == Applications == Around 2013, MIT researchers developed BullySpace, an extension of the commonsense knowledgebase ConceptNet, to catch taunting social media comments. BullySpace included over 200 semantic assertions based around stereotypes, to help the system infer that comments like "Put on a wig and lipstick and be who you really are" are more likely to be an insult if directed at a boy than a girl. ConceptNet has also been used by chatbots and by computers that compose original fiction. At Lawrence Livermore National Laboratory, common sense knowledge was used in an intelligent software agent to detect violations of a comprehensive nuclear test ban treaty. == Data == As an example, as of 2012 ConceptNet includes these 21 language-independent relations: IsA (An "RV" is a "vehicle" | X is an instance of a Y) UsedFor (a "cake tin" is used for "making cakes" | X is used for the purpose Y) HasA (A "rabbit" has a "tail" | X possesses Y element or feature) CapableOf (a "cook" is capable of "making baked goods" | X is capable of doing Y) Desires (a "child" desires "the aroma of baking" | X has a desire for Y) CreatedBy ("cake" is created by a "baker" | X is created by Y) PartOf (a "knife" is be part of a "knife set" | X is a part of Y) Causes ("Heat" causes "cooking"| X is what causes Y) LocatedNear (the "oven" is located near the "refrigerator" | X is located near Y) AtLocation (Somewhere a "Cook" can be at a "restaurant" | X is at the location of Y) DefinedAs (a "Cupcake" is defined as a "cake" that also has the qualities of being "small", "baked within a wrapper", and "containing only one area of frosting or icing" | X is defined as Y that also has the properties A, B & C) SymbolOf (a "heart" is a symbol of "affection" | X is a symbolic representation of Y) ReceivesAction ("cake" can receive the action of being "eaten" | X is capable of receiving action Y) HasPrerequisite ("baking" has the prerequisite of obtaining the "ingredients" | X cannot do Y unless A does B) MotivatedByGoal ("baking" is motivated by the goal of "consumption"/"eating" | X has the motivation of Y goal) CausesDesire ("baking" makesYou want to "follow recipe" | X causes the desire to do Y) MadeOf ("Cake" is made of "flour"/"eggs"/"sugar"/"oil"/etc | X is made of Y) HasFirstSubevent ("baking" has first subevent "make batter" | To do X the first thing that needs to be done is Y) HasSubevent ("eat" has subevent "swallow" | Doing X will lead to Y event following) HasLastSubevent ("sleeping" has last subevent of "waking" | Doing X ends with the event Y) == Commonsense knowledge bases == Cyc Open Mind Common Sense (data source) and ConceptNet (datastore and NLP engine) Evi Graphiq
Digital heritage
The Charter on the Preservation of Digital Heritage of UNESCO defines digital heritage as embracing "cultural, educational, scientific and administrative resources, as well as technical, legal, medical and other kinds of information created digitally, or converted into digital form from existing analogue resources". Digital heritage also includes the use of digital media in the service of understanding and preserving cultural or natural heritage. The digitization of both cultural heritage and Natural heritage serves to enable the permanent access of current and future generations to culturally important objects ranging from literature and paintings to flora, fauna, or habitats. It is also used in the preservation and access of objects with enduring or significant historical, scientific, or cultural value including buildings, archeological sites, and natural phenomena. The main idea is the transformation of a material object into a virtual copy. It should not be confused with digital humanities, which uses digitizing technology to specifically help with research. There have been several debates concerning the efficiency of the process of digitizing heritage. Some of the drawbacks refer to the deterioration and technological obsolescence due to the lack of funding for archival materials and underdeveloped policies that would regulate such a process. Another main social debate has taken place around the restricted accessibility due to the digital divide that exists around the world. Nevertheless, new technologies enable easy, instant and cross boarder access to the digitized work. Many of these technologies include spatial and surveying technology to gain aerial or 3D images. Digital heritage is also used to monitor cultural heritage sites over years to help with preservation, maintenance, and sustainable tourism. It aims to observe any changes, diseases, or deterioration that may occur on objects. == Cultural and natural heritage == Digital Heritage that is not born-digital can be divided into two separate groups—digital cultural heritage and digital natural heritage. Digital cultural heritage is the maintenance or preservation of cultural objects through digitization. These are objects, in some cases entire cities, that are considered of cultural importance. These objects are sometimes able to be digitized or physically represented in minute detail. Digital cultural heritage also includes intangible heritage. These are things such as "oral traditions, customs, value systems, skills, traditional dances, diets, performances" and other unique features of a culture. Intangible heritage is particularly vulnerable to destruction due to urbanization. There are several projects and programs which concentrate on digital cultural heritage. One such project is Mapping Gothic France, which aims to document and preserve cathedrals across France using images, VR tours, laser scans, and panoramas. This allows for scientific and historical study and preservation of the cathedrals and also provides detailed access to the sites for anyone in the world. The aim of projects like these is to help with the preservation and restoration of cultural objects. After the fire at Notre-Dame de Paris in 2019, digital scans are a major component in the ongoing restoration. Digital natural heritage pertains to objects of natural heritage that are considered of cultural, scientific, or aesthetic importance. Digital heritage in this instance is used not only to grant access to these objects, but to monitor any changes over time, such as with plant or animal habitats. Geographic information systems are a form of technology that is used primarily in the study of natural heritage. Western Australia has one such digital heritage project where they have created a digital repository of native plants important to both the region and the Aboriginal people. This is in order to protect and preserve the important biological heritage of Western Australia. == Educational impact == The digitization of these heritage objects has impacts around the world and across many disciplines. The increase of digital items means that people, especially the youth, are able to learn about new objects and cultures online through various media. They provide viewers with a more in-depth experience with an item or place, instead of just an image. The media is also able to be curated to age- or educational-level appropriateness, making learning easier. Some of the technology used in education, especially in museums, includes mobile apps, virtual reality, social media, and video games. Cultural heritage institutions are using this technology to try to expand access, increase appreciation for these items, and to gain new viewpoints on their collections. Digital heritage also helps scientists, archeologists, or other historians and specialists collect data on these objects, providing more information on the objects and the past. Digital Heritage is still currently being studied and improved by several sectors invested in cultural and intellectual preservation. It is particularly of interest to museums, governments, and academic institutions. Research by these groups are creating new concepts, methodologies, and techniques for the implementation of digital heritage to protect this type of cultural and natural heritage. As new technologies are created, museums and other heritage institutions are provided with more ways of disseminating their information and engaging with the public. A lack of resources within certain groups may still hinder everyone from accessing digital heritage. == Technologies used == The digitization of cultural heritage is attained through several means. Some of the main technology used is spatial and surveying technology. Space archaeological technology - Observations from space satellites are non-intrusive and can be integrated with other technologies on the ground. It is used to photograph vast areas of earth and help with research. Remnants of ancient civilizations or other human objects are also able to be spotted via satellite imaging. Unmanned aerial vehicles - UAV, such as drones, are commonly used in digitization of cultural heritage objects. The Great Wall of China is one such site that has been digitized and analyzed through unmanned aerial vehicle investigation. The resulting images, 3-D scans, maps, and other data are used to evaluate and maintain the Great Wall. Laser Scanning - Laser scanning is used to scan an area and recreate spatially accurate depictions, such as a 3D model. Virtual and Augmented Reality - VR is used primarily for education but does have uses for reconstruction and research. It is used to provide users with an immersive experience, as though they are actually at the site. Geographic Information systems - GIS are used primarily to study objects and sites over time. It is also important in studying the socioeconomic status of the past. 3D Modeling - 3D modeling has become more widely used due to an increase in technology that works specifically with heritage sites. It is often used in tandem with GIS to reconstruct objects for restoration, documentation, preservation, and educational purposes. Data is collected using satellite or other aerial imaging and ground-based imaging. There is some concern about the accuracy and authenticity of these types of digital reconstructions and their effects on the sites themselves. A major barrier to digital heritage is the amount of resources it takes to undertake such projects, such as money, time, and technology. Money and the lack of qualified personnel are two that are considered the most obstructive. This is especially an issue in less developed areas or within underfunded groups such as minorities. == Virtual heritage == A particular branch of digital heritage, known as "virtual heritage", is formed by the use of information technology with the aim of recreating the experience of existing cultural heritage, as in (approximations of) virtual reality. It is hard to differentiate this branch from the core contribution of digital heritage which is storing the heritage data digitally. Parsinejad et al. developed two techniques for Digital Twinning of the architectural assets and representation of the physical assets virtually in the museum context. Two techniques are hand recording and digital recording and both have challenges in adoption and implementation of Digital Twin as a revolutionary concept. == Digital heritage stewardship == Digital heritage stewardship is a form of digital curation which is modeled after collaborative curation. Digital heritage stewardship means stepping away from typical curatorial practices (e.g. discovering, arranging, and sharing information, material, and/or content) in favor of practices which allow its stakeholders the opportunity to contribute historical, political, and social context and culture. The collaborative practice encourages the creation, engagement, and maintena
Foreground detection
Foreground detection is one of the major tasks in the field of computer vision and image processing whose aim is to detect changes in image sequences. Background subtraction is any technique which allows an image's foreground to be extracted for further processing (object recognition etc.). Many applications do not need to know everything about the evolution of movement in a video sequence, but only require the information of changes in the scene, because an image's regions of interest are objects (humans, cars, text etc.) in its foreground. After the stage of image preprocessing (which may include image denoising, post processing like morphology etc.) object localisation is required which may make use of this technique. Foreground detection separates foreground from background based on these changes taking place in the foreground. It is a set of techniques that typically analyze video sequences recorded in real time with a stationary camera. == Description == All detection techniques are based on modelling the background of the image, i.e., setting the background and detecting which changes occur. Defining the background can be difficult when it contains shapes, shadows, and moving objects. In defining the background, it is assumed that stationary objects may vary in color and intensity over time. Scenarios in which these techniques apply tend to be very diverse. There can be highly variable sequences, such as images with different lighting, interiors, exteriors, quality, and noise. In addition to real-time processing, systems need to adapt to these changes. A foreground detection system should be able to: Develop a background model (estimate). Be robust to lighting changes, repetitive movements (leaves, waves, shadows), and long-term changes. == Background subtraction == Background subtraction is a widely used approach for detecting moving objects in videos from static cameras. The rationale in the approach is that of detecting the moving objects from the difference between the current frame and a reference frame, often called "background image", or "background model". Background subtraction is mostly done if the image in question is a part of a video stream. Background subtraction provides important cues for numerous applications in computer vision, for example surveillance tracking or human pose estimation. Background subtraction is generally based on a static background hypothesis which is often not applicable in real environments. With indoor scenes, reflections or animated images on screens lead to background changes. Similarly, due to wind, rain or illumination changes brought by weather, static backgrounds methods have difficulties with outdoor scenes. == Temporal average filter == The temporal average filter is a method that was proposed at the Velastin. This system estimates the background model from the median of all pixels of a number of previous images. The system uses a buffer with the pixel values of the last frames to update the median for each image. To model the background, the system examines all images in a given time period called training time. At this time, we only display images and will find the median, pixel by pixel, of all the plots in the background this time. After the training period for each new frame, each pixel value is compared with the input value of funds previously calculated. If the input pixel is within a threshold, the pixel is considered to match the background model and its value is included in the pixbuf. Otherwise, if the value is outside this threshold pixel is classified as foreground, and not included in the buffer. This method cannot be considered very efficient because they do not present a rigorous statistical basis and requires a buffer that has a high computational cost. == Conventional approaches == A robust background subtraction algorithm should be able to handle lighting changes, repetitive motions from clutter and long-term scene changes. The following analyses make use of the function of V(x,y,t) as a video sequence where t is the time dimension, x and y are the pixel location variables. e.g. V(1,2,3) is the pixel intensity at (1,2) pixel location of the image at t = 3 in the video sequence. === Using frame differencing === A motion detection algorithm begins with the segmentation part where foreground or moving objects are segmented from the background. The simplest way to implement this is to take an image as background and take the frames obtained at the time t, denoted by I(t) to compare with the background image denoted by B. Here using simple arithmetic calculations, we can segment out the objects simply by using image subtraction technique of computer vision meaning for each pixels in I(t), take the pixel value denoted by P[I(t)] and subtract it with the corresponding pixels at the same position on the background image denoted as P[B]. In mathematical equation, it is written as: P [ F ( t ) ] = P [ I ( t ) ] − P [ B ] {\displaystyle P[F(t)]=P[I(t)]-P[B]} The background is assumed to be the frame at time t. This difference image would only show some intensity for the pixel locations which have changed in the two frames. Though we have seemingly removed the background, this approach will only work for cases where all foreground pixels are moving, and all background pixels are static. A threshold "Threshold" is put on this difference image to improve the subtraction (see Image thresholding): | P [ F ( t ) ] − P [ F ( t + 1 ) ] | > T h r e s h o l d {\displaystyle |P[F(t)]-P[F(t+1)]|>\mathrm {Threshold} } This means that the difference image's pixels' intensities are 'thresholded' or filtered on the basis of value of Threshold. The accuracy of this approach is dependent on speed of movement in the scene. Faster movements may require higher thresholds. === Mean filter === For calculating the image containing only the background, a series of preceding images are averaged. For calculating the background image at the instant t: B ( x , y , t ) = 1 N ∑ i = 1 N V ( x , y , t − i ) {\displaystyle B(x,y,t)={1 \over N}\sum _{i=1}^{N}V(x,y,t-i)} where N is the number of preceding images taken for averaging. This averaging refers to averaging corresponding pixels in the given images. N would depend on the video speed (number of images per second in the video) and the amount of movement in the video. After calculating the background B(x,y,t) we can then subtract it from the image V(x,y,t) at time t = t and threshold it. Thus the foreground is: | V ( x , y , t ) − B ( x , y , t ) | > T h {\displaystyle |V(x,y,t)-B(x,y,t)|>\mathrm {Th} } where Th is a threshold value. Similarly, we can also use median instead of mean in the above calculation of B(x,y,t). Usage of global and time-independent thresholds (same Th value for all pixels in the image) may limit the accuracy of the above two approaches. === Running Gaussian average === For this method, Wren et al. propose fitting a Gaussian probabilistic density function (pdf) on the most recent n {\displaystyle n} frames. In order to avoid fitting the pdf from scratch at each new frame time t {\displaystyle t} , a running (or on-line cumulative) average is computed. The pdf of every pixel is characterized by mean μ t {\displaystyle \mu _{t}} and variance σ t 2 {\displaystyle \sigma _{t}^{2}} . The following is a possible initial condition (assuming that initially every pixel is background): μ 0 = I 0 {\displaystyle \mu _{0}=I_{0}} σ 0 2 = ⟨ some default value ⟩ {\displaystyle \sigma _{0}^{2}=\langle {\text{some default value}}\rangle } where I t {\displaystyle I_{t}} is the value of the pixel's intensity at time t {\displaystyle t} . In order to initialize variance, we can, for example, use the variance in x and y from a small window around each pixel. Note that background may change over time (e.g. due to illumination changes or non-static background objects). To accommodate for that change, at every frame t {\displaystyle t} , every pixel's mean and variance must be updated, as follows: μ t = ρ I t + ( 1 − ρ ) μ t − 1 {\displaystyle \mu _{t}=\rho I_{t}+(1-\rho )\mu _{t-1}} σ t 2 = d 2 ρ + ( 1 − ρ ) σ t − 1 2 {\displaystyle \sigma _{t}^{2}=d^{2}\rho +(1-\rho )\sigma _{t-1}^{2}} d = | ( I t − μ t ) | {\displaystyle d=|(I_{t}-\mu _{t})|} Where ρ {\displaystyle \rho } determines the size of the temporal window that is used to fit the pdf (usually ρ = 0.01 {\displaystyle \rho =0.01} ) and d {\displaystyle d} is the Euclidean distance between the mean and the value of the pixel. We can now classify a pixel as background if its current intensity lies within some confidence interval of its distribution's mean: | ( I t − μ t ) | σ t > k ⟶ foreground {\displaystyle {\frac {|(I_{t}-\mu _{t})|}{\sigma _{t}}}>k\longrightarrow {\text{foreground}}} | ( I t − μ t ) | σ t ≤ k ⟶ background {\displaystyle {\frac {|(I_{t}-\mu _{t})|}{\sigma _{t}}}\leq k\longrightarrow {\text{background}}} where the parameter k {\displaystyle k} is a free threshold (usuall
Digital image correlation for electronics
Digital image correlation analyses have applications in material property characterization, displacement measurement, and strain mapping. As such, DIC is becoming an increasingly popular tool when evaluating the thermo-mechanical behavior of electronic components and systems. == CTE measurements and glass transition temperature identification == The most common application of DIC in the electronics industry is the measurement of coefficient of thermal expansion (CTE). Because it is a non-contact, full-field surface technique, DIC is ideal for measuring the effective CTE of printed circuit boards (PCB) and individual surfaces of electronic components. It is especially useful for characterizing the properties of complex integrated circuits, as the combined thermal expansion effects of the substrate, molding compound, and die make effective CTE difficult to estimate at the substrate surface with other experimental methods. DIC techniques can be used to calculate average in-plane strain as a function of temperature over an area of interest during a thermal profile. Linear curve-fitting and slope calculation can then be used to estimate an effective CTE for the observed area. Because the driving factor in solder fatigue is most often the CTE mismatch between a component and the PCB it is soldered to, accurate CTE measurements are vital for calculating printed circuit board assembly (PCBA) reliability metrics. DIC is also useful for characterizing the thermal properties of polymers. Polymers are often used in electronic assemblies as potting compounds, conformal coatings, adhesives, molding compounds, dielectrics, and underfills. Because the stiffness of such materials can vary widely, accurately determining their thermal characteristics with contact techniques that transfer load to the specimen, such as dynamic mechanical analysis (DMA) and thermomechanical analysis (TMA), is difficult to do with consistency. Accurate CTE measurements are important for these materials because, depending on the specific use case, expansion and contraction of these materials can drastically affect solder joint reliability. For example, if a stiff conformal coating or other polymeric encapsulation is allowed to flow under a QFN, its expansion and contraction during thermal cycling can add tensile stress to the solder joints and expedite fatigue failure. DIC techniques will also allow the detection of glass transition temperature (Tg). At a glass transition temperature, the strain vs. temperature plot will exhibit a change in slope. Determining the Tg is very important for polymeric materials that could have glass transition temperatures within the operating temperature range of the electronics assemblies and components on which they are used. For example, some potting materials can see the Elastic Modulus of the material change by a factor of 100 or more over the glass transition region. Such changes can have drastic effects on an electronic assembly's reliability if they are not planned for in the design process. == Out-of-plane component warpage == When 3D DIC techniques are employed, out-of-plane motion can be tracked in addition to in-plane motion. Out-of-plane warpage is especially of interest at the component level of electronics packaging for solder joint reliability quantification. Excessive warpage during reflow can contribute to defective solder joints by lifting the edges of the component away from the board and creating head-in-pillow defects in ball grid arrays (BGA). Warpage can also shorten the fatigue life of adequate joints by adding tensile stresses to edge joints during thermal cycling. == Thermo-mechanical strain mapping == When a PCBA is over-constrained, thermo-mechanical stress brought about during thermal expansion can cause board strains that could negatively affect individual component and overall assembly reliability. The full-field monitoring capabilities of an image correlation technique allow for the measurement of strain magnitude and location on the surface of a specimen during a displacement-causing event, such as PCBA during a thermal profile. These "strain maps" allow for the comparison of strain levels over full areas of interest. Many traditional discrete methods, like extensometers and strain gauges, only allow for localized measurements of strain, inhibiting their ability to efficiently measure strain across larger areas of interest. DIC techniques have also been used to generate strain maps from purely mechanical events, such as drop impact tests, on electronic assemblies.
ImageNet
The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes. == History == AI researcher Fei-Fei Li began working on the idea for ImageNet in 2006. At a time when most AI research focused on models and algorithms, Li wanted to expand and improve the data available to train AI algorithms. In 2007, Li met with Princeton professor Christiane Fellbaum, one of the creators of WordNet, to discuss the project. As a result of this meeting, Li went on to build ImageNet starting from the roughly 22,000 nouns of WordNet and using many of its features. She was also inspired by a 1987 estimate that the average person recognizes roughly 30,000 different kinds of objects. As an assistant professor at Princeton, Li assembled a team of researchers to work on the ImageNet project. They used Amazon Mechanical Turk to help with the classification of images. Labeling started in July 2008 and ended in April 2010. It took 49K workers from 167 countries filtering and labeling over 160M candidate images. They had enough budget to have each of the 14 million images labelled three times. The original plan called for 10,000 images per category, for 40,000 categories at 400 million images, each verified 3 times. They found that humans can classify at most 2 images/sec. At this rate, it was estimated to take 19 human-years of labor (without rest). They presented their database for the first time as a poster at the 2009 Conference on Computer Vision and Pattern Recognition (CVPR) in Florida, titled "ImageNet: A Preview of a Large-scale Hierarchical Dataset". The poster was reused at Vision Sciences Society 2009. In 2009, Alex Berg suggested adding object localization as a task. Li approached PASCAL Visual Object Classes contest in 2009 for a collaboration. It resulted in the subsequent ImageNet Large Scale Visual Recognition Challenge starting in 2010, which has 1000 classes and object localization, as compared to PASCAL VOC which had just 20 classes and 19,737 images (in 2010). === Significance for deep learning === On 30 September 2012, a convolutional neural network (CNN) called AlexNet achieved a top-5 error of 15.3% in the ImageNet 2012 Challenge, more than 10.8 percentage points lower than that of the runner-up. Using convolutional neural networks was feasible due to the use of graphics processing units (GPUs) during training, an essential ingredient of the deep learning revolution. According to The Economist, "Suddenly people started to pay attention, not just within the AI community but across the technology industry as a whole." In 2015, AlexNet was outperformed by Microsoft's very deep CNN with over 100 layers, which won the ImageNet 2015 contest, having 3.57% error on the test set. Andrej Karpathy estimated in 2014 that with concentrated effort, he could reach 5.1% error rate, and ~10 people from his lab reached ~12-13% with less effort. It was estimated that with maximal effort, a human could reach 2.4%. == Dataset == ImageNet crowdsources its annotation process. Image-level annotations indicate the presence or absence of an object class in an image, such as "there are tigers in this image" or "there are no tigers in this image". Object-level annotations provide a bounding box around the (visible part of the) indicated object. ImageNet uses a variant of the broad WordNet schema to categorize objects, augmented with 120 categories of dog breeds to showcase fine-grained classification. In 2012, ImageNet was the world's largest academic user of Mechanical Turk. The average worker identified 50 images per minute. The original plan of the full ImageNet would have roughly 50M clean, diverse and full resolution images spread over approximately 50K synsets. This was not achieved. The summary statistics given on April 30, 2010: Total number of non-empty synsets: 21841 Total number of images: 14,197,122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million === Categories === The categories of ImageNet were filtered from the WordNet concepts. Each concept, since it can contain multiple synonyms (for example, "kitty" and "young cat"), so each concept is called a "synonym set" or "synset". There were more than 100,000 synsets in WordNet 3.0, majority of them are nouns (80,000+). The ImageNet dataset filtered these to 21,841 synsets that are countable nouns that can be visually illustrated. Each synset in WordNet 3.0 has a "WordNet ID" (wnid), which is a concatenation of part of speech and an "offset" (a unique identifying number). Every wnid starts with "n" because ImageNet only includes nouns. For example, the wnid of synset "dog, domestic dog, Canis familiaris" is "n02084071". The categories in ImageNet fall into 9 levels, from level 1 (such as "mammal") to level 9 (such as "German shepherd"). === Image format === The images were scraped from online image search (Google, Picsearch, MSN, Yahoo, Flickr, etc) using synonyms in multiple languages. For example: German shepherd, German police dog, German shepherd dog, Alsatian, ovejero alemán, pastore tedesco, 德国牧羊犬. ImageNet consists of images in RGB format with varying resolutions. For example, in ImageNet 2012, "fish" category, the resolution ranges from 4288 x 2848 to 75 x 56. In machine learning, these are typically preprocessed into a standard constant resolution, and whitened, before further processing by neural networks. For example, in PyTorch, ImageNet images are by default normalized by dividing the pixel values so that they fall between 0 and 1, then subtracting by [0.485, 0.456, 0.406], then dividing by [0.229, 0.224, 0.225]. These are the mean and standard deviations for ImageNet, so this whitens the input data. === Labels and annotations === Each image is labelled with exactly one wnid. Dense SIFT features (raw SIFT descriptors, quantized codewords, and coordinates of each descriptor/codeword) for ImageNet-1K were available for download, designed for bag of visual words. The bounding boxes of objects were available for about 3000 popular synsets with on average 150 images in each synset. Furthermore, some images have attributes. They released 25 attributes for ~400 popular synsets: Color: black, blue, brown, gray, green, orange, pink, red, violet, white, yellow Pattern: spotted, striped Shape: long, round, rectangular, square Texture: furry, smooth, rough, shiny, metallic, vegetation, wooden, wet === ImageNet-21K === The full original dataset is referred to as ImageNet-21K. ImageNet-21k contains 14,197,122 images divided into 21,841 classes. Some papers round this up and name it ImageNet-22k. The full ImageNet-21k was released in Fall of 2011, as fall11_whole.tar. There is no official train-validation-test split for ImageNet-21k. Some classes contain only 1-10 samples, while others contain thousands. === ImageNet-1K === There are various subsets of the ImageNet dataset used in various context, sometimes referred to as "versions". One of the most highly used subsets of ImageNet is the "ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012–2017 image classification and localization dataset". This is also referred to in the research literature as ImageNet-1K or ILSVRC2017, reflecting the original ILSVRC challenge that involved 1,000 classes. ImageNet-1K contains 1,281,167 training images, 50,000 validation images and 100,000 test images. Each category in ImageNet-1K is a leaf category, meaning that there are no child nodes below it, unlike ImageNet-21K. For example, in ImageNet-21K, there are some images categorized as simply "mammal", whereas in ImageNet-1K, there are only images categorized as things like "German shepherd", since there are no child-words below "German shepherd". === Later developments === In the WordNet they built ImageNet on, there were 2832 synsets in the "person" subtree. During 2018--2020 period, they removed the download of the ImageNet-21k as they went through extensive filtering in these person synsets. Out of these 2832 synsets, 1593 were deemed "potentially offensive". Out of the remaining 1239, 1081 were deemed not really "visual". The result was that only 158 syn
General time- and transfer constant analysis
The general time- and transfer-constants (TTC) analysis is the generalized version of the Cochran-Grabel (CG) method, which itself is the generalized version of zero-value time-constants (ZVT), which in turn is the generalization of the open-circuit time constant method (OCT). While the other methods mentioned provide varying terms of only the denominator of an arbitrary transfer function, TTC can be used to determine every term both in the numerator and the denominator. Its denominator terms are the same as that of Cochran-Grabel method, when stated in terms of time constants (when expressed in Rosenstark notation). however, the numerator terms are determined using a combination of transfer constants and time constants, where the time constants are the same as those in CG method. Transfer constants are low-frequency ratios of the output variable to input variable under different open- and short-circuited active elements. In general, a transfer function (which can characterize gain, admittance, impedance, trans-impedance, etc., based on the choice of the input and output variables) can be written as: H ( s ) = a 0 + a 1 s + a 2 s 2 + … + a m s m 1 + b 1 s + b 2 s 2 + … + b n s n {\displaystyle H(s)={\frac {a_{0}+a_{1}s+a_{2}s^{2}+\ldots +a_{m}s^{m}}{1+b_{1}s+b_{2}s^{2}+\ldots +b_{n}s^{n}}}} == The denominator terms == The first denominator term b 1 {\textstyle b_{1}} can be expressed as the sum of zero value time constants (ZVTs): b 1 = ∑ i = 1 N τ i 0 {\displaystyle b_{1}=\sum _{i=1}^{N}\tau _{i}^{0}} where τ i 0 {\textstyle \tau _{i}^{0}} is the time constant associated with the reactive element i {\textstyle i} when all the other sources are zero-valued (hence the superscript '0'). Setting a capacitor value to zero corresponds to an open circuit, while a zero-valued inductor is a short circuit. So for calculation of the τ i 0 {\textstyle \tau _{i}^{0}} , all other capacitors are open-circuited and all other inductors are short-circuited. This is the essence of the ZVT method, which reduces to OCT when only capacitors are involved. All independent sources are also zero-valued during the time constant calculations (voltage sources short-circuited and current source open-circuited). In this case, if the element in question (element i {\textstyle i} ) is a capacitor, the time constant is given by τ i 0 = R i 0 C i {\displaystyle \tau _{i}^{0}=R_{i}^{0}C_{i}} and when element i {\textstyle i} is an inductor is it given by: τ i 0 = L i / R i 0 {\displaystyle \tau _{i}^{0}=L_{i}/R_{i}^{0}} . where in both cases, the resistance R i 0 {\textstyle R_{i}^{0}} , is the resistance seen by elements i {\textstyle i} (denoted by subscript), when all the other elements are zero-valued (denoted by the zero superscript). The second-order denominator term is equal to: b 2 = ∑ i = 1 N − 1 ∑ j = i + 1 N τ i 0 τ j i = ∑ i 1 ⩽ i ∑ j < j ⩽ N τ i 0 τ j i {\displaystyle b_{2}=\sum _{i=1}^{N-1}\sum _{j=i+1}^{N}\tau _{i}^{0}\tau _{j}^{i}=\sum _{i}^{1\leqslant i}\sum _{j}^{