AI Assistant Volume Samsung

AI Assistant Volume Samsung — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Tinybop

    Tinybop

    Tinybop is a Brooklyn based publisher of apps for children. == History == Tinybop is a Brooklyn-based children's media company established in 2011 by Raul Gutierrez. App titles are released in two series: the Explorer's Library - a series of science apps and Digital Toys - series of open-ended construction apps. == Published apps == Explorer's Library Titles: The Human Body – An anatomy app for children. Released 2013. The company's first app was illustrated by Kelli Anderson and has been downloaded millions of times. Selected for the American Library Association's Notable Children's Media List in 2022. Named Apple App Store's Best of 2013. Winner of the Digital Ehon Yuichi Kimura Prize for Children's Digital Media. Plants – An app about biomes around the world. Homes – An app about houses around with world. Illustrated by Tuesday Bassen. Winner of the Parents Gold Choice Award for children's apps. Simple Machines – A children's physics app about simple machines. The Earth – An app for children about the geologic Earth illustrated by Sarah Jacoby. Weather – A children's weather app. Skyscrapers – A children's app about building tall buildings. Space – An interactive solar system. Mammals – A children's app about mammals illustrated by Wenjia Tang. Winner of the Digital Ehon Award for Children's Educational media. Coral Reef – An app about marine ecosystems. Winner of an Excellence in Early Learning Digital Media Honor from the American Library Association. State of Matter – An app covering solids, liquids, and gases. Winner of Excellence in Early Learning Digital Media Honor from the American Library Association. Light and Color – An app about light and color. Selected for The American Library Association's Notable Children's Media List 2023. Winner of the 2022 Yoichi Sakakihara Prize for Children's Media. Digital Toys Titles: The Robot Factory – A robot building app for children illustrated by Owen Davey. Apple named The Robot Factory as iPad App of the Year in 2015. The Everything Machine – A visual coding app for children. The Everything Machine was named Apple's Best of 2015. Monsters – A monster creation app illustrated by Tianhua Mao. The Infinite Arcade – An arcade game building app. Me: A Kids Diary – A digital journal for children. Selected for The American Library Association's Notable Children's Media List 2020. The Creature Garden – An app that allows children to create fantastical animals illustrated by Natasha Durley. Selected for The American Library Association's Notable Children's Media List 2021. Things that Go Bump – A multiplayer game set in an enchanted Japanese house, released on Apple Arcade in 2018.

    Read more →
  • Information seeking

    Information seeking

    Information seeking is the process or activity of attempting to obtain information in both human and technological contexts. Information seeking is related to, but different from, information retrieval (IR). == Compared to information retrieval == Traditionally, IR tools have been designed for IR professionals to enable them to effectively and efficiently retrieve information from a source. It is assumed that the information exists in the source and that a well-formed query will retrieve it (and nothing else). It has been argued that laypersons' information seeking on the internet is very different from information retrieval as performed within the IR discourse. Yet, internet search engines are built on IR principles. Since the late 1990s a body of research on how casual users interact with internet search engines has been forming, but the topic is far from fully understood. IR can be said to be technology-oriented, focusing on algorithms and issues such as precision and recall. Information seeking may be understood as a more human-oriented and open-ended process than information retrieval. In information seeking, one does not know whether there exists an answer to one's query, so the process of seeking may provide the learning required to satisfy one's information need. == In different contexts == Much library and information science (LIS) research has focused on the information-seeking practices of practitioners within various fields of professional work. Studies have been carried out into the information-seeking behaviors of librarians, academics, medical professionals, engineers, lawyers and mini-publics(among others). Much of this research has drawn on the work done by Leckie, Pettigrew (now Fisher) and Sylvain, who in 1996 conducted an extensive review of the LIS literature (as well as the literature of other academic fields) on professionals' information seeking. The authors proposed an analytic model of professionals' information seeking behaviour, intended to be generalizable across the professions, thus providing a platform for future research in the area. The model was intended to "prompt new insights... and give rise to more refined and applicable theories of information seeking" (1996, p. 188). The model has been adapted by Wilkinson (2001) who proposes a model of the information seeking of lawyers. Recent studies in this topic address the concept of information-gathering that "provides a broader perspective that adheres better to professionals' work-related reality and desired skills." (Solomon & Bronstein, 2021). == Theories of information-seeking behavior == A variety of theories of information behavior – e.g. Zipf's Principle of Least Effort, Brenda Dervin's Sense Making, Elfreda Chatman's Life in the Round – seek to understand the processes that surround information seeking. In addition, many theories from other disciplines have been applied in investigating an aspect or whole process of information seeking behavior. A review of the literature on information seeking behavior shows that information seeking has generally been accepted as dynamic and non-linear (Foster, 2005; Kuhlthau 2006). People experience the information search process as an interplay of thoughts, feelings and actions (Kuhlthau, 2006). Donald O. Case (2007) also wrote a good book that is a review of the literature. Information seeking has been found to be linked to a variety of interpersonal communication behaviors beyond question-asking, to include strategies such as candidate answers. Robinson's (2010) research suggests that when seeking information at work, people rely on both other people and information repositories (e.g., documents and databases), and spend similar amounts of time consulting each (7.8% and 6.4% of work time, respectively; 14.2% in total). However, the distribution of time among the constituent information seeking stages differs depending on the source. When consulting other people, people spend less time locating the information source and information within that source, similar time understanding the information, and more time problem solving and decision making, than when consulting information repositories. Furthermore, the research found that people spend substantially more time receiving information passively (i.e., information that they have not requested) than actively (i.e., information that they have requested), and this pattern is also reflected when they provide others with information. == Wilson's nested model of conceptual areas == The concepts of information seeking, information retrieval, and information behaviour are objects of investigation of information science. Within this scientific discipline a variety of studies has been undertaken analyzing the interaction of an individual with information sources in case of a specific information need, task, and context. The research models developed in these studies vary in their level of scope. Wilson (1999) therefore developed a nested model of conceptual areas, which visualizes the interrelation of the here mentioned central concepts. Wilson defines models of information behavior to be "statements, often in the form of diagrams, that attempt to describe an information-seeking activity, the causes and consequences of that activity, or the relationships among stages in information-seeking behaviour" (1999: 250).

    Read more →
  • Informationist

    Informationist

    An informationist (or information specialist in context) provides research and knowledge management services in the context of clinical care or biomedical research. Although there is no one educational pathway or formalized set of skills or knowledge for informationists, one way to think of the informationist is as one who possesses the knowledge and skill of a medical librarian with extensive research specialization and some formal clinical or public health education that goes beyond on-the-job osmosis. Medical librarians and other biomedical professional organizations have been exploring the possibilities for evaluating how informationists are being used and whether their activities supplement or replace medical library activity. More generally, an informationist is a professional who works with information within a particular business, analytic or scientific context to drive toward outcomes based on evidence, analysis, prediction and execution. For example, an extension of the term is increasingly emerging in financial services, life sciences and health care industries. Though still nascently in use, its adoption applies to individuals with extensive industry expertise, acute familiarity with organizational structures and processes, deep domain level information mastery and information systems technical savvy. Informationists in this context support transformational initiatives within and across functional areas of an enterprise as architects, governance experts, continuous improvement advocates and strategists. == Background == The term was proposed in 2000 by Davidoff & Florance. Their editorial suggested that physicians should be delegating their information needs to informationists, just as they currently order CT scans from radiologists or cardiac catheterizations from cardiologists. They conceived of an information professional who was embedded in (and indeed, supported by) the clinical departments. Supporters of the concept see it as a means for librarians to reinvigorate connections with the faculty/clinicians, as well as provide superior service by dint of informationists' biomedical training. Critics complained that the idea is nothing new; librarians already provide in-depth, high quality information services and clinical medical librarians have been working alongside physicians, nurses and other clinicians for years. Large informationist programs in the U.S. exist at the National Institutes of Health and at Vanderbilt University. Welch Medical Library at Johns Hopkins University (JHU) is developing an informationist service model in which its 10 clinical and public health librarians are moving from serving as liaison librarians for assigned departments toward becoming embedded informationists within their departments. To prepare for the embedded informationist role, librarians are undertaking education as needed to supplement their backgrounds. For example, librarians bring experience in clinical behavior counseling, public health, nursing, and more. Informationist training can then focus upon filling gaps in research methods knowledge more so than on gaining additional knowledge in the librarian's area of expertise. Courses, seminars and workshops being undertaken include those covering systematic reviews, evidence-based medicine, critical appraisal, medical language, anatomy and physiology, biostatistics, and clinical research. The term informationist is related to that of informatician—also informaticist—and many informationists do possess skills in clinical topics, bioinformatics, and biomedical informatics. Harvard University, the University of Pittsburgh, and Washington University in St. Louis are examples of institutional libraries which have hired PhD-level scientists (who may or may not have library degrees) to provide informatics support for biomedical research.

    Read more →
  • Information

    Information

    Information is an abstract concept that refers to something which has the power to inform. At the most fundamental level, it pertains to the interpretation (perhaps formally) of that which may be sensed, or their abstractions. Any natural process that is not completely random and any observable pattern in any medium can be said to convey some amount of information. Whereas digital signals and other data use discrete signs to convey information, other phenomena and artifacts such as analogue signals, poems, pictures, music or other sounds, and currents convey information in a more continuous form. Information is not knowledge itself, but the meaning that may be derived from a representation through interpretation. The concept of information is relevant to and connected with various concepts, including constraint, communication, control, data, form, education, knowledge, meaning, understanding, mental stimuli, pattern, perception, proposition, representation, and entropy. Information is often processed iteratively: Data available at one step are processed into information to be interpreted and processed at the next step. For example, in written text each symbol or letter conveys information relevant to the word it is part of, each word conveys information relevant to the phrase it is part of, each phrase conveys information relevant to the sentence it is part of, and so on until at the final step information is interpreted and becomes knowledge in a given domain. In a digital signal, bits may be interpreted into the symbols, letters, numbers, or structures that convey the information available at the next level up. The key characteristic of information is that it is subject to interpretation and processing. The derivation of information from a signal or message may be thought of as the resolution of ambiguity or uncertainty that arises during the interpretation of patterns within the signal or message. Information may be structured as data. Redundant data can be compressed up to an optimal size, which is the theoretical limit of compression. The information available through a collection of data may be derived by analysis. For example, a restaurant collects data from every customer order. That information may be analyzed to produce knowledge that is put to use when the business subsequently wants to identify the most popular or least popular dish. Information can be transmitted in time, via data storage, and space, via communication and telecommunication. Information is expressed either as the content of a message or through direct or indirect observation. That which is perceived can be construed as a message in its own right, and in that sense, all information is always conveyed as the content of a message. Information can be encoded into various forms for transmission and interpretation (for example, information may be encoded into a sequence of signs, or transmitted via a signal). It can also be encrypted for safe storage and communication. The uncertainty of an event is measured by its probability of occurrence. Uncertainty is proportional to the negative logarithm of the probability of occurrence. Information theory takes advantage of this by concluding that more uncertain events require more information to resolve their uncertainty. The bit is the standard unit of information. It is 'that which reduces uncertainty by half'. Other units such as the nat may be used. For example, the information encoded in one "fair" coin flip is log2(2/1) = 1 bit, and in two fair coin flips is log2(4/1) = 2 bits. A 2011 Science article estimates that 97% of technologically stored information was already in digital bits in 2007 and that the year 2002 was the beginning of the digital age for information storage (with digital storage capacity bypassing analogue for the first time). == Etymology and history of the concept == The English word "information" comes from Middle French enformacion/informacion/information 'a criminal investigation' and its etymon, Latin informatiō(n) 'conception, teaching, creation'. In English, "information" is an uncountable mass noun. References on "formation or molding of the mind or character, training, instruction, teaching" date from the 14th century in both English (according to Oxford English Dictionary) and other European languages. In the transition from Middle Ages to Modernity the use of the concept of information reflected a fundamental turn in epistemological basis – from "giving a (substantial) form to matter" to "communicating something to someone". Peters (1988, pp. 12–13) concludes: Information was readily deployed in empiricist psychology (though it played a less important role than other words such as impression or idea) because it seemed to describe the mechanics of sensation: objects in the world inform the senses. But sensation is entirely different from "form" – the one is sensual, the other intellectual; the one is subjective, the other objective. My sensation of things is fleeting, elusive, and idiosyncratic. For Hume, especially, sensory experience is a swirl of impressions cut off from any sure link to the real world... In any case, the empiricist problematic was how the mind is informed by sensations of the world. At first informed meant shaped by; later it came to mean received reports from. As its site of action drifted from cosmos to consciousness, the term's sense shifted from unities (Aristotle's forms) to units (of sensation). Information came less and less to refer to internal ordering or formation, since empiricism allowed for no preexisting intellectual forms outside of sensation itself. Instead, information came to refer to the fragmentary, fluctuating, haphazard stuff of sense. Information, like the early modern worldview in general, shifted from a divinely ordered cosmos to a system governed by the motion of corpuscles. Under the tutelage of empiricism, information gradually moved from structure to stuff, from form to substance, from intellectual order to sensory impulses. In the modern era, the most important influence on the concept of information is derived from the Information theory developed by Claude Shannon and others. This theory, however, reflects a fundamental contradiction. Northrup (1993) wrote: Thus, actually two conflicting metaphors are being used: The well-known metaphor of information as a quantity, like water in the water-pipe, is at work, but so is a second metaphor, that of information as a choice, a choice made by :an information provider, and a forced choice made by an :information receiver. Actually, the second metaphor implies that the information sent isn't necessarily equal to the information received, because any choice implies a comparison with a list of possibilities, i.e., a list of possible meanings. Here, meaning is involved, thus spoiling the idea of information as a pure "Ding an sich." Thus, much of the confusion regarding the concept of information seems to be related to the basic confusion of metaphors in Shannon's theory: is information an autonomous quantity, or is information always per SE information to an observer? Actually, I don't think that Shannon himself chose one of the two definitions. Logically speaking, his theory implied information as a subjective phenomenon. But this had so wide-ranging epistemological impacts that Shannon didn't seem to fully realize this logical fact. Consequently, he continued to use metaphors about information as if it were an objective substance. This is the basic, inherent contradiction in Shannon's information theory." (Northrup, 1993, p. 5). In their seminal book The Study of Information: Interdisciplinary Messages, Almach and Mansfield (1983) collected key views on the interdisciplinary controversy in computer science, artificial intelligence, library and information science, linguistics, psychology, and physics, as well as in the social sciences. Almach (1983, p. 660) himself disagrees with the use of the concept of information in the context of signal transmission, the basic senses of information in his view all referring "to telling something or to the something that is being told. Information is addressed to human minds and is received by human minds." All other senses, including its use with regard to nonhuman organisms as well to society as a whole, are, according to Machlup, metaphoric and, as in the case of cybernetics, anthropomorphic. Hjørland (2007) describes the fundamental difference between objective and subjective views of information and argues that the subjective view has been supported by, among others, Bateson, Yovits, Span-Hansen, Brier, Buckland, Goguen, and Hjørland. Hjørland provided the following example: A stone on a field could contain different information for different people (or from one situation to another). It is not possible for information systems to map all the stone's possible information for every individual. Nor is any one mapping the one "true" mapping. But peop

    Read more →
  • Graphics

    Graphics

    Graphics (from Ancient Greek γραφικός (graphikós) 'pertaining to drawing, painting, writing, etc.') are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of data, as in design and manufacture, in typesetting and the graphic arts, and in educational and recreational software. Images that are generated by a computer are called computer graphics. Examples are photographs, drawings, line art, mathematical graphs, line graphs, charts, diagrams, typography, numbers, symbols, geometric designs, maps, engineering drawings, or other images. Graphics often combine text, illustration, and color. Graphic design may consist of the deliberate selection, creation, or arrangement of typography alone, as in a brochure, flyer, poster, web site, or book without any other element. The objective can be clarity or effective communication, association with other cultural elements, or merely the creation of a distinctive style. Graphics can be functional or artistic. The latter can be a recorded version, such as a photograph, or an interpretation by a scientist to highlight essential features, or an artist, in which case the distinction with imaginary graphics may become blurred. It can also be used for architecture. == History == The earliest graphics known to anthropologists studying prehistoric periods are cave paintings and markings on boulders, bone, ivory, and antlers, which were created during the Upper Palaeolithic period from 40,000 to 10,000 B.C. or earlier. Many of these were found to record astronomical, seasonal, and chronological details. Some of the earliest graphics and drawings are known to the modern world, from almost 6,000 years ago, are that of engraved stone tablets and ceramic cylinder seals, marking the beginning of the historical periods and the keeping of records for accounting and inventory purposes. Records from Egypt predate these and papyrus was used by the Egyptians as a material on which to plan the building of pyramids; they also used slabs of limestone and wood. From 600 to 250 BC, the Greeks played a major role in geometry. They used graphics to represent their mathematical theories such as the Circle Theorem and the Pythagorean theorem. In art, "graphics" is often used to distinguish work in a monotone and made up of lines, as opposed to painting. === Drawing === Drawing generally involves making marks on a surface by applying pressure from a tool or moving a tool across a surface. In which a tool is always used as if there were no tools it would be art. Graphical drawing is an instrumental guided drawing. === Printmaking === Woodblock printing, including images is first seen in China after paper was invented (about A.D. 105). In the West, the main techniques have been woodcut, engraving and etching, but there are many others. ==== Etching ==== Etching is an intaglio method of printmaking in which the image is incised into the surface of a metal plate using an acid. The acid eats the metal, leaving behind roughened areas, or, if the surface exposed to the acid is very thin, burning a line into the plate. The use of the process in printmaking is believed to have been invented by Daniel Hopfer (c. 1470–1536) of Augsburg, Germany, who decorated armour in this way. Etching is also used in the manufacturing of printed circuit boards and semiconductor devices. === Line art === Line art is a rather non-specific term sometimes used for any image that consists of distinct straight and curved lines placed against a (usually plain) background, without gradations in shade (darkness) or hue (color) to represent two-dimensional or three-dimensional objects. Line art is usually monochromatic, although lines may be of different colors. === Illustration === An illustration is a visual representation such as a drawing, painting, photograph or other work of art that stresses the subject more than form. The aim of an illustration is to elucidate or decorate a story, poem or piece of textual information (such as a newspaper article), traditionally by providing a visual representation of something described in the text. The editorial cartoon, also known as a political cartoon, is an illustration containing a political or social message. Illustrations can be used to display a wide range of subject matter and serve a variety of functions, such as: giving faces to characters in a story displaying a number of examples of an item described in an academic textbook (e.g. A Typology) visualizing step-wise sets of instructions in a technical manual communicating subtle thematic tone in a narrative linking brands to the ideas of human expression, individuality, and creativity making a reader laugh or smile for fun (to make laugh) funny === Graphs === A graph or chart is a graphic that represents tabular or numeric data. Charts are often used to make it easier to understand large quantities of data and the relationships between different parts of the data. === Diagrams === A diagram is a simplified and structured visual representation of concepts, ideas, constructions, relations, statistical data, etc., used to visualize and clarify the topic. === Symbols === A symbol, in its basic sense, is a representation of a concept or quantity; i.e., an idea, object, concept, quality, etc. In more psychological and philosophical terms, all concepts are symbolic in nature, and representations for these concepts are simply token artifacts that are allegorical to (but do not directly codify) a symbolic meaning, or symbolism. === Maps === A map is a simplified depiction of a space, a navigational aid which highlights relations between objects within that space. Usually, a map is a two-dimensional, geometrically accurate representation of a three-dimensional space. One of the first 'modern' maps was made by Waldseemüller. === Photography === One difference between photography and other forms of graphics is that a photographer, in principle, just records a single moment in reality, with seemingly no interpretation. However, a photographer can choose the field of view and angle, and may also use other techniques, such as various lenses to choose the view or filters to change the colors. In recent times, digital photography has opened the way to an infinite number of fast, but strong, manipulations. Even in the early days of photography, there was controversy over photographs of enacted scenes that were presented as 'real life' (especially in war photography, where it can be very difficult to record the original events). Shifting the viewer's eyes ever so slightly with simple pinpricks in the negative could have a dramatic effect. The choice of the field of view can have a strong effect, effectively 'censoring out' other parts of the scene, accomplished by cropping them out or simply not including them in the photograph. This even touches on the philosophical question of what reality is. The human brain processes information based on previous experience, making us see what we want to see or what we were taught to see. Photography does the same, although the photographer interprets the scene for their viewer. === Engineering drawings === An engineering drawing is a type of drawing and is technical in nature, used to fully and clearly define requirements for engineered items. It is usually created in accordance with standardized conventions for layout, nomenclature, interpretation, appearance (such as typefaces and line styles), size, etc. === Computer graphics === There are two types of computer graphics: raster graphics, where each pixel is separately defined (as in a digital photograph), and vector graphics, where mathematical formulas are used to draw lines and shapes, which are then interpreted at the viewer's end to produce the graphic. Using vectors results in infinitely sharp graphics and often smaller files, but, when complex, like vectors take time to render and may have larger file sizes than a raster equivalent. In 1950, the first computer-driven display was attached to MIT's Whirlwind I computer to generate simple pictures. This was followed by MIT's TX-0 and TX-2, interactive computing which increased interest in computer graphics during the late 1950s. In 1962, Ivan Sutherland invented Sketchpad, an innovative program that influenced alternative forms of interaction with computers. In the mid-1960s, large computer graphics research projects were begun at MIT, General Motors, Bell Labs, and Lockheed Corporation. Douglas T. Ross of MIT developed an advanced compiler language for graphics programming. S.A.Coons, also at MIT, and J. C. Ferguson at Boeing, began work in sculptured surfaces. GM developed their DAC-1 system, and other companies, such as Douglas, Lockheed, and McDonnell, also made significant developments. In 1968, ray tracing was first described by Arthur Appel of the IBM Research Center, Yorktown Heights, N

    Read more →
  • Reverse data management

    Reverse data management

    Reverse data management describes a branch and set of research questions in relational database theory that aim to reverse the common focus of standard data management. Instead of focusing on the "forward" transformation of an input databases (a set of relational tables) to an output table, which is the main focus of standard query evaluation, reverse data management reverses that focus and studies the possible input database transformations that would achieve a desired output. Usually the objective is to find an intervention (a deletion, addition, or change of tuples) of minimal size, in order to achieve a particular change in the output. The problem has been studied at least since the 1980s, but has received renewed attention due to an influential paper in the early 2000s that made a connection between provenance and view propagation. The term was coined in a VLDB 2011 vision paper. The problem has been receiving significant attention in recent years due to its connection to computational fairness. == Topics in reverse data management problems == Example topics in reverse data management include: Deletion propagation with source side-effects: Find a minimal number of tuples to delete in the database in order to delete a particular tuple in the output. Deletion propagation with view side-effects: Find a set of tuples to delete in the database in order to delete a particular tuple in the output, while removing the minimal number of other output tuples. Causal responsibility: Find a minimal number of tuples to delete in the database in order to make a particular input tuple counterfactual. This notion is inspired by the notions of actual cause and causal responsibility from the work of Halpern and Pearl. Resilience: Find a minimal number of tuples to delete in the database in order to make a Boolean query false. The complexity of this problem is identical to the problem of deletion propagation with source-side effects over a different database. Smallest witness problem: Find a minimal number of tuples to keep in the a database (or equivalently, delete a maximal number of tuples) while keeping a particular tuple in the output. Minimum repair: Given a database that violates certain integrity constraints, find a minimal number of tuples to delete in the database in order to fulfill all constraints (also called to "repair" the database).

    Read more →
  • Time Warp Edit Distance

    Time Warp Edit Distance

    In the data analysis of time series, Time Warp Edit Distance (TWED) is a measure of similarity (or dissimilarity) between pairs of discrete time series, controlling the relative distortion of the time units of the two series using the physical notion of elasticity. In comparison to other distance measures, (e.g. DTW (dynamic time warping) or LCS (longest common subsequence problem)), TWED is a metric. Its computational time complexity is O ( n 2 ) {\displaystyle O(n^{2})} , but can be drastically reduced in some specific situations by using a corridor to reduce the search space. Its memory space complexity can be reduced to O ( n ) {\displaystyle O(n)} . It was first proposed in 2009 by P.-F. Marteau. == Definition == δ λ , ν ( A 1 p , B 1 q ) = M i n { δ λ , ν ( A 1 p − 1 , B 1 q ) + Γ ( a p ′ → Λ ) d e l e t e i n A δ λ , ν ( A 1 p − 1 , B 1 q − 1 ) + Γ ( a p ′ → b q ′ ) m a t c h o r s u b s t i t u t i o n δ λ , ν ( A 1 p , B 1 q − 1 ) + Γ ( Λ → b q ′ ) d e l e t e i n B {\displaystyle \delta _{\lambda ,\nu }(A_{1}^{p},B_{1}^{q})=Min{\begin{cases}\delta _{\lambda ,\nu }(A_{1}^{p-1},B_{1}^{q})+\Gamma (a_{p}^{'}\to \Lambda )&{\rm {delete\ in\ A}}\\\delta _{\lambda ,\nu }(A_{1}^{p-1},B_{1}^{q-1})+\Gamma (a_{p}^{'}\to b_{q}^{'})&{\rm {match\ or\ substitution}}\\\delta _{\lambda ,\nu }(A_{1}^{p},B_{1}^{q-1})+\Gamma (\Lambda \to b_{q}^{'})&{\rm {delete\ in\ B}}\end{cases}}} whereas Γ ( α p ′ → Λ ) = d L P ( a p ′ , a p − 1 ′ ) + ν ⋅ ( t a p − t a p − 1 ) + λ {\displaystyle \Gamma (\alpha _{p}^{'}\to \Lambda )=d_{LP}(a_{p}^{'},a_{p-1}^{'})+\nu \cdot (t_{a_{p}}-t_{a_{p-1}})+\lambda } Γ ( α p ′ → b q ′ ) = d L P ( a p ′ , b q ′ ) + d L P ( a p − 1 ′ , b q − 1 ′ ) + ν ⋅ ( | t a p − t b q | + | t a p − 1 − t b q − 1 | ) {\displaystyle \Gamma (\alpha _{p}^{'}\to b_{q}^{'})=d_{LP}(a_{p}^{'},b_{q}^{'})+d_{LP}(a_{p-1}^{'},b_{q-1}^{'})+\nu \cdot (|t_{a_{p}}-t_{b_{q}}|+|t_{a_{p-1}}-t_{b_{q-1}}|)} Γ ( Λ → b q ′ ) = d L P ( b p ′ , b p − 1 ′ ) + ν ⋅ ( t b q − t b q − 1 ) + λ {\displaystyle \Gamma (\Lambda \to b_{q}^{'})=d_{LP}(b_{p}^{'},b_{p-1}^{'})+\nu \cdot (t_{b_{q}}-t_{b_{q-1}})+\lambda } Whereas the recursion δ λ , ν {\displaystyle \delta _{\lambda ,\nu }} is initialized as: δ λ , ν ( A 1 0 , B 1 0 ) = 0 , {\displaystyle \delta _{\lambda ,\nu }(A_{1}^{0},B_{1}^{0})=0,} δ λ , ν ( A 1 0 , B 1 j ) = ∞ f o r j ≥ 1 {\displaystyle \delta _{\lambda ,\nu }(A_{1}^{0},B_{1}^{j})=\infty \ {\rm {{for\ }j\geq 1}}} δ λ , ν ( A 1 i , B 1 0 ) = ∞ f o r i ≥ 1 {\displaystyle \delta _{\lambda ,\nu }(A_{1}^{i},B_{1}^{0})=\infty \ {\rm {{for\ }i\geq 1}}} with a 0 ′ = b 0 ′ = 0 {\displaystyle a'_{0}=b'_{0}=0} === Implementations === An implementation of the TWED algorithm in C with a Python wrapper is available at TWED is also implemented into the Time Series Subsequence Search Python package (TSSEARCH for short) available at [1]. An R implementation of TWED has been integrated into the TraMineR, a R package for mining, describing and visualizing sequences of states or events, and more generally discrete sequence data. Additionally, cuTWED is a CUDA- accelerated implementation of TWED which uses an improved algorithm due to G. Wright (2020). This method is linear in memory and massively parallelized. cuTWED is written in CUDA C/C++, comes with Python bindings, and also includes Python bindings for Marteau's reference C implementation. ==== Python ==== Backtracking, to find the most cost-efficient path: ==== MATLAB ==== Backtracking, to find the most cost-efficient path:

    Read more →
  • Higuchi dimension

    Higuchi dimension

    In fractal geometry, the Higuchi dimension (or Higuchi fractal dimension (HFD)) is an approximate value for the box-counting dimension of the graph of a real-valued function or time series. This value is obtained via an algorithmic approximation so one also talks about the Higuchi method. It has many applications in science and engineering and has been applied to subjects like characterizing primary waves in seismograms, clinical neurophysiology and analyzing changes in the electroencephalogram in Alzheimer's disease. == Formulation of the method == The original formulation of the method is due to T. Higuchi. Given a time series X : { 1 , … , N } → R {\displaystyle X:\{1,\dots ,N\}\to \mathbb {R} } consisting of N {\displaystyle N} data points and a parameter k m a x ≥ 2 {\displaystyle k_{\mathrm {max} }\geq 2} the Higuchi Fractal dimension (HFD) of X {\displaystyle X} is calculated in the following way: For each k ∈ { 1 , … , k m a x } {\displaystyle k\in \{1,\dots ,k_{\mathrm {max} }}\} and m ∈ { 1 , … , k } {\displaystyle m\in \{1,\dots ,k}\} define the length L m ( k ) {\displaystyle L_{m}(k)} by L m ( k ) = N − 1 ⌊ N − m k ⌋ k 2 ∑ i = 1 ⌊ N − m k ⌋ | X N ( m + i k ) − X N ( m + ( i − 1 ) k ) | . {\displaystyle L_{m}(k)={\frac {N-1}{\lfloor {\frac {N-m}{k}}\rfloor k^{2}}}\sum _{i=1}^{\lfloor {\frac {N-m}{k}}\rfloor }|X_{N}(m+ik)-X_{N}(m+(i-1)k)|.} The length L ( k ) {\displaystyle L(k)} is defined by the average value of the k {\displaystyle k} lengths L 1 ( k ) , … , L k ( k ) {\displaystyle L_{1}(k),\dots ,L_{k}(k)} , L ( k ) = 1 k ∑ m = 1 k L m ( k ) . {\displaystyle L(k)={\frac {1}{k}}\sum _{m=1}^{k}L_{m}(k).} The slope of the best-fitting linear function through the data points { ( log ⁡ 1 k , log ⁡ L ( k ) ) } {\displaystyle \left\{\left(\log {\frac {1}{k}},\log L(k)\right)\right\}} is defined to be the Higuchi fractal dimension of the time-series X {\displaystyle X} . == Application to functions == For a real-valued function f : [ 0 , 1 ] → R {\displaystyle f:[0,1]\to \mathbb {R} } one can partition the unit interval [ 0 , 1 ] {\displaystyle [0,1]} into N {\displaystyle N} equidistantly intervals [ t j , t j + 1 ) {\displaystyle [t_{j},t_{j+1})} and apply the Higuchi algorithm to the times series X ( j ) = f ( t j ) {\displaystyle X(j)=f(t_{j})} . This results into the Higuchi fractal dimension of the function f {\displaystyle f} . It was shown that in this case the Higuchi method yields an approximation for the box-counting dimension of the graph of f {\displaystyle f} as it follows a geometrical approach (see Liehr & Massopust 2020). == Robustness and stability == Applications to fractional Brownian functions and the Weierstrass function reveal that the Higuchi fractal dimension can be close to the box-dimension. On the other hand, the method can be unstable in the case where the data X ( 1 ) , … , X ( N ) {\displaystyle X(1),\dots ,X(N)} are periodic or if subsets of it lie on a horizontal line (see Liehr & Massopust 2020).

    Read more →
  • Concurrent MetateM

    Concurrent MetateM

    Concurrent MetateM is a multi-agent language in which each agent is programmed using a set of (augmented) temporal logic specifications of the behaviour it should exhibit. These specifications are executed directly to generate the behaviour of the agent. As a result, there is no risk of invalidating the logic as with systems where logical specification must first be translated to a lower-level implementation. The root of the MetateM concept is Gabbay's separation theorem; any arbitrary temporal logic formula can be rewritten in a logically equivalent past → future form. Execution proceeds by a process of continually matching rules against a history, and firing those rules when antecedents are satisfied. Any instantiated future-time consequents become commitments which must subsequently be satisfied, iteratively generating a model for the formula made up of the program rules. == Temporal Connectives == The Temporal Connectives of Concurrent MetateM can divided into two categories, as follows: Strict past time connectives: '●' (weak last), '◎' (strong last), '◆' (was), '■' (heretofore), 'S' (since), and 'Z' (zince, or weak since). Present and future time connectives: '◯' (next), '◇' (sometime), '□' (always), 'U' (until), and 'W' (unless). The connectives {◎,●,◆,■,◯,◇,□} are unary; the remainder are binary. === Strict past time connectives === ==== Weak last ==== ●ρ is satisfied now if ρ was true in the previous time. If ●ρ is interpreted at the beginning of time, it is satisfied despite there being no actual previous time. Hence "weak" last. ==== Strong last ==== ◎ρ is satisfied now if ρ was true in the previous time. If ◎ρ is interpreted at the beginning of time, it is not satisfied because there is no actual previous time. Hence "strong" last. ==== Was ==== ◆ρ is satisfied now if ρ was true in any previous moment in time. ==== Heretofore ==== ■ρ is satisfied now if ρ was true in every previous moment in time. ==== Since ==== ρSψ is satisfied now if ψ is true at any previous moment and ρ is true at every moment after that moment. ==== Zince, or weak since ==== ρZψ is satisfied now if (ψ is true at any previous moment and ρ is true at every moment after that moment) OR ψ has not happened in the past. === Present and future time connectives === ==== Next ==== ◯ρ is satisfied now if ρ is true in the next moment in time. ==== Sometime ==== ◇ρ is satisfied now if ρ is true now or in any future moment in time. ==== Always ==== □ρ is satisfied now if ρ is true now and in every future moment in time. ==== Until ==== ρUψ is satisfied now if ψ is true at any future moment and ρ is true at every moment prior. ==== Unless ==== ρWψ is satisfied now if (ψ is true at any future moment and ρ is true at every moment prior) OR ψ does not happen in the future.

    Read more →
  • Divide-and-conquer algorithm

    Divide-and-conquer algorithm

    In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem. The divide-and-conquer technique is the basis of efficient algorithms for many problems, such as sorting (e.g., quicksort, merge sort), multiplying large numbers (e.g., the Karatsuba algorithm), finding the closest pair of points, syntactic analysis (e.g., top-down parsers), SAT solving, and computing the discrete Fourier transform (FFT). Designing efficient divide-and-conquer algorithms can be difficult. As in mathematical induction, it is often necessary to generalize the problem to make it amenable to a recursive solution. The correctness of a divide-and-conquer algorithm is usually proved by mathematical induction, and its computational cost is often determined by solving recurrence relations. == Divide and conquer == The divide-and-conquer paradigm is often used to find an optimal solution of a problem. Its basic idea is to decompose a given problem into two or more similar, but simpler, subproblems, to solve them in turn, and to compose their solutions to solve the given problem. Problems of sufficient simplicity are solved directly. For example, to sort a given list of n natural numbers, split it into two lists of about n/2 numbers each, sort each of them in turn, and interleave both results appropriately to obtain the sorted version of the given list (see the picture). This approach is known as the merge sort algorithm. The name "divide and conquer" is sometimes applied to algorithms that reduce each problem to only one sub-problem, such as the binary search algorithm for finding a record in a sorted list (or its analogue in numerical computing, the bisection algorithm for root finding). These algorithms can be implemented more efficiently than general divide-and-conquer algorithms; in particular, if they use tail recursion, they can be converted into simple loops. Under this broad definition, however, every algorithm that uses recursion or loops could be regarded as a "divide-and-conquer algorithm". Therefore, some authors consider that the name "divide and conquer" should be used only when each problem may generate two or more subproblems. The name decrease and conquer has been proposed instead for the single-subproblem class. An important application of divide and conquer is in optimization, where if the search space is reduced ("pruned") by a constant factor at each step, the overall algorithm has the same asymptotic complexity as the pruning step, with the constant depending on the pruning factor (by summing the geometric series); this is known as prune and search. == Early historical examples == Early examples of these algorithms are primarily decrease and conquer – the original problem is successively broken down into single subproblems, and indeed can be solved iteratively. Binary search, a decrease-and-conquer algorithm where the subproblems are of roughly half the original size, has a long history. While a clear description of the algorithm on computers appeared in 1946 in an article by John Mauchly, the idea of using a sorted list of items to facilitate searching dates back at least as far as Babylonia in 200 BC. Another ancient decrease-and-conquer algorithm is the Euclidean algorithm to compute the greatest common divisor of two numbers by reducing the numbers to smaller and smaller equivalent subproblems, which dates to several centuries BC. An early example of a divide-and-conquer algorithm with multiple subproblems is Gauss's 1805 description of what is now called the Cooley–Tukey fast Fourier transform (FFT) algorithm, although he did not analyze its operation count quantitatively, and FFTs did not become widespread until they were rediscovered over a century later. An early two-subproblem D&C algorithm that was specifically developed for computers and properly analyzed is the merge sort algorithm, invented by John von Neumann in 1945. Another notable example is the algorithm invented by Anatolii A. Karatsuba in 1960 that could multiply two n-digit numbers in O ( n log 2 ⁡ 3 ) {\displaystyle O(n^{\log _{2}3})} operations (in Big O notation). This algorithm disproved Andrey Kolmogorov's 1956 conjecture that Ω ( n 2 ) {\displaystyle \Omega (n^{2})} operations would be required for that task. As another example of a divide-and-conquer algorithm that did not originally involve computers, Donald Knuth gives the method a post office typically uses to route mail: letters are sorted into separate bags for different geographical areas, each of these bags is itself sorted into batches for smaller sub-regions, and so on until they are delivered. This is related to a radix sort, described for punch-card sorting machines as early as 1929. == Advantages == === Solving difficult problems === Divide and conquer is a powerful tool for solving conceptually difficult problems: all it requires is a way of breaking the problem into sub-problems, of solving the trivial cases, and of combining sub-problems to the original problem. Similarly, decrease and conquer only requires reducing the problem to a single smaller problem, such as the classic Tower of Hanoi puzzle, which reduces moving a tower of height n {\displaystyle n} to move a tower of height n − 1 {\displaystyle n-1} . === Algorithm efficiency === The divide-and-conquer paradigm often helps in the discovery of efficient algorithms. It was the key, for example, to Karatsuba's fast multiplication method, the quicksort and mergesort algorithms, the Strassen algorithm for matrix multiplication, and fast Fourier transforms. In all these examples, the D&C approach led to an improvement in the asymptotic cost of the solution. For example, if (a) the base cases have constant-bounded size, the work of splitting the problem and combining the partial solutions is proportional to the problem's size n {\displaystyle n} , and (b) there is a bounded number p {\displaystyle p} of sub-problems of size ~ n p {\displaystyle {\frac {n}{p}}} at each stage, then the cost of the divide-and-conquer algorithm will be O ( n log p ⁡ n ) {\displaystyle O(n\log _{p}n)} . For other types of divide-and-conquer approaches, running times can also be generalized. For example, when a) the work of splitting the problem and combining the partial solutions take c n {\displaystyle cn} time, where n {\displaystyle n} is the input size and c {\displaystyle c} is some constant; b) when n < 2 {\displaystyle n<2} , the algorithm takes time upper-bounded by c {\displaystyle c} , and c) there are q {\displaystyle q} subproblems where each subproblem has size ~ n 2 {\displaystyle {\frac {n}{2}}} . Then, the running times are as follows: if the number of subproblems q > 2 {\displaystyle q>2} , then the divide-and-conquer algorithm's running time is bounded by O ( n log 2 ⁡ q ) {\displaystyle O(n^{\log _{2}q})} . if the number of subproblems is exactly one, then the divide-and-conquer algorithm's running time is bounded by O ( n ) {\displaystyle O(n)} . If, instead, the work of splitting the problem and combining the partial solutions take c n 2 {\displaystyle cn^{2}} time, and there are 2 subproblems where each has size n 2 {\displaystyle {\frac {n}{2}}} , then the running time of the divide-and-conquer algorithm is bounded by O ( n 2 ) {\displaystyle O(n^{2})} . === Parallelism === Divide-and-conquer algorithms are naturally adapted for execution in multi-processor machines, especially shared-memory systems where the communication of data between processors does not need to be planned in advance because distinct sub-problems can be executed on different processors. === Memory access === Divide-and-conquer algorithms naturally tend to make efficient use of memory caches. The reason is that once a sub-problem is small enough, it and all its sub-problems can, in principle, be solved within the cache, without accessing the slower main memory. An algorithm designed to exploit the cache in this way is called cache-oblivious, because it does not contain the cache size as an explicit parameter. Moreover, D&C algorithms can be designed for important algorithms (e.g., sorting, FFTs, and matrix multiplication) to be optimal cache-oblivious algorithms–they use the cache in a probably optimal way, in an asymptotic sense, regardless of the cache size. In contrast, the traditional approach to exploiting the cache is blocking, as in loop nest optimization, where the problem is explicitly divided into chunks of the appropriate size—this can also use the cache optimally, but only when the algorithm is tuned for the specific cache sizes of a particular machine. The same advantage exists with regards to other hierarchical storage systems, such as NUMA or virtual memory, as well as for multip

    Read more →
  • Data (word)

    Data (word)

    The word data is most often used as a singular collective mass noun in educated everyday usage. However, due to the history and etymology of the word, considerable controversy has existed on whether it should be considered a mass noun used with verbs conjugated in the singular, or should be treated as the plural of the now-rarely-used datum. == Usage in English == In one sense, data is the plural form of datum. Datum actually can also be a count noun with the plural datums (see usage in datum article) that can be used with cardinal numbers (e.g., "80 datums"); data (originally a Latin plural) is not used like a normal count noun with cardinal numbers and can be plural with plural determiners such as these and many, or it can be used as a mass noun with a verb in the singular form. Even when a very small quantity of data is referenced (one number, for example), the phrase piece of data is often used, as opposed to datum. The debate over appropriate usage continues, but "data" as a singular form is far more common. In English, the word datum is still used in the general sense of "an item given". In cartography, geography, nuclear magnetic resonance and technical drawing, it is often used to refer to a single specific reference datum from which distances to all other data are measured. Any measurement or result is a datum, though data point is now far more common. Data is indeed most often used as a singular mass noun in educated everyday usage. Some major newspapers, such as The New York Times, use it either in the singular or plural. In The New York Times, the phrases "the survey data are still being analyzed" and "the first year for which data is available" have appeared within one day. The Wall Street Journal explicitly allows this usage in its style guide. The Associated Press style guide classifies data as a collective noun that takes the singular when treated as a unit but the plural when referring to individual items (e.g., "The data is sound" and "The data have been carefully collected"). In scientific writing, data is often treated as a plural, as in These data do not support the conclusions, but the word is also used as a singular mass entity like information (e.g., in computing and related disciplines). British usage now widely accepts treating data as singular in standard English, including everyday newspaper usage at least in non-scientific use. UK scientific publishing still prefers treating it as a plural. Some UK university style guides recommend using data for both singular and plural use, and others recommend treating it only as a singular in connection with computers. The IEEE Computer Society allows usage of data as either a mass noun or plural based on author preference, while IEEE in the editorial style manual indicates to always use the plural form. Some professional organizations and style guides require that authors treat data as a plural noun. For example, the Air Force Flight Test Center once stated that the word data is always plural, never singular.

    Read more →
  • Document

    Document

    A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The etymology of the word "document" derives from the Latin documentum, which denotes a "teaching" or "lesson": the verb doceō denotes "to teach". Historically, the term "document" was usually used to indicate written proof useful as evidence of a truth or fact. In the Computer Age, the term "document" typically refers to a primarily textual computer file, encompassing its structural and format elements, such as fonts, colors, and images. In the contemporary era, the definition of "document" has expanded beyond its traditional medium, such as paper, to encompass electronic documents as well. History, events, examples, opinions, stories, and creativity can all be expressed in documents. "Documentation" is distinct because it has more denotations than "document". Documents are also distinguished from "realia", which are three-dimensional objects that would otherwise satisfy the definition of "document" because they memorialize or represent thought. Documents are usually considered to be two-dimensional representations. == Abstract definitions == The concept of "document" has been defined by Suzanne Briet as "any concrete or symbolic indication, preserved or recorded, for reconstructing or for proving a phenomenon, whether physical or mental." An often-cited article concludes that "the evolving notion of document" among Jonathan Priest, Paul Otlet, Briet, Walter Schürmeyer, and the other documentalists increasingly emphasized whatever functioned as a document rather than traditional physical forms of documents. The shift to digital technology would seem to make this distinction even more important. David M. Levy has said that an emphasis on the technology of digital documents has impeded our understanding of digital documents as documents. A conventional document, such as a mail message or a technical report, exists physically in digital technology as a string of bits, as does everything else in a digital environment. As an object of study, it has been made into a document. It has become physical evidence by those who study it. "Document" is defined in library and information science and documentation science as a fundamental, abstract idea: the word denotes everything that may be represented or memorialized to serve as evidence. The classic example provided by Briet is an antelope: "An antelope running wild on the plains of Africa should not be considered a document[;] she rules. But if it were to be captured, taken to a zoo and made an object of study, it has been made into a document. It has become physical evidence being used by those who study it. Indeed, scholarly articles written about the antelope are secondary documents, since the antelope itself is the primary document." This opinion has been interpreted as an early expression of actor–network theory. == Kinds == A document can be structured, like tabular documents, lists, forms, or scientific charts, semi-structured like a book or a newspaper article, or unstructured like a handwritten note. Documents are sometimes classified as secret, private, or public. They may also be described as drafts or proofs. When a document is copied, the source is denominated the "original". Documents are used in numerous fields, e.g.: Academia: manuscript, thesis, paper, journal, chart, and technical drawing Media: mock-up, script, image, photography, and newspaper article Administration, law, and politics: application, brief, certificate, commission, constitutional document, form, gazette, identity document, license, manifesto, summons, census, and white paper Business: invoice, request for proposal, proposal, contract, packing slip, manifest, report (detailed and summary), spreadsheet, material safety data sheet, waybill, bill of lading, financial statement, nondisclosure agreement (NDA), mutual nondisclosure agreement, and user guide Geography and planning: topographic map, cadastre, legend, and architectural plan Such standard documents can be drafted based on a template. == Drafting == The page layout of a document is how information is graphically arranged in the space of the document, e.g., on a page. If the appearance of the document is of concern, the page layout is generally the responsibility of a graphic designer. Typography concerns the design of letter and symbol forms and their physical arrangement in the document (see typesetting). Information design concerns the effective communication of information, especially in industrial documents and public signs. Simple textual documents may not require visual design and may be drafted only by an author, clerk, or transcriber. Forms may require a visual design for their initial fields, but not to complete the forms. == Media == Traditionally, the medium of a document was paper and the information was applied to it in ink, either by handwriting (to make a manuscript) or by a mechanical process (e.g., a printing press or laser printer). Today, some short documents also may consist of sheets of paper stapled together. Historically, documents were inscribed with ink on papyrus (starting in ancient Egypt) or parchment; scratched as runes or carved on stone using a sharp tool, e.g., the Tablets of Stone described in the Bible; stamped or incised in clay and then baked to make clay tablets, e.g., in the Sumerian and other Mesopotamian civilizations. The papyrus or parchment was often rolled into a scroll or cut into sheets and bound into a codex (book). Contemporary electronic means of memorializing and displaying documents include: Monitor of a desktop computer, laptop, tablet; optionally with a printer to produce a hard copy; Personal digital assistant; Dedicated e-book device; Electronic paper, typically, using the Portable Document Format (PDF); Information appliance; Digital audio player; and Radio and television service provider. Digital documents usually require a specific file format to be presentable in a specific medium. == In law == Documents in all forms frequently serve as material evidence in criminal and civil proceedings. The forensic analysis of such a document is within the scope of questioned document examination. To catalog and manage the large number of documents that may be produced during litigation, Bates numbering is often applied to all documents in the lawsuit so that each document has a unique, arbitrary, identification number.

    Read more →
  • Trigger list

    Trigger list

    Trigger list in its most general meaning refers to a list whose items are used to initiate ("trigger") certain actions. == United States: Private financial information == In the United States, when a person applies for a mortgage loan, the lender makes a credit inquiry about the potential borrower from the national credit bureaus, Equifax, Experian and TransUnion. Unless the borrower is opted out, the credit bureaus put the applicants onto a "trigger list" of "leads" about persons who are interested in new loans. These lists are sold to numerous lenders all over the United States, and soon after the application the applicant starts receiving offers from all parts of the country. The trigger lists contain a significant amount of personal financial information. Among the buyers of trigger lists are "lead generators" which resell filtered information to borrowers, e.g., of people who live in a certain area and have a certain credit score. While the Federal Trade Commission considers the market of "trigger lists" to be a legal business, many people and organizations (such as the National Association of Mortgage Brokers) consider this a serious breach of privacy and lobby for putting this practice under regulatory controls. As of now, American consumers may opt-out from "trigger lists" by calling 1-888-5-OPTOUT (1-888-567-8688). == Nuclear non-proliferation == The Zangger Committee and the Nuclear Suppliers Group maintain lists of items that may contribute to nuclear proliferation; The nuclear non-proliferation treaty forbids its members to export such items to non-treaty members. these items are said to trigger the countries' responsibilities under the NPT, hence the name.

    Read more →
  • Online analytical processing

    Online analytical processing

    In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term OLAP was created as a slight modification of the traditional database term online transaction processing (OLTP). OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture. OLAP tools enable users to analyse multidimensional data interactively from multiple perspectives. OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, and slicing and dicing. Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate sales trends. By contrast, the drill-down is a technique that allows users to navigate through the details. For instance, users can view the sales by individual products that make up a region's sales. Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the OLAP cube and view (dicing) the slices from different viewpoints. These viewpoints are sometimes called dimensions (such as looking at the same sales by salesperson, or by date, or by customer, or by product, or by region, etc.). Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad hoc queries with a rapid execution time. They borrow aspects of navigational databases, hierarchical databases and relational databases. OLAP is typically contrasted to OLTP (online transaction processing), which is generally characterized by much less complex queries, in a larger volume, to process transactions rather than for the purpose of business intelligence or reporting. Whereas OLAP systems are mostly optimized for read, OLTP has to process all kinds of queries (read, insert, update and delete). == Overview of OLAP systems == At the core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or a hypercube). It consists of numeric facts called measures that are categorized by dimensions. The measures are placed at the intersections of the hypercube, which is spanned by the dimensions as a vector space. The usual interface to manipulate an OLAP cube is a matrix interface, like Pivot tables in a spreadsheet program, which performs projection operations along the dimensions, such as aggregation or averaging. The cube metadata is typically created from a star schema or snowflake schema or fact constellation of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables. Each measure can be thought of as having a set of labels, or meta-data associated with it. A dimension is what describes these labels; it provides information about the measure. A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a dimension. Each Sale has a Date/Time label that describes more about that sale. For example: Sales Fact Table +-------------+----------+ | sale_amount | time_id | +-------------+----------+ Time Dimension | 930.10| 1234 |----+ +---------+-------------------+ +-------------+----------+ | | time_id | timestamp | | +---------+-------------------+ +---->| 1234 | 20080902 12:35:43 | +---------+-------------------+ === Multidimensional databases === Multidimensional structure is defined as "a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data". The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. "Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions". Even when data is manipulated it remains easy to access and continues to constitute a compact database format. The data still remains interrelated. Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications. Analytical databases use these databases because of their ability to deliver answers to complex business queries swiftly. Data can be viewed from different angles, which gives a broader perspective of a problem unlike other models. === Aggregations === It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time required for the same query on OLTP relational data. The most important mechanism in OLAP which allows it to achieve such performance is the use of aggregations. Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions, using an aggregate function (or aggregation function). The number of possible aggregations is determined by every possible combination of dimension granularities. The combination of all possible aggregations and the base data contains the answers to every query which can be answered from the data. Because usually there are many aggregations that can be calculated, often only a predetermined number are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations (views) to calculate is known as the view selection problem. View selection can be constrained by the total size of the selected set of aggregations, the time to update them from changes in the base data, or both. The objective of view selection is typically to minimize the average time to answer OLAP queries, although some studies also minimize the update time. View selection is NP-complete. Many approaches to the problem have been explored, including greedy algorithms, randomized search, genetic algorithms and A search algorithm. Some aggregation functions can be computed for the entire OLAP cube by precomputing values for each cell, and then computing the aggregation for a roll-up of cells by aggregating these aggregates, applying a divide and conquer algorithm to the multidimensional problem to compute them efficiently. For example, the overall sum of a roll-up is just the sum of the sub-sums in each cell. Functions that can be decomposed in this way are called decomposable aggregation functions, and include COUNT, MAX, MIN, and SUM, which can be computed for each cell and then directly aggregated; these are known as self-decomposable aggregation functions. In other cases, the aggregate function can be computed by computing auxiliary numbers for cells, aggregating these auxiliary numbers, and finally computing the overall number at the end; examples include AVERAGE (tracking sum and count, dividing at the end) and RANGE (tracking max and min, subtracting at the end). In other cases, the aggregate function cannot be computed without analyzing the entire set at once, though in some cases approximations can be computed; examples include DISTINCT COUNT, MEDIAN, and MODE; for example, the median of a set is not the median of medians of subsets. These latter are difficult to implement efficiently in OLAP, as they require computing the aggregate function on the base data, either computing them online (slow) or precomputing them for possible rollouts (large space). == Types == OLAP systems have been traditionally categorized using the following taxonomy. === Multidimensional OLAP (MOLAP) === MOLAP (multi-dimensional online analytical processing) is the classic form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Some MOLAP tools require the pre-computation and storage of derived data, such as consolidations – the operation known as processing. Such MOLAP tools generally utilize a pre-calculated data set referred to as a data cube. The data cube contains all the possible answers to a given range of questions. As a result, they have a very fast response to queries. On the other hand, updating can take a long time depending on the degree of pre-computation. Pre-computation can also lead to what is known as data explosion. Other MOLAP tools, particularly those that implement the functional database model do not pre-compute derived data but make all calculations on demand other than those that were previously requested and stored in a cache. Advantages of MOLAP Fast query performance due to optimized storage, multidimensional indexing and caching. Smaller on-disk size of data compared to data stored in relational database due to compression techniques. Automated computation of higher-level aggregates of the data. It is very compact for low dimension data se

    Read more →
  • Master data

    Master data

    Master data represents "data about the business entities that provide context for business transactions". The most commonly found categories of master data are parties (individuals and organisations, and their roles, such as customers, suppliers, employees), products, financial structures (such as ledgers and cost centres) and locational concepts. Master data should be distinguished from reference data. While both provide context for business transactions, reference data is concerned with classification and categorisation, while master data is concerned with business entities. Master data is, by its nature, almost always non-transactional in nature. There exist edge cases where an organization may need to treat certain transactional processes and operations as "master data". This arises, for example, where information about master data entities, such as customers or products, is only contained within transactional data such as orders and receipts and is not housed separately. ISO 8000 is the international standard for data quality and data portability in master data. == Alternative definition == An alternative definition of the term master data is that it represents the business objects that contain the most valuable, agreed upon information shared across an organization. In this sense, it gives context to business activities and transactions, answering questions like who, what, when and how as well as expanding the ability to make sense of these activities through categorizations, groupings and hierarchies. It can cover relatively static reference data, transactional, unstructured, analytical, hierarchical and metadata. What constitutes master data under this definition is therefore not about an essential quality of the data (e.g. it is a business entity that provides context for business transactions), but rather about the context in which the organisation has decided to treat the data. == Externally-defined master data == For most organisations, most or all master data is defined and managed within that organisation. Some master data, however, may be externally defined and managed. This represents the single source of basic business data used across a marketplace, regardless of organisation or location. Thus, it can be used by multiple enterprises within a value chain, facilitating "integration of multiple data sources and literally [putting] everyone in the market on the same page." An example of market master data is the Universal Product Code (UPC) found on consumer products. == Master data management == Curating and managing master data is key to ensuring its quality and thus fitness for purpose. All aspects of an organisation, operational and analytical, are greatly dependent on the quality of an organization's master data. Master Data is therefore the focus of the information technology (IT) discipline of master data management (MDM). Without this discipline in place, organisations commonly encounter difficulties with having multiple versions of "the truth" about a business entity, both within individual applications, and distributed across applications.

    Read more →