AI For Business Free Course

AI For Business Free Course — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Dynamic texture

    Dynamic texture

    Dynamic texture ( sometimes referred to as temporal texture) is the texture with motion which can be found in videos of sea-waves, fire, smoke, wavy trees, etc. Dynamic texture has a spatially repetitive pattern with time-varying visual pattern. Modeling and analyzing dynamic texture is a topic of images processing and pattern recognition in computer vision. Extracting features that describe the dynamic texture can be utilized for tasks of images sequences classification, segmentation, recognition and retrieval. Comparing with texture found within static images, analyzing dynamic texture is a challenging problem. It is important that the extracted features from dynamic texture combine motion and appearance description, and also be invariance to some transformation such as rotation, translation and illumination. == Analysis methods of dynamic texture == The methods of dynamic texture recognition can categorized as follows: Methods based on optical flow: by applying optical flow to the dynamic texture, velocity with direction and magnitude can be detected and used to recognize the dynamic texture. Due to simplicity of its computation, it is currently the most popular method. Methods computing geometric properties: this methods track the surfaces of motion trajectories in spatiotemporal domain. Methods based on local spatiotemporal filtering : this methods analyze the local spatiotemporal patterns and its orientation and energy and employ them as feature used for classification. Methods based on global spatiotemporal transform: this method characterize the motion at different scale using wavelets that can decompose the motion into local and global. Model-based methods : These methods aims at generating a model to describe the motion by a set of parameters. == Applications == - Segmenting the sequence images of natural scenes. This helps on differentiate between streets and grass alongside these streets which could be used in the application of navigations. - Motion detection : Dynamic texture features extracted from footage videos can be exploited to detect abnormal crowd activities. - Video classification: video of natural scenes or other scenes that exhibit dynamic textures. - Video retrieval : Dynamic textures can be employed as a feature retrieve videos that contain, for example, sea-waves, smoke, clouds, wavy trees.

    Read more →
  • Chandy–Misra–Haas algorithm resource model

    Chandy–Misra–Haas algorithm resource model

    The Chandy–Misra–Haas algorithm resource model checks for deadlock in a distributed system. It was developed by K. Mani Chandy, Jayadev Misra and Laura M. Haas. == Locally dependent == Consider the n processes P1, P2, P3, P4, P5,, ... ,Pn which are performed in a single system (controller). P1 is locally dependent on Pn, if P1 depends on P2, P2 on P3, so on and Pn−1 on Pn. That is, if P 1 → P 2 → P 3 → … → P n {\displaystyle P_{1}\rightarrow P_{2}\rightarrow P_{3}\rightarrow \ldots \rightarrow P_{n}} , then P 1 {\displaystyle P_{1}} is locally dependent on P n {\displaystyle P_{n}} . If P1 is said to be locally dependent to itself if it is locally dependent on Pn and Pn depends on P1: i.e. if P 1 → P 2 → P 3 → … → P n → P 1 {\displaystyle P_{1}\rightarrow P_{2}\rightarrow P_{3}\rightarrow \ldots \rightarrow P_{n}\rightarrow P_{1}} , then P 1 {\displaystyle P_{1}} is locally dependent on itself. == Description == The algorithm uses a message called probe(i,j,k) to transfer a message from controller of process Pj to controller of process Pk. It specifies a message started by process Pi to find whether a deadlock has occurred or not. Every process Pj maintains a boolean array dependent which contains the information about the processes that depend on it. Initially the values of each array are all "false". === Controller sending a probe === Before sending, the probe checks whether Pj is locally dependent on itself. If so, a deadlock occurs. Otherwise it checks whether Pj, and Pk are in different controllers, are locally dependent and Pj is waiting for the resource that is locked by Pk. Once all the conditions are satisfied it sends the probe. === Controller receiving a probe === On the receiving side, the controller checks whether Pk is performing a task. If so, it neglects the probe. Otherwise, it checks the responses given Pk to Pj and dependentk(i) is false. Once it is verified, it assigns true to dependentk(i). Then it checks whether k is equal to i. If both are equal, a deadlock occurs, otherwise it sends the probe to next dependent process. == Algorithm == In pseudocode, the algorithm works as follows: === Controller sending a probe === if Pj is locally dependent on itself then declare deadlock else for all Pj,Pk such that (i) Pi is locally dependent on Pj, (ii) Pj is waiting for 'Pk and (iii) Pj, Pk are on different controllers. send probe(i, j, k). to home site of Pk === Controller receiving a probe === if (i)Pk is idle / blocked (ii) dependentk(i) = false, and (iii) Pk has not replied to all requests of to Pj then begin "dependents""k"(i) = true; if k == i then declare that Pi is deadlocked else for all Pa,Pb such that (i) Pk is locally dependent on Pa, (ii) Pa is waiting for 'Pb and (iii) Pa, Pb are on different controllers. send probe(i, a, b). to home site of Pb end == Example == P1 initiates deadlock detection. C1 sends the probe saying P2 depends on P3. Once the message is received by C2, it checks whether P3 is idle. P3 is idle because it is locally dependent on P4 and updates dependent3(2) to True. As above, C2 sends probe to C3 and C3 sends probe to C1. At C1, P1 is idle so it update dependent1(1) to True. Therefore, deadlock can be declared. == Complexity == Suppose there are n {\displaystyle n} controllers and m {\displaystyle m} processes, at most m ( n − 1 ) / 2 {\displaystyle m(n-1)/2} messages need to be exchanged to detect a deadlock, with a delay of O ( n ) {\displaystyle O(n)} messages.

    Read more →
  • Pol.is

    Pol.is

    Polis (or Pol.is) is wiki survey software designed for large group collaborations. As a civic technology, Polis allows people to share their opinions and ideas, and its algorithm is intended to elevate ideas that can facilitate better decision-making, especially when there are lots of participants. Polis has been credited for assisting the passage of legislation in Taiwan. Pol.is has been used by governments in the United States, Canada, Singapore, Philippines, Finland, Spain and elsewhere. == History == Pol.is was founded by Colin Megill, Christopher Small, and Michael Bjorkegren after the Occupy Wall Street and Arab Spring movements. In Taiwan, pol.is has been "one of the key parts" of vTaiwan's suite of open-source tools for its citizen engagement efforts arising out of the Sunflower Student Movement. vTaiwan claims that of the 26 national issues related to technology discussed on the platform, 80% led to government action. Pol.is is also utilized by "Join," a national platform for online deliberation run by the Taiwanese government. In 2022, Wired reported that Polis was an influence on the Community Notes project at Twitter. In 2023, Megill advised OpenAI on how to facilitate deliberation at scale in a way that was more efficient than Polis, which still required significant human labor and analysis at the time. He helped to award $1 million in grants to teams working on solving the problem of deliberation at scale. In 2023, Anthropic was also exploring steering model behavior using Polis. In 2025, it helped the county that includes Bowling Green, Kentucky make a 25 year plan by facilitating the collection and review of ideas from thousands of residents, representing 10% of the county. 2,370 of 3,940 unique ideas were agreed-upon by over 80% of survey respondents. Ideas were screened by volunteers if they were redundant to an existing idea, off-topic or obscene. == How it works == Pol.is participants are anonymous and cannot reply directly to others posts, in an effort to avoid personal attacks for users. Its algorithms are designed not for engagement and scrolling, but to find areas of agreement to better understand the nuances of a wide range of opinions. Participants are prompted for ideas and vote on other participants' ideas. == Reception == Andrew Leonard, The Financial Times, and VentureBeat describe Pol.is as a possible antidote to the divisiveness of traditional internet discourse by gamifying consensus. Audrey Tang agreed saying, "Polis is quite well known in that it's a kind of social media that instead of polarizing people to drive so called engagement or addiction or attention, it automatically drives bridge making narratives and statements. So only the ideas that speak to both sides or to multiple sides will gain prominence in Polis." Niall Ferguson argues that the approach to utilize tools like Pol.is and Join in Taiwan empowers ordinary people instead of the elite and protects individual freedoms, providing a contrast to the AI-enhanced panopticon model seen in China. Carl Miller praised the technology as having "gamified finding consensus." Darshana Narayanan, in an op-ed in the Economist, argues that open-source machine-learning-based tools like Polis can help to bypass the influence of special interests or experts. Jamie Susskind cited polis and vTaiwan as a model for democracies, particularly around digital policy issues.

    Read more →
  • Car–Parrinello molecular dynamics

    Car–Parrinello molecular dynamics

    Car–Parrinello molecular dynamics (CPMD) refers to either a method used in molecular dynamics (also known as the Car–Parrinello method) or the computational chemistry software package used to implement this method. The CPMD method is one of the major methods for calculating ab initio molecular dynamics (ab initio MD or AIMD). Ab initio molecular dynamics (AIMD) is a computational method that uses first principles through quantum mechanics to simulate the motion of atoms in a system. It is a type of molecular dynamics (MD) simulation that does not rely on empirical potentials or force fields to describe the interactions between atoms, but rather calculates these interactions entirely from the electronic structure of the system using quantum mechanics. In an ab initio MD simulation, the total energy of the system is calculated at each time step using density functional theory (DFT), Hartree-Fock (HF), or other electronic structure calculation methods. The forces acting on each atom are then determined from the gradient of the energy with respect to the atomic coordinates, and the equations of motion are solved to predict the trajectory of the atoms. AIMD permits chemical bond breaking and forming events to occur and accounts for electronic polarization effect. Therefore, Ab initio MD simulations can be used to study a wide range of phenomena, including the structural, thermodynamic, and dynamic properties of materials and chemical reactions. They are particularly useful for systems that are not well described by empirical potentials or force fields, such as systems with strong electronic correlation or systems with many degrees of freedom. However, ab initio MD simulations are computationally demanding and require significant computational resources. The CPMD method is related to the more common Born–Oppenheimer molecular dynamics (BOMD) method in that the quantum mechanical effect of the electrons is included in the calculation of energy and forces for the classical motion of the nuclei. CPMD and BOMD are different types of AIMD. However, whereas BOMD treats the electronic structure problem within the time-independent Schrödinger equation, CPMD explicitly includes the electrons as active degrees of freedom, via (fictitious) dynamical variables. The software is a parallelized plane wave / pseudopotential implementation of density functional theory, particularly designed for ab initio molecular dynamics. == Car–Parrinello method == The Car–Parrinello method is a type of molecular dynamics, usually employing periodic boundary conditions, planewave basis sets, and density functional theory, proposed by Roberto Car and Michele Parrinello in 1985 while working at SISSA, who were subsequently awarded the Dirac Medal by ICTP in 2009. In contrast to Born–Oppenheimer molecular dynamics wherein the nuclear (ions) degree of freedom are propagated using ionic forces which are calculated at each iteration by approximately solving the electronic problem with conventional matrix diagonalization methods, the Car–Parrinello method explicitly introduces the electronic degrees of freedom as (fictitious) dynamical variables, writing an extended Lagrangian for the system which leads to a system of coupled equations of motion for both ions and electrons. In this way, an explicit electronic minimization at each time step, as done in Born–Oppenheimer MD, is not needed: after an initial standard electronic minimization, the fictitious dynamics of the electrons keeps them on the electronic ground state corresponding to each new ionic configuration visited along the dynamics, thus yielding accurate ionic forces. In order to maintain this adiabaticity condition, it is necessary that the fictitious mass of the electrons is chosen small enough to avoid a significant energy transfer from the ionic to the electronic degrees of freedom. This small fictitious mass in turn requires that the equations of motion are integrated using a smaller time step than the one (1–10 fs) commonly used in Born–Oppenheimer molecular dynamics. Currently, the CPMD method can be applied to systems that consist of a few tens or hundreds of atoms and access timescales on the order of tens of picoseconds. == General approach == In CPMD the core electrons are usually described by a pseudopotential and the wavefunction of the valence electrons are approximated by a plane wave basis set. The ground state electronic density (for fixed nuclei) is calculated self-consistently, usually using the density functional theory method. Kohn-Sham equations are often used to calculate the electronic structure, where electronic orbitals are expanded in a plane-wave basis set. Then, using that density, forces on the nuclei can be computed, to update the trajectories (using, e.g. the Verlet integration algorithm). In addition, however, the coefficients used to obtain the electronic orbital functions can be treated as a set of extra spatial dimensions, and trajectories for the orbitals can be calculated in this context. == Fictitious dynamics == CPMD is an approximation of the Born–Oppenheimer MD (BOMD) method. In BOMD, the electrons' wave function must be minimized via matrix diagonalization at every step in the trajectory. CPMD uses fictitious dynamics to keep the electrons close to the ground state, preventing the need for a costly self-consistent iterative minimization at each time step. The fictitious dynamics relies on the use of a fictitious electron mass (usually in the range of 400 – 800 a.u.) to ensure that there is very little energy transfer from nuclei to electrons, i.e. to ensure adiabaticity. Any increase in the fictitious electron mass resulting in energy transfer would cause the system to leave the ground-state BOMD surface. === Lagrangian === L = 1 2 ( ∑ I n u c l e i M I R ˙ I 2 + μ ∑ i o r b i t a l s ∫ d r | ψ ˙ i ( r , t ) | 2 ) − E [ { ψ i } , { R I } ] + ∑ i j Λ i j ( ∫ d r ψ i ψ j − δ i j ) , {\displaystyle {\mathcal {L}}={\frac {1}{2}}\left(\sum _{I}^{\mathrm {nuclei} }\ M_{I}{\dot {\mathbf {R} }}_{I}^{2}+\mu \sum _{i}^{\mathrm {orbitals} }\int d\mathbf {r} \ |{\dot {\psi }}_{i}(\mathbf {r} ,t)|^{2}\right)-E\left[\{\psi _{i}\},\{\mathbf {R} _{I}\}\right]+\sum _{ij}\Lambda _{ij}\left(\int d\mathbf {r} \ \psi _{i}\psi _{j}-\delta _{ij}\right),} where μ {\displaystyle \mu } is the fictitious mass parameter; E[{ψi},{RI}] is the Kohn–Sham energy density functional, which outputs energy values when given Kohn–Sham orbitals and nuclear positions. === Orthogonality constraint === ∫ d r ψ i ∗ ( r , t ) ψ j ( r , t ) = δ i j , {\displaystyle \int d\mathbf {r} \ \psi _{i}^{}(\mathbf {r} ,t)\psi _{j}(\mathbf {r} ,t)=\delta _{ij},} where δij is the Kronecker delta. === Equations of motion === The equations of motion are obtained by finding the stationary point of the Lagrangian under variations of ψi and RI, with the orthogonality constraint. M I R ¨ I = − ∇ I E [ { ψ i } , { R I } ] {\displaystyle M_{I}{\ddot {\mathbf {R} }}_{I}=-\nabla _{I}\,E\left[\{\psi _{i}\},\{\mathbf {R} _{I}\}\right]} μ ψ ¨ i ( r , t ) = − δ E δ ψ i ∗ ( r , t ) + ∑ j Λ i j ψ j ( r , t ) , {\displaystyle \mu {\ddot {\psi }}_{i}(\mathbf {r} ,t)=-{\frac {\delta E}{\delta \psi _{i}^{}(\mathbf {r} ,t)}}+\sum _{j}\Lambda _{ij}\psi _{j}(\mathbf {r} ,t),} where Λij is a Lagrangian multiplier matrix to comply with the orthonormality constraint. === Born–Oppenheimer limit === In the formal limit where μ → 0, the equations of motion approach Born–Oppenheimer molecular dynamics. == Software packages == There are a number of software packages available for performing AIMD simulations. Some of the most widely used packages include: CP2K: an open-source software package for AIMD. Quantum Espresso: an open-source package for performing DFT calculations. It includes a module for AIMD. VASP: a commercial software package for performing DFT calculations. It includes a module for AIMD. Gaussian: a commercial software package that can perform AIMD. NWChem: an open-source software package for AIMD. LAMMPS: an open-source software package for performing classical and ab initio MD simulations. SIESTA: an open-source software package for AIMD. ORCA: a general-purpose quantum chemistry package. == Applications == Studying the behavior of water across different environments, such as near a hydrophobic graphene sheet. Investigating the structure and dynamics of liquid water at ambient temperature. Solving the heat transfer problems (heat conduction and thermal radiation), such as in Si/Ge superlattices. Probing the proton transfer along hydrogen-bonds in different environments, such as in 1D water chains inside carbon nanotubes. Evaluating the critical point of crystals, composites, and solid-state materials, such as aluminum. Predicting and modelling different phases and phase transitions, such as in the amorphous phase of the phase-change memory material GeSbTe. Studying the combustion of combustibles, such as lignite-water systems. Measuring th

    Read more →
  • Machine translation software usability

    Machine translation software usability

    The sections below give objective criteria for evaluating the usability of machine translation software output. == Stationarity or canonical form == Do repeated translations converge on a single expression in both languages? I.e. does the translation method show stationarity or produce a canonical form? Does the translation become stationary without losing the original meaning? This metric has been criticized as not being well correlated with BLEU (BiLingual Evaluation Understudy) scores. == Adaptive to colloquialism, argot or slang == Is the system adaptive to colloquialism, argot or slang? The French language has many rules for creating words in the speech and writing of popular culture. Two such rules are: (a) The reverse spelling of words such as femme to meuf. (This is called verlan.) (b) The attachment of the suffix -ard to a noun or verb to form a proper noun. For example, the noun faluche means "student hat". The word faluchard formed from faluche colloquially can mean, depending on context, "a group of students", "a gathering of students" and "behavior typical of a student". The Google translator as of 28 December 2006 doesn't derive the constructed words as for example from rule (b), as shown here: Il y a une chorale falucharde mercredi, venez nombreux, les faluchards chantent des paillardes! ==> There is a choral society falucharde Wednesday, come many, the faluchards sing loose-living women! French argot has three levels of usage: familier or friendly, acceptable among friends, family and peers but not at work grossier or swear words, acceptable among friends and peers but not at work or in family verlan or ghetto slang, acceptable among lower classes but not among middle or upper classes The United States National Institute of Standards and Technology conducts annual evaluations [1] Archived 2009-03-22 at the Wayback Machine of machine translation systems based on the BLEU-4 criterion [2]. A combined method called IQmt which incorporates BLEU and additional metrics NIST, GTM, ROUGE and METEOR has been implemented by Gimenez and Amigo [3]. == Well-formed output == Is the output grammatical or well-formed in the target language? Using an interlingua should be helpful in this regard, because with a fixed interlingua one should be able to write a grammatical mapping to the target language from the interlingua. Consider the following Arabic language input and English language translation result from the Google translator as of 27 December 2006 [4]. This Google translator output doesn't parse using a reasonable English grammar: وعن حوادث التدافع عند شعيرة رمي الجمرات -التي كثيرا ما يسقط فيها العديد من الضحايا- أشار الأمير نايف إلى إدخال "تحسينات كثيرة في جسر الجمرات ستمنع بإذن الله حدوث أي تزاحم". ==> And incidents at the push Carbuncles-throwing ritual, which often fall where many of the victims - Prince Nayef pointed to the introduction of "many improvements in bridge Carbuncles God would stop the occurrence of any competing." == Semantics preservation == Do repeated re-translations preserve the semantics of the original sentence? For example, consider the following English input passed multiple times into and out of French using the Google translator as of 27 December 2006: Better a day earlier than a day late. ==> Améliorer un jour plus tôt qu'un jour tard. ==> To improve one day earlier than a day late. ==> Pour améliorer un jour plus tôt qu'un jour tard. ==> To improve one day earlier than a day late. As noted above and in, this kind of round-trip translation is a very unreliable method of evaluation. == Trustworthiness and security == An interesting peculiarity of Google Translate as of 24 January 2008 (corrected as of 25 January 2008) is the following result when translating from English to Spanish, which shows an embedded joke in the English-Spanish dictionary which has some added poignancy given recent events: Heath Ledger is dead ==> Tom Cruise está muerto This raises the issue of trustworthiness when relying on a machine translation system embedded in a Life-critical system in which the translation system has input to a Safety Critical Decision Making process. Conjointly it raises the issue of whether in a given use the software of the machine translation system is safe from hackers. It is not known whether this feature of Google Translate was the result of a joke/hack or perhaps an unintended consequence of the use of a method such as statistical machine translation. Reporters from CNET Networks asked Google for an explanation on January 24, 2008; Google said only that it was an "internal issue with Google Translate". The mistranslation was the subject of much hilarity and speculation on the Internet. If it is an unintended consequence of the use of a method such as statistical machine translation, and not a joke/hack, then this event is a demonstration of a potential source of critical unreliability in the statistical machine translation method. In human translations, in particular on the part of interpreters, selectivity on the part of the translator in performing a translation is often commented on when one of the two parties being served by the interpreter knows both languages. This leads to the issue of whether a particular translation could be considered verifiable. In this case, a converging round-trip translation would be a kind of verification.

    Read more →
  • Two-phase commit protocol

    Two-phase commit protocol

    In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC, tupac) is a type of atomic commitment protocol (ACP). It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort (roll back) the transaction. This protocol (a specialised type of consensus protocol) achieves its goal even in many cases of temporary system failure (involving either process, network node, communication, etc. failures), and is thus widely used. However, it is not resilient to all possible failure configurations, and in rare cases, manual intervention is needed to remedy an outcome. To accommodate recovery from failure (automatic in most cases) the protocol's participants use logging of the protocol's states. Log records, which are typically slow to generate but survive failures, are used by the protocol's recovery procedures. Many protocol variants exist that primarily differ in logging strategies and recovery mechanisms. Though usually intended to be used infrequently, recovery procedures compose a substantial portion of the protocol, due to many possible failure scenarios to be considered and supported by the protocol. In a "normal execution" of any single distributed transaction (i.e., when no failure occurs, which is typically the most frequent situation), the protocol consists of two phases: The commit-request phase (or voting phase), in which a coordinator process attempts to prepare all the transaction's participating processes (named participants, cohorts, or workers) to take the necessary steps for either committing or aborting the transaction and to vote, either "Yes": commit (if the transaction participant's local portion execution has ended properly), or "No": abort (if a problem has been detected with the local portion), and The commit phase, in which, based on voting of the participants, the coordinator decides whether to commit (only if all have voted "Yes") or abort the transaction (otherwise), and notifies the result to all the participants. The participants then follow with the needed actions (commit or abort) with their local transactional resources (also called recoverable resources; e.g., database data) and their respective portions in the transaction's other output (if applicable). The two-phase commit (2PC) protocol should not be confused with the two-phase locking (2PL) protocol, a concurrency control protocol. == Assumptions == The protocol works in the following manner: one node is a designated coordinator, which is the master site, and the rest of the nodes in the network are designated the participants. The protocol assumes that: there is stable storage at each node with a write-ahead log, no node crashes forever, the data in the write-ahead log is never lost or corrupted in a crash, and any two nodes can communicate with each other. The last assumption is not too restrictive, as network communication can typically be rerouted. The first two assumptions are much stronger; if a node is totally destroyed then data can be lost. The protocol is initiated by the coordinator after the last step of the transaction has been reached. The participants then respond with an agreement message or an abort message depending on whether the transaction has been processed successfully at the participant. == Basic algorithm == === Commit request (or voting) phase === The coordinator sends a query to commit message to all participants and waits until it has received a reply from all participants. The participants execute the transaction up to the point where they will be asked to commit. They each write an entry to their undo log and an entry to their redo log. Each participant replies with: either an agreement message (participant votes Yes to commit), if the participant's actions succeeded; or an abort message (participant votes No to commit), if the participant experiences a failure that will make it impossible to commit. === Commit (or completion) phase === ==== Success ==== If the coordinator received an agreement message from all participants during the commit-request phase: The coordinator sends a commit message to all the participants. Each participant completes the operation, and releases all the locks and resources held during the transaction. Each participant sends an acknowledgement to the coordinator. The coordinator completes the transaction when all acknowledgements have been received. ==== Failure ==== If any participant votes No during the commit-request phase (or the coordinator's timeout expires): The coordinator sends a rollback message to all the participants. Each participant undoes the transaction using the undo log, and releases the resources and locks held during the transaction. Each participant sends an acknowledgement to the coordinator. The coordinator undoes the transaction when all acknowledgements have been received. ==== Message flow ==== Coordinator Participant QUERY TO COMMIT --------------------------------> VOTE YES/NO prepare/abort <------------------------------- commit/abort COMMIT/ROLLBACK --------------------------------> ACKNOWLEDGEMENT commit/abort <-------------------------------- end An next to the record type means that the record is forced to stable storage. == Disadvantages == The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. If the coordinator fails permanently, some participants will never resolve their transactions: After a participant has sent an agreement message as a response to the commit-request message from the coordinator, it will block until a commit or rollback is received. A two-phase commit protocol cannot dependably recover from a failure of both the coordinator and a cohort member during the commit phase. If only the coordinator had failed, and no cohort members had received a commit message, it could safely be inferred that no commit had happened. If, however, both the coordinator and a cohort member failed, it is possible that the failed cohort member was the first to be notified, and had actually done the commit. Even if a new coordinator is selected, it cannot confidently proceed with the operation until it has received an agreement from all cohort members, and hence must block until all cohort members respond. == Implementing the two-phase commit protocol == === Common architecture === In many cases the 2PC protocol is distributed in a computer network. It is easily distributed by implementing multiple dedicated 2PC components similar to each other, typically named transaction managers (TMs; also referred to as 2PC agents or Transaction Processing Monitors), that carry out the protocol's execution for each transaction (e.g., The Open Group's X/Open XA). The databases involved with a distributed transaction, the participants, both the coordinator and participants, register to close TMs (typically residing on respective same network nodes as the participants) for terminating that transaction using 2PC. Each distributed transaction has an ad hoc set of TMs, the TMs to which the transaction participants register. A leader, the coordinator TM, exists for each transaction to coordinate 2PC for it, typically the TM of the coordinator database. However, the coordinator role can be transferred to another TM for performance or reliability reasons. Rather than exchanging 2PC messages among themselves, the participants exchange the messages with their respective TMs. The relevant TMs communicate among themselves to execute the 2PC protocol schema above, "representing" the respective participants, for terminating that transaction. With this architecture the protocol is fully distributed (does not need any central processing component or data structure), and scales up with number of network nodes (network size) effectively. This common architecture is also effective for the distribution of other atomic commitment protocols besides 2PC, since all such protocols use the same voting mechanism and outcome propagation to protocol participants. === Protocol optimizations === Database research has been done on ways to get most of the benefits of the two-phase commit protocol while reducing costs by protocol optimizations and protocol operations saving under certain system's behavior assumptions. ==== Presumed abort and presumed commit ==== Presumed abort or Presumed commit are common such optimizations. An assumption about the outcome of transactions, either commit, or abort, can save both messages and logging operations by the participants during the 2PC protocol's execution. For example, when presumed abort, if during system recovery from failure no logged evidence for commit of some transaction is found by the recovery procedure, then it assumes that the transaction has been aborted, and acts accordingly. This means that it does not matter if aborts are logged at all, and such logging can be saved under this assumption. Typical

    Read more →
  • Distributed transaction

    Distributed transaction

    A distributed transaction operates within a distributed environment, typically involving multiple nodes across a network depending on the location of the data. A key aspect of distributed transactions is atomicity, which ensures that the transaction is completed in its entirety or not executed at all. It's essential to note that distributed transactions are not limited to databases. The Open Group, a vendor consortium, proposed the X/Open Distributed Transaction Processing Model (X/Open XA), which became a de facto standard for the behavior of transaction model components. Databases are common transactional resources and, often, transactions span a couple of such databases. In this case, a distributed transaction can be seen as a database transaction that must be synchronized (or provide ACID properties) among multiple participating databases which are distributed among different physical locations. The isolation property (the I of ACID) poses a special challenge for multi database transactions, since the (global) serializability property could be violated, even if each database provides it (see also global serializability). In practice most commercial database systems use strong strict two-phase locking (SS2PL) for concurrency control, which ensures global serializability, if all the participating databases employ it. A common algorithm for ensuring correct completion of a distributed transaction is the two-phase commit (2PC). This algorithm is usually applied for updates able to commit in a short period of time, ranging from couple of milliseconds to couple of minutes. There are also long-lived distributed transactions, for example a transaction to book a trip, which consists of booking a flight, a rental car and a hotel. Since booking the flight might take up to a day to get a confirmation, two-phase commit is not applicable here, it will lock the resources for this long. In this case more sophisticated techniques that involve multiple undo levels are used. The way you can undo the hotel booking by calling a desk and cancelling the reservation, a system can be designed to undo certain operations (unless they are irreversibly finished). In practice, long-lived distributed transactions are implemented in systems based on web services. Usually these transactions utilize principles of compensating transactions, Optimism and Isolation Without Locking. The X/Open standard does not cover long-lived distributed transactions. Several technologies, including Jakarta Enterprise Beans and Microsoft Transaction Server fully support distributed transaction standards. == Synchronization == In event-driven architectures, distributed transactions can be synchronized through using request–response paradigm and it can be implemented in two ways: Creating two separate queues: one for requests and the other for replies. The event producer must wait until it receives the response. Creating one dedicated ephemeral queue for each request.

    Read more →
  • Interviewer effect

    Interviewer effect

    The interviewer effect (also called interviewer variance or interviewer error) is the distortion of response to an interviewer-administered data collection effort which results from differential reactions to the social style and personality of interviewers or to their presentation of particular questions. The use of fixed-wording questions is one method of reducing interviewer bias. Anthropological research and case-studies are also affected by the problem, which is exacerbated by the self-fulfilling prophecy, when the researcher is also the interviewer it is also any effect on data gathered from interviewing people that is caused by the behavior or characteristics (real or perceived) of the interviewer. Interviewer effects can also be associated with the characteristics of the interviewer, such as race. Whether black respondents are interviewed by white interviewers or black interviewers has a strong impact on their responses to both attitude questions and behavioral ones. In the latter case, for example, if black respondents are interviewed by black interviewers in pre-election surveys, they are more likely to actually vote in the upcoming election than if they are interviewed by white interviewers. Furthermore, the race of the interviewer can also affect answers to factual questions that might take the form of a test of how informed the respondent is. Black respondents in a survey of political knowledge, for example, get fewer correct answers to factual questions about politics when interviewed by white interviewers than when interviewed by black interviewers. This is consistent with the research literature on stereotype threat, which finds diminished test performance of potentially stigmatised groups when the interviewer or test supervisor is from a perceived higher status group. Interviewer effects can be mitigated somewhat by randomly assigning subjects to different interviewers, or by using tools such as computer-assisted telephone interviewing (CATI).

    Read more →
  • Fatsecret

    Fatsecret

    Fatsecret, commonly styled as fatsecret, is a mobile application, website and API that helps people achieve their weight loss goals and find accurate nutrition information. It also offers a weight loss clinic with coaching and medically supported programs. The platform powers global health apps. == History == Fatsecret was founded in 2006 in Melbourne, Australia by Lenny Moses and Rodney Moses. As of 2019, Lenny serves as the company's CEO. The company is known for its calorie counting and meal tracking app, and by April 2016, the company claimed to have 45 million users of its services. In August 2018, a premium version of its app was released. Since August 2009, the company has operated the Fatsecret Platform API, which allows access to its global food and nutrition database. Fatsecret reportedly had 900,000 downloads of its app in January 2020. In an analysis of several Health & Fitness app subcategories for the United States in January 2021, Fatsecret was reported to have the highest 30 day user retention rate of top Calorie Counter + Meal Planner for Weight Loss apps.

    Read more →
  • Universal Data Element Framework

    Universal Data Element Framework

    The Universal Data Element Framework (UDEF) was a controlled vocabulary developed by The Open Group. It provided a framework for categorizing, naming, and indexing data. It assigned to every item of data a structured alphanumeric tag plus a controlled vocabulary name that describes the meaning of the data. This allowed relating data elements to similar elements defined by other organizations. UDEF defined a Dewey-decimal like code for each concept. For example, an "employee number" is often used in human resource management. It has a UDEF tag a.5_12.35.8 and a controlled vocabulary description "Employee.PERSON_Employer.Assigned.IDENTIFIER". UDEF has been superseded by the Open Data Element Framework (ODEF). == Examples == In an application used by a hospital, the last name and first name of several people could include the following example concepts: Patient Person Family Name – find the word “Patient” under the UDEF object “Person” and find the word “Family” under the UDEF property “Name” Patient Person Given Name – find the word “Patient” under the UDEF object “Person” and find the word “Given” under the UDEF property “Name” Doctor Person Family Name – find the word “Doctor” under the UDEF object “Person” and find the word “Family” under the UDEF property “Name” Doctor Person Given Name – find the word “Doctor” under the UDEF object “Person” and find the word “Given” under the UDEF property “Name” For the examples above, the following UDEF IDs are available: “Patient Person Family Name” the UDEF ID is “au.5_11.10” “Patient Person Given Name” the UDEF ID is “au.5_12.10” “Doctor Person Family Name” the UDEF ID is “aq.5_11.10” “Doctor Person Given Name” the UDEF ID is “aq.5_12.10”

    Read more →
  • Information

    Information

    Information is an abstract concept that refers to something which has the power to inform. At the most fundamental level, it pertains to the interpretation (perhaps formally) of that which may be sensed, or their abstractions. Any natural process that is not completely random and any observable pattern in any medium can be said to convey some amount of information. Whereas digital signals and other data use discrete signs to convey information, other phenomena and artifacts such as analogue signals, poems, pictures, music or other sounds, and currents convey information in a more continuous form. Information is not knowledge itself, but the meaning that may be derived from a representation through interpretation. The concept of information is relevant to and connected with various concepts, including constraint, communication, control, data, form, education, knowledge, meaning, understanding, mental stimuli, pattern, perception, proposition, representation, and entropy. Information is often processed iteratively: Data available at one step are processed into information to be interpreted and processed at the next step. For example, in written text each symbol or letter conveys information relevant to the word it is part of, each word conveys information relevant to the phrase it is part of, each phrase conveys information relevant to the sentence it is part of, and so on until at the final step information is interpreted and becomes knowledge in a given domain. In a digital signal, bits may be interpreted into the symbols, letters, numbers, or structures that convey the information available at the next level up. The key characteristic of information is that it is subject to interpretation and processing. The derivation of information from a signal or message may be thought of as the resolution of ambiguity or uncertainty that arises during the interpretation of patterns within the signal or message. Information may be structured as data. Redundant data can be compressed up to an optimal size, which is the theoretical limit of compression. The information available through a collection of data may be derived by analysis. For example, a restaurant collects data from every customer order. That information may be analyzed to produce knowledge that is put to use when the business subsequently wants to identify the most popular or least popular dish. Information can be transmitted in time, via data storage, and space, via communication and telecommunication. Information is expressed either as the content of a message or through direct or indirect observation. That which is perceived can be construed as a message in its own right, and in that sense, all information is always conveyed as the content of a message. Information can be encoded into various forms for transmission and interpretation (for example, information may be encoded into a sequence of signs, or transmitted via a signal). It can also be encrypted for safe storage and communication. The uncertainty of an event is measured by its probability of occurrence. Uncertainty is proportional to the negative logarithm of the probability of occurrence. Information theory takes advantage of this by concluding that more uncertain events require more information to resolve their uncertainty. The bit is the standard unit of information. It is 'that which reduces uncertainty by half'. Other units such as the nat may be used. For example, the information encoded in one "fair" coin flip is log2(2/1) = 1 bit, and in two fair coin flips is log2(4/1) = 2 bits. A 2011 Science article estimates that 97% of technologically stored information was already in digital bits in 2007 and that the year 2002 was the beginning of the digital age for information storage (with digital storage capacity bypassing analogue for the first time). == Etymology and history of the concept == The English word "information" comes from Middle French enformacion/informacion/information 'a criminal investigation' and its etymon, Latin informatiō(n) 'conception, teaching, creation'. In English, "information" is an uncountable mass noun. References on "formation or molding of the mind or character, training, instruction, teaching" date from the 14th century in both English (according to Oxford English Dictionary) and other European languages. In the transition from Middle Ages to Modernity the use of the concept of information reflected a fundamental turn in epistemological basis – from "giving a (substantial) form to matter" to "communicating something to someone". Peters (1988, pp. 12–13) concludes: Information was readily deployed in empiricist psychology (though it played a less important role than other words such as impression or idea) because it seemed to describe the mechanics of sensation: objects in the world inform the senses. But sensation is entirely different from "form" – the one is sensual, the other intellectual; the one is subjective, the other objective. My sensation of things is fleeting, elusive, and idiosyncratic. For Hume, especially, sensory experience is a swirl of impressions cut off from any sure link to the real world... In any case, the empiricist problematic was how the mind is informed by sensations of the world. At first informed meant shaped by; later it came to mean received reports from. As its site of action drifted from cosmos to consciousness, the term's sense shifted from unities (Aristotle's forms) to units (of sensation). Information came less and less to refer to internal ordering or formation, since empiricism allowed for no preexisting intellectual forms outside of sensation itself. Instead, information came to refer to the fragmentary, fluctuating, haphazard stuff of sense. Information, like the early modern worldview in general, shifted from a divinely ordered cosmos to a system governed by the motion of corpuscles. Under the tutelage of empiricism, information gradually moved from structure to stuff, from form to substance, from intellectual order to sensory impulses. In the modern era, the most important influence on the concept of information is derived from the Information theory developed by Claude Shannon and others. This theory, however, reflects a fundamental contradiction. Northrup (1993) wrote: Thus, actually two conflicting metaphors are being used: The well-known metaphor of information as a quantity, like water in the water-pipe, is at work, but so is a second metaphor, that of information as a choice, a choice made by :an information provider, and a forced choice made by an :information receiver. Actually, the second metaphor implies that the information sent isn't necessarily equal to the information received, because any choice implies a comparison with a list of possibilities, i.e., a list of possible meanings. Here, meaning is involved, thus spoiling the idea of information as a pure "Ding an sich." Thus, much of the confusion regarding the concept of information seems to be related to the basic confusion of metaphors in Shannon's theory: is information an autonomous quantity, or is information always per SE information to an observer? Actually, I don't think that Shannon himself chose one of the two definitions. Logically speaking, his theory implied information as a subjective phenomenon. But this had so wide-ranging epistemological impacts that Shannon didn't seem to fully realize this logical fact. Consequently, he continued to use metaphors about information as if it were an objective substance. This is the basic, inherent contradiction in Shannon's information theory." (Northrup, 1993, p. 5). In their seminal book The Study of Information: Interdisciplinary Messages, Almach and Mansfield (1983) collected key views on the interdisciplinary controversy in computer science, artificial intelligence, library and information science, linguistics, psychology, and physics, as well as in the social sciences. Almach (1983, p. 660) himself disagrees with the use of the concept of information in the context of signal transmission, the basic senses of information in his view all referring "to telling something or to the something that is being told. Information is addressed to human minds and is received by human minds." All other senses, including its use with regard to nonhuman organisms as well to society as a whole, are, according to Machlup, metaphoric and, as in the case of cybernetics, anthropomorphic. Hjørland (2007) describes the fundamental difference between objective and subjective views of information and argues that the subjective view has been supported by, among others, Bateson, Yovits, Span-Hansen, Brier, Buckland, Goguen, and Hjørland. Hjørland provided the following example: A stone on a field could contain different information for different people (or from one situation to another). It is not possible for information systems to map all the stone's possible information for every individual. Nor is any one mapping the one "true" mapping. But peop

    Read more →
  • Chinese speech synthesis

    Chinese speech synthesis

    Chinese speech synthesis is the application of speech synthesis to the Chinese language (usually Standard Chinese). It poses additional difficulties due to Chinese characters frequently having different pronunciations in different contexts and the complex prosody, which is essential to convey the meaning of words, and sometimes the difficulty in obtaining agreement among native speakers concerning what the correct pronunciation is of certain phonemes. == Concatenation (Ekho and KeyTip) == Recordings can be concatenated in any desired combination, but the joins sound forced (as is usual for simple concatenation-based speech synthesis) and this can severely affect prosody; these synthesizers are also inflexible in terms of speed and expression. However, because these synthesizers do not rely on a corpus, there is no noticeable degradation in performance when they are given more unusual or awkward phrases. Ekho is an open source TTS which simply concatenates sampled syllables. It currently supports Cantonese, Mandarin, and experimentally Korean. Some of the Mandarin syllables have been pitched-normalised in Praat. A modified version of these is used in Gradint's "synthesis from partials". cjkware.com used to ship a product called KeyTip Putonghua Reader which worked similarly; it contained 120 Megabytes of sound recordings (GSM-compressed to 40 Megabytes in the evaluation version), comprising 10,000 multi-syllable dictionary words plus single-syllable recordings in 6 different prosodies (4 tones, neutral tone, and an extra third-tone recording for use at the end of a phrase). == Lightweight synthesizers (eSpeak and Yuet) == The lightweight open-source speech project eSpeak, which has its own approach to synthesis, has experimented with Mandarin and Cantonese. eSpeak was used by Google Translate from May 2010 until December 2010. The commercial product "Yuet" is also lightweight (it is intended to be suitable for resource-constrained environments like embedded systems); it was written from scratch in ANSI C starting from 2013. Yuet claims a built-in NLP model that does not require a separate dictionary; the speech synthesised by the engine claims clear word boundaries and emphasis on appropriate words. Communication with its author is required to obtain a copy. Both eSpeak and Yuet can synthesis speech for Cantonese and Mandarin from the same input text, and can output the corresponding romanisation (for Cantonese, Yuet uses Yale and eSpeak uses Jyutping; both use Pinyin for Mandarin). eSpeak does not concern itself with word boundaries when these don't change the question of which syllable should be spoken. == Corpus-based == A "corpus-based" approach can sound very natural in most cases but can err in dealing with unusual phrases if they can't be matched with the corpus. The synthesiser engine is typically very large (hundreds or even thousands of megabytes) due to the size of the corpus. === iFlyTek === Anhui USTC iFlyTek Co., Ltd (iFlyTek) published a W3C paper in which they adapted Speech Synthesis Markup Language to produce a mark-up language called Chinese Speech Synthesis Markup Language (CSSML) which can include additional markup to clarify the pronunciation of characters and to add some prosody information. The amount of data involved is not disclosed by iFlyTek but can be seen from the commercial products that iFlyTek have licensed their technology to; for example, Bider's SpeechPlus is a 1.3 Gigabyte download, 1.2 Gigabytes of which is used for the highly compressed data for a single Chinese voice. iFlyTek's synthesiser can also synthesise mixed Chinese and English text with the same voice (e.g. Chinese sentences containing some English words); they claim their English synthesis to be "average". The iFlyTek corpus appears to be heavily dependent on Chinese characters, and it is not possible to synthesize from pinyin alone. It is sometimes possible by means of CSSML to add pinyin to the characters to disambiguate between multiple possible pronunciations, but this does not always work. === NeoSpeech === There is an online interactive demonstration for NeoSpeech speech synthesis, which accepts Chinese characters and also pinyin if it's enclosed in their proprietary "VTML" markup. === Mac OS === Mac OS had Chinese speech synthesizers available up to version 9. This was removed in 10.0 and reinstated in 10.7 (Lion). === Historical corpus-based synthesizers (no longer available) === A corpus-based approach was taken by Tsinghua University in SinoSonic, with the Harbin dialect voice data taking 800 Megabytes. This was planned to be offered as a download but the link was never activated. Nowadays, only references to it can be found on Internet Archive. Bell Labs' approach, which was demonstrated online in 1997 but subsequently removed, was described in a monograph "Multilingual Text-to-Speech Synthesis: The Bell Labs Approach" (Springer, October 31, 1997, ISBN 978-0-7923-8027-6), and the former employee who was responsible for the project, Chilin Shih (who subsequently worked at the University of Illinois) put some notes about her methods on her website.

    Read more →
  • Graphical Kernel System

    Graphical Kernel System

    The Graphical Kernel System (GKS) is a 2D computer graphics system using vector graphics, introduced in 1977. It was suitable for making line and bar charts and similar tasks. A key concept was cross-system portability, based on an underlying coordinate system that could be represented on almost any hardware. GKS is best known as the basis for the graphics in the GEM GUI system used on the Atari ST and as part of Ventura Publisher. A draft international standard was circulated for review in September 1983. Final ratification of the standard was achieved in 1985, making it the first ISO graphics standard. A 3D system modelled on GKS was introduced as PHIGS, which saw some use in the 1980s and early 1990s. == Overview == GKS provides a set of drawing features for two-dimensional vector graphics suitable for charting and similar duties. The calls are designed to be portable across different programming languages, graphics devices and hardware, so that applications written to use GKS will be readily portable to many platforms and devices. GKS was fairly common on computer workstations in the 1980s and early 1990s. GKS formed the basis of Digital Research's GSX which evolved into VDI, one of the core components of GEM. GEM was the native GUI on the Atari ST and was occasionally seen on PCs, particularly in conjunction with Ventura Publisher. GKS was little used commercially outside these markets, but remains in use in some scientific visualization packages. It is also the underlying API defining the Computer Graphics Metafile. One popular application based on an implementation of GKS is the GR Framework, a C library for high-performance scientific visualization that has become a common plotting backend among Julia users. A main developer and promoter of the GKS was José Luis Encarnação, formerly director of the Fraunhofer Institute for Computer Graphics (IGD) in Darmstadt, Germany. GKS has been standardized in the following documents: ANSI standard ANSI X3.124 of 1985. ISO 7942:1985 standard, revised as ISO 7942:1985/Amd 1:1991 and ISO/IEC 7942-1:1994, as well as ISO/IEC 7942-2:1997, ISO/IEC 7942-3:1999 and ISO/IEC 7942-4:1998 The language bindings are ISO standard ISO 8651. GKS-3D (Graphical Kernel System for Three Dimensions) functional definition is ISO standard ISO 8805, and the corresponding C bindings are ISO/IEC 8806. The functionality of GKS is wrapped up as a data model standard in the STEP standard, section ISO 10303-46.

    Read more →
  • Knowledge graph

    Knowledge graph

    In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities. Since the development of the Semantic Web, knowledge graphs have often been associated with linked open data projects, focusing on the connections between concepts and entities. They are also historically associated with and used by search engines such as Google, Bing, and Yahoo; knowledge engines and question-answering services such as WolframAlpha, Apple's Siri, and Amazon Alexa; and social networks such as LinkedIn and Facebook. Recent developments in data science and machine learning, particularly in graph neural networks, representation learning, and machine learning, have broadened the scope of knowledge graphs beyond their traditional use in search engines and recommender systems. They are increasingly used in scientific research, with notable applications in fields such as genomics, proteomics, and systems biology. == History == The term was coined as early as 1972 by the Austrian linguist Edgar W. Schneider, in a discussion of how to build modular instructional systems for courses. In the late 1980s, the University of Groningen and University of Twente jointly began a project called Knowledge Graphs, focusing on the design of semantic networks with edges restricted to a limited set of relations, to facilitate algebras on the graph. In subsequent decades, the distinction between semantic networks and knowledge graphs was blurred. Some early knowledge graphs were topic-specific. In 1985, Wordnet was founded, capturing semantic relationships between words and meanings – an application of this idea to language itself. In 2005, Marc Wirk founded Geonames to capture relationships between different geographic names and locales and associated entities. In 1998, Andrew Edmonds of Science in Finance Ltd in the UK created a system called ThinkBase that offered fuzzy-logic based reasoning in a graphical context. In 2007, both DBpedia and Freebase were founded as graph-based knowledge repositories for general-purpose knowledge. DBpedia focused exclusively on data extracted from Wikipedia, while Freebase also included a range of public datasets. Neither described themselves as a 'knowledge graph' but developed and described related concepts. In 2012, Google introduced their Knowledge Graph, building on DBpedia and Freebase among other sources. They later incorporated RDFa, Microdata, JSON-LD content extracted from indexed web pages, including the CIA World Factbook, Wikidata, and Wikipedia. Entity and relationship types associated with this knowledge graph have been further organized using terms from the schema.org vocabulary. The Google Knowledge Graph became a complement to string-based search within Google, and its popularity online brought the term into more common use. Since then, several large multinationals have advertised their use of knowledge graphs, further popularising the term. These include Facebook, LinkedIn, Airbnb, Microsoft, Amazon, Uber and eBay. In 2019, IEEE combined its annual international conferences on "Big Knowledge" and "Data Mining and Intelligent Computing" into the International Conference on Knowledge Graph. The development of large language models expanded interest in knowledge graphs as a way to structure information from unstructured text, with advances in language processing enabling their automatic or semi-automatic generation and expansion. The term knowledge graph has since broadened to include the dynamically constructed and adaptive graph structures, which support retrieval, reasoning, and summarization in generative systems. Microsoft Research's GraphRAG (2024) exemplified this development by integrating LLM-generated graphs into retrieval-augmented generation. == Definitions == There is no single commonly accepted definition of a knowledge graph. Most definitions view the topic through a Semantic Web lens and include these features: Flexible relations among knowledge in topical domains: A knowledge graph (i) defines abstract classes and relations of entities in a schema, (ii) mainly describes real world entities and their interrelations, organized in a graph, (iii) allows for potentially interrelating arbitrary entities with each other, and (iv) covers various topical domains. General structure: A network of entities, their semantic types, properties, and relationships. To represent properties, categorical or numerical values are often used. Supporting reasoning over inferred ontologies: A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge. There are, however, many knowledge graph representations for which some of these features are not relevant. For those knowledge graphs, this simpler definition may be more useful: A digital structure that represents knowledge as concepts and the relationships between them (facts). A knowledge graph can include an ontology that allows both humans and machines to understand and reason about its contents. === Implementations === In addition to the above examples, the term has been used to describe open knowledge projects such as YAGO and Wikidata; federations like the Linked Open Data cloud; a range of commercial search tools, including Yahoo's semantic search assistant Spark, Google's Knowledge Graph, and Microsoft's Satori; and the LinkedIn and Facebook entity graphs. The term is also used in the context of note-taking software applications that allow a user to build a personal knowledge graph. The popularization of knowledge graphs and their accompanying methods have led to the development of graph databases such as Neo4j, GraphDB and AgensGraph. These graph databases allow users to easily store data as entities and their interrelationships, and facilitate operations such as data reasoning, node embedding, and ontology development on knowledge bases. In contrast, virtual knowledge graphs do not store information in specialized databases. They rely on an underlying relational database or data lake to answer queries on the graph. Such a virtual knowledge graph system must be properly configured in order to answer the queries correctly. This specific configuration is done through a set of mappings that define the relationship between the elements of the data source and the structure and ontology of the virtual knowledge graph. == Using a knowledge graph for reasoning over data == A knowledge graph formally represents semantics by describing entities and their relationships. Knowledge graphs may make use of ontologies as a schema layer. By doing this, they allow logical inference for retrieving implicit knowledge rather than only allowing queries requesting explicit knowledge. In order to allow the use of knowledge graphs in various machine learning tasks, several methods for deriving latent feature representations of entities and relations have been devised. These knowledge graph embeddings allow them to be connected to machine learning methods that require feature vectors like word embeddings. This can complement other estimates of conceptual similarity. Models for generating useful knowledge graph embeddings are commonly the domain of graph neural networks (GNNs). GNNs are deep learning architectures that comprise edges and nodes, which correspond well to the entities and relationships of knowledge graphs. The topology and data structures afforded by GNNs provide a convenient domain for semi-supervised learning, wherein the network is trained to predict the value of a node embedding (provided a group of adjacent nodes and their edges) or edge (provided a pair of nodes). These tasks serve as fundamental abstractions for more complex tasks such as knowledge graph reasoning and alignment. === Entity alignment === As new knowledge graphs are produced across a variety of fields and contexts, the same entity will inevitably be represented in multiple graphs. However, because no single standard for the construction or representation of knowledge graph exists, resolving which entities from disparate graphs correspond to the same real world subject is a non-trivial task. This task is known as knowledge graph entity alignment, and is an active area of research. Strategies for entity alignment generally seek to identify similar substructures, semantic relationships, shared attributes, or combinations of all three between two distinct knowledge graphs. Entity alignment methods use these structural similarities between generally non-isomorphic graphs to predict which nodes correspond to the same entity. In 2023, researchers found success in using large language models (LLMs) in the task of entity alignment. This was in particul

    Read more →
  • Token-based replay

    Token-based replay

    Token-based replay technique is a conformance checking algorithm that checks how well a process conforms with its model by replaying each trace on the model (in Petri net notation ). Using the four counters produced tokens, consumed tokens, missing tokens, and remaining tokens, it records the situations where a transition is forced to fire and the remaining tokens after the replay ends. Based on the count at each counter, we can compute the fitness value between the trace and the model. == The algorithm == Source: The token-replay technique uses four counters to keep track of a trace during the replaying: p: Produced tokens c: Consumed tokens m: Missing tokens (consumed while not there) r: Remaining tokens (produced but not consumed) Invariants: At any time: p + m ≥ c ≥ m {\displaystyle p+m\geq c\geq m} At the end: r = p + m − c {\displaystyle r=p+m-c} At the beginning, a token is produced for the source place (p = 1) and at the end, a token is consumed from the sink place (c' = c + 1). When the replay ends, the fitness value can be computed as follows: 1 2 ( 1 − m c ) + 1 2 ( 1 − r p ) {\displaystyle {\frac {1}{2}}(1-{\frac {m}{c}})+{\frac {1}{2}}(1-{\frac {r}{p}})} == Example == Suppose there is a process model in Petri net notation as follows: === Example 1: Replay the trace (a, b, c, d) on the model M === Step 1: A token is initiated. There is one produced token ( p = 1 {\displaystyle p=1} ). Step 2: The activity a {\displaystyle \mathbf {a} } consumes 1 token to be fired and produces 2 tokens ( p = 1 + 2 = 3 {\displaystyle p=1+2=3} and c = 1 {\displaystyle c=1} ). Step 3: The activity b {\displaystyle \mathbf {b} } consumes 1 token and produces 1 token ( p = 3 + 1 = 4 {\displaystyle p=3+1=4} and c = 1 + 1 = 2 {\displaystyle c=1+1=2} ). Step 4: The activity c {\displaystyle \mathbf {c} } consumes 1 token and produces 1 token ( p = 4 + 1 = 5 {\displaystyle p=4+1=5} and c = 2 + 1 = 3 {\displaystyle c=2+1=3} ). Step 5: The activity d {\displaystyle \mathbf {d} } consumes 2 tokens and produces 1 token ( p = 5 + 1 = 6 {\displaystyle p=5+1=6} , c = 3 + 2 = 5 {\displaystyle c=3+2=5} ). Step 6: The token at the end place is consumed ( c = 5 + 1 = 6 {\displaystyle c=5+1=6} ). The trace is complete. The fitness of the trace ( a , b , c , d {\displaystyle \mathbf {a,b,c,d} } ) on the model M {\displaystyle \mathbf {M} } is: 1 2 ( 1 − m c ) + 1 2 ( 1 − r p ) = 1 2 ( 1 − 0 6 ) + 1 2 ( 1 − 0 6 ) = 1 {\displaystyle {\frac {1}{2}}(1-{\frac {m}{c}})+{\frac {1}{2}}(1-{\frac {r}{p}})={\frac {1}{2}}(1-{\frac {0}{6}})+{\frac {1}{2}}(1-{\frac {0}{6}})=1} === Example 2: Replay the trace (a, b, d) on the model M === Step 1: A token is initiated. There is one produced token ( p = 1 {\displaystyle p=1} ). Step 2: The activity a {\displaystyle \mathbf {a} } consumes 1 token to be fired and produces 2 tokens ( p = 1 + 2 = 3 {\displaystyle p=1+2=3} and c = 1 {\displaystyle c=1} ). Step 3: The activity b {\displaystyle \mathbf {b} } consumes 1 token and produces 1 token ( p = 3 + 1 = 4 {\displaystyle p=3+1=4} and c = 1 + 1 = 2 {\displaystyle c=1+1=2} ). Step 4: The activity d {\displaystyle \mathbf {d} } needs to be fired but there are not enough tokens. One artificial token was produced and the missing token counter is increased by one ( m = 1 {\displaystyle m=1} ). The artificial token and the token at place [ b , d ] {\displaystyle [\mathbf {b,d} ]} are consumed ( c = 2 + 2 = 4 {\displaystyle c=2+2=4} ) and one token is produced at place end ( p = 4 + 1 = 5 {\displaystyle p=4+1=5} ). Step 5: The token in the end place is consumed ( c = 4 + 1 = 5 {\displaystyle c=4+1=5} ). The trace is complete. There is one remaining token at place [ a , c ] {\displaystyle [\mathbf {a,c} ]} ( r = 1 {\displaystyle r=1} ). The fitness of the trace ( a , b , d {\displaystyle \mathbf {a,b,d} } ) on the model M {\displaystyle \mathbf {M} } is: 1 2 ( 1 − m c ) + 1 2 ( 1 − r p ) = 1 2 ( 1 − 1 5 ) + 1 2 ( 1 − 1 5 ) = 0.8 {\displaystyle {\frac {1}{2}}(1-{\frac {m}{c}})+{\frac {1}{2}}(1-{\frac {r}{p}})={\frac {1}{2}}(1-{\frac {1}{5}})+{\frac {1}{2}}(1-{\frac {1}{5}})=0.8}

    Read more →