Consulenza dps

Laboratory of Experimental Microbiology and Epidemiology LSME, DISCAT, Institute of Microbiology, School of Medicine, University of Genoa, Italy Records Matching model for data survey on applied and Salvo A. Reina , Vito M. Reina and Eugenio A. Debbia Summary
Experimental microbiology provides a huge quantity of raw data which need to be evaluated and classified under a
large variety of situation such marine research, environmental pollution, pharmacokinetics of antimicrobial agents
and epidemiological clinical trials on infectious diseases.
Almost indispensable for all kinds of disciplines is to validate, transform and correlate data clusters to demonstrate a
statistically significance of results. Whether studies are for academy or industrial biotechnological purposes, the
credibility of a work is strongly affected by the statistical methods and the adequacy with which those are used.
Beside simple univariate analysis, many software products, either commercial or open source, are available to
perform a much sophisticate statistics for discriminant and multi-factorial analysis, still the majority of scientists
use partially only a minimal part of statistics methods. This is due to the high competence level requested by a
multivariate approach; it is known that the choice of a test, correct distribution’s assumption, validable experimental
design and not last, preliminary raw data validation are prejudicial to a good science.
The need for any kind of experimentation is an analytical interpretation of descriptive evidence, and sometime the
classical numerical approach is not enough because on practise applied data can not be validate or simply are
incomplete. Microbiologists always wish to quickly discriminate, or correlate, groups and data clusters concerning
clinical patient profiles, auditing of multi-sensor derived numbers, monitoring of a biosphere indicators on either
chemical and physical parameters or dynamics of microbes population. Beside the application fields, very often the
mathematical and statistical analysis is aimed to distinguish phenotypes or constraints.
Basically, and practically, data are stored in spreadsheet and database files which change continuously time-by-time
pending on the data collection and scopes;
We here propose a Records Matching Methods (RMM) suitable for any kind of cluster analysis and pattern
identification which can be use for either parametric or non parametric without necessarily state pre-process
statistical assumption on variable distribution.
The RMM is an application of a theoretical approach based on the Unique Factorisation Domain and is explained
with an ideal generalisation model and then applied to a real-world microbiological study.
Authors have used an easy mathematical formalism and discuss the possible application of the method as largely
applicable to a plethora of taxonomic and phenetic investigation as well as for clinical trials and epidemiology.
Prototyping of the model for a computational automated process are also described in order to realise a simple
software which can infer on data by using a heuristic rules file.

Keywords
: Records Matching, Unique Factorisation Domain, Bioinformatics, Experimental Microbiology,
Statistical Process Control, Quality Assurance, System Audit
Corresponging authors : Dott. Salvo Reina, Prof. Eugenio A. Debbia (eugenio.debbia@unige.it )

Laboratory of experimental Microbiology and Epidemiology, Dept. DISCAT, School of Medicine, University of Genoa, Italy Freelance, ICT professional, Rome, Italy

INTRODUCTION

Method here described, and its software functional specification, were thought to provide a simple tool for data
calculation and experimental analysis on applied and experimental microbiology.
Data analysis achieved with the method is finalised to infer, group, filter or cluster data regardless statistical
assumption so that it could be applied on either diagnostics, clinics or observational measurements.
Generally speaking data are a collection of records and groups of records are considered datasets. Such a scheme can
be generalised to any record profile, thus a dataset is a table where rows are the samples under investigation and each
column is a characteristic of the sample. The analogy of this scheme is typically a table rows by columns and each
column is called “field” of the record (single row)
Beside the discipline and the specific domain, dataset treated on microbiology and biotechnology needs to be
analysed according several pre-process task which allow scientist to sort, classified, categorise groups of record
according to descriptive criteria; afterwards, it will possible to evaluate statistics.
Very often one or more datasets indicate a set of records which share common meaning and values (descriptive
variable and scalar parameter respectively) and for the basic science it is essential to discriminate or associate
samples according empirical criteria. A pre-process phase is indispensable especially when considering large datasets
in that a validation of record integrity and coherent not null information of each filed, will impact on results
credibility.
Usually, dataset can be grouped and or filtered starting from a database by utilising the Standard Query Language
(SQL) [10]; this compel the success of a good statistics to a high informatics competence. Moreover the SQL it is
demonstrated to be effective only in evaluating “identity” criteria rather than matching groups of records according
criteria such as “tolerance” and “proximity”.
Applied and experimental microbiology imply several doubtfulness and “fuzzy” evidences [4,11]. Sometime the
ability to approximate variables range can leverage the probability of a system adaptability (e.g. : environmental
sensor automation); also, it is restrictive to use pre-determined range of significance for variables and indicators
because it would be preferable to dynamically calibrate a variable or a parameter with a “weight factor” which
modulate the influence and the consideration of that variable or parameter on reason of the context.
Authors have already tested logics and mathematical models on several microbiological experiments concerning
microorganism growth and taxonomy, Post Antibiotic Effect, MIC and genetics of quinolones [1,6,7,8,9,12,13] and
these experiences drove to a unified record’s matching model. In the specific cases of marine microorganisms
identification in environmental polluted mud and HIV-eukaryotic cell interaction model unsupervided Kohonen
algorithms were also used [14]. Almost every experiment design used software computation.
Many evolute software are available to study microbiology with multi-factorial and multivariate techniques for
pattern matching such as Neural Networks, Bayesian nets and fuzzy logics. As already it was pointed up for high-
performance statistic tools, artificial intelligence and reasoning software are complicate and burdensome nevertheless
it would be desirable to be able to study similarities, proximity, phenotype’s varieties and cluster analysis on the
every day laboratory routine. We addressed this issue by creating a simple method based on a mathematical model
for cluster analysis and pattern matching; the method is practically realisable as a set of software framework which
can be easily implemented by anyone regardless the programming language and the dataset file format.
The method is called Records Matching Method (RMM) because it is formalised with a record profile metaphora and
its example’s application is based on the recursive comparison of records which can be clustered by mean of an
algorithm which use a simple template file which contains heuristic rules for each variables and parameters
In order to let every one to create its software, functional specification are provided together with software
documents and guide lines web coordinate. Several real world application were used by physicians for clinical trials
and epidemiology applied on Assisted Reproductive Medicine ART, andrology and endometriosis surveys. In those
experiences the theoretical model previously described by authors [15] was verified and tested for its simplicity and
suitability so that it is now possible to provide a software front-end specification for an intuitive easy to use tool with
powerful cluster analysis capability.
SIMPLE MODEL FOR A SIMPLE METHOD

Either experimental or applied microbiology imply an articulated panel of factors analysis both scalars and
descriptive information. Variables and parameters are generally referred to diverse typology of scales and
distribution, thus record-to-record comparison as well as datasets correlation equally need parametric and non
parametric statistics.
We refer to a record as a pertinent set of information concerning a generic sample which is the object under
investigation. Notoriously a record has a typical fields profile which is in our model globally considered as a unique
factorisation index (FI); such an index has the peculiarity of being at the same time a quantitative and a qualitative
expression of that specific record which can thought to as a fingerprint equivalent of the record entirely considered.
If many records, hence a dataset, are serially calculated as an array of unique FI it will be possible to apply univatiate
analysis on a vector of values. This simplification transform the study of complex rows by columns dataset to a series
of indexes which can then evaluated according to a heuristics previously defined by the scientist empirical
experience. The interaction between factorised dataset and “weighted” logics inside a heuristic file, will be the mean
for which theoretical model will allow to recursively correlate FI values according a grouping criteria with dynamic
and programmable range criteria. After all, the method will be represented by the ability of associating (or
discriminate) samples on reason of their affinity and similarity simply because it is able to determine haw much
records are diverse. We shall see that diverse could be analogously considered with the concepts of “weighted
distance
” of two overlapped records fingerprints (mathematical abstraction of a pattern).
Because it can compare records contiguity or closeness, the model find which, and estimate how-much, a subset of
records in a wider dataset table, is phenetically similar to a given record called Master Profile (MP). Generally, a MP
can be a reference record which either is newly inserted in the database or is one already registered record which is
assumed to be a significant paradigm.
An essential step in promoting the model to a method is the definition of a heuristics logic which describe a priori the
relevance of each field in the record profile. Before concepts such as correlation, association and dependence can be
applied to datasets it is necessary to determine the sense and the concurrent relationships between variables
relevance. In order to generalise the use of the method as much as possible we shall refer to variables and parameter
with homologous fields of a record profile in that their contribution coincide with the descriptive element of our
sample. On a practical base, the fields are the columns of a data table or a spread-sheet and this work will use this
scheme to better explain both mathematical model and easily applicable method.
Any experimental discipline uses a variety of analysis on descriptive science based on categorical information
formalised in a table where rows represent records (set of studied samples) and columns which represent the
characters of a sample (informative units, IU). Mathematically, a data table can be formalised ad a matrix of r by c (r
x c, rows by column) and our aim is to substituting the matrix with a vector containing a series of values equivalent
to each row or record.
The transformation cited above it is possible with the Unique Factorisation Domain theorem [2, 3] which profit by a
set of trained matrices containing the relative weights of the fields of a record so that all the range of all the possible
values assumed by a field have to be classified. In fact, the matrices will be used to determine the relative distance
(weighted distance) between homologous field of two records when compared and computed for their record FI.
A specially useful feature of the factorisation technique is to “summarise” and “persist” a quantitative and qualitative
expression of similarity in a two-records comparison by means of a delta value which sums the contribute of each
single field comparison with its corresponding on the opposite record.
We now introduce the definition of Matching Level or ML as the value achived each time a record-to-record
comparison is complete; when operated recursively, this process originate a vector (one-dimension matrix) with all
the ML values derived by the difference of two FI values.
Such array of ML value will be easy to be aliquot, ranked and clusterised according to cut-off values and or arbitrary
range of tolerances so that discrete bands of records can be distinguished to confine coherent groups of records on
the base of phenetic closeness and relevance similarity. In the most simple case we can divide two subsets with a cut-
off in the middle to separate concordant and non concordant records. This process can be repeated with arbitary cut-
off to trace which samples are falling within an acceptable level of similarity
The usage of factorisation gives to the RMM a simple way of treating experimental data because the heuristic
knowledge is empirically dynamically modified by the expert (heuristic Rules file, HRF) so that it can be mould and
adapted to any experiment. More over, the calculation algorithm can be reiterated by systematically changing the
HRF at every run and saving a corresponding results ML of FI delta’s vectors; virtuously it will be possible a
supervised analysis on a well characterised samples control. This variant of the method is strictly related to the
mathematical model demonstration [15] which justify an ultimate RMM usage to create calibration templates of
HRF. Frequently, distinct groups of scientists share observational data typology collected with different survey, yet
they wish to compare and evaluate data under a common impartial standard.
Because implementation of the method is easily translatable with a software acknowledged Template of HRF can be
utilised for large multi-centre audit of consensus trial, still every group could save autonomous ability of filtering,
clustering and monitoring its data according specific experimental schemes.
Despite its theoretical simplicity the model of RMM can lead to sophisticated reasoning software application. In
short, the algorithm could indeed be ran as a self-evaluation learning system; in such a case the process is would be
started without pre-defined HRF knowledge and historical repertoires could be scanned to automatically derive a set
of rules automatically by inferring on raw-data regardless the stochastic and homoscedasticity assumption required
by pre-process statistics.
Self- referential RMM system would bring to an ideal knowledge scanning system oriented meta and cluster analysis
for epidemiology. At the present time these tack can be accomplished with PCO, PCA hierarchical and cladistical
generally available only in high-level statistical software packages.

DISCUSSION
Model Theory and applied method
Mathematical treatise of the model and formal definition are disserted elsewhere [15], while in this work authors
write a divulgative exposure of the theory with a minimal use of mathematical formalisms in that practical example
will be oriented to experimental applied microbiology.
In order to be correctly applied, the method introduced so far have to be formally defined and modelled. Before of a
practical approach, we shall describe a totally theoretical example concerning the RMM. Hypothetical and
imaginative example is preliminary because will simplify the comprehension and concepts realistic applicability.

We premise that the field of a record have to contain non consecutive values with clearly non contiguous rank and
meaning; it is also necessary to extend for each field a definition of weighted-distance which will give a direct
measure of proximity or distance for a comparison of inter-fields as well inter-records entity.

Let us suppose that with the notation
1)
C4 = CITY = {Rome, Viterbo, Naples, Catania, Milan} (m=5)

we shall refer to the fourth field of a generic record R. The field considered is the descriptive value of the name of an
Italian city [CITY] while m is the number of possible value of the field so that it will be generically noted as iM or,
to be adherent to the 1) example we shall have
1a)
4M = 5, meaning with this the set of 5 possible values for the field CITY;

Weighted distance
It is possible an intuitive to express a “weighted-distance“ between two values of the field [CITY] in terms of
geographical distance expressed in kilometres; clearly, this proximity measure is reminiscent of the physical distance
between cities. We now use a graph to visually represent the possible field’s reciprocal relationships. Each arc of the
graph subtend a value for every couple of cities

For readability, graph and its arcs are not proportional, geographical distances are intentionally approximate and
original Italian names of the cities are reported; graph appears as a clear representation of the symbolic relationship
of the cities with each other among those considered as possible values of the field in 1) formula.
Let use the notation Gi to globally indicate the graph of all the “weighted-distances” for each couple of value of a ith
field Ci in a generic record R hence we define as :
2)
di (j,k) , j , for k=1., iM,

the “weighted-distance” between the jth and the kth value of the field Ci in R
The graph Gi can be represent with its associate matrix defined as Mi containing the di (j,k) values for the field Ci
in R:

T1)


The matrix in T1) is symmetric and indeed di (j, k) = di (k, j), albeit it is possible a field typology for which the
possible values d(j, k) not necessarily should have a linear correlation thus Mi would not be mirrored in its diagonal
line. Each cell contains at the intersection of two possible values of the field the phenetic distance which can be
interpreted as an index of affinity and similarity of two value among those possibly assumable by the field.
The example described intentionally uses the geographical distance to emphasise the concept prior to apply the
scheme for a more general kinds of information.
The Records Matching Method (RMM)
To consolidate theoretical approach we now address the model to a more specific real case. As already stressed out
the practical use of the RMM is highly flexible because it can be generalised to any kind of descriptive evidences as
far the scientist defines an a priori knowledgebase which, let say “informs” the algorithm on the relative significant
of the information evaluated and classify all their assumable values according an indexed relevance.
We now describe a case of agent-resistance experiment, still a microbiologists will immediately recognise a much
large spectrum of investigation to which the RMM could result applicable with success. This case, taken as
paradigm, is simple however complete since all the possible types of experimental variables and parameter, including
casting variants, are treated on details.
Let us recall the formalism in 1) and consider a set of fields which taken together represent a record profile. The
record and its composition of fields values is obviously our sample. The RMM finality is to compare two records by
determining their affinity and measuring it with a matching level.
Before a record-to-record level of matching we shall explain a field-by-field matching level which is a propedeutic
step; a sample is globally evaluated as result of the single contribution of each of its character whether is a variable
or a parameter (field).
Consider a record R (Ci) for i=0,.7 which is a sample of an experiment concerning the estimation of the Post
Antibiotic Effect (PAE) under several cultural conditions. Briefly, in vitro bacterial growth can show variable fresh
outbreak after antibiotic exposure pending on cultural media and incubation time.
The information characteristic collected, our fields, were registered to investigate parametric, non parametric
experimental outcomes on relation to phenotype and genotypes. All indicators are also associable to a descriptive
field which register the growth upshot.
Schematically the record’s profile can be formalised as follows
S1)
C0= PAE
= {0|0.10|0.11| 0.12| 0.13| 0.14| ……………|1.0} C1= PAERange = {0-0.30 | 0.31- 0.50 | 0.51-0.60 | 0.61-0.90 | 0.91-1.2 }
C2= Incubation = {60 min | 120 min | 360 min | 480 min}
C3= Resistance = {R | I | S}
C4= Antibiotic = {Amoxicillin | Meropenem | Ciprofloxacin | Gentamycin | Cefotaxime}
C5= Fenotype = {### | PenS | PenI | PenR | EryS | EryR M | ESBL}
C6= Genotype = {### | Pbp | ermB | mefA | ermTR | TEM4}
C7= Growth
= {### | true | false} or {### | - | +}
Each field gives opportunity to explain all the case for which the combination of values can be translated by the
model in a unique “image” which is consequence of the weighted-distance of each information therefore we describe
this pharmacoresistance experiment keeping in mind that any other kind of characters can be applied as well.
The first field C0(PAE), simply contains continuos values in a range of linear variability and the weighted-distance
could be calculated very much as for the example previously shown in 2), thus considering a simple absolute delta
between two values.
On practise the distance of the two records R1 and R2 for the field C0(R1) versus C0(R2) is the algebraic difference
of the values assumed by the two fields, thus if
C0(R1)= 0.27 , then according to formalism in 2) d(0.45 | 0.27) = 0.18

This first example concerns linear and continuous measures and as obvious parametric variable the value itself can
be appreciated as a direct measurement of geometric eucledian position. We shall soon see how the model will
translate even attribute, binaries and categorical descriptive fields.
The second field C1(PAERange), again, pertains to the PAE but is expressed as discrete ranks of values rather then
a variable single values. For microbiologist this attitude can be reminiscent of the MIC in antimicrobial
susceptibilities experiments, which indeed could be treated in the same way.
The field is clearly classified according 5 ranks (restricted groups of values) so recalling the T1 matrix, we can
reproduce a second matrix T2 which symbolise the theoretic graph G2 (non reported).
T2)
0.31- 0.50
0.50-0.60
0.61-0.90
0.61-1.2
0.31- 0.50
0.51-0.60
0.61-0.90
2
0.91-1.2
The matrix evidentiate the relationships of the mutual combination of weighted-distance between two ranks indexes. Recalling 2) we can adapt as follows : C1(R1)= [0.61-0.90]
C1(R2)= [0.31- 0.50]; thus d(3 | 1) = 2
In this case the delta value is calculated by using the ordinal index of the position of the rank. This is reasonable also because the ranges of the ranks, arbitrarily decide in their limits are nevertheless sorted in ascendent way. It will appears intuitive to microbiologist how limits of each rank can be arbitrarily decided pending on the experimental needs; there are no prejudices on the way the scales can be split and no forced schemes for regular length. On the contrary, diverse grouping can be decide to intentionally emphasise specific ranges. Therefore, the phenetic distance can assume all values between 0 and 4. It can be noted how the 0 valu e means that
two records R1 and R2 are identical for the field C1; moreover, this latter implication shows a first important
corollary of the model which demonstrate its coherence on the contour.
The field C2 allows us to consider the case of sorted and discrete variables which however do not follow a linear
function. For the field C2(Incubation) expressed in minutes, the simple difference between values can be calculate
in the way that the example 4) shows, still to important aspects arise. Firstly the sign of the delta value can be taken
in consideration with its negative value, and in a second instance not necessarily the different relative distances from
one the indexed field position could reflect the meaningful desired by the investigator on reality.
Let consider these two situation starting with the ordinary notation
d(120 | 480) = -360

Hence we have two possibile choices which can be adopted pending on a apriori empirical judgment of the scientist
a) to use the delta value the way the are, meaning by this that the difference will be taken on absolute value
5)
d(120 | 480) = -360
become |d(120 | 480) | = 360

b) to use a matrix of heuristic indexes to calculate a phenetic distance in a uniformed and predetermined way
As underscored the intervals taken as absolute values between consecutive incubation time have not geometrical
regularity and there are no regular proportion in the values succession; the three delta values 60|240|120, obtained
the for position 60-120-360-480, are non sorted (only crescent or descendent ) and scraps are not linear.
To understand how the b) situation can be favourable, we take advantage by the matrices T3a e T3b proposed as
follows :
T3a)
The two possibilities will be treated to gives different meaning relevance to the diverse experimental situation, still it will be processed by the RMM exactly in same way and it will be discriminative the human role. If we intend to get a linearity between incubation intervals, namely, will be all considered at the same level and we shall want only cluster and qualitative distinguish among various experiment we could use a heuristic table T3a and an example would be : d(3 | 1) = 2
This example clearly implies proportional increments deltas and the maximal separating factor would be [max d(0|3)] = 3 corresponding to the extremes 60 and 480 minutes.
If the microbiologist will prefer a more evident discrimination among incubation time, and even more, he wants to specifically decide which interval are more relevant to the experiments duration then an hypothetical heuristic matrix would be th T3b. The inter-distances scheme is identical to that in T3a albeit the weighted indexes were clearly chosen according to an exponential progression. If we repeat the step 6) by applying the T3b heuristics and maintaining the same fields values we shall obtain : d(15 | 4) = 11

It appears evident how heuristic matrices can be arbitrary rendered to fine tune the microbiologist decision which are
based on the logics on his empirical experience.
Detailed fields analysis of the model described so far on the first 3 fields is essentially the same for all the others
hence we shall omit the formalism of the heuristic calculation to preferably exhaust all other types of information in
the record structured profile designated in S1). We more briefly complete the plethora of possible fields typology and
their weighted-distance casting.
The field C3(Resistance) is a useful example of how attribute variables can be used as qualitative discrete inter-
values. In such a case is not that relevant to conserve an ideal sorting along the three symbolic values (Resistant,
Intermediate and Susceptible) therefore a simple linear heuristic matrix will adequately fit most cases in that there is
no a priori preferable values direction.
For the field C4(Antibiotic) are legitimate all the consideration already assured for the field C3 since values are not
scalar nor oriented, still it is plausible to establish a special relevance to privilege a type of antimicrobial agent
towards an other. For instance, quinolone and cephalosporin could be considered much similar and therefore much
close, when compared with ampicillin. This scheme could lead the RMM resolution to a better stratification for
clustering purposes because records with ampicillins will tend to segregate more centrifugally in their phenetic score.
Field C5 and C6 (Genotype and Phenotype) share all previously consideration for the descriptive variables except for
the peculiar value Null or symbolically [###]. It is indeed possible that either genotype or phenotype would be
unknown (or not definable). This important case, again, could be a subtle clue which need to be brought on
foreground to perspicuously separate samples.
The C5 and C6 fields are also vital to understand a further concept of the RMM named Extended Matching Score or
ExMS which makes it possible to extend the use of indexed weighted-distance by combining two concurrent fields
considered to be related in some way. This is exactly the case of the genotype and the phenotype fields in that it is
quite probable the expectation of having a specific genotype be associated to a phenotype. Failing this evidences
should rise doubts and it would be optimal to use RMM with an appropriate logics to accurately discern.
The ExMS is helpful in this case and is simple to apply because delta indexed values of two variables can simply be
multiply for a factor called “enhancer” when predetermined combination of values belonging associated fields will
occur. Authors have given an exhaustive treatise of the issue to include with the model the concept of fields
neighbourhood concurrency” [15].
At last consider field C7 which specifically issues the case of binary variables (TRUE/FALSE, YES/NO and
symbolically +/-). Despite two possible values the RMM heuritic matrix would help in discriminating a third level of
information’s type because the Null possibility could be indexed and samples could be diversely interpreted; possible
Null values of a field could acquaint several meanings such as not measurable, unknown value or not trustworthy
data.
Factorial record index
After a basic level of abstraction which explained the intra-record (inter-field) the model can now be scaled up to an
inter-record level. The concept of weighted-distance applied to the fields relationship can be transferred to classified
the entire record to step forward the RMM properly defined which will cumulate all the variable’s weights of each
field of the record. This mechanism aims to substitute a sample/record with a unique number which is together a
quantitative and a qualitative expression of that record.
By recalling the structure in S1) we obtain a set of fields representing a record R formalised as follows :
8)
R= {C0 | C1 | C3 | C3 | C4 | C5 | C6 | C7 }
eventually substituted with nominal definition
9) R= { PAE | PAERange | Incubation | Resistance | Antibiotic | Phenotype | Genotype | Growth }
We can express the content of the record R as an equivalent number called Factorial Record Index or FRI. This
number has a series of features that will be useful to give a qualitative and a quantitative representation of the record.
By utilising the Unique Factorisation Domain approach [2, 3] it is possible to achieve a unique number and by
reversing the algorithm to go back to all the values of the field of the original record [15].
In this paper the FRI will be described with respect to the only the practical suitability with the RMM; we remind
that the sum of all the weighted-indexes derived from the matricial calculation of each field of a record (e.g. : 8 and
9 formulas) is finalised to the comparison between two records. Each field inside the record profile will have a
weight”, all fields taken together, will result in a FRI.
We firstly define a table called Field Weights Table or FWT which is comprehensive of 3-dimensional arrays :
ordinal value of the field in R (its relative position in the record profile), its index value and its contained descriptive
value.
On reference of what was defined in S1) and supposing all descriptive fields as already classified in a heuristic
matrix like T2), we then have a table as follows :
T4)
Contained
Lines dotted signify tacitly omitted fields between C4 and C6; scheme meaning remain unaltered. The fourth column is a Field’s Weight Factor or FWF and will be essential to manipulate a meticulous logics which differs importance of a field towards others. Every row of the table has a weight which act as a multiplicative factor so that the expression in 6) can be applied as difference of two records R1 and R2 for that field; hence, that expression was d(3 | 1) = 2 for the Incubation field
and now would be revised according table T4 as follows :
d(3 | 1) * FWF (2|2) = 2 * 2 = 4

Basically, when a record is a case of an 2 hours incubation, its relevance during RMM is double in terms of
weighted-distance with other kinds of duration. This feature of the FWF is extremely important to understand how a
scientist can freely design a heuritics made with detailed rules and set up a reasoning template for the algorithmic
engine of the RMM. A weighted logics, adequately prepared for a specific set of information, is a sort of optical filter
which will deflect experimental dataset and re-project it on a screen as a clustered map; in a way a metaphor of the
trapezoid that filter a coherent light-wave and separate in wider coloured band
The case FWF (7|1) is zero, meaning by this that the Factorial Weight Index is also an effective mechanism to
selectively exclude a field. This feature is useful when the investigator wants to run a RMM on a dataset considering
only part of the record information; he will simply prevent the model from calculating.
The seventh field C7(Growth) in T4 is a special example because can show how coherent the model would be
considering other experimental situation at the edge; for instance when the detection of a value was not possible or is
not available, this does not means that there is no evidence of growth, simply the information is not available (e.g.
automation and technical accidents). It is obvious the setting to zero the symbolic value of [###] will prevent the
sample from being accidentally considered as [false], which rather means no growth.
As last implication, the RMM ignores, namely will not compare, those records which have even only one Null FWF;
only [True/False or +/-] are meaningful values.

Conclusion

The proposed model of the RMM is suitable to analyse experimental dataset in the daily microbiological routine. The
method is finalised to the cluster analysis and it represent a simple and customisable alternative to complex
modelling software and sophisticated statistics.
Its use and effectiveness is linked to the investigator that decide an a priori set of rules to determine the association
level of the experimental measures studied. The rules are represented with simple and intuitive knowledge tables for
each variable or parameter of a record; the heuristics can arbitrary be calibrated and adjusted so that the dataset can
be scanned by the RMM algorithm which will recursively process records matching on samples’s table.
Mathematical formalism of the model and its basic calculation algorithm is provided on literature [15] thus scientist
who has programmatic skill, can develop its own software program by using any programming language.
Virtually any type of dataset and experiments can be processed, still for practical software implementation, example
of source code concerning the modelling discussed in this work is freely distributed by the authors to anyone who
wish to realise the software toolkit. The auspice is that several other groups, involved in different microbiological
fields, could adopt the RMM an test its efficacy.
Acknowledgement
Authors are grateful to the programmer Carlo Bergamini (Genoa) for the Delphi and MS-VB6 source code
engineering and Franco Ameglio (Rome) for his manuscript revision of the microbiological aspects.
Literature

1. Cavallero A., Reina S., Schito G.C. - Post Antibiotic Effect induced by Ofloxacin in both gram-positivi
and gram-negative bacteria. "Chemoterapia" Jul 1987. 2. FG.M. Artin, Algebra, Prentice Hall (1991)
3. D. S. Dummit, R. M. Foote, Abstract Algebra, Wiley (1999).
4. Hanai T, Honda H. Application of knowledge information processing methods to biochemical engineering,
biomedical and bioinformatics fields. Adv Biochem Eng Biotechnol. 2004;91:51-73. Review. PMID: 15453192 [PubMed - indexed for MEDLINE] 5. Pollera C.F., Ameglio F., Reina S. - Changes in serum iron levels following very high dose of cisplatin.
Cancer Chemotherapy and Pharmacology 1987 6. Reina S., Debbia E.A., Schito G.C. - Ciprofloxacin Induced Modulation of cellular grothw in activated,
normal and lymphoid established Cell Lines. The antimicrobial agent resistances: orin treatment and control.abs 70, 25 5 1991, Principato di Monaco. 7. Reina S., Debbia E., Schito G.C. - Evaluation of the post antibiotic effect induced by various antibiotics
against Staphylocossi and Enterococci. A.A.M.J. 1993 8. Reina S., Debbia E. - Genetic recombination by spheroplast fusion in Escherichia coli K12 . Cytobios by
9. Reina S., E.A. Debbia, G.C. Schito. In Vitro Cellular Growth Modulation by quinolone conditioned
medium. 93rd General Meeting, Atlanta, Georgia, USA. Session 120. Paper nu. I28. 10. Reina S., Boeri E., Lillo F., Cao Y., Varnier E.O. Automation in AIDS research and diagnostic activity: a
Local Area Network with Standard Query Language. 7th European Edition of Conference on Advanced Technology for Clinical Laboratory and Biotechnology. - ATB '91 Nov 26-11-1991 B11. 11. Reina S., Miozza F. - Knowledge Data Base System for Twins study. ACTA GENET MED ET
GEMELLOL. Ed. Mendel Institute, Rome. 1994. 43:83-88 12. Reina S., Reina V. , Giacomini M., Debbia E. - Bio-fouling and micro-organisms identification on
polluted materials: a novel Knowledge Data Base System architecture for a heuristic expert system engine.
Atti congresso Internazionale dei Biologi, 22-25 settembre 1994. Vieste
13. Reina S. Il percent growth rate average (PGRA) migliora l'interpretazione dell'effetto post-antibiotico.
14. Ruggiero C., Giacomini M., REINA. S., Gaglio S. A qualitative process theory based model of the HIV-1
Virus-Cell interaction. Procedings of Medical Informatics Europe 93, Israel. ISBN 965-294-091-7 pp. 147-150. 15. Salvo A. Reina , Vito M. Reina and Eugenio A. Debbia Simple method for Records Matching for
experimental and diagnostic datasets of patient’s records. (Pre-print submitted to BioStatistics, COBRA electronic publishing network)

Source: http://www.salvoreina.it/RMMExpMicro.pdf

March 06 jmcp

Editorial Subjects—In This Issue and in Previous Issues affirms our intuition that (a) patients can discern among specificmeals or rosiglitazone (Avandia) 4 mg twice daily.17 The meansensory attributes of intranasal steroid products, (b) someA1c values at baseline were 9.5% in the INH group versus 9.4%attributes have higher economic value in their avoidance for the rosiglitazone group. Af

Part i: anthropometry, biomechanics & physical ergonomics (11)

PPCOE 2010 Program Session Code: DD-P-R DD : Date of paper presentation / poster exhibition P : Parallel block on the date R : Room for paper presentation (A, B, C) / poster exhibition (D) 08-1-A: The Agricultural Ergonomics Chair: Kwan Suk Lee , Hongik University, Korea A Korean national project of agricultural assistive technology Kwan Suk Lee, Seong Rok Chang, & Yu-Chang

Copyright 2014 Pdf Medic Finder