NAME
clust3dm - Modified K-means Algorithm with Masking capabil-
ities
SYNOPSIS
clust3dm [ -NR ntapr ] [ -NI ntapi ] [ -NM ntapm ] [ -OR
otapr ] [ -OI otapi ] [ -OC otapc ] [ -tfstart tfstart ] [
-tfskip tfskip ] [ -tfend tfend ] [ -ildm ildm ] [ -cldm
cldm ] [ -nclust nclust ] [ -sclustini sclustini ] [
-sdelthresl sdelthresl ] [ -sdelthresu sdelthresu ] [ -sdel-
ini sdelini ] [ -maxcycle maxcycle ] [ -kmthres kmthres ] [
-seed seed ] [ -fout ] [ -V ] [ -? ]
DESCRIPTION
clust3dm The clust3dm routine represents a modified K-means
cluster algorithm (see J.T. Tou, and R.C. Gonzales; Pattern
Recognition Principles, Massachusetts, Addison-Wesley,
1974). The performance index that is minimized using the K-
means algorithm is defined as the sum of the squared dis-
tances from all points in a cluster domain to the cluster
center (which can be thought of as a minimization of the
overall noise power contained in the the whole cluster
domain). Initial cluster centers can be specified. The user
can also delete clusters, relocate cluster centers or delete
feature vectors. Furthermore, if the user supplies a mask-
file (which can be easily created using the USP-routines
clustedit and/or pickmask) a specified subset of feature
vectors can be selected for classification.
USAGE
clust3dm gets all its parameters from command line argu-
ments. These arguments specify the real and imaginary part
of the initial input data set (-NR and -NI), the maskfile
(-NM), the real and imaginary part of the output ("initial
subset") feature data set (-OR and -OI), the output file
containing the final cluster assignments (-OC), the sample
points of each initial trace you want to use for processing
(-tfstart, -tfskip and -tfend), the in-line/cross-line spac-
ing of the dataset (-ildm and -cldm), the number of initial
cluster centers (-nclust), the cluster center initilization
method (-sclustini), the threshold and modification method
used for cluster center relocation, cluster center removal
and feature vector vector removal (-sdelthresl, sdelthresu,
and -sdelini), the maximum number of iterations (-maxcycle)
that can occur, the convergence threshold used within the
K-means subroutine (-kmthres), a seed parameter for a random
generator (-seed), as well as a feature output file flag (-
fout).
In the IKP environment, input #0 and input #3 are used for
the real part (-NR) and imaginary part (-NI) of the initial
data set, respectively. Input #6 is used for the maskfile
and output #1 contains the cluster assignments (-OC). Output
#4 and output #5 are the real (-OR) and imaginary part (-OI)
of the "initial subset". Output #2 is the usual process
printfile output.
Command line arguments
-NR ntapr
Enter the input file name immediately after typing -NR
(corresponding to the real part of the initial input
data set). The input file should include the complete
path name if the file resides in a different directory.
For this program, the data must be stored as a rec-
tangular grid of regularly binned data. The number of
samples denoted by lineheader word 'NumSmp' defines the
number of entries used for each initial feature vec-
tors. The number of traces denoted by lineheader word
'NumTrc' defines the number of traces in the x-
direction. The number of records (seismic lines)
denoted by lineheader word 'NumRec' defines the number
of traces in the y-direction. Dead traces have to be
flagged by the dead traceheader flag 'StaCor'=30000.
Has to be specified, otherwise program will abort.
-NI ntapi
Enter the input file name immediately after typing -NI
(corresponding to the imaginary part of the initial
input data set). For this program, the data must be
stored as a rectangular grid of regularly binned data.
The number of samples denoted by lineheader word
'NumSmp' defines the number of entries used for each
initial feature vector. The number of traces denoted by
lineheader word 'NumTrc' defines the number of traces
in the x-direction. The number of records (seismic
lines) denoted by lineheader word 'NumRec' defines the
number of traces in the y-direction. Dead traces have
to be flagged by the dead traceheader flag
'StaCor'=30000. Does not have to be specified. If not
specified the imaginary part is set internally to zero.
-NI ntapm
Enter the maskfile name immediately after typing -NM.
The number of traces denoted by lineheader word
'NumTrc' defines the number of traces in the x-
direction. The number of records (seismic lines)
denoted by lineheader word 'NumRec' defines the number
of traces in the y-direction. Note that the x- and y-
dimensions of all input files have to match. The
number of samples denoted by lineheader word 'NumSmp'
is one; each sample referirng to a masking flag for a
certain x-y-position. The masking flags subdivide
traces into "active" and "passive" traces. "Passive"
traces are marked by a masking flag of -1, all other
values correspond to "active" traces. If a passive
trace is detected, StaCor will be set to 30000, i.e.,
all traces of output files corresponding to "passive"
traces are made dead traces. Only "active" traces are
used for classification. Does not have to be specified.
If not specified all traces are assumed to be "active".
-OC otapc
Enter the output file name immediately after typing -OC
(contains the cluster assignments for each and ever
spatial location). Note that cluster assignments for
dead and masked traces are set to zero and that cluster
assignments for deleted traces are set to "number of
clusters + 1", respectively. Has to be specified if in
IKP, otherwise program will abort. Will be set to
outc.usp if not in IKP and not specified.
Note: The number of clusters used is stored in the
LINEHEADER keyword 'MnDpIn'!!!
-OR otapr
Enter the output file name immediately after typing -OR
(contains the real part of the "initial subset" of
feature vectors). The "initial subset" of feature vec-
tors includes dead and masked traces (StaCor=30000)
which will be taken out for cluster processing. Has to
be specified if in IKP and the feature output file flag
(-fout) is specified, otherwise program will abort.
Will be set to outr.usp if not in IKP and not speci-
fied.
-OI otapi
Enter the output file name immediately after typing -OI
(contains the imaginary part of the "initial subset" of
feature vectors). The "initial subset" of feature vec-
tors includes dead and masked traces (StaCor=30000)
which will be taken out for cluster processing. Has to
be specified if in IKP and the feature output flag (-
fout) as well as the input file name (-ntapi) are
specified, otherwise program will abort. Will be set to
outi.usp if not in IKP and not specified.
tfstart, tfskip and tfend can be used to select a reduced set of
sample points (per trace) of the whole input data set.
Note that the times/frequencies that you specify will
be internaly adjusted to fit the time/frequency grid of
the existing data. All times/frequencies have to be
given and will be set in milliseconds/Hz, respectively.
Note that tfend has to be greater than tfstart; other-
wise error message will occur.
-tfstart tfstart
Specifies the time/frequency of occurence of first sam-
ple point in trace which will be used for processing.
Will be adjusted internaly according to the following
rules:
default => TmMsFs (time/frequency of occurence of first
sample point in trace)
if tfstart specified is less than or equal to TmMSFs =>
set tfstart internaly to TmMsFs
if tfstart specified is greater than TmMSFs => set
tfstart internaly to the time/frequency of occurence of
the next available sample whose time/frequency of
occurence is greater or equal to tfstart specified
if tfstart specified is greater than the time/frequency
of occurence of the last sample in the trace => set
tfstart internaly to time/frequency of occurence of
last sample in trace
-tfskip tfskip
Specifies the time/frequency increment for the selec-
tion of trace sample points used for processing. Will
be adjusted internaly according to the following rules:
default => one input sample interval
if tfskip specified is less than or equal to zero =>
set tfskip internaly to one input sample interval
(given by lineheader words SmpInt and MutVel for time
and frequency, respectively).
if tfskip specified is greater than zero => set tfskip
internaly to the multiple of input sample intervals it
encloses
if tfskip specified is greater than the difference
between the time/frequency of occurence of the last
sample point in a trace and the first sample point in a
trace => set tfskip internaly to this time/frequency
difference
-tfend tfend
Specifies the time/frequency of occurence of last sam-
ple point in trace which will be used for processing.
Will be adjusted internaly according to the following
rules:
default => time of occurence of last sample point in
trace
if tfend specified is less than or equal to TmMSFs =>
set tfend internaly to the time/frequency of occurence
of last sample in trace
if tfend specified is greater than TmMSFs => set tfend
internaly to the time/frequency of occurence of the
last available sample whose time/frequency of occurence
is smaller or equal to tfend specified
if tfend specified is greater than the time/frequency
of occurence of last sample in trace => set tfend
internaly to time/frequency of occurence of last sample
in trace
-ildm ildm
If not found in the line header, you must specify the
in-line spacing (In-line cell increment)
-cldm cldm
If not found in the line header, you must specify the
crossline spacing (Cross-line cell increment)
-nclust nclust
Specifies the number of clusters used initialy. Note
that the result might contain less clusters than speci-
fied here (if -sdelini was set to 2, i.e., if one
deletes clusters having insufficient number of
members).
default => 5
-sclustini sclustini
Specifies the cluster center initialization method:
default => Option 4
1 => Select the first nclust feature vectors of the
"start subset" as cluster centers. The "start subset"
consists of all the feature vectors in the "initial
subset" (see -OR, -OI) corresponding to live traces.
2 => Select the initial cluster centers randomly from
the "start subset".
3 => Select the initial cluster centers based on the
one-pass "furthest neighborhood method". The first
feature vector of the "start subset" serves as the
first initial cluster center and the feature vector
which is furthest away from this first cluster center
is choosen as the second cluster center. The next
nclust-2 cluster centers are selected accordingly,
i.e., the distance between each remaining feature
vector and the already established cluster centers is
obtained and the feature vector which is furthest away
from any of these cluster centers is choosen as the
next cluster center. Precaution! Note that this method
has its pitfall. Although, e.g., the third cluster
might be far away from cluster center 1 it might actu-
ally coincide with cluster center 2.
4 => Select the initial cluster centers based on the
"maxmin-distance algorithm" (see J.T. Tou, and R.C.
Gonzales; Pattern Recognition Principles, Mas-
sachusetts, Addison-Wesley, 1974). Select the first and
second cluster center as in the "furthest neighborhood
method". Calculate the distance from each remaining
feature vector to each of the already existing cluster
centers and assign that feature vector as the third
cluster center that has the maximum minimum distance.
This procedure is repeated to obtain the remaining
cluster centers.
-sdelthresl sdelthresl
Specifies the lower membership-threshold individual
clusters (number of members) are compared against.
default => 0
-sdelthresu sdelthresu
Specifies the upper membership-threshold individual
clusters (number of members) are compared against.
default => 0
sdelthresl and sdelthresu specify the lower and upper
membership threshold. Using these thresholds the user
can select one out of four different boundary ranges,
individual cluster membership numbers are compared
against. Clusters whose membership number fall within
the specified boundary will be modified according to
the specification made in sdelini.
range 0 => default; selected if neither sdelthresl nor
sdelthresu are specified (or if both are set to zero
internally). Results in no modification for any of the
clusters.
range 1 => selected if only sdelthresl is specified (or
if sdelthresu) is set to zero internally). Results in
modifications of all clusters whose membership number
is <= sdelthresl.
range 2 => selected if sdelthresl and sdelthresu are
specified. Results in modifications of all clusters
whose membership number is > sdelthresl and <
sdelthresu.
range 3 => selected if only sdelthresu is specified (or
if sdelthresl) is set to zero internally). Results in
modifications of all clusters whose membership number
is >= sdelthresu.
Note: If sdelthresl is specified to be > sdelthresu
program will abort!
-sdelini sdelini
Specifies the modification method. Action is taken if
the membership number of any cluster falls within the
boundary range given by sdelthresl and sdelthresu:
default => Option 3
1 => Deletes all feature vectors which belong to a
cluster whose membership number falls within the thres-
hold boundary set by sdelthresl and sdelthresu. The
number of cluster centers stays the same. The feature
vectors that have been deleted are given the value
"nclust + 1" in the cluster assignment output file (see
-OC) where nclust is the final number of cluster.
Cluster centers whose members have been deleted are
moved to be near a cluster center of an adequately
(membership number within boundary) populated cluster.
The cluster with the most members is selected.
2 => Deletes all feature vectors which belong to a
cluster whose membership number falls within the thres-
hold boundary set by sdelthresl and sdelthresu. The
number of cluster centers is also reduced.
3 => Deletes all cluster centers which belong to a
cluster whose membership number falls within the thres-
hold boundary set by sdelthresl and sdelthresu. This
will reduce the number of clusters used but it will not
effect the number of feature vectors used. Using this
option one can distribute the feature vectors which
fall in low populated clusters among the higher popu-
lated clusters.
4 => Move all cluster centers which belong to a cluster
whose membership number falls within the threshold
boundary set by sdelthresl and sdelthresu. The clus-
ter centers are not being deleted. They are moved to be
near a feature vector of an adequately (membership
number within boundary) populated cluster. The cluster
whose cluster center is closest (in the Euclidean
sense) to the cluster center we want to move is being
selected. This leads to a splitting of this adequately
populated cluster.
-maxcycle maxcycle
Specifies how many iteration cycles (K-means cycles)
will at most occur.
default => 0; will lead to no limitations in the amount
of iteration cycles that can occur.
-kmthres kmthres
Specifies the threshold used for each K-means cycle. As
soon as the number of reassignments is equal or smaller
than kmthres convergence is assumed to be achieved.
default => 0
-seed seed
Specifies seed value for random generator used for
cluster initialization method (see -sclustini).
default => 1
-fout
Enter the command line argument '-fout' if you want to
output the "initial subset" of feature vectors (see
-otapr and otapi). This gives the user the ability to
use the same feature vectors in another clust3dm ses-
sion working with a different parameter set.
-V Enter the command line argument '-V' to get additional
printout.
-? Enter the command line argument '-?' to get online
help. The program terminates after the help screen is
printed.
BUGS
None currently known.
See Also:
REFERENCES:
Kirlin, R.L., 1996, "Considerations for Clustering and Seg-
mentation of Reflections for 3D Horizon Feature Enhance-
ment", R. Lynn Kirlin, Inc.
Kaufhold, B., Kirlin, R.L., Dizaji, R.M., 1998, "A Cluster-
ing Based Blind System Indentification Approach", submitted
to ICA' 99, France
CONTRACT AGREEMENT
This product is brought to you by Research Contract Agree-
ment 2548 (Seismic coherency cube). Thank you for your sup-
port.
AUTHOR
Bertram Kaufhold (1998)
COPYRIGHT
copyright 2001, Amoco Production Company
All Rights Reserved
an affiliate of BP America Inc.
Man(1) output converted with
man2html