NAME
clust3d - Modified K-means Algorithm
SYNOPSIS
clust3d [ -NR ntapr ] [ -NI ntapi ] [ -OR otapr ] [ -OI
otapi ] [ -OC otapc ] [ -tstart tstart ] [ -tskip tskip ] [
-tend tend ] [ -ildm ildm ] [ -cldm cldm ] [ -nclust nclust
] [ -sclustini sclustini ] [ -sdelthresl sdelthresl ] [
-sdelthresu sdelthresu ] [ -sdelini sdelini ] [ -maxcycle
maxcycle ] [ -kmthres kmthres ] [ -seed seed ] [ -fout ] [
-V ] [ -? ]
DESCRIPTION
clust3d The clust3d routine represents a modified K-means
cluster algorithm (see J.T. Tou, and R.C. Gonzales; Pattern
Recognition Principles, Massachusetts, Addison-Wesley,
1974). The performance index that is minimized using the K-
means algorithm is defined as the sum of the squared dis-
tances from all points in a cluster domain to the cluster
center (which can be thought of as a minimization of the
overall noise power contained in the the whole cluster
domain). Initial cluster centers can be specified. The user
can also delete clusters, relocate cluster centers or delete
feature vectors.
USAGE
clust3d gets all its parameters from command line arguments.
These arguments specify the real and imaginary part of the
initial input data set (-NR and -NI), the real and imaginary
part of the output ("initial subset") feature data set (-OR
and -OI), the output file containing the final cluster
assignments (-OC), the sample points of each initial trace
you want to use for processing (-tstart, -tskip and -tend),
the in-line/cross-line spacing of the dataset (-ildm and
-cldm), the number of initial cluster centers (-nclust), the
cluster center initilization method (-sclustini), the thres-
hold and modification method used for cluster center reloca-
tion, cluster center removal and feature vector vector remo-
val (-sdelthresl, sdelthresu, and -sdelini), the maximum
number of iterations (-maxcycle) that can occur, the conver-
gence threshold used within the K-means subroutine (-
kmthres), a seed parameter for a random generator (-seed),
as well as a feature output file flag (-fout).
In the IKP environment, input #0 and input #3 are used for
the real part (-NR) and imaginary part (-NI) of the initial
data set, respectively. Output #1 contains the cluster
assignments (-OC). Output #4 and output #5 are the real (-
OR) and imaginary part (-OI) of the "initial subset". Output
#2 is the usual process printfile output.
Command line arguments
-NR ntapr
Enter the input file name immediately after typing -NR
(corresponding to the real part of the initial input
data set). The input file should include the complete
path name if the file resides in a different directory.
For this program, the data must be stored as a rec-
tangular grid of regularly binned data. The number of
samples denoted by lineheader word 'NumSmp' defines the
number of entries used for each initial feature vec-
tors. The number of traces denoted by lineheader word
'NumTrc' defines the number of traces in the x-
direction. The number of records (seismic lines)
denoted by lineheader word 'NumRec' defines the number
of traces in the y-direction. Dead traces have to be
flagged by the dead traceheader flag 'StaCor'=30000 and
the number of padded traces have to be given by the
lineheader word 'Nx_Pad'. Has to be specified, other-
wise program will abort.
-NI ntapi
Enter the input file name immediately after typing -NI
(corresponding to the imaginary part of the initial
input data set). For this program, the data must be
stored as a rectangular grid of regularly binned data.
The number of samples denoted by lineheader word
'NumSmp' defines the number of entries used for each
initial feature vector. The number of traces denoted by
lineheader word 'NumTrc' defines the number of traces
in the x-direction. The number of records (seismic
lines) denoted by lineheader word 'NumRec' defines the
number of traces in the y-direction. Dead traces have
to be flagged by the dead traceheader flag
'StaCor'=30000 and the number of padded traces have to
be given by the lineheader word 'Nx_Pad'. Does not have
to be specified. If not specified the imaginary part is
set internally to zero.
-OC otapc
Enter the output file name immediately after typing -OC
(contains the cluster assignments for each and ever
spatial location). Note that cluster assignments for
dead traces and deleted traces are set to zero and
"number of clusters + 1", respectively. Has to be
specified if in IKP, otherwise program will abort. Will
be set to outc.usp if not in IKP and not specified.
Note: The number of clusters used is stored in the
LINEHEADER keyword 'MnDpIn'!!!
-OR otapr
Enter the output file name immediately after typing -OR
(contains the real part of the "initial subset" of
feature vectors). The "initial subset" of feature vec-
tors includes filled traces (StaCor=30000) which will
be taken out for cluster processing. Has to be speci-
fied if in IKP and the feature output file flag (-fout)
is specified, otherwise program will abort. Will be set
to outr.usp if not in IKP and not specified.
-OI otapi
Enter the output file name immediately after typing -OI
(contains the imaginary part of the "initial subset" of
feature vectors). The "initial subset" of feature vec-
tors includes filled traces (StaCor=30000) which will
be taken out for cluster processing. Has to be speci-
fied if in IKP and the feature output flag (-fout) as
well as the input file name (-ntapi) are specified,
otherwise program will abort. Will be set to outi.usp
if not in IKP and not specified.
tstart, tskip and tend can be used to select a reduced set of
sample points (per trace) of the whole input data set.
Note that the times that you specify will be internaly
adjusted to fit the time grid of the existing data. All
times have to be given and will be set in milliseconds,
respectively. Note that tend has to be greater than
tstart; otherwise error message will occur.
-tstart tstart
Specifies the time of occurence of first sample point
in trace which will be used for processing. Will be
adjusted internaly according to the following rules:
default => TmMsFs (time of occurence of first sample
point in trace)
if tstart specified is less than or equal to TmMSFs =>
set tstart internaly to TmMsFs
if tstart specified is greater than TmMSFs => set
tstart internaly to the time of occurence of the next
available sample whose time of occurence is greater or
equal to tstart specified
if tstart specified is greater than the time of
occurence of the last sample in the trace => set tstart
internaly to time of occurence of last sample in trace
-tskip tskip
Specifies the time increment for the selection of trace
sample points used for processing. Will be adjusted
internaly according to the following rules:
default => one input sample interval
if tskip specified is less than or equal to zero => set
tskip internaly to one input sample interval
if tskip specified is greater than zero => set tskip
internaly to the multiple of input sample intervals it
encloses
if tskip specified is greater than the difference
between the time of occurence of the last sample point
in a trace and the first sample point in a trace => set
tskip internaly to this time difference
-tend tend
Specifies the time of occurence of last sample point in
trace which will be used for processing. Will be
adjusted internaly according to the following rules:
default => time of occurence of last sample point in
trace
if tend specified is less than or equal to TmMSFs =>
set tend internaly to the time of occurence of last
sample in trace
if tend specified is greater than TmMSFs => set tend
internaly to the time of occurence of the last avail-
able sample whose time of occurence is smaller or equal
to tend specified
if tend specified is greater than the time of occurence
of last sample in trace => set tend internaly to time
of occurence of last sample in trace
-ildm ildm
If not found in the line header, you must specify the
in-line spacing (In-line cell increment)
-cldm cldm
If not found in the line header, you must specify the
crossline spacing (Cross-line cell increment)
-nclust nclust
Specifies the number of clusters used initialy. Note
that the result might contain less clusters than speci-
fied here (if -sdelini was set to 2, i.e., if one
deletes clusters having insufficient number of
members).
default => 5
-sclustini sclustini
Specifies the cluster center initialization method:
default => Option 1
1 => Select the first nclust feature vectors of the
"start subset" as cluster centers. The "start subset"
consists of all the feature vectors in the "initial
subset" (see -OR, -OI) corresponding to live traces.
2 => Select the initial cluster centers randomly from
the "start subset".
3 => Select the initial cluster centers based on the
one-pass "furthest neighborhood method". The first
feature vector of the "start subset" serves as the
first initial cluster center and the feature vector
which is furthest away from this first cluster center
is choosen as the second cluster center. The next
nclust-2 cluster centers are selected accordingly,
i.e., the distance between each remaining feature vec-
tor and the already established cluster centers is
obtained and the feature vector which is furthest away
from any of these cluster centers is choosen as the
next cluster center. Precaution! Note that this method
has its pitfall. Although, e.g., the third cluster
might be far away from cluster center 1 it might actu-
ally coincide with cluster center 2.
4 => Select the initial cluster centers based on the
"maxmin-distance algorithm" (see J.T. Tou, and R.C.
Gonzales; Pattern Recognition Principles, Mas-
sachusetts, Addison-Wesley, 1974). Select the first and
second cluster center as in the "furthest neighborhood
method". Calculate the distance from each remaining
feature vector to each of the already existing cluster
centers and assign that feature vector as the third
cluster center that has the maximum minimum distance.
This procedure is repeated to obtain the remaining
cluster centers.
-sdelthresl sdelthresl
Specifies the lower membership-threshold individual
clusters (number of members) are compared against.
default => 0
-sdelthresu sdelthresu
Specifies the upper membership-threshold individual
clusters (number of members) are compared against.
default => 0
sdelthresl and sdelthresu specify the lower and upper
membership threshold. Using these thresholds the user
can select one out of four different boundary ranges,
individual cluster membership numbers are compared
against. Clusters whose membership number fall within
the specified boundary will be modified according to
the specification made in sdelini.
range 0 => default; selected if neither sdelthresl nor
sdelthresu are specified (or if both are set to zero
internally). Results in no modification for any of the
clusters.
range 1 => selected if only sdelthresl is specified (or
if sdelthresu) is set to zero internally). Results in
modifications of all clusters whose membership number
is <= sdelthresl.
range 2 => selected if sdelthresl and sdelthresu are
specified. Results in modifications of all clusters
whose membership number is > sdelthresl and <
sdelthresu.
range 3 => selected if only sdelthresu is specified (or
if sdelthresl) is set to zero internally). Results in
modifications of all clusters whose membership number
is >= sdelthresu.
Note: If sdelthresl is specified to be > sdelthresu
program will abort!
-sdelini sdelini
Specifies the modification method. Action is taken if
the membership number of any cluster falls within the
boundary range given by sdelthresl and sdelthresu:
default => Option 3
1 => Deletes all feature vectors which belong to a
cluster whose membership number falls within the thres-
hold boundary set by sdelthresl and sdelthresu. The
number of cluster centers stays the same. The feature
vectors that have been deleted are given the value
"nclust + 1" in the cluster assignment output file (see
-OC) where nclust is the final number of cluster.
Cluster centers whose members have been deleted are
moved to be near a cluster center of an adequately
(membership number within boundary) populated cluster.
The cluster with the most members is selected.
2 => Deletes all feature vectors which belong to a
cluster whose membership number falls within the thres-
hold boundary set by sdelthresl and sdelthresu. The
number of cluster centers is also reduced.
3 => Deletes all cluster centers which belong to a
cluster whose membership number falls within the thres-
hold boundary set by sdelthresl and sdelthresu. This
will reduce the number of clusters used but it will not
effect the number of feature vectors used. Using this
option one can distribute the feature vectors which
fall in low populated clusters among the higher popu-
lated clusters.
4 => Move all cluster centers which belong to a cluster
whose membership number falls within the threshold
boundary set by sdelthresl and sdelthresu. The clus-
ter centers are not being deleted. They are moved to be
near a feature vector of an adequately (membership
number within boundary) populated cluster. The cluster
whose cluster center is closest (in the Euclidean
sense) to the cluster center we want to move is being
selected. This leads to a splitting of this adequately
populated cluster.
-maxcycle maxcycle
Specifies how many iteration cycles (K-means cycles)
will at most occur.
default => 0; will lead to no limitations in the amount
of iteration cycles that can occur.
-kmthres kmthres
Specifies the threshold used for each K-means cycle. As
soon as the number of reassignments is equal or smaller
than kmthres convergence is assumed to be achieved.
default => 0
-seed seed
Specifies seed value for random generator used for
cluster initialization method (see -sclustini).
default => 1
-fout
Enter the command line argument '-fout' if you want to
output the "initial subset" of feature vectors (see
-otapr and otapi). This gives the user the ability to
use the same feature vectors in another clust3d session
working with a different parameter set.
-V Enter the command line argument '-V' to get additional
printout.
-? Enter the command line argument '-?' to get online
help. The program terminates after the help screen is
printed.
BUGS
None currently known.
See Also:
REFERENCES:
Kirlin, R.L., 1996, "Considerations for Clustering and Seg-
mentation of Reflections for 3D Horizon Feature Enhance-
ment", R. Lynn Kirlin, Inc.
Kaufhold, B., Kirlin, R.L., Dizaji, R.M., 1998, "A Cluster-
ing Based Blind System Indentification Approach", submitted
to ICA' 99, France
CONTRACT AGREEMENT
This product is brought to you by Research Contract Agree-
ment 2548 (Seismic coherency cube). Thank you for your sup-
port.
AUTHOR
Bertram Kaufhold (1998)
COPYRIGHT
copyright 2001, Amoco Production Company
All Rights Reserved
an affiliate of BP America Inc.
Man(1) output converted with
man2html