Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/3296
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorPüskülcü, Halisen
dc.contributor.authorSülün, Erhan-
dc.date.accessioned2014-07-22T13:51:15Z-
dc.date.available2014-07-22T13:51:15Z-
dc.date.issued2004en
dc.identifier.urihttp://hdl.handle.net/11147/3296-
dc.descriptionThesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2004en
dc.descriptionIncludes bibliographical references (leaves. 78)en
dc.descriptionText in English; Abstract: Turkish and Englishen
dc.descriptionix, 79 leavesen
dc.description.abstractBy the help of large storage capacities of current computer systems, datasets of companies has expanded dramatically in recent years. Rapid growth of current companies. databases has raised the need of faster data mining algorithms as time is very critical for those companies.Large amounts of datasets have historical data about the transactions of companies which hold valuable hidden patterns which can provide competitive advantage to them. As time is also very important for these companies, they need to mine these huge databases and make accurate decisions in short durations in order to gain marketing advantage. Therefore, classical data mining algorithms need to be revised such that they discover hidden patterns and relationships in databases in shorter durations.In this project, K-means data mining algorithm has been proposed to be improved in performance in order to cluster large datasets in shorter time. Algorithm is decided to be improved by using parallelization. Parallelization of the algorithm has been considered to be a suitable solution as the popular way of increasing computation power is to connect computers and execute algorithms simultaneously on network of computers. This popularity also increases the availability of parallel computation clusters day by day. Parallel version of the K-means algorithm has been designed and implemented by using C language. For the parallelisation, MPI (Message Passing Interface) library hasbeen used. Serial algorithm has also been implemented by using C language for the purpose of comparison. And then, algorithms have been run for several times under same conditions and results have been discussed. Summarized results of these executions by using tables and graphics has showed that parallelization of the K-means algorithm has provied a performance gain almost proportional by the count of computers used for parallel execution.en
dc.language.isoenen_US
dc.publisherIzmir Institute of Technologyen
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subject.lccQA76.9.D343 .S62 2004en
dc.subject.lcshData miningen
dc.titleImprovements in K-means algorithm to execute on large amounts of dataen_US
dc.typeMaster Thesisen_US
dc.institutionauthorSülün, Erhan-
dc.departmentThesis (Master)--İzmir Institute of Technology, Computer Engineeringen_US
dc.relation.publicationcategoryTezen_US
item.fulltextWith Fulltext-
item.grantfulltextopen-
item.languageiso639-1en-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.cerifentitytypePublications-
item.openairetypeMaster Thesis-
Appears in Collections:Master Degree / Yüksek Lisans Tezleri
Files in This Item:
File Description SizeFormat 
T000441.pdfMasterThesis483.4 kBAdobe PDFThumbnail
View/Open
Show simple item record



CORE Recommender

Page view(s)

100
checked on Nov 18, 2024

Download(s)

36
checked on Nov 18, 2024

Google ScholarTM

Check





Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.