Data Mining – Gold Nuggets of Data May Prove to be Priceless

data mining tutorialData mining can often be used to gather knowledge from data sources. Data mining processes may involve tasks such as the following: data filtering, research, data retrieval and queries, statistics, analysis, graphic model development, etc. Searching for data as well as the analysis of data can be conducted manually or by using approaches based on automation. Data mining practices stem from machine learning and as data mining has evolved; information gathered from data mining has played a role in many areas, which include the following: medical innovation, predictive models, software development, analytical applications, marketing, strategic competition, game design, inventory replenishment, reporting, etc. Learn about the advantages that data mining can present with a comprehensive tutorial.

Patterns and Analysis

In some cases, procedures for analysis; may involve the use of algorithms, the studying of historical data, discovery of patterns, and the application of statistics. One goal of studying patterns found in data is to develop conclusions based on the patterns found. When patterns are found in data; the more often the same pattern occurs within an environment that accepts many possible scenarios can be an indication of the level of accuracy involved in the resulting conclusions and therefore, data analysis is often conducted with large amounts of data. As an example of a case when research for a relatively small amount of data may not provide as accurate of conclusions as when the same research had been performed for a large amount of data is as follows: An analyst had found that 2 out of 10 of the customers included in a study had purchased the same two items of interest together, so they had initially concluded that there should be a 20% chance that the two items would be purchased together. When the same type of study included 10,000 orders, it had been found that there had actually been a 35% chance that the two items had been purchased together; this study had produced more accurate results because there had been more opportunities for orders to include none of the items in question, one of the items in question, or at least one of each of the items in question. When a study proves to be more accurate, it is also a study that has also ruled out coincidental or random factors.

The previous example had been a demonstration of research outcomes based on itemset patterns as the frequency of the itemsets (in the case of the example, the two items of interest had been the itemset) being purchased together (which had been the pattern being searched for) had been the primary factor for determination. In addition to frequent patterns itemsets that focus on items, other frequent patterns include subsequences and substructures. The researcher for the store had established that the store would consider at itemset having at least a 60% chance of being present as being a frequent itemset, so in the study conducted for the store, the two items in question had not quite been what the store would consider a frequent itemset. The store has a goal to make purchases of the two items together a frequent itemset, due to factors associated with their business relations with the two suppliers of the compatible products, and had developed a promotion which offered customers a discount when they purchase the items within the same order.

It would have taken quite a while to manually analyze the large amount of data for 10,000 orders; for the previous example, a tool had been used to apply automated processes along with the Apriori Algorithm in the analytical procedures. The Apriori Algorithm encompasses programming aspects such as If…Then…Else statements in order to rule out data that does not meet the criteria of the itemsets being searched for; the formula for the algorithm essentially searches for values that meet criteria outlined in rule sets (such as the number value and minimum count for the values of interest) and can then group itemsets in a hash table. The Apriori Algorithm can be used along with association rules when analyzing data. In the example for the retail business attempting to determine if the purchase of two specific items in the same order had been a frequent itemset, the results of the data analysis had assisted the store with making promotional/marketing decisions. In some cases, association rules assist with predictive analysis and modeling.

I had recently come across an app created by Five which retrieves words from posts one has posted on Facebook and then develops a personality trait summary based on analysis of posts. The basis of the analysis for the personality trait determinations involved previous knowledge of traits associated with defined behaviors, expressions, and communication; thus, the knowledge gathered on the factors associated with traits had stemmed from research in Psychology. My hypothesis is that when the Facebook posts made by an individual had been analyzed, that patterns had been searched for and then matched with personality traits that have been proven to be associated with the patterns found by the program. For instance, one who uses emotionally driven words may be more likely to be sensitive; one who posts and then quickly deletes their posts may have neurotic tendencies; one who does not post frequently may be solitary; one who composes their posts with precision and includes insightful posts may be more likely to be conscientious. You may have your own theories on how the app analysis information and develops the conclusions that it presents; in any case, the app is one of the many examples of ways that the gathering and analysis involved in data mining has been applied.

An additional scenario in which data mining had helped to provide more insight via analysis had been for root cause analysis performed within a help desk. A technical support group had been monitoring a recent issue that has occurred with the processing of some transactions; different errors has been found in the error logs when research had been conducted for the issue. Queries (which included SQL database queries) that had been used to find out which types of transactions had been affected had led to discoveries such as the fact the transactions which could not be completed and that resulted in the same types of errors within the same time frame had been for projects that had previously been saved and then recalled at the time of sale. Upon further research which involved extracting data that firstly pertained to the projects that had recently been saved as well as recent releases (system updates), the errors found became more exclusive. At last, semaphore related errors had been found to be associated with all projects for the reported issues. What about possible issues meeting the same criteria with the same symptoms that had not been reported? Queries along with a ROC (Receiver Operating Characteristic) model had been used to determine how often the same error had occurred with purchase orders; the results had helped provide information that indicated that the error had occurred many times, even within one day. Therefore; the help desk had been able to do the following: locate the purchase orders affected before an attempt had been made to recall the saved projects, provide others who may have not been aware of the issue with details on the issue, estimate how much call volume related to the issue could increase, and forward the information to analysts who could then work towards resolving the issue.

Processes

The first step of data mining can often be planning. Data mining processes typically include the following: planning, data collection, cleaning, querying, analysis, modeling, further analysis with consideration of modeling output, presentation. Planning can help with determining what types of data should be gathered and outline the objective of research. Data collection can include data being gathered for multiple data repositories or sources. After data has been gathered, the data is prepared so that it can be worked with more easily; preparing data involved data cleansing. Data cleaning had been used for the previous scenario because when data had been gathered and procedures had been performed for analysis of data, one of the factors that had not been directly mentioned had been that irrelevant data and other data that would not be beneficial in analysis (such as duplicate values) had been omitted during the early stages of analysis. In order to decrease the likelihood of time wasted for analysis of relatively invaluable data and the likelihood of research being more tedious than necessary, data cleansing is used in data mining; data cleansing helps ensure that the data being analyzed will provide value. Moreover, data cleansing can help with discovering discrepancies and/or cases of missing data.

Operations that help with clustering and converting data can help with data handling as functions can then be performed for data within different systems. Manual methods and automation tools can then be used to perform searches and analysis on data. If you are interested in discovering data extraction techniques and/or seek to expand upon your current knowledge of data mining, then you might benefit from a tutorial which demonstrates data mining concepts. Modeling can be used to create visual representations of information gathered from data, provide predictive analysis, and determine how various factors could affect results. Charts or other types of visual representation of potentially conclusive data can be created so that information gathered can be presented in relatively easy to understand forms. Determinations and actions to be taken can then be established after results of research has been presented.

Tools

There is an ample selection of tools available which help with data mining. Tools such as RapidMiner include features which may include allowing imports of data from Excel spreadsheets or other data sources for data repositories, management of projects, model building based on data and selected factors, as well as process control. When you think of companies that extract and analyze large amounts of data, one company that may come to mind is Google. Not only does Google use data mining for many, if not all, of its services; Google also offers users a tool that assists with data mining. The tool which Google offers, that can be used for data mining, is called Google Analytics. Google Analytics includes the following: a dashboard that helps with the organization of and accessibility of features, ability to import and export along with use of the Google Analytics API, tracking of events, goal management options, search engine optimization options, generation of visuals, etc. If you have discovered the potential that Google Analytics has when applied in ecommerce and other areas, then you should consider taking an Udemy tutorial that aims to help you master Google Analytics.

Additional Examples of Application

There are many possible applications of data mining, some examples are as follows: anomaly detection can assist with making discoveries such as those used for IRS audits and analysis for classification can assist with filters such as those used for email spam filters. Data mining in web analysis has contributed to the occurrence of selective advertisements that show up on pages; when a user views pages, cookies can store information for the ages viewed and data mining has helped with extracting information from such data in order to customize advertisements based on what the user may be interested in as determined by the ages that they have viewed. You may have personally experienced a case of seeing advertisements on a page shortly after you had viewed pages related to what is being advertised.

Summary

Data mining consists of steps which are often dependent of extensive research. Data mining has many different applications that can be found in various fields. Tools have been developed to help automate processes as well as to provide fairly user-friendly options for completing data mining processes. The gathered is just a part of data mining, the knowledge gathered from the extracted data is also a major element of data mining concepts. One key principle for data mining is that clear objectives along with direct analysis on relevant and valid data can help with producing “nuggets” of knowledge that can be used for beneficial purposes.