Business Intelligencerange :Data and information quality Essay

Data and information quality: Application in Big Data Analytics Matriculation Number E-Mail Business Intelligence (BI) solutions normally aimed at facilitating decision-making procedures by providing an inclusive view over a company’s main business data and appropriate abstractions. This is a business analysis paper that involves discussion of metadata and data element classification, data fusion and integration and predictive analytics. Data fusion is a means of incorporating data. SAP system, which is a business application appropriate for functionality that comprises risk analysis, financial management and compliance, is also used to assess data and information quality in Big Data analytics. Metadata structured information that locates, illustrates, explains, and otherwise makes it easier to retrieve and use an information resource is also analyzed. The modules of SAP ERP Financials execute core accounting and documenting, control of accounts receivable, collective services, and other functions. This paper seeks to analyze the identified data matters in business terms. Keywords: metadata and data element classification, data fusion and integration, predictive analytics. 1 Introduction Big Data and, more significant, the analytics Big Data drives business technology issues. However, developing better, quicker, more robust methods of accessing and analyzing huge data sets can lead to catastrophe if organization data quality and data management processes do not keep pace. For Big Data quality (Big DQ) and management (DM), minimum metadata necessities need to be recognized and, ultimately, metadata ideologies too (Fan and Floris 2012). To promote cross-enterprise application of data, taxonomies that signify classification or categorical frameworks that must be defined, such as financial data, demographic data, property characteristics, geographic/geospatial data, and individual identifiable data (Fan and Floris 2012). Data quality and data management principles and procedures have comprised: plainly defining the predicament and questions to be resolved before recognizing the data required, defining DQ benchmarks to make sure the data must be fit for its proposed use, identifying key DQ variables to be deliberated such as validity (Tozer 1999), timeliness, accuracy, completeness, reasonableness, managing data quality as attach next to the source as potential, acquiring data that flows from the fundamental business procedures, and generating and maintaining data documentation and artefacts. The paper involves discussion of data and information quality as an application in big data analytics. SAP software as a business tool for data and information management application is handy for this paper. 2 Metadata and Data Element Classification This subtitle is essential for this paper since it introduces the element of data and metadata that are essential for paper completion. Metadata for a long time have been utilized in various structures as a means of cataloguing archived data or information (Hillmann and Elaine 2004). In business, time is essential and with proper cataloguing the activities are well managed and controlled. The DDS (Dewey Decimal System) used for the categorization of library resources is an early illustration of metadata application (Zeng and Jian 2008). Library catalogues employed 3 x 5" cards to show a books author, title, brief description, and subject matter; a shortened alphanumeric identification mechanism demonstrated the physical place of the book inside the library (Fan and Floris 2012). Such data assisted to classify, identify, and pick books. Business Metadata illustrates non-technical expressions employed by business participators to integrate across both business and Information Technology (IT) to improve perform their responsibilities (Inmon et al 2008). In business, metadata reveals data without business context is worthless. This statement is illustrated by John Schmidt in is laws of integration. He illustrated the concept in a formula style as shown below Information = Data + Context (1) Classification and nomenclature are mostly illustrated to incorporate those data elements constructive not just for classification sensitive information, but for creating and documenting data elements (Li and Yingyi 2012). The data elements are grouped into three key groups namely classification, characters, and literature. Classification data are needed for every database as names outlines the back-bone of actual information (Czajkowski et al 2001). Classification of ordinary data elements are broken down into prohibited information, confidential information, restricted information and unrestricted information. In regard to data element classification, for Big DQ and DM, minimum metadata necessities need to be recognized and, eventually, metadata principles too (Li and Yingyi 2012). To promote cross-enterprise application of data, taxonomies that signify classification or categorical frameworks that must be defined, such as financial data, demographic data, property characteristics, geographic/geospatial data, and individual identifiable data (May et al 2006). This is essential in integrating metadata and data element categorization of business activities. 2.1 Metadata definition Metadata is defined as structured information that illustrates, locates, explains, and otherwise makes it simpler to use and retrieve an information source. There are different types of metadata in business that assist them manage their databases effectively (Hillmann and Elaine 2004). Administrative metadata are the ones related to the management, use, and encoding procedures of digital matters over a given time period. Descriptive metadata are the ones that describe an effort for purposes of identification and discovery, such as title, creator, and topic. Structural metadata indicates how complex objects are structured and provided to enforce use of the objects (Ma 2007). Metadata schemes sometimes known as schema are categories of metadata elements intended for a precise purpose, like describing a given type of information source. The description or significance of the elements themselves is recognized as the semantics of the schema (Tozer 1999). The values provided to metadata basics are the content. Metadata schemes normally identify semantics and names of elements. 2.2 Data Acquisition In obtaining data, it is significant for data to be prearranged to be more eagerly quantifiable. Data and data alternative principles for Big DQ and DM are major tools in the acquisition procedure (Austerlitz and Howard 2003). Application of the common expressions and grammar that standard sustain and facilitate the outlining of data across materials. For Big DM and DQ, movement concerning data and data substitute standards will consequence in less resistance in the data acquisition procedure (Di, Paolo 2013). 3 Data Fusion and Integration Data fusion and integration are essential functionalities that must be applied in business to ensure databases are well organised to save space and access time. Integrating data across numerous sources could be a large concept of a Big Data attempt. Data standards significantly facilitate the structuring of data (Lee et al 2008). Other DM instruments that assist integration are MDM (master data management), identity management (idM) and entity resolution. MDM generates a distinct, consistent outlook of data that is distributed within an organization. idM concentrates on identifying persons within a data resource or across data sources. Entity resolution is a procedure or instrument that determines who knows who and who is who; this is essential in business since identification is one of the marketing strategies (Foster and Kesselman 1999). For Big DQ and DM, business owner must make more exploitation of data integration and management tools such as idM, MDM, entity resolution, and data principles. The utmost impact of Big Data is on data value or quality. While data quality has conventionally been assessed in relation to its proposed use, for Big Data ventures, data quality could have to be evaluated beyond its proposed application and handle how data can be re-projected (Mitchell 2012). To achieve this, data quality features such as validity, accuracy, reasonableness, timeliness, completeness, and reliability must be evidently defined, recorded, measured, and made accessible to end users (Lee et al 2008). Artefacts associating to each data element, comprising business rules and significance mappings, must also be documented. If data is outlined or cleaned, care must be observed not to outdo the initial values. Data element summaries must be generated. The profiles must document the wholeness of every data (Lee et al 2008). Since data can be transferred across applications, reconciliation and controls criteria must be generated and recorded to make sure data sets precisely reproduce the data at the stage of acquirement and that no information is lost or copied in the procedure (Smith and Waterman 1981). Special attention ought to be accorded to semi-structured and unstructured data since data quality artefacts and attributes must not be easily or voluntarily defined. If prearranged data is generated from semi-structured and unstructured data, the generation procedure too must be recorded and any of the formerly acknowledged data quality procedures used. For Big DQ and DM, a business must generate data quality metadata that comprises data quality measures, attributes, mappings, business rules, cleansing routines, controls and data element profiles (Foster et al 2002). 3.1 Data Fusion Data fusion is the procedure of incorporation of numerous data and knowledge representing the equivalent real-world entity into an accurate, consistent, and constructive representation (Manyika and Durrant-Whyte 1995; Castanedo et al 2010). Below figure 1 provide an outline of how data fusion is essential and used in business entity to coordinate functions. Fig. 1 Sample of a data fusion of an impact assessment Data fusion, is normally described as the application of methods that merge data from multiple origins such as dimension #1 & #2 in figure 2 below and collect that data in order to attain conclusion (like scatter plot below) that will be more resourceful and potentially more correct than if they were attained by means of any lone source such as in a histogram. Fig. 2 Scatter plot showing data fusion technique In general, all errands that require any kind of parameter approximation from multiple sources can advantage from the application of data/information fusion techniques. The words data fusion and information fusion are characteristically used as synonyms; but in a number of scenarios, the expression data fusion is employed when dealing with raw data and the expression information fusion is used to define already processed data (Durrant-Whyte and Stevens 2001). In this logic, the expression information fusion denotes a higher semantic stage compared to data fusion. Other expressions associated with data fusion that characteristically seen in the literature comprise multisensor data fusion, data aggregation, decision fusion, sensor fusion, and data combination (Durrant-Whyte 1988; Hall and Llinas 1997). It is a multidisciplinary region that involves numerous fields, and it is hard to establish a precise and stern classification. The working techniques and methods can be categorised according to the subsequent criteria: (1) attending to the associations among the input data resources, as recommended by Durrant-Whyte (Durrant-Whyte and Stevens 2001). These associations can be described as (i) cooperative data, (ii) redundant, or (iii) complementary; (2) in accordance to the output/input data kinds and their environment, as recommended by Dasarathy (Dasarathy1997; Luo et al 2002); (3) subsequent an abstraction stage of the applied data: (i) signals, (ii) raw measurement, and (c) decisions or characteristics; (4) based on the diverse data fusion stages illustrated by the JDL (Blasch and Plano 2002;Llinas et al 2004); (5) As illustrated by the architecture type: (i) decentralized, (ii) centralized, or (iii) distributed. Classification According to Type of Architecture: One of the key queries that occur when planning a data fusion classification is a condition where the data fusion procedure will be processed. According to this criterion, the subsequent kinds of architectures could be recognized: (a) centralized architecture: in this architecture, the fusion nodule is located in the central system that receives the data from the entire of the input sources. (b) Decentralized architecture: this is composed of a system of end points in which every computer has its own processing potentials and there is no distinct point of information fusion (Durrant-Whyte and Stevens 2001). (c) Distributed architecture: involves measurements from every source computer that are processed separately before the data is transferred to the fusion processor; the fusion node is responsible for the data that is received from the other sources. (d) Hierarchical architecture comprises a blending of distributed and decentralized nodes, generating hierarchical processes in which the data fusion procedure is achieved at different stages in the hierarchy (Durrant-Whyte and Stevens 2001). 3.2 Data Integration SAP data integration organizes and blends information to generate a complete image of a business that propels actionable insights. The absolute data integration podium delivers precise, analytics prepared data to final users from any source (McDonald 2006). With illustration tools to eradicate coding and complication, SAP puts Big Data and the entire data sources at the forefront of business and IT personnel alike. Fig.3 SAP data integration platform The system is essential to boost business activities concerning data and information management due to its capability to execute the followings: Graphical ETL (extract-transform-load) instrument to process and load big data sources in recognizable methods. Visual border to call practice code, scrutinize images and video data to generate meaningful metadata. Rich collection of pre-built parts to acquire and change data from a full range of sources. Dynamic alterations, using attributes to determine location mappings, justification and enrichment regulations. Incorporated debugger for testing and regulating job execution (Wood 2007). 4 Predictive Analytics Predictive analytics is acknowledged as a business intelligence technology that creates a prognostic score for every customer or any organizational constituent. Allocating these predictive measures is the work of a predictive framework which has, in consequence, been skilled over business data, learning from the familiarity of any organization (Siegel 2013). It optimizes marketing procedures and website conduct to boost customer responses, adaptations and clicks, and to reduce churn (Klimberg and McCullough 2013). Each clients predictive score acknowledges actions to be executed with that client. This is a clear proof of business intelligence that just does not get more actionable more than that. 4.1 Application of Predictive Analytic in business In businesses, predictive scores referred to as the golden eggs created by doing predictive analytics; this is so because is a one predictive score per client or prospect. Each clients predictive score acknowledges actions to be executed with that client. Therefore, business intelligence does not get more actionable than this type of decision computerization (Taylor 2013). Predictive analytics is used in several ways to assist businesses surmount a plethora of predicaments. The core disparity in one form of application to the other is in what is being forecasted. Businesses of all dimensions apply predictive analytics to computerize functional decisions, both real-time and offline, across advertising, sales and more (Schneider and Eric 2013). Which commerce application of predictive analytics is most excellent for a business is a strategic query, and depends on which kind of decision one prefer to computerize; that is, how prognostic scores will greatly serve to propel decisions within ones business. It allows business system to facilitate clients’ retention, direct promotion, credit scoring, product recommendations, and others (Schneider and Eric 2013). Fig. 4 application of predictive analytics 4.2 SAP Predictive Analysis It is a fact in business that past performance does not assure future outcomes, why just attain client demands when a business can foresee them? Recognize fresh opportunities and give targeted services and products. SAP Predictive Analysis system operates with business existing data situation and preeminent with SAP HANA for instantaneous predictive insight (MacGregor 2014). SAP permits a business owner take benefit of spontaneous and iterative predictive structuring, generate advanced and influential data visualization and influence endless potentials with R assimilation. SAP has completed a quantity of PoC (proof-of-concept) tests with client systems include their augmentations as well as relevant copies of their industrious data (MacGregor 2014). The outcomes are very thrilling. So in broad perspective, the anticipation is that most tailoring should not be influenced. However, a comprehensive review of the client’s customizations will better notify the business operators of any possible impact (Schneider and Eric 2013). Fig. 5 SAP application (Schneider and Eric 2013) 4.3 Benefits of Predictive Analytics It has been acknowledged that predictive analytics, that long-existent progressive cousin of conventional business analytics, has been realizing more consideration of late due to innovative advances. The figure below provides a graph showing benefits of predictive analytics (Maisel and Gary 2014). Fig. 6 Source: Ventana Research Predictive Analytics Benchmark Research It is apparent that predictive analytics is no longer just the realm of mathematicians and statisticians. There is an explicit trend gearing business analysts and other commerce operators making use of this system (Sarma 2007). Sales and marketing are big contemporary users of predictive analytics and marketplace analysts are employing this technology on a greater scale (Maisel and Gary 2014). Therefore, the paper also assesses at the skills essential to perform predictive analytics and how the method can be used and functionalized across the businesses. It explores custom and business matters involved with creating predictive analytics (Power 2013). 5 Conclusion In conclusion, data quality and data management principles for Big Data are identical as they have been used in history for conventional data. But priorities may vary, and definite data quality and data management procedures, such as data integration, metadata, data consistency, and data quality, ought to be given augmented emphasis (Fan and Floris 2012). One key exception includes the time-tested exercise of first evidently defining the predicament. In the function of Big Data, where information may be employed in ways not initially intended, data elements require to be organized, defined, and produced in a way that optimizes potential use and does not hamper future utility. References 1. Austerlitz, H. and Howard A. Data acquisition techniques using PCs. 2nd ed. Amsterdam: Academic Press, 2003. Print. 2. Blasch E. P. And Plano, S. “JDL level 5 fusion model “user refinement” issues and applications in group tracking,” in Proceedings of the Signal Processing, Sensor Fusion, and Target Recognition XI, pp. 270–279, April 2002. View at Publisher · View at Google Scholar · View at Scopus 3. Castanedo, F., García, J. Patricio, M. A. And Molina, J. M. “Data fusion to improve trajectory tracking in a cooperative surveillance multi-agent architecture,” Information Fusion, vol. 11, no. 3, pp. 243–255, 2010. 4. Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid Information Services for Distributed Resource Sharing. In: 10th IEEE International Symposium on High Performance Distributed Computing, pp. 181–184. IEEE Press, New York (2001) 5. Dasarathy, B. V. “Sensor fusion potential exploitation-innovative architectures and illustrative applications,” Proceedings of the IEEE, vol. 85, no. 1, pp. 24–38, 1997. View at Publisher · View at Google Scholar · View at Scopus 6. Di, Paolo E. M. Data Acquisition Systems: From Fundamentals to Applied Design. New York, NY: Springer, 2013. Internet resource. 7. Durrant-Whyte H. F. And Stevens, M. “Data fusion in decentralized sensing networks,” in Proceedings of the 4th International Conference on Information Fusion, pp. 302–307, Montreal, Canada, 2001. 8. Durrant-Whyte H. F., “Sensor models and multisensor integration,” International Journal of Robotics Research, vol. 7, no. 6, pp. 97–113, 1988. View at Scopus 9. Fan, W., and Floris G. Foundations of Data Quality Management. S.l.: Morgan & Claypool, 2012. 10. Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The Physiology of the Grid: an Open Grid Services Architecture for Distributed Systems Integration. Technical report, Global Grid Forum (2002) 11. Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1999) 12. Gros, X. E. Applications of NDT data fusion. Boston: Kluwer Academic Publishers, 2001. Print. 13. Hall D. L. and Llinas J., “An introduction to multisensor data fusion,” Proceedings of the IEEE, vol. 85, no. 1, pp. 6–23, 1997. View at Publisher · View at Google Scholar · View at Scopus 14. Hillmann, D. I., and Elaine L. W. Metadata in practice. Chicago: American Library Association, 2004. 15. Inmon, W. H., Bonnie K. N., and Lowell F. Business metadata capturing enterprise knowledge. Amsterdam: Elsevier/Morgan Kaufmann, 2008. Print. 16. Klimberg, R. K., and McCullough. B. D. Fundamentals of predictive analytics with JMP. Cary, NC: SAS Institute, 2013. Print. 17. Lee, S., Hanseok K., and Hernsoo H. Multisensor Fusion and Integration for Intelligent Systems: An Edition of the Selected Papers from the Ieee International Conference on Mulitsensor Fusion and Integration for Intelligent Systems 2008. Berlin: Springer, 2009. Print. 18. Li, D., and Yingyi C. Computer and Computing Technologies in Agriculture V: 5th Ifip Tc 5/sig Conference, Ccta 2011, Beijing, China, October 29-31, 2011: Proceedings. Heidelberg: Springer, 2012. 19. Llinas, J., Bowman, C. Rogova, G., Steinberg, A. Waltz, E., and White, F. “Revisiting the JDL data fusion model II,” Technical Report, DTIC Document, 2004. 20. Luo, R. C. Multisensor integration and fusion for intelligent machines and systems. Norwood, N.J.: Ablex Pub., 1995. Print. 21. Luo, R. C., Yih, C.-C., and Su, K. L. “Multisensor fusion and integration: approaches, applications, and future research directions,” IEEE Sensors Journal, vol. 2, no. 2, pp. 107–119, 2002. View at Publisher · View at Google Scholar · View at Scopus 22. Ma, J. Metadata. Washington, D.C.: Association of Research Libraies, 2007. 23. MacGregor, J. Predictive analysis with SAP: the comprehensive guide. Bonn: Galileo, 2014. Print. 24. Maisel, L., and Gary C. Predictive Business Analytics: Forward-looking Capabilities to Improve Business Performance. , 2014. Internet resource. 25. Manyika, J. And Durrant-Whyte, H. Data Fusion and Sensor Management: A Decentralized Information-Theoretic Approach, Prentice Hall, Upper Saddle River, NJ, USA, 1995. 26. May, P., Ehrlich, H.C., Steinke, T.: ZIB Structure Prediction Pipeline: Composing a Complex Biological Workflow through Web Services. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 1148–1158. Springer, Heidelberg (2006) 27. McDonald, Kevin. Mastering the Sap Business Information Warehouse: Leveraging the Business Intelligence Capabilities of Sap Netweaver. Indianapolis, IN: Wiley Pub, 2006. Internet resource. 28. Metadata. London: LITC, South Bank University, 2000. Print. 29. Mitchell, H B. Data Fusion: Concepts and Ideas. Berlin: Springer, 2012. Internet resource. 30. National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov (Accessed: 09.06.2014) 31. Power, D. J. Decision support, analytics, and business intelligence. 2nd ed. New York, N.Y.] (222 East 46th Street, New York, NY 10017): Business Expert Press, 2013. Print. 32. Sarma, K. S. Predictive Modeling with Sas Enterprise Miner: Practical Solutions for Business Applications. , 2007. Internet resource. 33. Schneider, T. and Eric W. ABAP-Entwicklung für SAP HANA mit dem SAP NetWeaver AS ABAP 7.4 Anwendungen für SAP HANA entwickeln; technische Grundlagen, Entwicklungsumgebung, Datenmodellierung und Programmierung; erweiterte Funktionen wie Fuzzy-Suche und Predictive Analysis. 1. Aufl. ed. Bonn: Galileo Press, 2013. 34. Siegel, E. Predictive analytics: the power to predict who will click, buy, lie, or die. Hoboken, N.J.: Wiley, 2013. Print. 35. Sinha, A. K. Geoinformatics: data to knowledge. Boulder, Colo.: Geological Society of America, 2006. Print. 36. Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. J. Mol. Biol. 147, 195–197 (1981) 37. Taylor, J. Decision management systems: a practical guide to using business rules and predictive analytics. Upper Saddle River, NJ: IBM Press/Pearson plc, 2012. Print. 38. Tozer, G. V. Metadata management for information control and business success. Boston: Artech House, 1999. Print. 39. Wood, D.C. Sap Scm: Applications and Modeling for Supply Chain Management (with Bw Primer). Hoboken, N.J: J. Wiley & Sons, 2007. Internet resource. 40. Zeng, M. L., and Jian Q. Metadata. New York: Neal-Schuman Publishers, 2008. Print. Read More

Data and Information Quality: Application in Big Data Analytics - Example

Extract of sample "Data and Information Quality: Application in Big Data Analytics"