Over the past two decades, data has become an invaluable asset for companies, rivaling traditional assets like physical infrastructure, technology, intellectual property, and human capital. For some of the world’s most valuable companies, data forms the core of their business model.
The scale of data production and transmission has grown exponentially. Forbes reports that global data production increased from 2 zettabytes in 2010 to 44 ZB in 2020, with projections exceeding 180 ZB by 2025 – a staggering 9,000% growth in just 15 years, partly driven by artificial intelligence.
However, raw data alone doesn’t equate to actionable insights. Unprocessed data can overwhelm users, potentially hindering understanding. Information – data that’s processed, organized, and consumable – drives insights that lead to actions and value generation.
This article shares my experience in data analytics and digital tool implementation, focusing on leveraging “Big Data” to create actionable insights. These insights have enabled users to capitalize on commercial opportunities, identify cost-saving areas, and access useful benchmarking information. Our projects often incorporated automation, yielding time savings and efficiency gains. I’ll highlight key challenges we faced and our solutions, emphasizing early project phases where decisions have the most significant impact.
Key areas of focus include:
- Quantification of benefits
- The risk of scope creep
- Navigating challenges with PDF data
- Design phase and performance considerations
In large organizations, data availability and accessibility often pose significant challenges, especially when combining data from multiple systems. Most of my projects aimed to create a unified, harmonized dataset for self-serve analytics and insightful dashboards. We employed agile methodologies to maintain clear oversight of progress and bottlenecks, ensuring accountability for each team member.
The typical lifecycle of data projects encompasses scoping, design, development, implementation, and sustainment phases. During scoping, the product owner collaborates closely with the client/end-user organization to grasp overall needs, desired data types and insights, requirements, and functionality.
Quantification of benefits
A crucial element of the scoping phase is the benefit case, where we quantify the solution’s potential value. In my experience, this step often proves challenging, particularly when estimating the value of analytical insights. I’ve found that while calculating automation benefits like time savings is relatively straightforward, users struggle to estimate the value of insights, especially when dealing with previously unavailable data.
In one pivotal project, we faced this challenge head-on. We were developing a data model to provide deeper insights into logistics contracts. During the scoping phase, we struggled to quantify the potential benefits. It wasn’t until we uncovered a recent incident that we found our answer.
A few months earlier, the client had discovered they were overpaying for a specific pipeline. The contract’s structure, with different volumetric flows triggering varying rates, had led to suboptimal usage and excessive costs. By adjusting volume flows, they had managed to reduce unit costs significantly. This real-world example proved invaluable in our benefit quantification process.
We used this incident to demonstrate how our data model could have:
- Identified the issue earlier, potentially saving months of overpayment
- Provided ongoing monitoring to prevent similar issues in the future
- Offered insights for optimizing flow rates across all contracts
This concrete example not only helped us quantify the benefits but also elevated the project’s priority with senior management, securing the funding we needed. It was a crucial lesson in the power of using tangible, recent events to illustrate potential value.
However, not all projects have such clear-cut examples. In these cases, I’ve developed alternative approaches:
- Benchmarking: We compare departmental performance against other departments or competitors, identifying best-in-class performance and quantifying the value of reaching that level.
- Percentage Improvement: We estimate a conservative percentage improvement in overall departmental revenue or costs resulting from the model. Even a small percentage can translate to significant value in large organizations.
Regardless of the method, I’ve learned the importance of defining clear, measurable success criteria. We now always establish how benefits will be measured post-implementation. This practice not only facilitates easier reappraisal but also ensures accountability for the digital solution implementation decision.
Another valuable lesson came from an unexpected source. In several projects, we discovered “side customers” – departments or teams who could benefit from our data model but weren’t part of the original scope. In one case, a model designed for the logistics team proved invaluable for the finance department in budgeting and forecasting.
This experience taught me to cast a wider net when defining the customer base. We now routinely look beyond the requesting department during the scoping phase. This approach has often increased the overall project benefits and priority, sometimes turning a marginal project into a must-have initiative.
These experiences underscore a critical insight: in large organizations, multiple users across different areas often grapple with similar problems without realizing it. By identifying these synergies early, we can create more comprehensive, valuable solutions and build stronger cases for implementation.
The risk of scope creep
While broadening the customer base enhances the model’s impact, it also increases the risk of scope creep. This occurs when a project tries to accommodate too many stakeholders, promising excessive or overly complex functionality, potentially compromising budget and timeline. The product owner and team must clearly understand their resources and realistic delivery capabilities within the agreed timeframe.
To mitigate this risk:
- Anticipate some design work during the scoping phase.
- Assess whether new requirements can be met with existing data sources or necessitate acquiring new ones.
- Set clear, realistic expectations with client management regarding scope and feasibility.
- Create a manual mockup of the final product during scoping to clarify data source requirements and give end-users a tangible preview of the outcome.
- Use actual data subsets in mockups rather than dummy data, as users relate better to familiar information.
The challenges related to PDF data
Several projects highlighted challenges in capturing PDF data. Users often requested details from third-party vendor invoices and statements not available in our financial systems. While accounting teams typically book summarized versions, users needed line item details for analytics.
Extracting data from PDFs requires establishing rules and logic for each data element, a substantial effort worthwhile only for multiple PDFs with similar structures. However, when dealing with documents from thousands of vendors with varying formats that may change over time, developing mapping rules becomes an immense task.
Before including PDF extraction in a project scope, I now require a thorough understanding of the documents involved and ensure the end-user organization fully grasps the associated challenges. This approach has often led to project scope redefinition, as the benefits may not justify the costs, and alternative means to achieve desired insights may exist.
Design phase and performance considerations
The design phase involves analyzing scoped elements, identifying data sources, assessing optimal data interface methods, defining curation and calculation steps, and documenting the overall data model. It also encompasses decisions on data model hosting, software applications for data transfer and visualization, security models, and data flow frequency. Key design requirements typically include data granularity, reliability, flexibility, accessibility, automation, and performance/speed.
Performance is crucial, as users expect near real-time responses. Slow models, regardless of their insights, often see limited use. Common performance improvement methods include materializing the final dataset to avoid cache-based calculations. Visualization tool choice also significantly impacts performance. Testing various tools during the design phase and timing each model step helps inform tool selection. Tool choice may influence design, as each tool has preferred data structures, though corporate strategy and cost considerations may ultimately drive the decision.
Future trends
Emerging trends are reshaping the data analytics landscape. Data preparation and analysis tools now allow non-developers to create data models using intuitive graphical interfaces with drag-and-drop functionality. Users can simulate and visualize each step, enabling on-the-fly troubleshooting. This democratization of data modeling extends the self-serve analytics trend, empowering users to build their own data models.
While limits exist on the complexity of end-user-created data products, and organizations may still prefer centrally administered corporate datasets for widely used data, these tools are expanding data modeling capabilities beyond IT professionals.
A personal experience illustrates this trend’s impact: During one project’s scoping phase, facing the potential loss of a developer, we pivoted from a SQL-programmed model to Alteryx. The product owner successfully created the data model with minimal IT support, enhancing both their technical skills and job satisfaction.
The socialization of complex analytical tool creation offers significant benefits. Companies should consider providing training programs to maximize the value of these applications. Additionally, AI assistants can suggest or debug code, further accelerating the adoption of these tools. This shift may transform every employee into a data professional, extracting maximum value from company data without extensive IT support.
Unlock data’s value
Data-driven decision-making is experiencing rapid growth across industries. To unlock data’s value, it must be transformed into structured, actionable information. Data analytics projects aim to consolidate data from various sources into a centralized, harmonized dataset ready for end-user consumption.
These projects encompass several phases – scoping, design, build, implementation, and sustainment – each with unique challenges and opportunities. The scoping phase is particularly critical, as decisions made here profoundly impact the entire project lifecycle.
The traditional model of relying on dedicated IT developers is evolving with the advent of user-friendly data preparation and analysis tools, complemented by AI assistants. This evolution lowers the barrier to building analytical models, enabling a broader range of end-users to participate in the process. Ultimately, this democratization of data analytics will further amplify its impact on corporate decision-making, driving innovation and efficiency across organizations.