How to Lead and Build Strategy for Data Platform
There’s a good article from the data startup Monte Carlo of how to build a data platform technically. It’s a good one, which I generally agree with my first-hand experience from past decade building data platforms myself.
Well, the Rome of data platform isn’t built in a day. The architecture is super nice but no one can get there single-handedly. It takes strategy and time to plan and make it play out. I want to share my experience and thoughts on how to lead and build a strategy of data platform that can actually make all the goods in that article to happen.
- Business Oriented (Business Driven) and Create Alignment
The first thing I have to call out is that, though that architecture is great and attractive, do not take it as your whole world by aiming for it in day one, and forget what’s right in front of eyes — the business. The platform can be internal or external facing, can be fully vendor-based and self-built, but the most critical thing is always align with your companies strategy and keep in mind that data platform exists to serve business.
It should emphasize on business priorities, and pride itself to empower and enable business. It should not be built in a silo that floats outside of core business — otherwise, no matter how good your data platform vision is, the strategy fails.
One example I experienced is when in a startup, I witnessed how its infra strategy failed due to infra takes precedence over business. It’s a general infra not data infra failure but still relevant. It’s a marketplace and business priority at the time is to bump sales/profits to be self-sustainable. The tragedy started with platform engineer team who were responsible for building micro-service foundation for product engineering somehow became obsessed with K8S. K8S was new and exciting at 2017, but unstable and no one actually knew how to run them. The platform engineering somehow shut their ears, isolated themselves and started to build K8S, while ignoring the whole company’s business priority. As you can imagine, the K8S effort delivered no value and failed to empower product engineers, the infra team left, the business goal was impacted, all because manager of that team failed to align their team’s priority and goals with the business.
So, put your business first. If your core business cannot grow or survive, forget about data platform.
2. Talent, Team and Hiring Plan
A technology platform can be as good only as the talents who build, manage, and operate it.
Once the business goal is determined, next is making sure we as a leader put a solid team in place. Hire leads first in key areas, and let them build the team.
3 pitfalls to watch out for:
a) It can be a big mistake to think there’s no need for a platform team if you are using vendor solution. https://www.safegraph.com/blog/scaling-data-as-a-service-daas-with-platform-engineering this is a good example of why.
b) Hiring junior engineer to balance the team usually should happen at later stage with key seniors are already in place. Try avoid doing it reversely, otherwise you’ll burn yourself out to hand hold them. I’ve heard excuses from startup founders, like “oh, our startup is just series A, it’s so hard to hire senior people so we can just attract and hire junior folks”. That is BS. That attitude can lead to the same result wherever they are, small or big companies. The right way to think of it, is that you haven’t hired the key leader who can hire more senior engineers. Once such a key leader come in, hiring should become much easier. BTW, if you had a hard time finding such a leader, something may be wrong — e.g. uncompetitive comp, or maybe the startup’s business is just not good enough to attract people (99% startups fail anyway so be brave and face it :)
c) Act early. Do not wait till you really need such someone then start to hire them. Start early as it can take months to source and hire the proper people.
3. BI, AI/ML, etc- Deliver the Last Mile of Data Value End-to-end
Building a platform is not end of the story, data is only valuable when turning into certain form of product, e.g. BI and dashboards for visualization, AI/ML algorithms for recommendations and forecasting, events (alerts, messages) in the case of streaming.
More often than not, leaders of the platform should take the lead in filling the last miles, rather than waiting for product engineering, because:
a) Your data users may not have the motivation to do so. What data users want is straightforward value, not the plumbings, e.g. they usually do not want to figure out what BI tools to use, how to connect their dashboard to data sources/warehouses, how to set caching or refreshing strategy. They’d expect data platform has already done so for them, thus your solution either works or not work for them at all.
In this case, the data platform team can build a few best practices, e.g. 1) set clear responsibility boundaries and expectations upfront 2) continuously educating users 3) have good documentations and guidances 4) pre-config params or other things to the best general values or situations for the users
b) Your data users may not even know what they can do with the data you provided. Thus data platform has to step forward and tell the users what magic they can create, e.g. “you can use dataset A in warehouse B to analyze X”, “use dataset B and infra C to build ML model for use case Y”, “join data D and E to get Z”, etc.
4. Looking Ahead, not Following Behind
A solid strategy should account for potential growth in 3–5 years internally, and keep close eye on industry trend externally.
- If there’s only data warehouse and structured data now, what about data lake and un/semi-structured data?
- If there’s only offline data processing now, what about streaming and stream processing to reduce data latency?
- If there’s only small traffic, what about the volume in next 1–3 years and 3–5 years
- Does it make sense to bring vendor solutions in-house for flexibility, cost, or other reasons?
- If entering new markets, what about the data privacy and compliance laws there?
- How to make all data easily discoverable and leveraged to generate value?
- Where is the industry going, and what are the new technologies? ……
Looking ahead is hard, and any preparation certainly cannot be comprehensive enough to cover all details. But the key is to have this mentality and habits of thinking of these aspects, so we are better prepared when things come.
5. Technicals— Scalability, Composability, etc
They are discussed in this post.
In short, we can summarize the aspects to be — figure out
- what you should do it for? and why?
- who can help you do it?
- who you should empower and collaborate with? and how?
- what’s potentially next?
- what you need to do now? and potentially next?