What is a Security Data Fabric
This post was originally published on https://www.cloudquery.io/blog/what-is-a-security-data-fabric
Industry buzzwords are all around us. Sometimes, they are helpful, but sometimes, they cause more confusion, which, in this case, they do. So, I will try to break down what they mean and suggest alternative non-buzzwords for those terms (or at least as to how I currently understand them—it’s an opinionated piece, so feel free to disagree).
Just to give a bit more history on buzzwords, buzzwords are helpful sometimes, and sometimes they help describe a very big shift in technology, architecture, or way of operations sometimes they are helpful in the beginning and then stop being useful (see modern data stack - now it’s just a data stack) and sometimes they are also not useful from the start because they try to identify a trend that's happening but it either end up not being a trend or just being a sub-case of an already well known technology.
Enough with the introduction and disclaimers, let’s dive right into that.
What is a data fabric?
First, let’s break down what is a “data fabric” before going into “security data fabric”.
“data fabric” is also a pretty confusing buzzword, in my opinion, but I will try to break it down (on what I think it means). Data fabric is not a technology but rather a strategy of building your data strategy and stack in the company and collecting, storing, enriching and analyzing that data. This is it. So now you think, Oh, so why do we need a word for it? Well, because there are vendors that try to sell more integrated solutions rather than you buying different pieces and trying to differentiate themselves in the market. Not to say that what those vendors provide is not useful as you can definitely buy more integrated solutions, either one or multiple, and build your stack in many different ways depending on requirements and business goals. So now that we have a better understanding that it’s a combination of technologies, I’d like to suggest a different term for that and call it just a “data stack.” Just like in more mature markets, we have a “marketing stack” or “martech” (collection of tools and software that marketers use to improve, analyze, and perform their services, and it’s up to the marketing team to build their stack with integrated and non-integrated solutions).
What is a security data fabric?
Now that we have defined “data fabric,” what is a “security data fabric”? If “data fabric” is “data stack” then “security data fabric” is "security data stack“? Doesn’t make a lot of sense as you have one data stack for different types of data and use cases. So what is it then? Well, actually, I'm not sure myself, but I’ll try to explain. First, from the end, what does a “security data fabric” try to solve? The challenge is the rise of the number of security tools in an organization (solving different problems) and getting insights/analytics from data that is generated by those tools. If we look at the last sentence, we can actually take the goal or the challenge and describe the category in those words, which we think will make it much clearer then “security data fabric” is a way to run analytics on your security data from different security tools.
So now that we have eliminated the “security data fabric,” we can just call it “analytics for security data,” which both describe the goal and the “category.”
How to build one?
Now that we have cleared both terms, let's discuss how to build one (if you need one). So, assuming you already have some sort of data stack or even if you don’t, you will need at least one central location like a database, data warehouse, or data lake where you are going to run your analytics (can be PostgreSQL for smaller amount of data and BigQuery, ClickHouse, Snowflake, Databricks, and other data warehouses and data lakes for a larger amount of data).
Now you will need to get the data out of different security systems such as CSPMs/CNAPP, Vulnerability management, EDRs, MDMs, and any other stuff that you have in your security stack to that central location. Here, I’m going to plug CloudQuery, which part of what we do is an ELT (Extract-Load-Transform) for cloud and security tools, so you don’t need to write your own connectors (though you definitely can; it’s not a lot of fun and usually not the interesting business part like writing the analytics that you want to get ).
Once you have the data in your central queryable location, you can start running your analytics, creating query views, creating the necessary answers, and exposing those programmatically or via BI/Dashboarding tools.
Final Thoughts
Before committing to any new buzzwords, first, understand what you are trying to solve. If you are looking to run analytics on your security data and (have the bandwidth to create those analytics), then definitely give CloudQuery a go to remove the slog of writing connectors for security APIs so you can focus on your business use case!