On this episode of data driven Frank and Andy interview Lauren
Speaker:Mafayo author of Designing Data Governance from the Ground
Speaker:Up Data governance has become more pressing of late,
Speaker:what with all the advancements in generative AI systems.
Speaker:Tune in for a fascinating look at data governances, civic
Speaker:technology, and more.
Speaker:You. Hello, and welcome to Data Driven
Speaker:Podcast. We cover the emergent fields of data science,
Speaker:AI, and machine learning. Today,
Speaker:I'm here with Andy. My voice is a little crackly because of a
Speaker:sinus infection, but it's all
Speaker:good. I've gotten on the meds and I am definitely feeling like
Speaker:I'm on the mend. How are you doing, Andy? I'm well,
Speaker:Frank. And I just heard how you were doing. Actually, I knew a little bit
Speaker:about it because you texted me when you were in the throes of it, and
Speaker:I knew something was up because usually you communicate
Speaker:more. I was like, Frank's down for the weekend. And
Speaker:I know you've been having very busy weekends the past
Speaker:little bit for something that people will know more about
Speaker:later, right? Much later, probably. But it's all
Speaker:good. It is all good so far. It's ended well. So for
Speaker:folks that we're going to release this episode, we're recording this on
Speaker:July 17, we're going to release this probably on July
Speaker:18. And you'll hear me refer to a legal
Speaker:case. It looks like that will be resolved this
Speaker:week, hopefully in one form or the other, and it's gone our way. That's all
Speaker:I can say right now. But it is good news. Speaking of
Speaker:good news, we have with us an excellent guest who's
Speaker:based in the DC area. So not that far from Chateau
Speaker:Lavinia. It is Lauren.
Speaker:Sorry, she will correct me, but she's a published
Speaker:author. Her book just came out talking about designing data
Speaker:governance, which is a topic that just more and more
Speaker:keeps coming up. And I think that if you're a data engineer and you think
Speaker:I don't have to worry about that hold up. Maybe you should need to worry
Speaker:about that. Even data scientists? Especially data scientists, I would
Speaker:say, and doubly so if you're in the
Speaker:generative AI space. I think we'll see what we get into that.
Speaker:And she has a very interesting background, so I'll let her explain
Speaker:it. Welcome to the show, Lauren. Thank you guys, for having me. I'm really
Speaker:excited to be here and to chat with you all. Yeah, likewise,
Speaker:likewise. So your background is
Speaker:amazing. You studied overseas at Cambridge,
Speaker:I think. At LSE
Speaker:and the London School of. Economics, which is like, wow,
Speaker:I half expected you to have a British accent, honestly, because I wasn't
Speaker:sure. And you also have spent
Speaker:some time doing arts and design, so
Speaker:I found that fascinating too. I actually
Speaker:am a service designer in my day job, and so I work very
Speaker:closely with data scientists and engineers to
Speaker:design things like pipelines, cloud architecture,
Speaker:environments, different service models for
Speaker:chief Data Officers. And so I always say as a service
Speaker:designer that I'm the user advocate on a project. I'm the person
Speaker:who is tasked with helping the client define who their key
Speaker:user groups are. And once I do that, I conduct user
Speaker:interviews with people who fit those demographics to figure out what
Speaker:they like or dislike about a product or service. I capture
Speaker:the results of those interviews and design assets like personas and journey
Speaker:maps. And then ultimately I do work with people like
Speaker:you, data architects, engineers, scientists
Speaker:to build a product that will hopefully solve the pain points that
Speaker:we uncovered in the user research. Fascinating.
Speaker:And you were in the Civic Tech space if memory serves as well, which
Speaker:is a fascinating space that once upon a time
Speaker:I was on the Microsoft Civic Tech team. Yes, I am. So
Speaker:I work for an organization called Steampunk and we're a human centered design
Speaker:firm that builds solutions for federal government
Speaker:agencies because as we all know, the federal government is
Speaker:the most progressive when it comes to tech and so they
Speaker:barely need us at all. But the reality actually is that they
Speaker:need us quite a bit and that we very often come in and
Speaker:have that human centered approach that many of their tools
Speaker:were just not built with. And so then we come in and often
Speaker:try to improve them and improve the user experience.
Speaker:And user experience in that context is really about
Speaker:getting the right services to the American public, which I
Speaker:think is what makes the work so interesting. It's not commercial products, it's
Speaker:things like improving unemployment benefits and how
Speaker:easily it is for people to, how easy it is for people to access them,
Speaker:improving the ease with which you can send folks overseas in official
Speaker:roles, defining the service offerings
Speaker:that a Chief Data officer is going to provide its
Speaker:colleagues. And so the problems that you solve in Civic Tech I think
Speaker:are really fascinating. And I think COVID was the
Speaker:final confirmation that all of these systems are long
Speaker:overdue for major upgrades which we are seeing
Speaker:the influx of now. Yeah, you don't have kind of good
Speaker:user design or good user experience as part of the RFP
Speaker:that went out for building these large federal systems. That made was
Speaker:probably not a bullet point on the list, not at
Speaker:worse. So for those not familiar with Civic
Speaker:Tech, how would you define it? I would define
Speaker:Civic Tech as technology which exists to serve
Speaker:the public. And the public is very broad. I would define the
Speaker:public further by saying it's citizens of any
Speaker:country or area where
Speaker:the tech exists. And so for instance, Civic Tech
Speaker:encompasses the tech in a town
Speaker:that my hometown, for instance, NATIC, Massachusetts might use to
Speaker:serve residents of NATIC. So this could be anything from
Speaker:tech that allows people to pay their bills online
Speaker:to applying for benefits. And then likewise I
Speaker:work as a designer in the federal space. And so I work with US
Speaker:federal agencies to improve the
Speaker:way that they deliver services to the American public. And the
Speaker:public in this case, is any American who needs to use
Speaker:those services. But then we get more granular about who those
Speaker:particular user groups are. So, for instance, I have worked on
Speaker:many projects in the past with the Department of Agriculture, and within
Speaker:the Department of Agriculture there are many different
Speaker:subdivisions that serve different user groups. And
Speaker:so then I will work with my client to define what those user
Speaker:groups are and figure out how we can tailor a user
Speaker:experience and a product to meet those unique needs. But I would
Speaker:broadly define civic tech as any technology which
Speaker:serves the public. And the public can then be further
Speaker:defined into groups based on things like geography, but
Speaker:also things like role, the day to day experience,
Speaker:things like that. That's a good definition because it
Speaker:used to be very nebulous in terms of what it meant and the implications
Speaker:thereof. But I like your definition. It's probably the most cogent
Speaker:I've heard to date of the field. Thank you.
Speaker:Now this explains so how did you get into data governance, right? Because
Speaker:this is something well, let's start before we do that. How would you
Speaker:define data governance? I love the fact that you
Speaker:start the conversation by asking me to define it, because I think like
Speaker:many terms in tech, it is often left undefined. And that's
Speaker:why there's not only a lot of confusion about it, but also a lot of
Speaker:resistance to it. I think people have in their heads that governance is
Speaker:purely compliance and that it is a blocker
Speaker:to innovation and to tinkering. Other people think
Speaker:that it is something that you can quote unquote, ship after
Speaker:deployment. And I have had C suite leaders say as much. They've
Speaker:said things like, we'll do data governance later, or
Speaker:we will deliver it in the next contract after
Speaker:production. And that refrain is still unfortunately
Speaker:common. So I define data governance as the strategy you
Speaker:have to encompass the people, processes and
Speaker:tools that help you manage your data at scale. And I often
Speaker:say manage your big data at scale. Big data, as we
Speaker:know, is another buzzword that often means both everything
Speaker:and nothing. But I use big data in this context because the
Speaker:reality is that most organizations have more data that
Speaker:they both ingest and produce than ever before.
Speaker:It is too big for one person
Speaker:or one team to manage on their own. And that's why you do need this
Speaker:holistic data governance strategy that is really
Speaker:a business strategy before a technical
Speaker:strategy. Your data governance should never be divorced from what you're
Speaker:doing in development and production environments. It should be
Speaker:integrated into those environments. But at the same time,
Speaker:I think people make a mistake when they think of data governance not just
Speaker:as pure compliance, but also purely as a technical problem to
Speaker:solve. Because the more complicated reality is that it's a
Speaker:cultural transformation that your organization needs
Speaker:to be invested in from the top down. And that's really how you
Speaker:gain success from data governance. Now, that's a good way to put it.
Speaker:And that's why I wanted to define it, because it doesn't have a very firm
Speaker:definition, right. My definition, that my operating
Speaker:definition is pretty close to yours. I'll say it's really because
Speaker:in my day job at Red Hat is like they ask, well,
Speaker:what does your product do for data governance? And I kind of laugh and say,
Speaker:well, not really much, because
Speaker:data governance is largely around,
Speaker:yes, it's people, processes and technology. But 80% of that is
Speaker:nothing is not technology. Right.
Speaker:And you need a vehicle to make it happen in
Speaker:the technology space. But the people in process part,
Speaker:those are going to be the hard ones. Absolutely. And that's why
Speaker:it is so tricky. I think it's also why
Speaker:relatively few organizations have made a lot of headway. And that's also
Speaker:why I think it's really important to frame data governance as a
Speaker:cultural transformation that you can design and embed
Speaker:into your business strategy. You really cannot
Speaker:separate the two. I think a lot of people have been saying that
Speaker:for quite some time now, but we're really seeing the
Speaker:results of that and rather the results of not
Speaker:doing that now we are in a pseudo
Speaker:recession, if not an actual recession. Tech organizations have certainly been
Speaker:acting like there's a recession with both layoffs of
Speaker:employees, but also in their buying behaviors
Speaker:and in not buying as many cloud tools and
Speaker:pieces of software that they used to. And so it's more important than ever
Speaker:that whatever technology you're investing in is
Speaker:producing tangible outputs for your organization. And so
Speaker:we're seeing the consequence of trying to divorce data
Speaker:governance from your business strategy. It's just no longer
Speaker:an option to separate the two. No, I totally agree.
Speaker:And Andy looks like he has a question, but I want to get this out
Speaker:there. I think part of it is that a lot of organizations, and I mean
Speaker:legacy organizations probably, I would say federal, it would definitely fall on this,
Speaker:is that it's only been in the recent years,
Speaker:maybe decade, that we've thought of data as an asset
Speaker:as opposed to a byproduct of some other process.
Speaker:And maybe that's it now it's
Speaker:something of value. And as with anything of value, you probably should
Speaker:have processes not guards around it, but gatekeepers or gates
Speaker:around it just to make sure it's not wasted, it's not
Speaker:contaminated, that sort of thing. That's where my head is at.
Speaker:I agree with that. I think data as an actual
Speaker:tangible asset is a relatively new concept, certainly
Speaker:within the last decade. And I think what's also new about it
Speaker:is the pure volume of data that exists in the world
Speaker:today, there is more data produced and
Speaker:ingested than ever before, and that number is
Speaker:certainly not going to go down. When you think about all of the Internet connected
Speaker:devices that exist, when you think about the explosion of remote work and the
Speaker:fact that now employees are doing work for their
Speaker:organizations on private devices, which means that you can be
Speaker:having organizational data that exists in several locations,
Speaker:which is a very tangible reality. And then I
Speaker:think that lends itself to the broader conversation
Speaker:that I see happening in data circles now about managing data more
Speaker:as a product and less as a service, which is an approach
Speaker:that I largely support because a big part of what you need to
Speaker:do to be successful at data governance is
Speaker:defining clear data domains and subdomains within
Speaker:your organization. These are the key areas that your
Speaker:organization collects data on, and then it gives you a way of
Speaker:categorizing them more clearly, rolling them up to
Speaker:specific owners. These would be equivalent to your product managers if we're
Speaker:using the product analogy. So there's a lot being done to
Speaker:reframe big data in this way as an
Speaker:asset that you manage like a product. And I think there's a lot of
Speaker:value to that, rather than the top down data
Speaker:as a service model that begins and ends with it
Speaker:and begins and ends with people who really lack the
Speaker:context to make those decisions about data and
Speaker:its quality across domains, I. Think that's really
Speaker:important. Lauren and what would you say
Speaker:to an enterprise or just maybe a small
Speaker:to medium sized company that says, yeah, we
Speaker:understand all of that and they kind of give mental assent to
Speaker:it, but they think about their culture and the way they've always done
Speaker:things and they can't bridge that
Speaker:gap? That's a great question because I
Speaker:think that is realistically. Where the biggest blockers
Speaker:occur, people are messy, they're
Speaker:intangible, they all have different motivations, even if
Speaker:they work for the same organization, they not only have different roles,
Speaker:but they have different end goals. Very often you have people
Speaker:in organizations who do not want change, they
Speaker:want things to say the same, they have a vested interest in it, even
Speaker:if that is arguably not what is best for the organization in
Speaker:the long run. You will have people who are invested
Speaker:in not changing the status quo, especially as it pertains
Speaker:to data. I think a lot of that comes down to the fact that data
Speaker:governance has not been practiced to the degree that it should
Speaker:have. And so when people look at how much data they
Speaker:have in an organization and then they think about not only the work it would
Speaker:take to create data governance standards from scratch, but then to
Speaker:retroactively apply those standards to the data they have, it gets
Speaker:very overwhelming very quickly. And so what I would say to someone who is on
Speaker:the fence about implementing data governance is
Speaker:to start small. To start by
Speaker:looking at the key data domains in your organization.
Speaker:So these are the areas like sales Data, marketing data,
Speaker:customer success data, where your organization is
Speaker:producing and or ingesting data about
Speaker:from a high level. I would also tell them to start
Speaker:small by not only defining those key data domains and
Speaker:respective subdomains. For instance, you could have a data domain on
Speaker:sales data and then two subdomains could be inbound and outbound
Speaker:leads and those are two subdomains you can collect data on. But
Speaker:then you also want to apply that data to a particular
Speaker:project that is contained and that has been
Speaker:already greenlit by the sea level leadership
Speaker:as having high value to the organization. I think
Speaker:that does two things. It helps you contain
Speaker:your efforts so that you are not reinventing the wheel
Speaker:across all areas of the organization, and it also
Speaker:ensures that you are working on something that senior
Speaker:leadership really cares about that is also essential. I talk in the
Speaker:book about finding the right sponsor for your data
Speaker:governance efforts, and that really is crucial because like any big
Speaker:transformation, it has to be a top down effort. If you're the Chief
Speaker:Data officer and your C suite, your chief
Speaker:executive officer is not on board with data governance,
Speaker:you can make some progress. Because, again, if you're a senior data leader,
Speaker:your entire job is to strategically manage data as an
Speaker:asset. And so you can make some progress. But without that high
Speaker:level buy in and without connecting your efforts back to the
Speaker:business, you're really going to stall. So I would say start
Speaker:small. Look for a strategic project where data governance
Speaker:can add value, and then do everything you possibly can to
Speaker:connect your governance efforts back to that business goal.
Speaker:So it sounds like someone should write a book about doing
Speaker:data governance from scratch or something like that. That
Speaker:would be a nice idea. It would have helped me on some of my early
Speaker:projects, which is why I wrote the book that's well, I. Was
Speaker:going to lead into that. And you mentioned the book in your answer, and
Speaker:Lauren has written a book for those who are listening, and it's
Speaker:called Designing Data Governance from the Ground
Speaker:Up. And I just picked
Speaker:up the ebook. We were looking at your
Speaker:bio before the show. Frank and I connect about five or six
Speaker:minutes before the show, and I said, that sounds
Speaker:like something I need to dig into. So I picked it up, I'll read
Speaker:it. I've got a little bit of vacation coming up here starting at
Speaker:the end of the month, so maybe I'll get to it then. I'm looking
Speaker:forward. Hopefully you'll read it on the plane there or
Speaker:back. Because I always joke that if someone's reading my book on a beach somewhere,
Speaker:something's gone wrong, because this is not exactly a light hearted beach
Speaker:read. And I always joke with people
Speaker:because when I encounter resistance to the concept of data governance, I
Speaker:joke with them, well, you might not want to read my book, but you're going
Speaker:to have to read the book at some point. So hopefully it will be helpful
Speaker:when you do. I look forward to it. And as we were talking
Speaker:a little in the virtual green room about this,
Speaker:and I said, I'm basically a data
Speaker:engineer. I came into data
Speaker:from software and I made the leap about
Speaker:probably 20 to 25 years ago when
Speaker:a lot of I would call it process
Speaker:control, because before I did software, I was in manufacturing.
Speaker:So it had a lot of the same types of
Speaker:thinking around engineering and process control.
Speaker:And even back then, some of the buzzwords that sound
Speaker:new in software are new ish we were doing in
Speaker:the 90s in manufacturing stuff like Kanban and Six
Speaker:Sigma and those sorts of metrics collection.
Speaker:And I was very fortunate to be trained by
Speaker:someone who was trained by W. Edwards Deming
Speaker:himself on that information. So very
Speaker:fresh, probably some insights that I'll never
Speaker:share, but just interesting to
Speaker:get. Definitely a true believer and someone who came at it with an open
Speaker:mind and really understood it, but
Speaker:these sorts of things that have grown out of that, and I see this as
Speaker:growing out of the data governance is one of the things that grew out of
Speaker:a combination of compliance and quality. Would you agree
Speaker:with that or would you correct me? No, I do agree with
Speaker:that. I think that actually hits the nail on the head. We
Speaker:have let data grow
Speaker:unchecked, broadly speaking, and
Speaker:that is because we just didn't know, as an industry
Speaker:and society how to manage it. You're exactly right that there are people who have
Speaker:been data architects, engineers, scientists for decades, and
Speaker:they've been doing this work for a very long time outside of
Speaker:the public view. But what's different about the work today is
Speaker:the volume of data that is produced by consumer products
Speaker:and the amount of sensitive data that is effectively
Speaker:floating out in the world today through various
Speaker:cloud systems and various products that are used. And
Speaker:to that end, we're now in the earliest stages of
Speaker:figuring out how to manage that from legislative standpoints, both
Speaker:in the US. And abroad. GDPR legislation in
Speaker:Europe comes to mind. That's fairly recent legislation that gives EU
Speaker:citizens a lot more personal rights over their personal
Speaker:data and what organizations can do in terms of profiting from
Speaker:that data. We do not have the equivalent of federal legislation
Speaker:here in the US. But I do see that changing over the next
Speaker:five to ten years. And I think what you also said about
Speaker:quality really rings true. That's a huge issue because
Speaker:we as an industry really lack consistent,
Speaker:clear standards which define what data quality
Speaker:is and how we should be measuring it. And that's a big difference.
Speaker:If you look at fields like medicine law areas
Speaker:that have very high impact on the
Speaker:public, they have pretty clear governing bodies and
Speaker:standards for how doctors and lawyers should do their
Speaker:work. We have things like IEEE, we have
Speaker:the association for Computing Machinery, we certainly have membership
Speaker:organizations where people can get together and discuss these things
Speaker:and debate these issues. But we really lack a
Speaker:clear framework for data quality and
Speaker:compliance, which I think is very long overdue. So
Speaker:I do see that as being the double pronged issue today. And I'm
Speaker:also curious what your take is, as someone who's been doing this work for
Speaker:decades. How have you seen data governance evolve
Speaker:from the 90s through to the present day?
Speaker:Well, it's interesting
Speaker:as I've made the transition from being an employee to
Speaker:being a consultant, which happened around 2005,
Speaker:2006, I definitely saw some difference there.
Speaker:But as an employee at one place, and actually I was a
Speaker:contractor there too, attempt
Speaker:they worked with medical devices. And so there
Speaker:I saw a strict compliance, but it almost fed down
Speaker:from the culture. You mentioned culture earlier as being very important.
Speaker:I totally agree. But it was almost an
Speaker:accidental culture shift that came from the medical
Speaker:device part, the medical part of the medical device field
Speaker:into all aspects of software and
Speaker:data. And it was really interesting to see how
Speaker:that sort of thinking led to
Speaker:almost a practice of data governance. And we weren't even
Speaker:calling it calling it data governance back then, right? We were
Speaker:just considering it software and data. That was
Speaker:all. I fell under that umbrella. And having that experience
Speaker:there was very eye opening and going from there to more of a startup
Speaker:culture, which not picking on startups. There's
Speaker:a priority difference, though, between that and somebody
Speaker:in kind of a more stayed and stable
Speaker:environment. And I'm not picking again, I'm not calling
Speaker:startups unstable. There's a lot of
Speaker:benefits to startups and a lot of
Speaker:innovative cultures, and some of that wasn't
Speaker:present in the more medical device environment. Some of the benefits
Speaker:of that kind of drive and ambition and go, go
Speaker:and get things done. But it's very easy to overlook. And I saw
Speaker:it, I saw important aspects of
Speaker:what we now call data governance and really just good
Speaker:engineering practices. Some of that was overlooked, some of it was
Speaker:deprioritized for what I consider
Speaker:to be mostly legitimate business concerns in a startup
Speaker:world. I would agree with that. I think when you
Speaker:consider startups and the landscape they're in, they
Speaker:have to innovate and be different or else they will not
Speaker:survive in the marketplace. And so their priority really is to
Speaker:move fast and figure it out later. I gave a talk
Speaker:at Data Architecture Online last week and the
Speaker:keynote moderator made a joke about how
Speaker:developers are often like, don't bother me with requirements on
Speaker:coding, meaning they're tinkering and they'll figure it out
Speaker:later. And we've really taken that approach with data
Speaker:and that it's a really tricky balance
Speaker:to balance those standards and the creation of those standards
Speaker:with the need to innovate and stay
Speaker:in business. And that's really what startups are focused
Speaker:on. And then on the flip side, you have these
Speaker:large, highly regulated, highly bureaucratic industries
Speaker:like government, healthcare, medicine,
Speaker:law, which are highly regulated, and they have
Speaker:to exist to be stable and to provide
Speaker:services in a way that their users can rely
Speaker:on. And so innovating, not only is it not the
Speaker:priority in those environments very often, it's also
Speaker:an inherent risk because people in those environments are not
Speaker:really rewarded for doing something in a new way,
Speaker:but they will be very highly penalized if something goes wrong.
Speaker:I think you talked and touched on motivation earlier,
Speaker:and you really have to examine the motivations of whomever
Speaker:you're working with and consider the context. The book that I wrote is
Speaker:a 100 page six step guide to designing your
Speaker:first data governance program from scratch. And it is short
Speaker:enough because there is a lot of nuance when it comes to data governance.
Speaker:When you implement a data governance program for 100,000
Speaker:person multinational firm, that is going to look very different than doing
Speaker:it for a 25 person startup. But the
Speaker:key aspects of governance are the same,
Speaker:I argue, across those nuances. And so that's why the book
Speaker:is short in the first instance, because it's meant to be the first
Speaker:prelude to whatever gets more specific about
Speaker:how to do data governance in your own environment. And that context per
Speaker:environment is really crucial. No, I mean, that's a
Speaker:good point. Data governance, it's come up more and more in my
Speaker:day job as well, because it becomes and it's
Speaker:also interesting. And as the world's imagination is
Speaker:captured by generative AI,
Speaker:I think it's important to realize the generative
Speaker:AI. Well, first off, there's a lot of legal
Speaker:questions that remain unresolved, right? Like, if I tell it
Speaker:to produce a novel in the style of a particular author,
Speaker:andy's laughing because we've been doing some experiments with
Speaker:that. I was muted, but I was laughing. You were
Speaker:laughing. Yeah, more on that later. But no, I mean,
Speaker:what does that mean? If you produce an image in the style of a particular
Speaker:artist, obviously, that is
Speaker:but I think the legislative hammer is coming down on that.
Speaker:And my opinion is it's probably best to start with governance
Speaker:today to save you what a stitch in time will save nine
Speaker:legal bills later. Like something like that.
Speaker:Do you think that generative AI is really going to
Speaker:make the data governance cool, for lack of a better
Speaker:term? That's a really interesting question. I think it is absolutely going to make
Speaker:data governance essential. And I was speaking to somebody on
Speaker:a separate podcast this month about this very issue
Speaker:because you mentioned writing a book in the style of a particular
Speaker:author giving generative AI the prompt
Speaker:to write a novella in the style
Speaker:of cormac McCarthy, for example. In that case, you
Speaker:are maybe not
Speaker:copying or plagiarizing cormac McCarthy's work directly,
Speaker:or maybe you are. It really depends on whether the generative
Speaker:AI can actually understand what you mean, and it can understand
Speaker:cormac McCarthy's style of writing enough to
Speaker:produce a novella in his
Speaker:likeness, if you will. Likeness is a very interesting
Speaker:concept, I think, these days. And you're right, it is incredibly
Speaker:murky from the legal standpoint. And I was speaking on a
Speaker:podcast recently about this in the sense of
Speaker:where when we look at the legal landscape of generative AI, where
Speaker:is there going to be progress? And rather
Speaker:than making progress on the consumer data
Speaker:privacy and consumer rights aspect of the issue,
Speaker:I actually think that we're going to see more progress
Speaker:made and more cases brought to court on the grounds of
Speaker:copyright infringement. If you look at things like
Speaker:using a music in a movie or
Speaker:using images that a corporation owns in a book,
Speaker:I just went through this with my own book. I wanted to use
Speaker:commercial software to make a few diagrams
Speaker:and use templates to do it. And my editor
Speaker:said, are those templates that are pre built into the software? I
Speaker:said, yes. And he said, you either have to get permission
Speaker:legally from their legal department to use those in the book, or you have to
Speaker:create some from scratch and make them yourself. So I chose
Speaker:the latter because it was the path of least resistance. And I think
Speaker:when we consider generative AI and what that means for
Speaker:data, we in the United States are going to see more
Speaker:progress on the grounds of copyright
Speaker:infringement than we are on data privacy and consumer
Speaker:rights in the short term. Now, having said that, I think humans are
Speaker:inherently reactive. And I do foresee
Speaker:in the future, within the next five years, certainly there's going to
Speaker:be a data breach to such a degree
Speaker:that there is going to be enough groundswell for
Speaker:organizations to really get serious about protecting
Speaker:consumer rights and as it pertains to data.
Speaker:The other model you can look at is
Speaker:what's happened in cybersecurity three to five years ago. There were very
Speaker:few conversations happening about being proactive when it comes to
Speaker:cybersecurity. And in recent years, we've seen a
Speaker:large increase in breaches, not just within
Speaker:software companies, not just within organizations, but even
Speaker:breaches of oil and gas pipelines,
Speaker:things like that. And so just like with data governance
Speaker:no longer being a nice to have, it never was to begin with, but now
Speaker:it really is something that you need. Likewise, we're
Speaker:seeing tech teams really prioritize cyber,
Speaker:not just in their pipelines, not just on the technical side, but
Speaker:also creating a more cyber literate workforce. And. I think there's actually
Speaker:a lot that data practitioners can learn from their
Speaker:counterparts in Sizzos to drive the needle on that
Speaker:front. No, that's a good point. I think connecting those dots
Speaker:are important because
Speaker:when the C suite realizes that this isn't a game anymore,
Speaker:when the SCADA drivers got hacked,
Speaker:or when the Colonial pipeline incident happened,
Speaker:I think that realized in obviously a number of ransomware
Speaker:attacks. I think security became very serious, like, oh, wait a
Speaker:minute, this could affect us and it's not
Speaker:optional anymore, or nice to have. Right. And I think data governance
Speaker:is going to follow that same thing. I think
Speaker:that's an interesting take that you have, is that up till now, the only
Speaker:driver in this space has effectively been privacy legislation,
Speaker:right. GDPR probably being the poster child for
Speaker:that. But I can easily see
Speaker:fear of being involved in some massive
Speaker:copyright lawsuit would probably like, I know there's some
Speaker:controversy about how GPT was trained, right? Like he was trained on Twitter
Speaker:data and then Elon Musk said, wait a minute, did you get anyone's approval for
Speaker:that? On that
Speaker:note, I would also encourage people because every now and then I have
Speaker:the strong urge when I am transcribing,
Speaker:for instance, user interviews, to use a tool like chat GPT. It would be
Speaker:incredible if I could feed that video content into
Speaker:a system to spit out an accurate transcript.
Speaker:And that is absolutely not an option for the
Speaker:role that I'm in, for the industry I'm in. I cannot give that proprietary
Speaker:information to anyone outside of my organization. And if
Speaker:I did, the consequences would be things that I don't even
Speaker:really want to think about because I am beholden
Speaker:to keeping that information private. And what
Speaker:that calls to mind is the Samsung incident.
Speaker:Pretty early on in Chat GPT where folks fed
Speaker:proprietary Samsung data to chat GPT.
Speaker:OpenAI owns that now. Again,
Speaker:we as a society, we as an industry don't
Speaker:have the full context or real
Speaker:comprehension of what that actually means, what ownership really means.
Speaker:But on a very practical level, it does mean that highly
Speaker:sensitive commercial data is now with the hands
Speaker:of this very large nonprofit to be used
Speaker:in very different contexts in very different ways.
Speaker:And the consequences of that are really going to be felt
Speaker:and continue to be felt, I think, over the next several years.
Speaker:That's interesting. I was just going to say it's almost
Speaker:like the I'm not sure how accurate
Speaker:it is, but knowing the source I heard it from, it's probably
Speaker:likely that a
Speaker:game manufacturer received
Speaker:proprietary information from a defense contractor
Speaker:in the US. I don't want to get too specific.
Speaker:It sounds like something is hitting the fan and it's not
Speaker:parmesan cheese. Well, it
Speaker:was an argument. The bit that I will share is it was an argument
Speaker:about someone had made a guess about what the
Speaker:interior of some piece of equipment looked like and someone said, no,
Speaker:it looks like this. And they actually
Speaker:supplied documents to prove that. And that wasn't
Speaker:good. Wow. Yeah, that was pretty wild. It was like
Speaker:all on discord server too. Exactly. Which was
Speaker:notoriously secure.
Speaker:So many wrong things about that, yet that happened. It's
Speaker:off the charts. But I mean, it's a good example of good
Speaker:intentions going horribly wrong. And you think that's
Speaker:a thing in data governance as well, like a risk?
Speaker:Absolutely. And when I talk about bias in AI, which is
Speaker:one, I don't believe, again, that data governance is separate
Speaker:from bias mitigation in the training
Speaker:process. I think data governance is a form of
Speaker:risk reduction and bias
Speaker:troubleshooting. And I do think that the
Speaker:overarching issue here is that we
Speaker:really need to think of this as an integrated problem
Speaker:that is one with the business. But I also think
Speaker:that people it's a misnomer to
Speaker:say, of course hackers have nefarious intent in many
Speaker:cases. Of course, there are always going to be people that want to manipulate
Speaker:data, that want to use it to cause harm.
Speaker:There's no doubt about that. But the vast majority of times when we
Speaker:see the biased outputs of algorithms or we
Speaker:see data governance gone wrong, no one was trying to
Speaker:harm someone. There was no negative
Speaker:intent. There are many complicated technical reasons why an
Speaker:algorithm can produce biased outputs towards one user group over
Speaker:another. And this is kind of where when people say, assume positive
Speaker:intent, I think that only goes so far because I
Speaker:don't believe that most developers or data scientists are
Speaker:trying to or executives are trying to harm people by
Speaker:a long shot. They're really doing the best that they can. But if the end
Speaker:result is still that people's
Speaker:rights are being abused, that
Speaker:resumes are getting screened out automatically instead of being
Speaker:given the proper consideration,
Speaker:if those negative results are still occurring, the intent,
Speaker:how much does it matter? But I do think that's an important
Speaker:distinction. Rather than painting the
Speaker:industry overall as a group of
Speaker:bad people with ill intent, I just don't think that's accurate, and I think there's
Speaker:a lot more nuance to it. It's also important, I think, to show
Speaker:that while these challenges are part of the job,
Speaker:they're inherent in the work of doing data today.
Speaker:Whether you're an engineer, a scientist, a governance
Speaker:person, this is part of the job. And so to that
Speaker:degree, it's somewhat inevitable, but it's not
Speaker:unsolvable. There are tactics that you can use to
Speaker:improve your work in this space, and so I don't want it
Speaker:to be a doom and gloom scenario. There are things that we can do
Speaker:as practitioners to avoid a lot of the consequences
Speaker:we're talking about, and there
Speaker:are a lot of blueprints out there for how to do this. Like I mentioned,
Speaker:cybersecurity is doing a lot to
Speaker:educate workforces on how to spot phishing attacks.
Speaker:Things like that if you look at it, governance
Speaker:from a stewardship perspective and a governance council
Speaker:perspective, if you've ever certified on a nonprofit board, nonprofits
Speaker:are actually surprisingly advanced when it comes to
Speaker:things like data governance. When I was writing the book, I found
Speaker:many universities washington University in St. Louis
Speaker:comes to mind that have full websites devoted to their
Speaker:data governance charter, who serves on the governance
Speaker:council, what they manage on it. And I'm sure those
Speaker:people would tell you that their governance council is far from perfect,
Speaker:but they're doing the work, they're holding themselves accountable,
Speaker:and they've set up the structure to succeed. So
Speaker:nonprofits and the cyberspace are both two
Speaker:really strong models to look towards when we're thinking about
Speaker:what the future of data governance looks like.
Speaker:No, that's a good way to look at it. It's an evolving
Speaker:field, and it's
Speaker:interesting how it's finally coming up, and it's becoming more and more
Speaker:prevalent, at least in the conversations I have. And
Speaker:that's encouraging to hear, because like I said, when I was pitching the book and
Speaker:then writing it, I felt confident that this
Speaker:information was necessary, that people in the field
Speaker:could use it. But at the same time, I was seeing
Speaker:relatively little being written about data governance. I was seeing a lot of
Speaker:articles on different things you could do with data from the data
Speaker:science side or engineering side, but I wasn't seeing a lot about
Speaker:governance, and there was that nagging part of me that
Speaker:worried. I feel confident about this book and
Speaker:its subject, and I do worry that it's going to
Speaker:land with a bit of a little thump and
Speaker:then go nowhere. But I've actually really seen the conversation in
Speaker:our industry shift this year. I think it's no accident that that
Speaker:happened when Chat GBT became mainstream, when Generative AI
Speaker:officially became mainstream. And that really was
Speaker:my thought all along, was that we were going to reach a
Speaker:tipping point where data governance was necessary. And so I would even
Speaker:go so far as to say when the book was in beta last fall, I
Speaker:still had some of those concerns about whether it was going to be
Speaker:relevant enough or perceived to be relevant enough, and
Speaker:I don't have that doubt anymore.
Speaker:So it's interesting. I see that there's an Audible version too. That's
Speaker:awesome. There is. And so they did turn it into an audiobook. So if
Speaker:people want to read it, they can either pick up an e
Speaker:copy, which is available on any ereader, they can also
Speaker:order a print copy, but it is also available on
Speaker:audiobooks. So if people want to utilize that I know
Speaker:that audiobooks are preferred for people on the go. I listen to
Speaker:them at the gym or on planes, and so I
Speaker:find that audiobooks can be a great
Speaker:alternative. If you don't have that time to sit and read every
Speaker:day, you probably at least are sitting down at some point during the
Speaker:day, whether on a commute, whether on a plane. And so hopefully the audiobook
Speaker:can help. No, absolutely. Because
Speaker:of circumstances related to what I mentioned early
Speaker:in the show about the good news, I was just spending a lot of time
Speaker:in the car between here and Pittsburgh. So I've gotten a lot of audio
Speaker:books done in there and I think this is an
Speaker:awesome conversation. This could probably go on for the 2 hours, but I want
Speaker:to switch to the pre canned questions. But while
Speaker:hopefully Lauren, you've had a chance to review those before. Oh, Andy
Speaker:just posted them, it looks like. Well, let me post them over
Speaker:here in our team's chat. Oh, I just did
Speaker:it. They're not brain teasers,
Speaker:but they're just fun little questions that we have, we ask of every guest.
Speaker:But I will point out that Audible is a sponsor
Speaker:of Data Driven, and if you go to
Speaker:thedatadedrivenbook.com, you could pick up a free book.
Speaker:And I'm looking forward to listening to your book. Lauren.
Speaker:Awesome. Thank you so much. That really means
Speaker:excellent. Yes. And if listeners want to
Speaker:buy the book, you can go to Pragueprog.com. That's
Speaker:Pragprog.com. The book is
Speaker:called Designing Data Governance from the Ground Up, and your listeners can
Speaker:use the code Datagov 23 all
Speaker:Caps to get 35% off the e copy.
Speaker:So if folks are interested and they need a little bit of a
Speaker:boost, that code should be good, and I
Speaker:would love to know what folks think. So I'm happy to be connected with on
Speaker:LinkedIn and if folks want to leave reviews of the book on sites
Speaker:like Amazon and Goodreads, that is also hugely helpful.
Speaker:Those reviews really do make a difference in books getting found and
Speaker:discovered on those platforms, so every review helps.
Speaker:Awesome. All right, our first question. How did you find
Speaker:your way into Data? Did you find Data or did Data find you?
Speaker:Data did find me. I'm a writer at heart,
Speaker:and I have a background in mixed methods
Speaker:research, journalism, and digital media and
Speaker:content management. I started using open source CMS
Speaker:systems to manage that content. So that's my
Speaker:first foray into open source tech and communities. But I
Speaker:didn't really get interested in Data until I was a research analyst at
Speaker:Gartner and I started learning about AI
Speaker:that way. That's where I started hearing about different types of AI,
Speaker:things like natural language processing versus robotic process
Speaker:automation and how you could use these different types of tech to
Speaker:solve very specific business problems. And I was
Speaker:surprised by how interesting I found
Speaker:that whole aspect of it and how interesting I found the fact that at
Speaker:the end of the day, AI is data, and the more
Speaker:you learn about data and the more you know about it, the more you can
Speaker:use those technologies effectively.
Speaker:Awesome. You want to take the next question, Andy?
Speaker:Yes, sure. Sorry.
Speaker:I was thinking of how that parallels Frank's story a little bit.
Speaker:I beat Frank up about this every chance I get because I
Speaker:begged him for, like, ten years to come over to
Speaker:data and specifically analytics and business
Speaker:intelligence because Frank is a gifted natural
Speaker:artist. He's one of those people that can draw.
Speaker:And I'm almost 60 years old. I still can't
Speaker:color in the lines. So I had to do something like data engineering
Speaker:that didn't require that artistic bend.
Speaker:But I was thinking of that, as you mentioned, that could I use this
Speaker:to beat Frank up and see, I did
Speaker:it's in love. Frank, you know that. Oh, I totally know. I totally know.
Speaker:Yeah. It only took the collapse of Silverlight
Speaker:and Windows Phone for me to see the light. I'm so sorry that
Speaker:happened. That's okay. Our second question.
Speaker:Lauren, what's your favorite part of your current gig?
Speaker:My favorite part of my current gig is talking
Speaker:to users of a particular product. And
Speaker:when the light bulb goes off between what they're saying
Speaker:is a pain point and a possible solution that we can build or
Speaker:design, that gets really exciting to me. And
Speaker:so you can get a little overwhelmed by all of the user interviews
Speaker:that you do, especially in the beginning when you're taking in a lot of information.
Speaker:But then as you zoom back and then start looking at the big
Speaker:picture to see how you might solve some of those
Speaker:challenges with technology, that's where I see the
Speaker:real clear overlap between those user interviews and
Speaker:what is designed and put out into the world through tech. And
Speaker:that's really exciting to me. Got you.
Speaker:Our next complete the sentences when I'm not working. Well, we have
Speaker:three questions sorry, too much coffee. We
Speaker:have three questions that are complete the sentence. Right. So the first one is, when
Speaker:I'm not working, I enjoy blank. I enjoy
Speaker:traveling. I love to travel as much as my time
Speaker:and money allow. And one of the cool things about working in Tech is that
Speaker:you get to attend a lot of conferences that are in really cool places. So
Speaker:by virtue of being in Tech, I've gotten to see a lot of
Speaker:new cities and even some countries in places.
Speaker:For instance, I'm scheduled to go to North
Speaker:Macedonia next month to help teach at a tech
Speaker:camp in Orid, North Macedonia. And I would not
Speaker:be going if not for my career in Tech. But I love
Speaker:to explore new places, and doing that is one of the few things that actually
Speaker:gets me to turn my brain off, and that's one of the things that I
Speaker:value about it. So I do that as much as time and money
Speaker:allow. I am with you. Yes. I like to not
Speaker:look at a calendar. That's kind of my thing. Yeah.
Speaker:And it's a luxury in this day and age, and when I get
Speaker:to do it, that's really special Macedonia.
Speaker:I've never been into that part of the world and I am jealous.
Speaker:Yes, I'm looking forward to it. Other than Croatia,
Speaker:I haven't been to the Balkans. I've seen very little of Central and
Speaker:Eastern Europe as a region. And that's the thing about travel. As much
Speaker:as you've seen, there's always more to see and you know that
Speaker:you can't possibly scratch the surface of all of it. So I really
Speaker:value every opportunity that I get to see something new.
Speaker:Excellent. So our second complete the sentence is I think the
Speaker:coolest thing in technology today is blank.
Speaker:I think the coolest thing in technology today
Speaker:is the opportunity to
Speaker:get time back to plan more
Speaker:effectively. And so that might sound like a catch
Speaker:22, but I think when we look for opportunities to
Speaker:automate really repetitive tasks that take people hours,
Speaker:if not days to complete, it does give you a lot of
Speaker:time back to be more strategic about how you complete
Speaker:the essence of your work. And so one example of that is I teach a
Speaker:course on interaction design at George Washington University and I had a student this past
Speaker:semester ask me about the
Speaker:impact that I think AI will have on the design profession. And I said,
Speaker:well, you're already using AI and design today because it's embedded
Speaker:into Canva and mural and all of the
Speaker:software that you use to make these designs. And you're
Speaker:already pretty adept at using AI, but what it can't do
Speaker:is teach you to get really granular about the best
Speaker:way to design that technology to
Speaker:do a particular task that can solve a user need. And
Speaker:so I think that that is what's really cool. I think
Speaker:that is what is not easy to be easily automated.
Speaker:And I think that if we can use technology to do
Speaker:the dull stuff, for instance, using natural language processing to comb
Speaker:through hundreds of documents and get you the information you need within
Speaker:minutes, that is on the surface kind of boring,
Speaker:but it's also hugely valuable. It's better in many cases than
Speaker:what humans can do and it gives you more time back.
Speaker:Good answer.
Speaker:Oh, you're on mute, Frank. Frank, I'm on mute. Sorry,
Speaker:but I was coughing. The third and final complete the sentence is I look
Speaker:forward to the day when I can use technology to blank
Speaker:to drive. I would really love. I
Speaker:grew up learning to drive in the suburbs of Boston and then I moved to
Speaker:Washington DC. Which means that driving is not a fun
Speaker:experience for me. And I do look forward to the
Speaker:day when the technology for self driving cars is advanced
Speaker:enough that I can use it to just get in the
Speaker:car, have it drive for me. I
Speaker:do not know what exactly that looks like beyond this idea that I just
Speaker:shared because obviously self Driving Cars and Regulation
Speaker:is a whole other podcast. But I do look forward to the day
Speaker:when, like, planes being effectively flown on
Speaker:autopilot today. I do look forward to the day when we can actually do that
Speaker:with cars. I wholeheartedly agree on
Speaker:that one. Driving in there's something about driving in and around
Speaker:DC that is just an unpleasant experience. It is. And it's gotten
Speaker:worse over the pandemic, for sure. I notice a lot more speeding,
Speaker:a lot more people running red lights, a lot more people going through intersections.
Speaker:And as someone who straddled the border of DC and Maryland
Speaker:for seven years, maryland drivers are truly terrifying.
Speaker:And so I hope that self driving
Speaker:cars can alleviate a lot of that. As a Maryland resident,
Speaker:I do not disagree. I was
Speaker:just going to interject that here in Farmville, Virginia. It's tough, too. I
Speaker:mean, just the other day there were like five cars at the light.
Speaker:It's a rough one. The struggle is real,
Speaker:by the way. I agree with self driving, even though it's all
Speaker:rural around me. Share something different about
Speaker:yourself, Lauren. But we remind all of our guests
Speaker:that we want to keep our clean rate. Yes. So
Speaker:something different about me is that I foster
Speaker:dogs. So I have a dog myself. I have a
Speaker:rescue dog who is my little work from home buddy.
Speaker:But I also foster dogs every now and then. And so I fostered
Speaker:a total I did the math recently. I've fostered a total of
Speaker:ten within the past two years. And so every now and then
Speaker:I have two pups at home, and I always encourage people to
Speaker:foster whenever I can. We're in the summer right now.
Speaker:Summer is a notoriously busy season at Shelter. So
Speaker:if you have ever considered fostering a
Speaker:dog, a cat, any other animal that just needs a home to
Speaker:decompress in before they get adopted, I highly recommend that people
Speaker:consider it. That's cool. My wife and I have done the
Speaker:same, and we've only managed to keep two.
Speaker:Yeah, well, so one of them I did end up adopting. I did
Speaker:adopt one foster, but the others and
Speaker:people say they're like, well, is it hard to give them up?
Speaker:And it is to some extent, but I also think,
Speaker:you know, when you're a stop on their journey versus
Speaker:their final destination and it's hard to
Speaker:explain it more than that, but it is a gut feeling. And
Speaker:so I think you actually know, like I said,
Speaker:I highly encourage people to do it. The way I also sell it to people
Speaker:is you get all the fun of having a pet around without
Speaker:the bills and long term responsibility. So
Speaker:that's also good if you just want a little buddy for a while
Speaker:but don't want a pet long term, that works out, too. It is a bit
Speaker:like Uber for dogs in that sense, or whatever animal.
Speaker:Yeah, no,
Speaker:we had a whole litter of puppies once that were fostered with us, and it
Speaker:was really cool to have that little baby puppy experience,
Speaker:but. Yeah, it sounds like a lot of work, though.
Speaker:It was. And then as they got adopted, I was like, okay,
Speaker:yeah. I'm happy to see them go to their new homes where they're the center
Speaker:of attention.
Speaker:That's part of the justification for moving where we did now, where we have like,
Speaker:four acres, was for the dogs, basically. I work hard
Speaker:so my dog has a better life. Oh, totally.
Speaker:I work to support my dog. At the end of the day,
Speaker:we have a dog, but we're owned by five cats. Share it
Speaker:on. That's also a good way to put it. Yeah. You're including
Speaker:the dog. The dog is also owned by the cats, I'm guessing.
Speaker:And our final question, where can people find more about you and what you're
Speaker:up to? Yes. So I am active on LinkedIn, so
Speaker:if people want to connect to me, I would welcome that. I'm on there under
Speaker:my full name, and then they can also, like I
Speaker:mentioned, go to Pragprov.com to find the book.
Speaker:So that would be fantastic if your listeners want to find it and
Speaker:download it and then let me know what they think. So those are the main
Speaker:avenues. I am on Twitter as well, although less so
Speaker:these days. And I am trying out new
Speaker:platforms like Threads. I'm active on
Speaker:Instagram already, and so I did decide to
Speaker:try out Threads as well. That is TBD, but that's used
Speaker:in more of a personal context. I don't talk to my friends
Speaker:about data governance in my everyday life, but that's also partially why I like
Speaker:talking to people like you about it. Cool. Well, thank you. And
Speaker:with that, we'll Let Bailey, our AI
Speaker:assistant, and the show. Thanks for joining us.
Speaker:Thank you, guys. Thanks for listening to data driven
Speaker:have you checked out Data Driven magazine yet? We are looking for
Speaker:writers for the Autumn 2023 issue. Please check
Speaker:out Data Driven magazine.com for more information. Thanks
Speaker:for listening, and be sure to rate and review us on whatever podcasting app