
Background: I attended the Enterprise Data Governance Online hosted by Dataversity. I’ve gone several times before and have found it packed with useful information. I would recommend this for anyone working as a governance professional. It is free and runs all day with several 40 minute talks.
The first presentation that I came in on was one by Katrina Ingram on data ethics. The full name of the presentation was “Integrating Data Ethics into Your Data Governance Program” which is quite an appropriate title nowadays. Katrina is from Canada and according to her LinkedIn profile, centers her work around AI, ethics and privacy. She runs a consulting company called Ethically Aligned AI. Her full story is given on her website here, https://www.ethicallyalignedai.com/about-1. Katrina also teaches a Dataversity course entitled Data and AI Ethics which can be found here – https://training.dataversity.net/courses/daie0625-data-and-ai-ethics-practical-approaches-and-solutions.
I often like to visit the chat room during these types of events. This event had a plethora of participants from various sectors. One comment from a participant that struck me was “Data governance is more of people governance than data.” I have to agree in that governance deals with changing how people work with data day-to-day. Another participant commented, “Data quality – people do generate 20% or more of our data problems.” I actually think it is probably more than 20%. If you consider that data quality problems not only arise when someone “fat-fingers” an entry into a data form, but can arise in the creation, handling, storage, etc. of the data on down the line. Data quality issues are the result of problems that can occur anywhere in the data lifecycle.
I had a brief conversation with someone in the chat about this point. His comment to the group was that,
“Everyone who touches data is responsible for data quality. So it’s everybody’s problem.” I replied that as a data consumer, I touch data but I’m not sure I’d be responsible for the data quality of the data from that source. Everyone involved with the data lifecycle is somewhat responsible for its quality. Now if I consume from a data source to create a dashboard, then I have to be accountable for what I put out there. I have the responsibility to be transparent about what I’ve done with the original data to present it on my dashboard, etc.
The speaker also told stories about Ethical issues around metadata and its use. Specifically, she told the story about an AI-based app that shared location data about users it tracked and the privacy implications of doing this. As a metadata aficionado, I was thrilled to see metadata included in a discussion of ethics. For example, a search of LinkedIn learning courses on Responsible AI yields 32 courses at this writing, (https://www.linkedin.com/learning/topics/responsible-ai). But try doing a search on Responsible AI and metadata…crickets. It is metadata that can help build responsible AI systems. One academic paper that drives this point home is Metadata in Trustworthy AI: From Data Quality to ML Modeling
by Jian Qin and Bei Yu. (https://dcpapers.dublincore.org/files/articles/953354037/dcmi-953354037.pdf).
In the paper, the author present how metadata can be used to “enhance the traceability of an AI system, thus increasing its trustworthiness. Metadata can be used to improve data quality as AI systems are designed, developed, tested on algorithms that produce metadata.
Some key opportunities would be developing appropriate metadata schema to describe AI produced objects. Think about datasets, models, data pipelines, algorithms, lineage flows, etc. used in an AI system. The authors correctly make the point that work needs to be done in the metadata development arena to keep up with the fast pace of AI. We cannot use the same conventions for generating, standardizing metadata that we do in digital repositories. Metadata is at the heart of the development of these AI systems so it would be useful.
Below are some questions asked and answers given by Katrina for this talk.
Question: What are the best ways to communicate data governance in a company?
Layered approaches work best such as newsletter, lunch ‘n’ learns to get into data governance topics. I certainly agree here that a multi-layered approach works best to spread the word about governance. Multi-modal communications (written, oral, visual) that are long, short, formal, informal, etc. is a better practice to resonate with your audience. You actually need a comprehensive communication strategy that breaks down what layers you will have, what modes you will use, how often, what you will deliver and how you will deliver it.
Question: Do you find that most companies are driven by cost, risk, compliance?
There can be finance, legal, reputation risks which equate to costs. Compliance can be used as good leverage to start a conversation because it is what a company has to do. Then, you can talk about the ethical issues with governance and eventually, the ethics in AI-based systems. Having worked in governance, I think starting with a compliance/cost problem, then risk issue is always smarter when bringing up governance matters. Corporate leaders tend to look at you dumbfounded until you bring up potential problems that cost a company money.
Question: How do you approach talking about ethically designed AI systems and LLM (large language models)
Katrina calls AI development a “Fair Trade Coffee Issue”, in that we are excited about getting the coffee but now need to figure out how to use it fairly. Generative AI systems that produce text, images, audio are using original works from artisans who have not been compensated for their copyrighted work. There is a huge issue around generative AI and compensation and fair use of original data. In 2023, I attended a few important “listening sessions” sponsored by the US Copyright Office on this very topic. There were various groups represented there from artists, writers, musicians along with corporations such as Microsoft. Academicians were there are well. This discussion around the fair use of copyrighted material to generate new material often for commercial use was pretty heated. I greatly appreciated the nature of these discussions in that they attempt to get out in front of the technology which has not happened before in my recollection. Although the commercialization of technology cannot be stopped, talking about the impact and ramification of the new technology can lead to helpful legislation with regards to handling intellectual properties such as copyrights of AI-generated material. Although, I don’t think the laws can keep up, I do think that appropriate laws can be developed within the same generation!
Question: How do we change corporate culture to value data ethics?
Katrina recommends that in our own circles both professionally and socially, we share stories about real people, real situation, and the real adverse effects of AI ethics. Stories resonate with people and can drive change. To me, this sounds like a long road but a start to an important topic that will continue to exist as more and more AI systems get developed and become more ubiquitous.
