Talking Drupal #545 - DKAN

March 23, 2026

Today we are talking about the open data platform DKAN, what it’s used for, and how it applies to Drupal with guests Liz Tupper & Dan Feder. We’ll also cover Modern Drupal Dashboard as our module of the week.

Listen:

direct Link

Topics

  • What Is DKAN
  • Who Uses Open Data
  • 20:08 DKAN Origin Story
  • Why Drupal Fits DKAN
  • From Distribution to Module
  • DKAN 2 Rebuild and JSON Shift
  • Async Jobs and API First
  • How Teams Publish Data
  • What a Dataset Really Is
  • Metadata vs Data Access
  • Why DKAN Left Drupal Org
  • Migration Path to DKAN Four
  • Harvesting and Data Store ETL
  • APIs Visualizations and Bots
  • Roadmap Data Store and AI
  • Contributing and Where to File Issues
  • Brief description:
    • Have you ever wanted to have your Drupal site admins start with a fast, widget-based interface that surfaces key site metrics, system health, and operational insights? There’s a module for that.
  • Module name/project name:
  • Brief history
    • How old: created in Feb 2026 by Gaurav Kapoor (gaurav.kapoor) of werk21 in Berlin
    • Versions available: 1.0.5, which works with Drupal core 10.3 and 11
  • Maintainership
    • Actively maintained
    • Security coverage
    • Number of open issues: no open issues
  • Usage stats:
    • 4 sites
  • Module features and usage
    • With the module installed, site visitors with the new “Access modern dashboard” permission can access a React-based dashboard with widgets to provide insights on topics like:
    • Content overview: total content count, published vs unpublished, and per content type breakdown.
    • Users overview: user count per role (users with multiple roles are counted in each role), plus pie chart visualization.
    • Additional Content (Entity overview): lists all entity types (content + configuration), shows counts, and provides direct “Manage” links.
    • Modules overview: installed modules summary, including enabled/disabled and core/contrib breakdown.
    • System & status: key environment details such as Drupal core version, PHP version, and database information.
    • Health checks: displays Drupal requirement checks grouped by status (pass/warning/error) with a dedicated detail view.
    • Each widget can be clicked to open a detail view of the extended data, making it easy for admins to dig into any area
    • The widget-based architecture should also help to pull in data from other sources, potentially including things like analytics
Transcript

 

John: This is Talking Drupal, a weekly chat about web design development from a group of people with one thing in common. We love Drupal. This is episode 5 45 Deccan. On today's show, we're talking about the open data platform, DKAN, what it's used for and how it applies to Drupal. With our guests, Liz Tupper and Dan Fader.

We'll also cover Modern Drupal Dashboard as our module of the week.

Welcome to Talking Drupal. Our guests today are Liz Tupper and Dan Fader. Liz Tupper is a strategic technology leader with over 25 years of experience driving transformation change across the civic tech, social good, ai, internet of things, web, mobile application, and video game industries. She's the senior director of product at Civic Actions where she's passionate about making an impact in civic tech, especially using open source technology.

Liz, thanks for joining us today.

Liz: Yeah, thanks for having me.

John: And Dan is a web developer who has worked at Civic Actions since 2017. He has served the public and nonprofit sectors for over two decades and is one of the principal maintainers of the DKAN Open Source data software. He lives in Philadelphia with his wife and son.

Dan, welcome to the show and thanks for joining us.

Dan: Thanks.

John: I'm John Ozzi, solutions architect at EAM, and today my co-hosts are joining us for the third time, Steve Wirt, tech Lead at Civic Actions. He's a backend developer and product caretaker to government websites, Drupal module developer, practicing contribute first.

We like that. The first robotics team mentor and a builder of odd furniture and other wood worky things. Steve, I don't, I I don't even know what to do with all those things. It seems like you're doing, you're doing so much building furniture, building robots, building open source. My gosh. Thanks for joining us and

Steve: thanks for having me.

John: Regular listeners might be noticing that we're kind of like mixing things up a little bit. This is actually gonna be Steve's fourth show, but in the third show spot because next, next week we'll be recording our live DrupalCon show. Right. So listeners look forward to that one. You'll also notice maybe that in our other guest host spot today is Martin Anderson Klutz, Martin Anderson includes product marketing manager for Drupal at Acquia. There we go.

Martin: Happy to be here.

John: And yes, Martin is filling in for Nick today as he is powerless to join us. Babu. Alright, let's actually just turn it right over to Martin for our module of the week.

What do you got for us this week? Martin?

Martin: Thanks John. Have you ever wanted to have your Drupal site admins start with a fast widget based interface that surfaces key site metrics, system health and operational insights. There's a module for that. It's called Modern Drupal Dashboard, and it was created in February of 2026 by Grav Kaur of work 21 in Berlin.

It has a 1.0 0.5 version available, which works with Drupal Core 10.3 and 11. It is actively maintained and has security coverage and has no open issues and is officially in use by four sites according to drupal.org. Now with the module installed site visitors with the new Access Modern Dashboard permission can access a React based dashboard with widgets to provide insights on topics like content overview to see total content count published versus unpublished and per content type background.

A user's overview to see user count per role, including users with multiple roles counted in each role, plus pie chart visualization, additional content listing, all entity types showing counts and providing direct manage links. A modules overview indicating installed modules summary, including enabled or disabled and core or contrib breakdown, a system and status summary including key environment details like Drupal core version, PHP version and DA database information as well as ha health checks including, a Drupal requirement check grouped by status, sort of pass warning or error with a dedicated detailed view. Now each widget can be clicked to open a detailed view, including the extended data, making it easy for admins to dig into any specific area of interest. The widget based architecture should also help to pull in data from other sources, potentially including other things like analytics as well.

But let's talk about modern Drupal dashboard.

Steve: I took a quick look at it this morning and I like the look and feel of it. It has a nice, nice appearance to the dashboard. Gives you what you need to know.

John: Yeah, I think it's interesting 'cause I feel like everybody wants a dashboard. Every, every time, every time I hear about like, oh, you user needs to log in, Abby needs to log in.

They need a dashboard. So, you know, the fact that this is, is in existence is, is interesting. And it's interesting because to me also because it says it's a dashboard for Drupal 11, but it, it supports, looks like it supports ten three and above. So you can also get it on your Drupal 10 site.

Martin: Yeah, I think the Drupal 10.3 compatibility was kind of recently added.

So, probably the, maybe the product page needs needs to be updated to reflect that. I do think it's interesting as well, considering Drupal CMS two can includes dashboard using a different project. So the, the dashboard project. So it, it's kind of interesting, you know, we sort of have a growing number of options in this space, but it is interesting to see this as specifically a react based approach to providing that, that dashboard and would be interesting to see if there was some way for some of these competing solutions to maybe it like kind of an API layer have some commonality.

So, you know, one given plugin could be written that could support either display using a React widget in one ecosystem or displayed in more of a native Drupal interface in another. I don't know if that's possible, but would be interesting if, some of the people behind some of these different solutions could maybe compare notes on, on some of those ideas.

John: Now it, you made it sound like it was slightly customizable, where you could probably build your own widgets to kind of plug and play into this. Is that, is that true or did I just kind of fabricate that?

Martin: Yeah, so the, the module does include kind of a, of a base class that you can use to sort of extend and create custom widgets if you wanted to have some kind of, let's say CRM display or, you know, I think analytics is a thing I've heard a lot of you know, people looking at Drupal as a CMS ask about.

So I think the degree to which you can make Drupal the glass for things that are not just about sort of managing content, but other kinds of ways that people evaluate the success of a website, being able to pull all those into Drupal and, and have it sort of that single source of truth for all kinds of ways that people might evaluate the success of their website would, would definitely be helpful

John: not to, not to shift focus from our module of the week this week, but have you seen a module that allows for kind of providing that analytics widget.

Did we, did we do a previous module, am I remembering correctly, that provides that analytics data back to back to Drupal in a widget form?

Martin: So there is a, a dashboard module that does that. I can't remember if it's the one that's in Drupal CMS or a different one that I have used over the years, but I definitely know there is one that, if I recall correctly, works with blocks.

And because the, there's like a Google analytics reporting module that provides a block, then you can sort of drop that into your, your, your dashboard. It's maybe also worth mentioning while we're sort of broadening our desco, our, our discussion to include sort of dashboards in general. The home box dashboard module is one that a lot of people will be familiar with because it's sort of how and.org people can customize their own little dashboard to include whichever widgets are sort of meaningful for them.

So if your website needs kind of more that individually configurable dashboard, then home box could also be one to check out.

Steve: Alright. It does seem like there's a handful of things competing for the space and I like, I like your comments, Martin, about how like, bringing 'em together more. Like when I, when I first looked at it, I was thinking about like my site documentation module that has a bunch of reports. Will Jackson's got an audit export module that does a bunch of reporting and, and it's like, you know, if we were all.

Collaborating a little bit more and connecting the dots instead of having like three or four competing products you know, if they were blended together, I, I think that'd be great.

John: Yes, I agree with that statement. 150 to 200%. I do often feel like we need to combine our efforts into some of these solutions to, to kind of unify, unify things, but hey, we can all agree.

Everybody needs a good dashboard. Right. Martin, thanks for bringing us a great module of the week. If folks wanted to get a hold of you to suggest a module or just talk about modules in general, how could they best do that?

Martin: We're always happy to talk about candidates for a module of the week in the talking Drupal channel of Drupal Slack, or people can reach out to me directly as man clue on a variety of Drupal and social channels,

John: or if you happen to be a DrupalCon this week, I'm sure.

Martin will be around for you to for you to talk modules with.

Martin: Yeah. I expect to be in the Acquia booth during a lot of the breaks. So if folks want to stop by and chat, would love to talk about modules of the week.

John: Did I not to, not to go down the, the rabbit hole of, of DrupalCon booths, but did I, did I hear that?

Aquia is doing some sort of beach theme this year. Was that, was that accurate or am I making that up again?

Martin: Yeah. Tiki theme. Yeah.

John: Yeah, yeah, yeah. Okay, cool. I look forward, I look forward to it. I'm always I'm always interested by the Aquia booth. All right. Let's jump into our, our site or our site content.

Oh boy. Our show content today. And Liz, I'm gonna, I'm gonna point the first question to you. Because I'm not overly familiar, and I'm sure maybe many of our listeners are not overly familiar or maybe new to it, but what exactly is DKAN and what problem does it solve?

Liz: Yeah. For this question, I'll start with just like a brief summary about DKAN, but then we need to step back and talk a little bit more about open data and then we can come back to the discussion around DKAN.

But essentially, DKAN is a free open source data catalog built on Drupal. It helps governments and organizations publish and manage open data. And if we take the, you know, the step back about open data since around like early 2000 tens you know, many governments has have required. Open data. So basically the idea is when government creates data, it should belong to the public.

And so, you know, open data, it takes aim at taking information that's either hidden or really hard to access and private databases or spreadsheets and makes it easy for anyone to find and use that data. You know, having it be searchable by keyword date agency and just the list goes on from there.

And you can think of open data like open source software, but for data sets. So this means, you know, publishing data in open formats like CSV or XML and under open licenses such as Creative Commons. And so to get back to, to DKAN

Martin: mm-hmm.

Liz: You know, DKAN lets you build an open data catalog.

Sometimes, you know, it'll be called a portal or repository. And Drupal the name really reflects its inspiration from ccan, which is an influential on pop, popular, open source, python data catalog.

John: Mm-hmm.

Liz: And you know, while many governments have chosen to use centralized or proprietary software like Socrata to power their open data programs, we believe tools should be open as well.

And that's why Deccan exists. So really the bottom line is DKAN makes it easy to publish and share data.

John: Mm-hmm.

Liz: And just easy to find and download and use data.

John: So. I'm gonna try to explain this back to you like I'm a fifth grader to see if I got it. So open data, open data's great. Yes. A big fan, right?

So like, I, I don't know, let's, let's like use a big data set like the census data, right? They collect census data, they publish it, people can access it and use it for all sorts of good things, right? So they may use something like DKAN as a piece of software that basically provides them with a framework to publish that, that data on the internet,

Liz: correct?

Yep.

John: Yes. Alright. You got

Liz: it. Smart 5-year-old.

John: I'm, I'm taking the rest of the show off guys. I, I, I've gotten it. Okay. So that's, that's cool. And I'm sure as we ask more questions, we'll get into how it, how it connects to Drupal. But you know, I think listeners right now are probably thinking like. I can kind of see where they're going with this because, you know, Drupal's structure of content might lend itself very well to that.

Liz: Yes, 100%.

Martin: So Liz, you mentioned government, but are there other kinds of organizations that use Deccan and maybe what are some of the common use cases?

Liz: Yeah, for sure. So yeah, you know, DKAN is mainly used by government. So think about government, not only like at the federal level, but also state and local nonprofits and NGOs.

You know, also wanna share data with the public too. So those are some common use cases. So like, sort of like our two major varieties of open data, you know, organizations. So, you know, there's, there's government data, so like those are agencies sharing. Information with the public because they're either required to or they want to, right?

Mm-hmm. Like they wanna, they wanna be open to their constituents. Another, you know, great variety is research data. So data that's like, you know, used for publishing studies and sharing findings for academic purposes or for research purposes. And so some like common examples that you can think of, of like, think of like, city data.

So traffic data, transit data, crime data, buyer data. So like public works related type data. Public health data is another great example. And like, you know, there can also be like environmental data ocean data, oceanographic data could be another example. So like, if you think more broadly, but then there's also these very specific use cases as well.

And Decans really built for, government data. That was sort of our intended use case, which really makes it more straightforward. Research data, there's a lot of great benefits to using DKAN. It is more complex because there's more rules. There's more software integrations that can be clunkier.

But many research organizations still use DKAN and CCAN for research because of that flexibility piece. So especially with like the dral customizations Oak Ridge National Labs, which is out of the Department of En Energy, uses DKAN in this way for their constellation data portal. So it's really, it's like terabytes of data.

It's peer reviewed data that they're then sharing out to their, to their users. So that's one, you know, specific research example.

John: Interesting. So I'm wondering, you talked about government, you talked about kind of like, educational research kind of facilities, right? I'm wondering do you see any corporate users of DKAN, like are they using it for like their, like corporate social responsibility data or their, like environmental impact data or stuff like that?

Like any, any use cases there? Or are they just like using whatever web software they're currently using for that?

Liz: Yeah. So we haven't seen a ton of that. And the one thing I will say about DKAN is it's better at handling really large data sets. So like in that example of Oak Ridge, like that's terabytes of data.

There's a lot of other. You know, open data tools, I can handle smaller data sets. The, the Ben, you know, the great benefit with Deccan and Drupal is like all this taxonomy, filtering and searching the additional sort of like governance and user roles and everything on the publishing flow. So, like, I like, and Dan may disagree with this, or Steve might disagree with this, but like, I don't think this is a, a great tool for some, a small data set.

Like maybe if you have many small data sets and you wanna aggregate those as sort of like a powerful tool behind this. But this is actually really helping to provide clarity and transparency to, to find the specific information that you need. Whether it's, you know, you are, you're an academic. You are a lawyer, you're a journalist, you're the general public.

Like, you know, who is ever going to look at this data, find the information that they need in this sort of sea of a data set that might exist.

Steve: I disagree slightly on in that if you've already got a Drupal site and you've got five, five data sets you wanna share, I think it's still the right tool for the job just because of all the things it can do.

You know, Drupal doesn't care that it only has five things to play with. Like

Liz: that's fair. Yeah, yeah, yeah,

Steve: yeah. But, but if you didn't have a Drupal site, then, you know, adding the burden of of Drupal and all of that to sharing five CSVs or something like that, you know, might not be worth it. But if you're already running Drupal, it would be nothing to

Liz: yeah, I would agree.

I would agree with that, that statement. If you're already in the Dral ecosystem, why not? It's there. You might not take advantage of all the tools, but there's no harm in that. But if you're not on Drupal not planning to get to Drupal, it's probably not the right use case for you.

Steve: Or it's the reason to go to.

Sure.

Dan: If I could jump in. I also think you know, sort of the major thing that D-A-N-I-I, I might say this, but the major thing, there's a lot of, but you know, a big reason that that people use these kinds of portals is is about sort of implementing standard metadata schemas. So I could be a good choice for a government or, or sorry, for a corporation that has a bunch of data sets that they wanna publish with this specific kind of data metadata.

I think we might've seen like a health insurance company randomly use it for something a number of years ago. Don't, don't quote me on that, I guess. Except that I just said it. But anyway, I think that use case is more common in a sort of institutional framework where you're like, you know, you're sharing with people that need to cite it in a certain way for a paper, or people that use, you know, data kind of scraping tools, and it's just more common for that to be kind of in the public context.

Steve: Dan, you've been hip deep in DA and open data for a long time. And like, can you describe sort of the, the origin story of Decan and, and how it first emerged? It's, it's kind of a cool story.

Dan: Yeah. The idea first came from the sort of web, this is before my time, so I may not get all the terms right, but there was sort of a web team or technology team at the New York State Senate that, was a sort of pioneering office in releasing open data. You know, people don't usually think of the New York State Senate and innovation in the same breath, but there are, there are some exceptions to the rule. And this was, this was sort of a happy story. They, you know, they looked at some of these new ideas about how you should be publishing data, how you should be licensing it, how you should be making it discoverable.

And you know, they kind of set up a one-off kind of view and set of fields and probably some sort of RDF feed at the New York State Senate website, which was already running Drupal. And I think as they saw more of these things, you know. Start to roll out across governments around the world. You had a kind of major executive order in 2013 from president Obama instructing federal agencies to sort of standardize around this stuff.

You know, we started to see ccan get really popular for this and, you know, we love ccan where we kind of talked to them about a lot of things and are inspired by their work. But there were a lot of challenges to implementing ccan for especially smaller governments. It's kind of it, especially at the time, I think this has changed a bit, but at the time it really didn't use the kind of stack that people were used to deploying in in kind of standard web contexts.

And there's just a lot of investment already at Drupal in the public sector, you know, the whole ecosystem of vendors and skills. So. A few people that formed, you know, that left the state Senate and formed a new company that was called New Civic. Said, well, let's let's build ccan and Drupal basically.

And that's where Dec a n came from. You know, at this point it's pretty different from ccan and you know, it's, it, we're not sort of suggesting that we're a direct port of ccan anymore, but they have, you know, similar origins and I think the initial version of, of Deccan definitely resembled CCAN a lot more just visually.

So that took off, you know, we had a lot of, this was during this sort of boom in open data where just thousands of these portals were being spun up all over the world. So there was a lot of adoption in those early days. And, eventually this company, new Civic, went through a number of acquisitions and mergers and kind of got spun down eventually.

But Civic Actions decided to take over the stewardship of DKAN as a project in 2017. And myself and Jeanette Day, who had been working at New Civic for a long time in this came on board shortly after that and we've been maintaining it at civic actions ever since.

John: So I may have missed this, or we might not have said it, but what does DCAN stand for?

Dan: Well, it sort of stands for CCAN and Drupal, but CCAN stands for a Comprehensive Knowledge Archive Network. So it's a little bit of a acronym or something, you know, we don't really spell it out that often, so C so it's just

John: Drupal comprehensive.

Dan: Yeah, and there's been a num, there's been I think a JAN and a, a couple of other kind of rifts on ccan N because CCAN is so, known and influential in that space.

John: Got it. So we talked a little bit, and I think Liz mentioned this a little bit about kind of the, the capabilities of Drupal that are, are useful in Decan. But and I think we're, you know, taxonomy users. I'm just wondering, Dan, like, can we go into a little bit more detail on that? Like, I think, you know, regular listeners are gonna understand like, yes, Drupal is a good use case for this, but I'm just curious specifically, like what features is DKAN really leveraging out of Drupal?

Dan: Well, I think the biggest things are kind of obvious, right? I mean, the, the basics of a contact management system are already there. And we don't have to reinvent the wheel on user authentication and granular permissions. You know, there's this huge ecosystem of modules out there that, you know, you can never really have this one size fits all kind of solution that's gonna be right for every use case is gonna be very different.

You're gonna have a research institute that, you know, people have their academic single sign-on that they need to use. You have a government where, I don't know, you know, there's, there's some kind of other standards that they need to meet, or a PIV card log and whatever it is. You know, Drupal is modular and it is there's, because it has such a large footprint in government and academia, a lot of these problems are already solved.

And then, yeah, in addition to that you know, this whole system of entities and, and schemas and all these things, there's a lot of, a lot of things already there in Drupal that. You don't have to reinvent the wheel for

John: what are, what are some, I guess, common what are some common like integrations that you see when people are using like DKAN obviously the SSO and the, and the, like the institutional login piece makes sense.

But I'm wondering, are there other, are there other integrations, other like, I don't know, modules that are installed that you, you're seeing?

Dan: Let me think. I mean the mo you know, when we talk about integrations with Decan, you know, we talk about it in different directions. You know, decans very sort of API first, it has its own kind of restful API that actually.

Circumvents the core rest stuff and, and implements you know, its own routes and things like that. And it's, it's really meant to power a lot of decoupled front end kinds of things because so many people are doing data visualizations and kind of complicated searches that just work better as, as custom front end stuff.

In terms of like integrations on the backend, I mean, I, I would say the biggest thing is workflow. You know, a lot there's, there's a lot of very kind of use case specific you know, requirements around, okay, well this dataset, this person, you know, this, this class of user role should have.

Permission to create a data set but they can't publish it. And this other type of role, it's sort of a curator role, needs to review and approve it. So even though, you know, DKAN doesn't have any of that stuff in its core code base you know, we've seen different ways of integrating the dataset entity with, you know, Drupal workflow and moderation, content moderation.

Steve: So I heard the Powers and Drupal to like, power to send out notifications, things like that is also really helpful. Yeah,

Martin: so I, I mean, I guess maybe extending that a little bit. Does Decan have any integrations with some of the more modern tools in Drupal, like ECA?

Dan: No, I don't, I don't, I, I don't, I think it's possible, but to be honest, that's not something, I don't know.

Steve might have more of an idea than me how that might happen, but not in my experience.

Steve: It has a lot of events that can be subscribed to, you know, so it's, it's easy for other modules to, you know, custom modules to kind of hook into it and, and pull out what's needed in terms of actions and that kind of thing.

But no, it hasn't been, been brought up to speed with ECA yet.

Martin: Interesting. Now, I heard that Deccan started as a Drupal distribution, so is that true? Is it still a distro? And has there been any talk about maybe upgrading it to be something like a recipe or even a site template?

Dan: Well, the stories are true.

It was a distribution in its first release which has been required some finagling on the drupal.org side to sort of migrate it because it's now in its Drupal eight you know, incarnation. It is just a module that lives at that level. But the Inal seven you know, we were focused more on this sort of turnkey outta the box solution and, you know, there was just a whole different world back then.

We didn't have composer, we didn't have a lot of the CI ICD stuff. It was just very hard to sort of manage a, you know, a, a group of modules that were composed together in a lot of different ways. So, you know, a lot of that stuff is just redundant now, I think. Community has basically moved on from distributions, as have we, and you know, more focused on things that are composable and you know, more flexible.

So I think, you know, I think there are some things that would make Deccan a little bit more work play a little bit better with, with recipes. But in general, you know, Drupal Deccan is more focused on doing a few things very well and, you know, could obviously be part of a larger recipe. Like, I don't know if you had a academic website recipe, it could be part of that to to, to add a, a data repository.

Martin: Yeah. I wonder if the, the team that works on local gov has ever thought about having kind of like a deccan recipe that could sort of drop into those kinds of websites.

Steve: That's a great idea.

Dan: We should chat about that. Yeah. We'll get to this. There have been some, some things about the way Deccan is packaged that have probably also been a little bit of a barrier to that, but that is hopefully changing very soon as we move to, to drupal.org for our releases.

Steve: Deccan two release was a, was a major rebuild. What were the biggest challenges that were, that you encountered and the biggest changes that were made?

Dan: Right. So, in 2020 we started sort of a ground up rebuild and we, we kind of broke and rebuilt everything. There were a lot of things that we really struggled with in the Drupal seven version that we kind of wanted to just reimagine and, and, and.

Focus on sort of getting right. And the biggest change, which is kind of a, you know, a challenging thing I think for some, some people to adapt to when they use DKAN, is that we essentially abandoned the field's API for content storage. I mean, we have a field that's called JSON metadata in the kind of core deccan entity.

But we had so many problems you know, in sort of real world use of these really complicated metadata schemas for open data portals in implementing them as individual fields because these tend to be very sort of nested complex, you know, repeating kinds of sets of fields where you might have you know, one field that contains a group of eight fields that could repeat and themselves have children.

And, you know, I think even, even today with. Sort of the, the, the latest approaches to this. It's very hard to, you know, to manage this with, you know, like field. I might not be up on the latest things, but you've had, you know, field groups and paragraphs and all these things, but you inevitably have, you know, sort of a, you know, a mess that's probably not fair, but you have a lot of entity relationships and things to keep in sync.

Whereas we have this thing called JSON that just, you nest it as much as you want. And, you know, PHP and MySQL and all the sort of tools in the stack support it very well natively. So we decided, and I should say in addition to that, there's a performance issue where in some of the operations you wanna do on the, on the data portal, you might need to update.

Dozens or hundreds of records at once. And when that means, you know, making joins across 25 tables because that's the way that your content is structured. You know, we were sometimes waiting for hours for certain things to complete. And switching to one field has, you know, made the equivalent operation take.

Seconds. So, there's a lot of challenges. You know, I have to recognize it's a deal breaker for some people that there isn't just a set of fields they can move around in the Fields ui. But it has enabled a lot of really powerful functionality. So that was just one point on my list, but that was a really big thing to sort of rebuild and imagine the system based on, on JSON to sort of store the different fields.

And that required creating you know, an interface for that. So we have this submodule, which has just been released as its own module called JSON Form Widget. And that's a form widget, you know, that you can assign to a text field in the Drupal field, API that sort of takes a JSON schema. And builds an entire form from it.

So if your JSON schema is properly composed and you can add a little bit of config to it to say, well, actually I want this to be an auto complete or whatever and you get a, you get a, a full form, you submit the form and it's stored as JSON. So those, I mean, that was, that was quite a lot. You know, it probably took us, took us a good year between starting and having a usable release, but we've been at it, you know, now for, for six years and it gets, gets better and better.

There's a couple more things I could probably mention. We kind of started using the Drupal Q system heavily. You know, a lot of these operations take a long time. It's not a good combination with something that. It's just a request and a response. So rather than having, you know, a an API request or a submit or a DR command that just hangs for 10 minutes, you know, it adds it to a queue and it kind of, you know, happens in the background on Aron run.

That's been really powerful. And I think probably the last thing I'll mention is that we built the whole thing API first. So before we even started to think about how the backend UI would work, we, you know, put it all together with, with an API basically because we did the opposite way last time and felt like we were always catching up.

Like the API was always behind what it could actually do. And it's been much easier to make it, make the software, do what you want, and then create the UI for it. I think that's sort of a more sustainable modern approach.

John: I have so many questions right now.

Dan: Sorry, I know that was a lot.

John: No, no, no.

That was all great. But I like, I feel like a little bit of a data, dumb, dumb because I kind of was like coming at this thinking like some poor guy was like sitting there typing their data into like, into Drupal. Liz, I'm wondering like when you see people use Deccan, right? Like how are they typically using it?

Is it like, here's all my data in this like spreadsheet that's like, you know, a hundred thousand lines? Or is it as Dan was just talking about, is it like, Hey, there's A-J-S-O-N file that has all my, all my stuff in it? Like, I guess what I'm, what I'm asking is like, you install Deccan, right? Then what, like, are you uploading a CSV?

Are you typing it in manually, like.

Liz: Yeah.

John: What does that look like?

Liz: So it kind of depends on what your use case is mm-hmm. And where your data actually lives. And we, and we find that sometimes even within an organization or an agency, different teams might have different processes too. Sure. So like, and so like, it's really sort of understanding for the, for the data publisher, where are they getting their data from mm-hmm.

In some cases, right? We've got people going into Drupal mm-hmm. And then going into our DKAN module and uploading a file, right? Mm-hmm. And then, and then they can go ahead and they can add specific information about the file as well. In other cases we have people, you know, running the API. On their own.

And then in some cases too, you know, we have like, and especially for like large files, we have like a folder and then the folder is running some queries to pull that in so they don't have to log into DA. But those are like some of the more common things that we see. Mm-hmm. But like, like there is some uniqueness to an organization and what type of data we're actually talking about, and like who, like who needs to publish and how frequently they need to publish.

Mm-hmm. As well. And then somewhat sometimes within an agency it could be multiple scenarios. And Dan, you might be able to like, speak more depth around that too, but like, you know, some of the more recent things I've seen over the years tend to fall into those three buckets.

Dan: Yeah. It makes me realize we maybe should have given a, a more kind of clear, like a more specific outline of what a.

Data catalog consists of, I mean, it's a set of records that we would call, that we call data sets.

Stephen: Mm-hmm.

Dan: You could think of each one of those as, you know, like a blog, like those are the, that's sort of the basic entity type is a, a dataset, and the dataset entity describes this gets a little. Difficult sometimes for people, but the dataset entity describes the dataset.

So the, the data itself is usually a file, like ideally a CSV file. That's the absolute easiest thing to work with. But it might be a zip of a bunch of XML files. It might be, you know, in the research field, you know, you, you can see some really, I, I think once it working with Oakridge, one of the data sets was like.

2000 TIFF files in a directory somewhere. So it could be anything CSV files. You can do a lot more with anything else. But the dataset is kind of a wrapper in that way around the, the file. Or it could be multiple files. So the dataset will, you know, it's the reason we were able, we did it the first time with the, the content, you know, field, field, API is, it is at the end of the day just a set of fields that someone else has said, you need to tell me when it was published.

You need to tell me you know, what language it's in. You need to tell me what the file format is. You need to tell me what the geospatial boundaries of it are. This is a data set about Alaska. So here's a, you know, set of points that surround where the, there's a lot, you know, there's a lot of 'em are optional, but the data standards have like a lot of, a lot of things that have a lot of fields that have evolved over time as like a standard way to describe a dataset. So that's what we're talking about. And it's true there are government agencies, especially in the early days, that would hand pipe A-J-S-O-N file and then look for software that could sort of ingest that, almost like an RSS feed.

John: So if somebody has a data set in a CSV file right? Each that CSV file, like just as an example, could com, you know, contain like, you know, a thousand data points. Right. So I'm trying to like, in my brain, Drupal realize this. Right. So like your, your dataset entity has those multiple data points on it as, and is dke converting that into JSON or is it, I

Dan: think it's, it's.

It's a little simpler than you're imagining it. Like if, I think probably let's, if that's a data set, let's say a common sort of example is the tree data set. A lot of cities have a data set that's a list a see, have to be of all the trees that they know of Okay. In the city. And it'll be, you know, a, a lat long point and then maybe like species, you know, data permit, all that kind of thing.

Sure. So the, the, the dataset content in Drupal is probably is not going to have anything at the tree level, you know that, right? It's going to say, you know, maybe there's like a taxonomy of, you know, this is streets data and then potentially it could say, well this is the geographic boundary of this data set, so it's only gonna have maybe four or five points that say like, this is the extent described.

Mm-hmm. And the more, anything more detailed than that is in the file, which is just, it's just a, got it.

John: So then somebody downloads the file to see the actual data points.

Dan: They download the file or, and we'll probably get this in another, you know, in a bit, but, or that file has been sort of ingested into the data store, which is one of the main components of DKAN and they can query it directly with the API

John: Got it.

Okay. And then at that point, is that data in JSON format?

Dan: Yeah. The API, you can request the data as JSO or CSV.

John: Got it. Okay. So you wouldn't necessarily edit the data directly in with DA that the data is provided to Decan as a file or some other method?

Dan: Correct. Yeah.

John: Got it. Got it.

Dan: Yeah.

John: But the metadata for the file is editable through dkk.

Exactly, yes. Got it. Okay. Whew. All right. We made it, so going back to kind of DKAN and Drupal and we talked about it being a distribution. You ta alluded a little bit to it. Kind of coming back to drupal.org. I noticed when I was researching for the show that the, the module looks like it's currently hosted on GitHub instead of drupal.org.

I was wondering if you could talk a little bit about why that is and then like how you kind of plan to bring it back to drupal.org if I understood you correctly.

Dan: Absolutely. Yeah. So the, the fir when we first released Deccan and it, we, I wasn't quite on board yet, but the, I would say a big reason is in those days, this was kind of the beginning of civic tech and the beginning of GitHub and.

All these hackathons were happening and people were, you know, staying up all night in city hall and writing little Ruby applications and posting 'em. GitHub was like the cool place to be if you were doing civic tech. That was one thing. And so the people doing this said like, well, let's just put it on GitHub, see what people think.

Also it had a lot of JavaScript in it. And because package management was not really there yet, we, the JavaScript was like in the module and it didn't work with the licensing rules on drupal.org. Mm-hmm. So we kind of, I think that probably would've been solvable, but we kind of just got used to, honestly to, you know, to working in GitHub and it kind of stayed there for a long time.

But that's kind of been not working so great lately. I mean, it. It does work to add it. I mean, obviously it works. That's how we've been doing it this whole time. To just add it it, you know, through a composer file, it's published on package list. But you know, I think Steve noticed this, that when he tried to publish a submodule to, or, you know, another module that, that depended on Deccan and drupal.org, it caused all these problems because it wasn't in the Drupal, in the drupal.org registry.

We also had some submodules that had naming conflicts with modules that were already on drupal.org and. We just decided this, we really need to sort of be where the community's at. You know, I think that time where GitHub was mattered has kind of passed. The, the tooling on drupal.org is, is there now.

You know, we didn't have CI pipelines and all this stuff that we were pretty early, early adopters of that. We said, well, we gotta use GitHub if we want that stuff. So all that stuff is there and you know, we want to make it easier to use and compose with other things. So we've been, we're just at the tail end of a pretty honestly arduous process of renaming a number of the sub modules, which is way harder than it.

Maybe it's not harder than it sounds 'cause you're nodding, like, you know exactly how hard that sounds. And that's required we're, we're actually jumping two major versions because you're gonna need to, you know, in order to sort of enable new modules and disable the old ones and then remove them from the code base, it's a pretty, you know, you can't do it in the wrong order.

So, Drupal four sorry, I always do this. 2D, two big D words in my life. Deccan four will release on drupal.org pretty soon. There's a dev branch up there already and we're working out, you know, some of the, some of the stuff with the packaging and, you know, and all that stuff. But I, I expect, you know, within weeks that we'll have a, at least a release candidate up there.

John: Great. We're, we're excited to have you back. Thanks. I imagine, like, I imagine like Composer probably makes it a lot easier to include like some external libraries and stuff like that and bring those, bring that stuff all together, right?

Dan: Yeah, yeah, sure. I mean, yeah, all that stuff is solved, you know? There's a lot, you know, we have a lot, and I mean, most sites, most decant sites have a job, like a React front end.

Mm-hmm. So, you know, they've got an MPM file, they might be pulling in. Yeah. There's, there's all kinds of things that, that work pretty well. And, and you know, for a lot of things it doesn't really matter that that DECAN wasn't un drupal.org, but there's enough, enough it creates enough hurdles that we thought it was worth moving.

Liz: Yeah, one, you know, I also think this, like being on drupal.org, we're gonna be exposed to more people who wanna take a look at it and either use it or contribute to, to it. Like that's been another challenge, frankly, being on GitHub is like having to do like our own big marketing push around it and being a part, like, it just, frankly it just makes more sense to be on drupal.org, especially now.

And I've been really excited that we're all, that we're almost there and I think it'll, like, it just makes sense and there'll be a huge benefit lift to doing it across the board.

Steve: Just a side note for our. Current DECCAN fans anybody new, you know, trying DECCAN for the first time will be able to jump directly into deccan four from drupal.org.

Anybody that's currently on DECCAN two will have to update to like the newest version of Deccan two and then update to Deccan three and then update to Deccan four and have to do it in, in those hops to get configuration and everything else to play nicely. Unfortunately,

John: I don't, I don't, I don't wanna turn that into a, turn this into a, a negative, but it sounds like you're doing your own Drupal seven to eight migration there

Dan: almost.

Oh, I mean, the funny thing is that the code is all the same and sort of like the, the. The test is, does it work exactly the same as it did before? So in that, you know, in some ways it's nothing like that. But it is a But the, but it is, yeah. In, in some ways it is. It's, Dan, you have to run.

John: It's okay, Dan, you could just tell me I'm wrong.

Like, no, don't call it that.

Dan: Well, it's just that, that, that's seven to eight upgrade had so many specific things that I'm still a little traumatized by that I'm like, eh, but

John: aren't we, aren't we all?

Dan: Yeah. But it is like that in the sense that there's a lot of update hooks that need to run in a very specific order to get where you're going.

Martin: Dan, earlier on you were talking about the process of ingesting a data set. It sounds like maybe DA performs some ETL processes in there. So what is the typical life set? A life cycle of a data set in DKAN.

Dan: So, yeah, ETL is a, you know, stands for Extract, transform Load. It's kind of a data engineering term that definitely applies to two major things in Deccan.

So I'll start with what we call the Harvest. There's, there's a few ways to get a data set into DKAN. One is just a, a post to the API or a, you know, a submission of the, of the dataset form that gets one of them in there. Or you can point it at a another catalog or, you know, or A-J-S-O-N file that describes a lot of data sets at once and have it harvest that.

And that's not our word. That's, you know, a kind of a standard in, in the data portal or data cataloging industry or something. So if you're using a harvest it. It goes through, I, I mean, it's called ETL in the, in the, in the code. It goes through the extract, which is, you know, there's kind of a guzzle to the URL of that JSON file.

I, it, it act it's a very flexible system so you can point it at another extract class that could parse the XML file and, you know, transform it. That's the next step into something that you can ingest. But it, it, it grabs your records from the external source, you know, mungs them into JSON that looks like your schema, and then loads it into dataset entities.

The second side of it is the data itself. If you're ingesting or importing A-A-C-S-V file into the data store system that's gonna be triggered sort of automatically when you load a data set, no matter how you do it, whether it's a API or a save. If, if DKAN sort of sees A-A-U-R-L to a CSV file, it's gonna kick off a process where if it's a remote file, it's gonna make a local copy.

It's going to parse that all into a database table and then expose that to the API. So that's got a little bit less of a transform aspect to it, but it definitely extracts and loads the CSV,

excuse me. I could go into a lot of detail about that second process. That's where a lot of our engineering time goes these days is to refining that.

Steve: What integrations or or APIs make DKAN attractive for organizations that wanna connect their data catalogs with other tools or platforms?

Dan: I mean, I'd say the Deccan API you know, it has a bunch of endpoints for the metadata and a bunch of endpoints for the data itself. And, you know, being able to get it all of that in a really sort of straightforward, simple way.

Well simple for an API you know. It kind of makes the possibilities endless for third party applications. So, you know, we see a lot of visualizations built because they can, you know, they can create, use any kind of charting library, build a query with the API, maybe it needs to find the right data set using the metadata API, and then build this query.

There's all kinds of things that you can do that, you know, that the API empowers. And then, yeah, I mean, there's integrations with other services that we talked about when we were talking about the, you know, the Drupal ecosystem.

John,

John: you're muted. John, you're on mute.

All right, let's try again. Is that something like talking about APIs and integrations, like, is that something that you do use like Tableau for like do you see a lot of folks. Using, like pulling data and using it in, in that way? Or like what kind of, I guess, charting libraries are you guys seeing people use?

Liz: We've seen, well, I can speak, I can speak to some of the more recent use cases that we've seen people use. So there's now and now it's falling outta my brain, but was it real charts and viss are two of the viss

John: and

Liz: Yeah. Mm-hmm. Yeah, yeah, yeah. Charts yeah. Are two, yeah, two of those. You know, I've asked the same question about Tableau.

Like I think that there, I I think, you know, I've seen in, in previous lives a lot of interest in Tableau for either internal use or sometimes, you know, having it on that, you know, public dashboard to be like, here's, here's all the data. Mm-hmm. I had worked on a project where it was shelter animal data that they were presenting on a, on a tableau.

Dan, I'll let you jump in 'cause you had something else to add there as well.

Dan: Yeah. Well, part, I mean, I think one thing is we don't, we can talk about the things that we've used to build applications on top of, on top of DKAN, which is, you know, these kind of chart libraries we've mentioned vis Rechart you know, D three and Vega, and all these things that we've either deployed or at least experimented with.

We don't really, I mean, we don't know, like if someone pulls data from the API, we can't really tell what, what it is or what they're doing with it.

John: Right.

Dan: This is maybe a whole can of worms. We're definitely seeing things that are pretty clearly, you know, all kinds of bots probably being driven by various AI agents just kind of going through the data and who knows what they're doing with it.

So that's like, I think, gonna be a challenge in the future, just on the infrastructure side is, is how to, how to deal with that and you know, continue to provide the, you know, make the data available to whoever wants it without just. You know, having your servers falling over every day. We're not there yet, but there have been, you know, there are some worrying signs.

John: So as we, as we come to a close here Liz, I'm wondering, looking ahead, like what do you see on the roadmap for DKAN and like, how do you see it evolving, kind of, alongside the broad, broader Drupal ecosystem?

Liz: Yeah. So, you know, for the past 18 months there's been a, this big focus on like, you know, cleaning up old code and fixing technical issues so we can start adding new features.

You know, right now, as Dan mentioned, we're working on upgrading to Deep Decant four and getting everything on drupal.org. Some next stuff. We've got a bunch of exciting things here. So like we're, IM, we're looking at implementing Arch Architectural changes to both the metadata model and the data store.

So this will make many things, like not only are we addressing that, but this will make many things work better and integrate more smoothly with Drupal. You know, we've got Deccan set up with a decoupled front end that uses APIs to display tables. And we're building a, a data table that doesn't rely on rely, react, so like making things simpler and faster.

With that. So we wanna be able to not have to re rely on relaxed, sorry, react so that you know, if people don't have that set up, it's not required. We're also making the data store more flexible so it can work with different types of data and be easier to customize. And then we're working on a stored query system that will let users save and share reports for specific data sets.

There's been a lot of interest in that with some of our projects that we've been working on. And then sort of down the road you know, this is on our radar, but we're being really thoughtful and cautious about it. We're, you know, experimenting with AI and what problems it could solve, but like that, that's one that we've been sort of kicking the can a little bit.

I think like we've got both AI curious people and skeptics within this group, which is a good healthy balance of like, what are the, you know, great use cases here and DKAN and how can we implement them intentionally?

John: Well, I mean, I think that's lends itself to a natural, i'm gonna go out on a limb here and say like a natural use case for ai, right?

Like, hey, mm-hmm. Look at my big, big data set and like, tell me like what commonality you see between this point, right?

Stephen: Yep.

John: And being able to like, get that back and have it go well, we are seeing like this and okay, now graph it, okay, now do this or that. Do that with it like that, that seems like a, a natural, a natural evolution.

So, that is super interesting. We will once you get that, once you get that up and running, we'll have to have you guys back to talk more about it. Thanks. Dan, anything on the roadmap that you're specifically excited about?

Dan: Yeah, I, I, it's kind of hard to get into 'cause it requires getting way in more in the weeds than people care about, about the internals.

But I mean, this, these changes the data store that we wanna make I'm, you know, I'm really interested in. Maintaining this data store functionality with but getting the tables out of the Drupal database there's a lot of challenges with, if you have bigger data sets or a lot of data sets, you know, you can end up with you know, tens of gigabytes size Drupal database, even though it's only two gig, two megabytes of, you know, CMS config data and the rest is just these data tables.

So, I think for these sort of real world applications it works fine to, you know, to sort of demo the software to get stuff into the Drupal database. But you know, we've been looking into both just having the data, the, these kind of big data store tables in a separate database, maybe on the same infrastructure, but there's other technologies that are much more you know.

Much more suited to what you, what we're doing with these tables, which is kind of letting people run whatever queries that they want on them.

John: Yeah.

Dan: You know, these sort of analytic database engines that just blow MySQL outta the water with their performance and even, and their storage, you know, efficiency.

So I'd love to be able to point the data store at something like that.

John: That

Dan: sounds, I mean,

John: really

Dan: cool. And we're, we're looking into that. Yeah.

Liz: Yeah. And I like, I would say too, for anyone who's listening to this, if you're interested in DKAN if you wanna contribute, if you have ideas for future features or improvements, just let us know.

Like we're super happy to chat about it at any time

John: and, and, you know, soon to be in the Drupal issue queue, right. You can drop a, drop a issue issue 100% and Dral issue queue, and, and you get right. Get right involved there. That sounds awesome.

Dan: Well, I wanna be clear, just wanna clarify for now, we're just publishing releases to drupal.org.

We're not quite ready to move. I think once the GitLab migration is complete and we can do issues in GitLab, it'll be easier to make the transition. But we almost, there are not just slip, we're not just flipping the switch on the, on the GitHub issues just yet. So do leave an issue in GitHub for now, but we will.

But, but that's, that's chapter two of the, the move to drupal.org.

John: There you go.

Steve: We are on the list to get, to get it converted to the GitLab issues when that continues. Rolling.

John: Awesome. Well, Liz and Dan, thank you for joining us. It was great, great chatting with both of you.

Dan: Yeah. Thanks for having us.

Yeah, thanks. It's been great.

Martin: Do you have questions or feedback? Reach out to talking Drupal on the socials with the handle of talking Drupal or by email with [email protected]. You can connect with our hosts and other listeners on Drupal Slack in the talking Drupal channel.

John: Would you like to be a guest on Talking Drupal or our new show TD Cafe?

Click the guest request button in the [email protected].

Martin: You can promote your Drupal community event on Talking Drupal. Learn [email protected] slash td promo.

John: Get the Talking Dral newsletter to learn more about our guest hosts. Show news, upcoming shows, and much more. Sign up for the [email protected] slash newsletter.

Martin: Thank you patrons for supporting talking Drupal. Your support is greatly appreciated. You can learn more about becoming a [email protected] and choose Become a patron.

John: All right again, everyone, thanks for joining us. Liz. If folks wanted to get ahold of you to talk about DKAN or civic actions or anything else, how could they go about doing that?

Liz: . Alright, so you can go to get DA n.org. There's a contact us form on that. That'll ping us in Slack. You can also reach out to me on LinkedIn. I'm pretty active on there. Send me a message, you know, you can add me as a contact as well, and I would be happy to connect.

John: Awesome. Dan, what about you?

Dan: Yeah. I'm D-A-F-E-D-E-R at on GitHub, on Drupal on Blue Sky. You can find me on LinkedIn. Yeah, get in touch. Awesome. Oh, and also I, we do have a decant channel on Drupal Slack, so I'd love to see people there.

John: Fabulous.

Liz: Oh yeah, for sure. We're both, Dan and I are both active on that channel, so you can ping any either of us.

John: Good deal. And we will have links to some of the, the decant stuff in the show notes for our listeners who are, who are interested. Steve, what about you?

Steve: I'm s swt on drupal.org and Drupal Slack, and Steve WT at LinkedIn.

John: Martin, thanks for filling in today. Where can folks find you if they want to chat?

Martin: Folks can find me as Blu on a variety of Drupal and social channels, or occasionally they can find blog posts by me on man clue.com.

John: And I'm John Zi. You can find me [email protected], on social media and drupal.org at John Zi. And you can find out about eam [email protected].

Steve: Thanks all. If you've enjoyed listening, we've enjoyed talking.

John: Have a good one, everyone.