[00:00:00] Intro: Welcome to the Research Culture Uncovered podcast, where in every episode we explore what is research culture and what should it be. You'll hear thoughts and opinions from a range of contributors to help you change research culture into what you wanted to be.

[00:00:26] Nick Sheppard: Hi, it's Nick, and for those who don't yet know me, I'm Open Research Advisor based in the library here at the University of Leeds. You're joining us in season three of the Research Culture Uncovered Podcast where we'll be speaking to colleagues from both the University of Leeds and from other universities and organizations about open research, what it is, how it's practiced in different disciplines, and how it relates to research culture.

If you haven't already, you can catch up with season one, which was an introduction to the podcast and to my co-hosts and season two with my colleague Tony Bromley, who was in conversation with a number of presenters from the REDS Conference of 2022. That's the Researcher Education and Development Scholarship International Conference held here in Leeds.

But now I'd like to introduce my guest for today, Hugh Shanahan, who is Professor of Open Science at Royal Holloway University of London. Hugh has expertise in computational biology and statistics. He's co-chair of the Codata, RDA Schools of Research Data Science, and Vice-Chair of the World Data System Scientific Committee.

So welcome to the podcast Hugh and thank you for taking the time to join us.

[00:01:26] Hugh Shanahan: Thank you very much, Nick, for having me along.

[00:01:29] Nick Sheppard: And before we get onto, um, Codata and FAIR data, and you know, the definition of that acronym, um, I know FAIR data is a thing that you'll probably talk about, I must first ask you about Royal Holloway. Um, uh, so it happens to be, as I mentioned to you before, the, uh, university that I went to back in the nineties, and it's quite a spectacular campus, especially in the snow. Have you, have you had much snow in Egham?

[00:01:51] Hugh Shanahan: We had over the last few days quite, quite nice. I'm sorry I didn't take enough pictures of it, but yeah, no, it looks great.

[00:01:58] Nick Sheppard: But for those that don't know, it's, it's quite a spectacular building, isn't it? If you've not seen it, it's based on a French chateau, as I recall, Founders building?

[00:02:06] Hugh Shanahan: That's right. Although it being a Victorian building, they, they, instead of being built in white marble, it was built in red brick.

[00:02:12] Nick Sheppard: Red brick. Yeah. A red brick chateau. Um, and again, I was just telling you...in actual fact, the very first time I went on the web was on campus in the geology department. I actually studied English, but I had a friend in geology, uh, and we went and played on the web, um, in about 1994. So that would've been the Mosaic browser, I'm guessing. I don't know. I'm not sure. That was the first browser, wasn't it back then?

[00:02:33] Hugh Shanahan: Something like that. Yeah. The dim distance of time.

[00:02:37] Nick Sheppard: So, um, so as I say, uh, thanks for for joining us today and, uh, I suppose to start with just a, perhaps a little bit of your academic background. I mentioned, um, so you're a biologist?

[00:02:48] Hugh Shanahan: Um, uh, actually no. I'm, I am, I'm something of a cuckoo in some respects. Um, so back in the early nineties, um, I did my PhD in high energy physics, in particular a very, very sort of computational end of high energy physics, which was, kind of the, the, the, the starting point of thinking about all of this. So I did a number of postdocs, one in in Glasgow and then in Cambridge and then over in Japan.

Um, at that stage I kind of got tired of tramping around the world. Um, and I got, um, a fellowship working in bioinformatics. Uh, so that was working with Janet Thornton's team in, she was then based at UCL in Central London, and then we moved out to the European bioinformatics Institution. Um, and I was there until, uh, 2005. So, uh, at that stage I started working in the computer science department. So, yet again, another discipline change. Um, so I was there from 2005, uh, although it was about, uh, two thousand and fourteen, fifteen when I, I kind of got more and more frustrated with trying to get to data and so on. And at that stage kind of started that journey into me thinking about, uh, moving into the open end of things, and I'd been kind of working on the, the Codata RDA schools. I'd been doing all of that. And then to the point where, uh, when I actually got my post as, as professor at at at Royal Holloway, I decided okay, it's time to, to sort of step out and, uh, I almost said come out of the closet. Um, uh, but yeah, I suppose it was a bit coming out moment when I said, Okay, I am, I'm labeling myself as Professor of Open Science.

[00:04:48] Nick Sheppard: Well, I was, I was gonna ask you about that. So how, how did you become a Professor of Open Science? Not being an academic myself, how's a professorship work? Is that you, you sort of, uh, well decide what, what your area is for professorship? Do, do you or...?

[00:05:02] Hugh Shanahan: Yeah. So I wish it was something which was, which was more exotic, uh, than that, but you know, if you go to the HR department at Royal Holloway, they'll put me down as professor. Uh uh, the Professor of Open Science, the open science bit, that's the thing that I put on my door basically. And I decided from day one, this is, this is what I, this is what I, this is what I want to do. Um, I'm not the only person to, to, to sort of do this. I mean, you know, looking around amongst my colleagues. Everybody does this. So, you know, you have professors of machine learning and we have professors of software, language engineering and so on. And I sort of simply said, yeah, okay, fine. If they can do that, I can do that as well. Which I think is, you know, there's that quiet lesson in academia, which is um, ask for forgiveness rather than asking for...ask for permission.

[00:05:57] Nick Sheppard: It's easier to ask forgiveness than ask permission, but you haven't had to ask for forgiveness I guess? You are now recognized as a, an expert in open science?

[00:06:05] Hugh Shanahan: So far so good. Nobody's, nobody's knocked on my door and, or, or, uh, you know, called me up from the higher offices and said, ah, um, Hugh, uh, yes, wanted to talk about this title you've been giving yourself? So I'll run with it until somebody says no.

[00:06:24] Nick Sheppard: And you explicitly obviously talking about open science. So here at Leeds, and I think in the sector generally, we often talk about open research and I think that's, it's sort of an active decision to, to maybe be inclusive of the humanities. But obviously you're from a STEM background and you're coming at it are you from a particularly STEM focused perspective?

[00:06:41] Hugh Shanahan: Yeah, I think I, I think so. Although so, so number one, there's kind of, uh, the language difference. So, uh, you know, in British English there's, there's definitely distinction between science and research and the need to be more inclusive. Um, uh, other countries tend to be more relaxed about it, so, you know, if you're in Germany...

[00:07:06] Nick Sheppard: Yeah in Europe, it tends to be called open science, but you've worked on a quite lot of European projects, I think?

[00:07:12] Hugh Shanahan: Yeah, yeah, and, and likewise, I think in the US, you know, in North America it tends to be, still tends to be open science. But that...now A. that said, uh, totally get the fact that there needs to be open research, and in fact, if you go and look at Royal Holloway's policies on this, the conversation is always about open research rather than, than open science. Um, because we think, yeah, a lot of the practices, they map over, although, at the same time, personally, I'm always trying to be careful because I don't want to end up, um, you know, telling...hey, you digital humanities folks, get a, get a grip, this is how you do your stuff, you know? And it's just like, no, no, no.

[00:07:58] Nick Sheppard: Yeah, well, I think that's an interesting point that's come up in some other discussions, you know, the fact that sometimes, um, colleagues in the humanities perhaps feel that it's being foisted on them from the STEM disciplines a little bit. And it, you know, you need to, maybe we need to define open research in different disciplines in different ways...

[00:08:14] Hugh Shanahan: absolutely.

[00:08:15] Nick Sheppard: But I suppose that's a good time to sort of ask, you know, what, what is open science? I mean, you've already mentioned sort of access to data, um, but, you know, could you give us a sort of, uh, elevator pitch for what open science actually is from your perspective?

[00:08:30] Hugh Shanahan: Yeah, so I think being an academic I should sort of pull out the, uh, 10 minute seminar description, yes, well, this was...it appeared here at this point, and then here are these...and here are the different interpretations and so on. I'm gonna be lazy Nick to be honest with you, and I'm gonna give you what, what I think of open science as, and I want to try and also keep it...I'm gonna go for as minimal a definition as possible, and then what I wanna do is just try and unpack that a bit, if that's okay?

[00:09:09] Nick Sheppard: Yeah, no problem.

[00:09:10] Hugh Shanahan: So, the way I would talk about open science, and I'll try and I'll talk about open science initially and then move on to, to maybe to, to open research, is, you know, in, in two sentences.

It's...open science is a set of practices to make scientific research more efficient and effective when we live in an era when the questions that we're thinking about have become more complex, more challenging, and also are data and computation driven.

Underlying a whole variety of different, sort of, aspects that sit with the the open monika, in this, is that there's, that, there's this principle of being as transparent as possible during the process of scientific discovery. Now, what I would also do is to say that with respect to open research, you are extending that remit in terms of, in terms of saying, yeah, actually there are many areas of research which are, again, facing bigger issues and are very often being driven by large data and so on. And hence the process of being transparent in terms of what you're, you're doing is, is...it also holds there.

Now, I think it's, it's worth unpacking that a little bit now, trying to come up with something which is short and pithy and so on, but then you're gonna go, oh, what is all that? And I, what I wanted to do is just spend a little bit of time talking about the things that aren't included in that definition. Alright? And, uh, because. I, uh, there are, if you know, uh, during, you know, these interviews, I'm sure you're gonna talk to a variety of different people who say, well, what's open research? What's open science? And you'll get a different spectrum of opinions. Okay. And I'm, I'm not here to sort of say, well, I'm right because I, you know, because I put Professor of Open Science on my door. So I'd like to kind of just explain those things which I don't sort of mention in there.

So the first point is, is that in those two sentences I made no reference to community or collaboration and those, those are really, really sort of important ideas. Alright. Uh, uh, I would argue and say they kind of flow from the idea of, well, if you're gonna be transparent, if you're gonna be sharing stuff, actually by fiat, you know, you start working with people and start realizing, hey, hang on, we've gotta, we've gotta share standards and so on.

There's no reference to research integrity, even though, I think the more that time goes by, the more we kind of realize how grey the landscape is and how much we, and I mean, that's, that's everybody - that's the academics, that's research pro...you know those in professional services, that's the funders - uh, all need to have a good, long, hard look in the mirror of ourselves and say, we need to reform ourselves.

There's no reference to equality. Uh, uh, in terms of, uh, the fact that, you know, that's kind of an elephant in the room here is that I'm chatting away and I'm...then say, I put open science on my door because you know what? I say that, I say so, and I, so with a certain level of confidence and oh yeah, I'm a white middle class cis male. Uh, uh, so I can, I can afford to be a bit cheeky. And, you know, so there's no reference to those kinds of kinds of issues there, and there are a variety of people who really think hard about and say, no, no, no, hang on, this is super, super important.

There's no reference to expansion of roles in terms of sort of saying, no, no, no, it's not just the holy academic, it's the data steward, the research software engineer, um, the curator. All of those people who, who, who also sort of play roles.

[00:13:35] Nick Sheppard: I think that's, it's really interesting. Sorry to interrupt Hugh, but I'm just, uh, you know, from that perspective, I think what you're getting at there as well, cause, you know, um, at Leeds we've got, um, a lot of initiatives around what we're calling research culture.

[00:13:47] Hugh Shanahan: Yeah.

[00:13:49] Nick Sheppard: And all of these are part of that, aren't they? They are about community. We, we start to talk a lot more, certainly our university and elsewhere, collaboration over competition, um, research integrity as you said, and, and quality and the equality, diversity issues. You know, I was talking to a colleague recently for this podcast around so-called "Bropen" science. Have you heard that term?

[00:14:12] Hugh Shanahan: Oh no, but I think I can guess

[00:14:14] Nick Sheppard: Yeah, the concept of "bro" I mean I'll link it on, on her podcast. You know, this idea that there's people already, you know, often white males in positions of power that can actually, um, bully, uh, younger ECRs, you know, and there's a gendered element to that and all that kind of stuff.

So, I suppose that's one of the aspects I'm most interested inm is how this sort of underpins developing research culture, um, which as you say, isn't captured in that sort of raw definition, I suppose, of open science.

[00:14:43] Hugh Shanahan: Yeah, yeah, exactly. And I, the things...I mean, you know, it's kind of from my background, I still do tend to go towards the, well, surely there's a high tech fix to all of this rather than saying, hang on, there are cultural issues that we need to address here.

I think the things I would argue in terms of saying...focusing on that, that sort of minimal definition is to say that if you're clear about what it is that you're trying to do, which is to say, look, we want to be more efficient and effective, and what that means is, we need to put aside this relentless obsession with papers and so on, and kind of understand that there are lots of different ways in which we need to be better at what we do, and we do that by making sure that we share all those ideas, whether that be code or data or, uh, pre-registered reports or protocols or, you know, this whole sort of gamut of things that they enter into this milieu, then that enables us to have that stop and hard look at the culture and say we need to change things. I mean, if we think about different organisations that, you know, in the commercial sector or in terms of, uh, um, things like the military and so on, there's this sort of clarity in terms of what it is that you want to achieve, then you start saying, well it's ridiculous that things should be run in this, in this way.

As I said, I think there would be many others who would push against that and say, uh, no, culture first. You get the culture fixed, and then that follows. And I would sort of say, I totally respect that. I totally get what's being said there. Uh, and you know, respect is being said there in not just in a trivial kind of way. It is absolutely a total sense of respect. I guess. It's my way of trying to figure things through.

[00:17:18] Nick Sheppard: Yeah. And then...and it's both, isn't it? I suppose you are almost talking like top down and bottom up, aren't we I guess, coming at the problem, from different directions? And do you think we are making progress? I mean, you were very good enough...you were very good to do a talk for us about 18 months ago now. I was just looking back. It was back in May, 2021. So, um, a little over 18 months, isn't it now, on open research.

I was looking at some of the things you pulled out there, you know, you talked about openness as a spectrum. You know, it's not all or nothing, I think there can be this sort of concern that, you know, if you're not doing everything, then you can't do anything at all. Um, but I suppose just, maybe a trite question, but have we made any progress in those 18 months, do you think? Are we moving in the right direction with open science and open research?

[00:18:11] Hugh Shanahan: Yeah, good question. I think it's, um, let me focus on the UK, though I think a lot of the same arguments will hold in other high income countries as well. So what I think we've seen is, uh, much more sort of institutional, well, not just institutional recognition, but institutionally, kind of, them moving on at that kind of level. So 18 months ago you had, you know, universities like say Reading, who were the only ones who would have an open science or open research policy, and now you're starting to see a kind of a flowering of different things and it's pretty open, which is, you know, like Royal Holloway has an open research policy. I was very lucky to be part of that with Scott Glover, who's based in psychology at Royal Holloway, one of the kind of key leads plus a number of other people. And that's, um, politely nicking stuff from places like Reading, and modifying it and so on.

And what you see now is the kind of senior management of universities saying, yeah, we get this. We need to do these things. And you see them providing support in terms of rolling things out and thinking. So I think what you've got is definitely an acceptance of the importance of open research, open science by university management. And that's more than simply, yes, this is quite a nice idea, but, you know, make sure you still get five nature papers out tomorrow, sort of perspective. It's much more kind of saying, yeah, we need to do this and backing that up. And it feels like a lot of the effort now is sort of in the training side of things and raising awareness and also sort of saying, okay, how do we do this in a sustainable fashion?

Because we can't look at universities to sort of say...as you know, one of the things I'll argue is sort say, yeah, infrastructure costs alright? And furthermore, it's not just a once off, it's a thing that keeps on going. So how do you organise this in a sustainable fashion? So what I think you're seeing a lot more of is more awareness raising and starting to do things in a more kind of concerted campaign n terms of training and so on.

So that you get to a point where all academics are, at least in some way, aware of what the thinking is behind this approach. Not necessarily saying, like what I was talking to initially, not to say to them, now you have to change everything because with this is year zero. It's to sort of say, okay, here's a whole bunch of things that you could do. Pick one of them or could you pick one of them? Have a think about that, you know, is the stuff that's there.

[00:21:44] Nick Sheppard: No, as I say, I will post the link to that talk you did for us in the show notes for this. But I like your metaphor of a bridge, you know, you don't need a massive, you know, suspension bridge, uh, necessarily to get across a river, you know, it's starting, isn't it, and just doing something in the first instance and taking it from there?

[00:22:07] Hugh Shanahan: Exactly.

[00:22:08] Nick Sheppard: And in that context as well, I mean, you've mentioned infrastructure a few times and another concept that you talked a lot about at that talk was this concept of a first class research object.

So myself working in libraries, you know, we've tended to be perhaps a little bit obsessed with open access, if that's not overstating it, and the journal article as the, you know, the final research output? And I think perhaps we're trying to get away from that a little bit, would you say in terms of a first class research object?

[00:22:43] Hugh Shanahan: Yeah, I think so. So first of all, maybe to explain, I used this phrase "first class research object" I think in my talk, and for the life of me, I can't remember...I know I wasn't the person who thought of it but basically somebody else. For the life of me, I can't find it but I need to look harder.

So let's think about it again from a UK audience is to say, what do we mean by first class research object? It's a thing which you can present at the REF is, you know, that's the most kind of blunt definition. And that's of course the thing that all academics suddenly... their, you know, their antennae start, kind of, looping up. And if you do look at the REF rules, even the previous one in 2016, the people who are organizing the REF said, yeah, it doesn't have to be a paper, what that submission is. Now the problem was that, you know, the relevant committees...everybody kind of second guessed the committees, and sort of said, well, no, they won't be interested in data sets or bits of software or anything like that, so there was a great deal of sort of self censoring. So it was like, in the 2016 REF, I think there was like a handful of non-paper based submissions to the kind of things that are there for the research.

I'd be interested, I haven't tried to look at what happened with the REF in whatever, 2021 to see if there was any improvement on the situation. But still, overall, you know, we have this, I would call obsession with papers is what you need to do to make your career, and of course papers is broad, that's monographs, books, you know, written word and you know, conference proceedings, all of that jazz.

And, one of the things that, that I'm arguing is to say that we should think of data sets, the software that we write, even things like lab protocols and so on, all of those things should be things that we value because they're stuff that other people can go and run with and actually and go and make use of their research. And when you think about that in that way, that's I think an incredibly empowering thing because, you know, what usually happens is, let's be honest, what we have in research groups, is that there are people who are good at writing, and then there are other people who do all the other bits and pieces. And there's always that thing about, well, you know, this person is...x is really good, they just, they're not, you know, they don't write the papers, so how do we, we can't ignore them, but they're not one of us, you know, and if you start saying to them...Let me pick one example: person X writes pieces of software or workflows or whatever that really work really well. Now, if you, if you think of what they do as a research object, a first class research object, then they get the credit. Other people are using their stuff directly rather than trying to figure stuff out from the paper and so on, and people get on with doing things much, much more quickly.

[00:26:23] Nick Sheppard: And is there an infrastructure aspect to that? Because perhaps...and again, I dunno if this is true or not, up to what extent It's true, but a lot of infrastructure, institutional infrastructure's focused on that paper. You know, things like repositories and systems like Pure that I know use it at Royal Holloway or Simplectic for us here at Leeds. They tend to focu...and the broader infrastructure perhaps focuses on papers?

[00:26:52] Hugh Shanahan: Yeah. Yeah. Yeah. So I think the steps are being made in terms of...we now have, the thing that's really important that's happened over the last, say, 10 years is that persistent identifiers are now a thing.

[00:27:16] Nick Sheppard: That's things like a DOI...

[00:27:18] Hugh Shanahan: DOI, and ORCID and so on, and correspondingly. So, and again, you know, you get DOIs for data sets. You get DOIs, you know, you submit something onto, to Zenodo or Figshare or Data Dryad, you get a DOI for your data set. You get a DOI for your software...there's the software heritage archive. All of these things you now have, so they're that first layer of sort of atoms that, you know, things that people can call and gradually we start seeing databases building up. Which are initially, right now fairly generic, but then gradually will become more domain specific so that people can answer this.

I mean, I think one of the things that's kind of one of the projects I've been interested in is this thing called computational notebooks. So examples of this are things like Jupyter Notebooks and R notebooks and so on, which kind of put your code and your text and visualization all in one place. And, at some level, they're a fantastic tool for doing analysis of your data and in some respect they really are like the inheritors, for some disciplines at least, they really are the inheritors of the thing that papers should be doing next, all right. Because it's a very, very interactive, you can play around with it as you see fit, but it's only now that we're starting to get our heads around publishing notebooks. And there are, so for example, there's, there's something called NeuroLibre which is doing that. And publishers are trying, and there's a bunch of other efforts that are there to try and make that happen, but it's still not quite there. And I think one of the things is really, really key there is the search element. So, and that's because searches...you know, say, Google Scholar, everybody uses Google Scholar, but it's a service that could disappear tomorrow, so you know...

[00:29:29] Nick Sheppard: Well, yeah, I mean perhaps the open source, you know, that's another podcast in itself, isn't it I think, discussing the issues around open source versus commercial software. I mean, I think we've seen that a little bit with Twitter recently, you know, the fact that a tool that's used by a lot of scientists and academics, is then suddenly taken over by, you know, a billionaire in this case, and he can run it according to his whim. But that, you know, it is instructive, isn't it, to think about software in that sort of context?

[00:30:00] Hugh Shanahan: Yeah. Yeah. And how much our services are really, really dependent, you know, I mean, Google Scholar I think is a very extreme example of something that, you know, it's not anywhere part of Google's mission in some respect, it could be pulled tomorrow and then it's like...and of course we know, I mean there's Scopus, there's WoS and so on, but it's kind of, that's the one which is you take that away and then I think lots of academics are gonna go, everybody's gonna go, oh yeah, we had these resources that we were using previously, and they'll have to kind of figure out how to use them again.

[00:30:39] Nick Sheppard: And I think perhaps the sector is thinking a bit about infrastructure at the moment. I mean, you'll be aware of Octopus, which is a platform that's intended to sort of pull things together in an open way, and I've spoken to Alex Freeman who was created Octopus on this podcast. So I dunno if you've got any thoughts on that?

[00:31:01] Hugh Shanahan: So Nick, I have to, to, to make a full confession. I mean, I've heard of Octopus, but I haven't had a chance to have a play around with it. So I won't particularly comment on Octopus myself.

I'm delighted to see that that services like that are coming into place. Um, I am very disappointed that at the same time just have decided to withdraw support for CORE, which, we have to get to a point where it can't be just like on the other hand, taketh away, you know...

[00:31:43] Nick Sheppard: Just for people that may not be aware what CORE is, cos again, it's kind of behind the scenes, isn't it, a bit really? So people might not be aware of it. Can you tell us what it is?

[00:31:52] Hugh Shanahan: So it's a repository, well, not so much a repository, but a service in terms of listing sort of open access publications and so on, that's there. So it's, as you said, it's a thing which kind of sits in the background. It's not...the plug hasn't been pulled on it entirely, but the funding's been withdrawn and now they have to, they, those people have to now figure out how to get supporting for this. So all the time, there's always...I think this will be like a motif with all of these interviews in terms of infrastructure costs.

Never ever trust anybody who says, oh, don't worry, it's really important and people somehow, some way it'll be paid for. X will be paid for. You have to kind of say, this is like having, this is like roads and buildings, you know, you gotta build them and you gotta keep them updated and every day you gotta, you know, and that's the quid pro quo of this...

[00:32:54] Nick Sheppard: Is it almost....I mean I don't wanna go too far off piste, but almost a fundamental issue with the internet? You know, thinking back to when I tried the internet for the first, back in 1994, you know, things have changed since then, haven't they, I mean, it's this commercial ownership and the, you know, the big data companies, whether it's Facebook or Apple or Google, you know. This is all part of an issue across the internet I think?

[00:33:19] Hugh Shanahan: Yeah. I think, yeah. I mean, I want to stress and say that I don't have a problem with commercial organisations running services, as long as there isn't a point where you can't change your mind and say, I'd like to use this sort of service from another. I think that's the important element. I mean, again, if we use the internet sort of idea, is that again, in the background, there's a colossal amount of people and organisations who were involved in, you know, laying down the optical fibre and the routers and, you know, the stuff that's there. And they do a really good job. And it's just that you're not like dependent on one organisation that do that. Huge numbers of different organizations who are there, who are doing all of that, you know, providing the backbones, the tier one, the tier two, and the tier three, all that, you know, the sort of level of connectivity and so on. So, as long as we're not dependent that we can always say, Thank you, but you know these other people can do just as good a job at 80% of your cost, thanks, we'll use them now. That's the thing that keeps everybody honest.

[00:34:51] Nick Sheppard: Well with your admission that you haven't yet tried Octopus, perhaps you can listen to, um, Alex Freeman's podcast, and perhaps hope that she doesn't listen to yours.

[00:35:01] Hugh Shanahan: I'm not, I'm not having a go at at Octopus.

[00:35:03] Nick Sheppard: No, no. I know.

[00:35:04] Hugh Shanahan: I'm the next level, next level up I'm afraid.

[00:35:06] Nick Sheppard: No, no, completely. But just, yeah, in terms of, as I said, I did talk to Alex and, you know, I think it's an interesting model and it will be interesting to see how that evolves.

So I'm just a little conscious of time, not least cos I'll have to transcribe this podcast as I said to you, but you touched on it a little bit, but, you know, talked about infrastructure and the fact that policies are developing in universities, et cetera. And I think earlier on you suggested that training's the challenge now, or is that particular challenge actually training people? I mean, already you've mentioned Jupyter notebooks, you know, uh, Figshare, you know, Zenodo, all these different services, pre-registration, registered reports, you know, there's data sharing and software and, you know, there's so many skills. So, I mean, two aspects to that, I suppose for me, just to ask you about. There's training for them and also recognition. I mean, people are busy, aren't they? Researchers are, are busy.

[00:36:02] Hugh Shanahan: Sure, sure.

[00:36:02] Nick Sheppard: How, you know, why should they do this stuff if they're not getting sort of rewarded for it?

[00:36:09] Hugh Shanahan: Yeah, good question, Nick. I mean, I think there is a...there is obviously a chicken and egg type of scenario, which is, unless people start developing expertise in that background... in a particular area, then they won't sort of evolve to the point where they get recognition for things.

The things I would say is that we need to be fairly focused in terms of what it is, the type of training that we provide. So if you're talking to say somebody who's, say, a PI who's got a research group. Okay? The first thing you're doing is you're saying to them, um, you know what, you don't actually have to need to know all the ins and outs of what's happening here.

You need to kind of understand, probably have a sort of fairly broad understanding in terms of saying, okay, keeping an eye on your data and so on. And if you are, if your area means that you're developing software that yeah, actually your team should be should be doing things in a certain sort of way, but you don't necessarily have to have everything on your fingers and tips.

I think the other...you know, if you're a PhD student, if you're an early career researcher, then again, it should be about, okay, what is it that's useful for you? Uh, and you know, I mean, if we think particularly about PhD students, we also should be aware that, you know, every PhD student now...we also need to, kind of, be giving them the skills so that actually if they want to go, you know, that if they decide not to go into a research career, that they can think other options in their lives and to say to them. Hey, hang on, there's some practices here which are...which might be of use to them. And again, it's not saying to them, oh yeah, you need to know everything. You know, you kind of, sort of say to them, well, you know, actually things like annotation of data, that might be something that's useful to you for the things that you do. Or it could be developing software or it could be thinking about documenting lab protocols and so on.

Um, so I think it's also about, before we say to people, do this, and you will eventually at some unspecified point, you will get some, you know, you'll be rewarded for this, is more to take them...you know, the first win is to say, if you do this, this is actually something that's just gonna make what you do more efficient, first of all. Okay. So in terms of, you know, talking to a PI to say to them, hey, you know what, um, you remember that thing where, you know, a postdoc would go or a PhD student would go, and then, the next person along would spend like six months figuring out what the hell the last person did with the data that they generated or so on.

You can say, you know what, if you do things, you know, in this way. Think about that. You can bridge that gap and you can get them much closer to making that transition much easier for you if you can say to, you know, a graduate student, um, listen, you know what you kind of understand how to do stuff in Excel, but if you do things in R, yeah, I know it takes a few afternoons to get your head aroundRo, and stuff like that. And it's, ooh, scary, it's programming. It, it means that you can process a thousand files in one go rather than one file, spending a thousand days doing the thing with an Excel spreadsheet. And you can do it in a reliable, reproducible way, then those are the wins.

So again, it's always...still kind of what I talked about 18 months ago in terms of small wins, uh the efficiency gains, I think are the things which are important, which is, you know, which is probably the Achilles heel in some respect of open researh policies, when you kinda say, oh, here's the big picture, when you want to kind of say one good thing for you now, you know, and trust me, this will help you a lot.

[00:41:06] Nick Sheppard: Yeah, that's a good way of thinking of it. And then as, as I say, I will let you go in a moment, but just, aware that we haven't yet defined the acronym FAIR. Perhaps you can give us a quick overview of what FAIR means? And for my personal benefit, perhaps focus a little bit on the I and the R? I'm okay with the F and the A, but I struggle a bit with the I and the R. Could you tell us what FAIR means?

[00:41:27] Hugh Shanahan: Sure thing. So, uh, findable, it's up there. FAIR is, it's an acronym. It's shorthand for, I think it's 16 principles. I, I don't actually have them tattooed across my chest.

[00:41:38] Nick Sheppard: Not yet.

[00:41:40] Hugh Shanahan: Not yet. But the letters stand for findable, accessible, interoperable, reusable. And what it is, is it's in essence it's about doing two things, which is a) trying to encapsulate the kind of hard won principles that came out from sort of sharing data, you know, and research, data management, sort of saying, these are the things that you should be doing.

And the second point, which I think hopefully addresses, at least the 'I' one is taking, that next step, which is to say, can we make this sort of machines talking to machines? I'm a little bit cautious about that because as I mentioned previously, you can get very carried away with the machine to machine everything. Everything works perfectly, and that' a big ask. Uh, and I'd rather have stuff which is, which is again, something small that works now almost fair, all in little letters rather than in big capital letters.

[00:42:48] Nick Sheppard: Well, I think we struggle perhaps cos we look after a, an institutional data repository, and our data is so heterogeneous, I think it's really difficult to look at the interoperable aspect of it in that sort of machine to machine...

[00:43:02] Hugh Shanahan: So, so you asked me to kind of maybe dive a little bit more into that. So, uh, in, uh, so the findable aspect is to say, well, can I actually get onto my laptop and find this data set wherever it sits, I go, aha, here's a DOI for this.

Uh, accessible means. That, oh, I've got the doi. And you know what, when I click on the do it takes me to the landing page where I can go click and I can download the data set. And this is all kind of short, kind of cartoon-like figure.

Interoperable, the 'I' says I've downloaded that, and now it's...that data is in a format that I recognise and know how I can read into my computer. And then the 'R' is reusable, which is to say there aren't any licenses associated with the data set, which say, oh, you know, yes, you're free to download, but no, you can't really publish anything about, you know, that it gives you that freedom.

Now those principles do things...they're a lot more subtle than that. So they, you know, for example, they sort of say things like, well, data might not be there for reasons, perfectly reasonable ones, but metadata, you know, the data about the data should always be there and so on.

So, um, I think we've kind of reached a point where FAIR is something that's accepted. We've stopped having kind of cosmic discussions about what does FAIR mean and much more people sort of pulling up their...you know, sort of saying, okay, we have an idea of what's FAIR. I think what's kind of interesting is that the next phase is much more about use cases to sort of say, yeah, this is what works in a particular domain.

So something to keep an eye out for are things called FAIR implementation profiles, which are effectively use cases of how FAIR gets implemented for a particular type...for a domain piece of data. And I do recommend, uh, so there's a new collaboration, which I'm not part of, so I'm not...I don't have shares in it, but it's something called WorldFAIR, which is doing a lot of work in that particular area.

So if you're kind of struggling for saying, oh, okay, how do I make my data FAIR? You can look up these profiles and sort of say, oh yeah, I'm working with geophysics data of some kind. And I can say, oh yeah, this is an example of how this was done. And now that kind of gives you, in some respect, a recipe to work at it from there and so on. So, I hope that a answers...

[00:46:02] Nick Sheppard: Yeah, yeah, it does. I mean, thank you. And as I say, you know, I do kind of know what FAIR means, but I suppose actually putting it into practice, especially when, as I say, we've got heterogeneous collection of data in a local repository and then trying to point people...you know, research colleagues at maybe, you know, domain specific repositories and standards. And, a lot of it comes down to, you know, as I say, these disciplinary differences is quite a challenge for someone in a role like mine, I think.

[00:46:29] Hugh Shanahan: Exactly. So, as I said, I think the FAIR implementation profiles, I think, you know, you could see something on the horizon where there's a whole library of them that you can kind of pull out and go, ah, this data set, this type of data set, this is what I could do. That I think is an interesting sort of thing. It's a lot of work, but it could be something quite important.

[00:46:51] Nick Sheppard: Yeah. No, I'll have a, I'll have a look at those. Thank you. So I will let you go. I mean, you did warn me at the outset that, you know, you might talk a lot, which is great. That's fine. Thank you very much. I suppose just a final sort of parting shot, you know, the question that we've touched on already, but, you know, what, what is next for the sector, do you think, in open science? I mean, what should be our priorities whether locally our institutions, you know, nationally in terms of the UK or internationally? I mean, very quickly off the top of your head, what, you know, what should be the priorities?

[00:47:21] Hugh Shanahan: So I think the challenge now is that...so institutionally things are starting to get their act together. We also have domain specific work, which is there. And what we need to do is to kind of get our heads around that matrix so that we have people who are working...and, and by the way I'm doing lots of hand motions right now, which will mean absolutely nothing to those who listen to this podcast. But in some respect, we have the people, you know, the data stewards, the research software engineers, who do things at an institutional level, who are doing things quite generically and thinking about things that way, but then also, for particular disciplines, you have a corresponding people who have much more of a domain, specific area who will work across institutions. So, you know, so that things like N8 or, you know, and and so on, could sort of say, oh yeah, there's, you know, there's the data stewards associated with archeological data, say, and they work, you know, in the uk, you know, there are a handful of people who work in the UK or indeed across Europe.

[00:48:38] Nick Sheppard: So we know where to sign post people for the expertise in a specific discipline.

[00:48:42] Hugh Shanahan: Exactly. Yeah. So that I think is the sort of the...we need to kind of get our heads around how to work at those sort of two levels. And I think it's sort of like, almost like that matrix type of model in terms of training and also in terms of services and infrastructure.

[00:49:02] Nick Sheppard: Yeah. No, brilliant. That's great. Thanks very much Hugh, as I say, I'll draw to a close there, but thanks very much for your time and perhaps we'll have you back on in another 18 months, see if we've made any more progress, if you can spare the time, and if there's still any snow, perhaps take a few photos of Founders for old time's time sake.

[00:49:21] Hugh Shanahan: Absolutely.

[00:49:21] Nick Sheppard: Thank you very much and I'll see you.

[00:49:23] Hugh Shanahan: Thank you, Nick. Cheers.

[00:49:28] Outro: Thanks for listening to the Research Culture Uncovered podcast. Please subscribe so you never miss out on our brand new episodes. And if you are enjoying the discussions, give us some love by dropping a five star rating and written review as it helps other research culturists find us and please share with a friend and show them how to subscribe.

Email us at academicdev@leeds.ac.uk. Thanks for listening, and here's to you and your research culture.