Safeguarding Data

AGU seeks your input on a new draft of a position statement addressing safeguarding data to address global challenges for our future. Your feedback will be considered in revisions of this statement. Submit your comments by 30 April.

Safeguarding data to address global challenges for our future

Achieving accessibility, transparency and reproducibility in research

It is urgent for humanity to address serious and complex global challenges including sustainability, ecology, biodiversity, water and food security, and climate. Trustworthy and reproducible science requires that data - including their provenance - behind an assertion be accessible to evaluate and build upon. International collaboration and the open sharing of data are essential for addressing these challenges and promoting new scientific advances. While great strides have already been made regarding data sharing platforms and data and metadata standards, work remains to facilitate careful collection, use, stewardship and rewarding of data across the Earth and space sciences.

This statement takes the broadest interpretation of the term “data” according to the Beijing Declaration on Research Data: data can be collected, generated or compiled by humans or machines and include (but not limited to) metadata, samples, methods, software and algorithms. Infrastructure includes hardware, intangible assets such as software, and human capital.

Empowering scholarship

To achieve an ecosystem where data are consistently collected, well described, shared, preserved and reused there needs to be significant continuous evolution in (i) equitable access to data and infrastructure, (ii) a change in the research culture and the scientific rewarding system, and (iii) standards around describing and documenting data for multidisciplinary research.

Ensuring equitable access to trusted and reusable data

Researchers have a responsibility to collect, document and share data in an ethical manner, that is as open and transparent as possible. Persistently funded open research infrastructure must facilitate ease of deposit and long-term reuse, both for the management of data through the research process, but also to ensure the data are FAIR - Findable, Accessible, Interoperable, and Reusable -  for people and machines. Given the challenges associated with data related to national security, intellectual property concerns and cultural sensitivities this statement takes the position of ‘as open as possible, as closed as necessary’, in alignment with the UNESCO  Recommendation on Open Science. Future data ecosystems could implement federated infrastructure to ensure equitable access without compromising national security, privacy, and other concerns. Acknowledging the context and provenance of data is highly important and should be done with respect to nature and people (according to the CARE Principles for Indigenous Data Governance), and to build the foundation for trustworthy Artificial Intelligence (AI).

Changing research culture to prize data contributions

Researcher behaviour is strongly influenced by the cultures of institutions, publishers, and funders. Thus, it is critical that these entities recognize and reward contributions by people serving different roles across the life cycle of data. Accordingly, robust data stewardship requires cooperation between individual researchers, scientific facility teams, disciplinary communities, and repository personnel. In addition, researcher training should incorporate a curriculum on best practices for data stewardship and disciplinary communities. Domain repositories should prioritise and reward efforts to develop, share, and adopt these best practices. Dedicated data managers/curators are integral to building synergies among these many players and need to be appropriately funded and recognized. 

Implementing metadata standards across disciplines

In our interdisciplinary and diverse research community, data allows researchers from different domains to communicate concretely about specific measurements and analysis. However, which salient features of data are captured, and how they are described, often differs by domain, creating challenges to collaboration and data reuse. Additionally, describing and accounting for the specific contexts (e.g. terminologies, uncertainty, sampling biases) in data is especially important for responsible use of automated data-driven analysis and artificial intelligence.  It is thus paramount that our community continuously improve data documentation and adopt standards for transparent, machine-readable and understandable metadata, and data properties.

Taking collective responsibility

Robust stewardship, preservation and sharing of data requires individual actions by all players in the global research ecosystem: researchers, institutions, funders, publishers, and research infrastructure stewards. Infrastructure supporting the complete data life cycle should be globally distributed across public, private, commercial and not for profit entities. Limiting infrastructure within national boundaries restricts the development of science addressing global challenges. All entities forming this ecosystem have a responsibility to ensure their infrastructure not only facilitates equitable and sustained ease of use for both data depositors and users, but is also interoperable and focused on collective reuse of data.

In summary:

  • Researchers should engage with research infrastructure and repositories to ensure their data are FAIR and data are developed/used responsibly
  • Research infrastructure can be key partners for researchers in making their data FAIR and should ensure metadata capture is robust, standardized, and comprehensive (including widespread usage of persistent identifiers for researchers, samples, datasets, and software, for instance)
  • Publishers and Journals should require data (as defined broadly in the beginning of this statement) be archived in federated and recognized (domain) repositories and be available on the publication of the work while mitigating concerns of privacy, national security and data sovereignty
  • Institutions should recognise, reward and incentivize the work associated with responsible data sharing and stewardship
  • Funders should recognise data as a primary scholarly output and include the requirement for responsible and FAIR data sharing in all funding agreements, with consequences for non-compliance
Position Statement Guidelines
Before submitting feedback, please review our guidelines for writing comments on position statement drafts.

Join the Conversation

Please enter constructive feedback on this position statement. Your comments will be reviewed and added to the public comment section below.

NAME
EMAIL
AFFILIATION
ARE YOU AN AGU MEMBER?
I HAVE READ THE DRAFT STATEMENT ON SAFEGUARDING DATA AND...
Public Comments
30 April 2024
First, I really appreciate this statement, and agree with most of the content. I think there is one area/theme that should be emphasized a bit more. Another large part of "achieving accessibility, transparency and reproducibility in research" is investment in improving infrastructure that makes submitting high-quality data and recording provenance easier, automating as much as possible, and creating valuable data integration/visualization products. And this should be highlighted more throughout the statement. For example, we can provide persistent identifiers for samples and datasets, but there are currently no tools that provide automated cross-linking and exchanging relevant metadata when multidisciplinary samples are sent to multiple data systems. We need to build automated connections for anything using a PID, such that when a PID is referenced we automatically update provenance records and cross-link where relevant. This statement does not need that level of detail, but something to the effect of investing in improving data infrastructure and connections across systems in a way that incentivizes using (meta)data standards. Other examples would be to create tools for data integration, integrated data products, visualizations, etc. for similar data types that may be in high demand.

As written, this document emphasizes researchers using data standards more than infrastructure providers creating high-value systems and data products that incentivize use. And as stated this requires yet additional funding and investment in data infrastructure itself (in addition to dedicated data managers/curators). Many data repositories, for example, do not currently have sufficient resources to create and maintain tools that make data curation and integration within and across data systems easier for researchers.
This is an excellent and complete statement, I particularly appreciate the bullets targeted at institutions and funders, since it has long been the case --even with recent progress -- that a career in data issues has lower chances of being a successful one, particularly in academia.
As a general comment, there are some aspects that I particularly liked from the current position statement on data that might be missing from this statement:

* data management planning: there is no reference in this statement
* data as a world heritage was for me a very strong statement that highlighted the uniqueness of Earth and Space sciences.
*Most of the statements are quite specific with concrete examples which focus on solutions. For instance, when speaking about recognition, there was an explanation and how recognition could be achieved: ""

Recognition can take many forms but is most often seen in the scientific community in the form of credit and attribution through citation. All elements of the science ecosystem are eligible for citation, including data processing, creating curated products, and code development, as well as the creation of the research artefacts themselves. Citing data sets and other research artefacts in a precise and persistent manner increases the use and sharing of data, publication, and other recognized impacts of scientific research.""

I feel that adding some context as per the previous statement would make the statement stronger and more easily understandable.

Specific to the word ""Safeguarding"" in the title - I might not be sure we are talking about only safeguarding data here but more about initiating a data culture change to address global challenges for the future.

Specific to the section header ""Empowering Scholarship"" - I am not sure we need a title section here. It depends on what we want to highlight in the statement though.

Specific to the phrase ""trustworthy Artificial Intelligence"" - I totally agree with Anca that a reference here could be added. it could potentially be also the report Shelley wrote: Stall, Shelley, et al. ""Ethical and responsible use of AI/ML in the earth, space, and environmental sciences."" Authorea Preprints (2023).

Specific to the header ""Implementing metadata standards across disciplines"" - I would emphasis data and metadata standards here. Although, we indeed need to develop the metadata standards, the data themselves have their own challenges and with the development of AI/ML, there might be a change in how we want to store (formats) or share data (protocols).
Specific to the phrase ""trustworthy Artificial Intelligence (AI)"" - suggest to add a reference here. In EU we have this https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai

Specific to the term "provenance" - consider using "lineage" instead of "provenance" to be inclusive of the transformations applied on the data.
These comments may also be found at:
https://docs.google.com/document/d/1vsdgiadkLLaW2DXr9QuU-aGlaPYDmqiuSK-GC8u1t3o/edit?usp=sharing

First, the Informatics Section extends its sincere appreciation and thanks to the writing team for their diligent efforts and intentions in drafting the Position Statement on Safeguarding Data. This topic is critical to ensure that geoscience research can thrive, catalyzing solutions to our most pressing societal and environmental challenges. The comments offered below are intended to provide additional perspective and seek clarity in areas to improve the final product. Thank you for your work and the opportunity to comment. Danie Kinkade, Informatics Section President.

Comments on overall statement:
Martina: I miss the role of repositories and the sustainability aspect for infrastructure but also data/software.

Alexey: Data sharing is a necessary precondition for the overall objective of increasing accessibility, etc., but it is far from sufficient. “Just” making the data available — even with good metadata, etc. — does not make it technically possible to do anything with those data (e.g., if the data are really large, or in a complicated format). And, even if the data are technically easy to use (e.g., analysis-in-place environments; interactive data portals; analysis-ready formats), that is usually not sufficient to make the data accessible to a non-expert audience. Some examples: (1) a dataset of 500 mB geopotential height anomalies will be mostly useless to people without meteorology training no matter how good the format or data portal; (2) “surface temperature” means subtly but significantly different things in the context of a weather station, a satellite retrieval, and a model estimate; (3) daily output from a reanalysis or weather forecast has very different utility from daily estimates from a multidecadal climate simulation. In addition to the policy and culture components of requiring data sharing and the technical components of how data are shared and distributed, we need to devote serious attention to the “information management” side of stewardship. In the same way that good librarians do much more than store and organize books — they help visitors find books best suited to addressing their specific questions or general interests — this document needs to address the critical role of “library science” (e.g., how data are cataloged; semantics and ontologies of datasets and variables) in data stewardship.

Danie: The body of the full statement asserts that research data infrastructure must be persistently funded to enable long-term provision of FAIR data put forth in the statement, however the Funder and Institution Summaries lack reinforcement of this assertion. Who, if not funders and institutions will support the sustainable infrastructure necessary to steward the data which they are being charged to require the sharing of? More broadly, if the infrastructure needs and costs are not addressed, the safeguarding of data will be a difficult goal to achieve.

Comments on specific sections or lines of the statement: Martina: Section “Taking collective responsibility”: Add repositories with their role or alternatively include them explicitly in the “research infrastructure”. Section “Ensuring equitable access to trusted and reusable data”: mention importance of sustainability and the TRUST principles, which capture this long-term aspect.

Alexey: [24-40, “Ensuring equitable access…”] This section should mention the need for technical investment in research and development of data architectures. Although policy / culture challenges are arguably the most important barrier to data sharing, there are some significant technical challenges that should be addressed. Some examples include: (1) Compression, including intelligent lossy compression, of large datasets; (2) provenance tracking of datasets, especially for datasets derived from other datasets, and in situations where multiple copies of data are stored in different repositories; (3) incompatibilities in analysis libraries and techniques between different data models (e.g., multidimensional gridded data a la NetCDF vs. raster data a la GeoTIFF); (4) differences in latency and performance between local file systems and network-based object stores like S3; (5) guidance and best practices for “hybrid compute” that takes full advantage of both local compute resources like institutional HPC and analysis-in-place capabilities of cloud computing environments.

[41-52; “Changing research culture…”] This section needs to more explicitly address current incentive structures that (are perceived to) disincentivize data sharing. A few suggestions: Explicitly state that existing academic incentive structures discourage (or, at least, are perceived to discourage) proactive data sharing — e.g., concerns about “getting scooped”; pressures to publish and compete for grants; well-funded groups with technical resources and deep benches of expertise can cheaply publish high-impact meta-analyses, but the data behind these studies is expensive to collect and often collected by early career researchers and non-academic staff (who, at most, get a lower-impact journal citation). Explicitly state that these disincentives for data sharing need to be recognized and addressed. In some cases, the perceived disincentive is not real — e.g., some research shows that studies that publish their data and code get more citations, so there is an incentive to do open science within the existing incentive structure. In some cases, we may not know whether the disincentive is real or not — e.g., we need to do formal studies on how often people actually “get scooped” and how that relates to open science practices. Finally, in some cases, the disincentive is real and requires meaningful changes to incentive structures — e.g., dedicated funding specifically for data production or curation.

Mark:
Section on Ensuring equitable access to trusted and reusable data:
It is good to acknowledge that data should be equitably and ethically accessible, but it is important to note that the CARE principles are necessary but not sufficient. CARE was developed from an Indigenous rights perspective. It should be explicitly noted that there are other marginalized groups who have limited access to data and can be harmed through general openness. This can be issues of environmental justice or the fact that openness often privileges tier one researchers and institutions and discounts those with disabilities or lack of access to relevant infrastructure (e.g., cloud computing).

The short clause on building “the foundation for trustworthy Artificial Intelligence (AI)” is insufficient. More needs to be said about the need for data and model transparency and documentation, reproducibility, and mitigation of risk and bias. Referencing AGU’s guidelines on Ethical and Responsible Use of AI/ML in the Earth, Space, and Environmental Sciences could probably meet the need.

Section on Taking collective responsibility
The role of data stewards (broadly conceived) should be explicitly included

Danie: Section “Changing research culture to prize data contributions”: This paragraph opens with the idea that research behavior is influenced by actions of funders and institutions. However, sustained support for data management infrastructure that provides the sound foundation for Open Data access and interoperability can also influence perception and behavior. The rationale being that funder and institutional investment demonstrates the importance and value of this necessary infrastructure.

The sentence near line 47, beginning with: “In addition, researcher training should incorporate a curriculum on best practices for data stewardship and disciplinary communities.” seems awkward grammatically, implying training should incorporate curriculum on disciplinary communities in addition to best practices; and suggest slight edit to ”... on best practices for data stewardship within disciplinary communities.”, or “... on disciplinary best practices for data stewardship.” (if that is what the authors wish to convey).

As a domain repository manager, the statement in this section that repositories should “...reward efforts to develop, share and adopt best practices” is challenging to understand and appreciate, and imposes some pressure on repositories. What types of “reward” systems or structures do the authors envision on behalf of repositories? Although I’m supportive, many existing repositories are challenged to simply sustain their infrastructure and services, so I cannot think of substantial rewards other than in the form of formal accolades for researchers. Some additional detail from the authors on this vision would help here.

Section: “In Summary…”

“• Researchers”: Consider language encouraging proactive behavior on behalf of researchers to educate themselves on the concepts and practices of Open Data sharing.
“• Researcher Infrastructure”: This term seems vague, yet the section clearly is referring to repositories and their supporting infrastructure, and should be explicitly included. Is there a reason for explicitly referencing persistent identifiers? Specifically elevating the use of PIDs at the expense of other critical actions on behalf of repositories that enable FAIR data seems to devalue the full gamut of work performed by repositories.
“• Publishers and Journals”: Consider including the concept of “trustworthy” in addition to “federated” and “recognized” (repositories should be trustworthy or continually aspire to this goal. Additionally, if they cannot demonstrate trustworthiness, the aspiration to federate will remain aspirational).
As a scientist who is both a data user and a data provider, I see two aspects which should be improved in this statement.

First, as written, the statement does not explicitly call out the responsibilities of data users in the open science lifecycle. To maintain financial and institutional support for long-term data sets, providers usually need to demonstrate that the data are being used and have clear value. Therefore, data users have the responsibility to correctly cite and attribute data used in their studies. While this requirement is usually addressed in journal expectations, lack of citation or attribution harms data providers’ ability to continue producing open data sets, and so has bearing on this position. Additionally, data users have the responsibility to use and represent these data sets correctly in published work, and to take reasonable steps (including engaging with data providers if needed) to ensure that this is the case. Misrepresenting an open data set in a publication can do significant harm if that publication becomes the basis for other researchers’ approach to using the dataset in question.

Second, the paragraph “Changing research culture to prize data contributions” addresses two important but distinct points that would be better split into two paragraphs. The first two sentences, “Researcher behavior is strongly influenced by the cultures of institutions, publishers, and funders. Thus, it is critical that these entities recognize and reward contributions by people serving different roles across the life cycle of data,” addresses the need for institutions employing individuals contributing to open science endeavors to give appropriate value to data sets, software, or other open-science-supporting material in addition to or in lieu of traditional publications when considering career advancement. This is a critical culture shift that must happen to maintain an open science community.

Later in the same paragraph, the statement calls for “…researcher training should incorporate a curriculum on best practices for data stewardship and disciplinary communities.” This is also an important point, as such skills must be taught. However, this is a distinctly different point from that of rewarding open science contributions made earlier in the paragraph. Therefore, I recommend that this paragraph be split to better highlight these distinct requirements.
1) Congratulations on an excellent document.

2) It is great to see the Beijing Declaration being cited - being developed by CODATA with the support of the ISC it is authoritative and has one of the broadest, but most relevant scientific definitions of data and is one of the few to include samples.

3) I applaud the highlighting of the CARE principles in the body of the text, but given their growing importance, I feel that they should be mentioned in the summary dot points alongside the three mentions of FAIR, otherwise it looks like it is tokenism. Both FAIR and CARE should be cited in the summary.

4) Although the FAIR principles were published in 2016, implementing them principles has not been easy, and today few systems fully comply with FAIR particularly for Interoperability and Reuse, as well as machine actionability. Implementing the CARE principles will be orders of magnitude harder and we need to raise awareness of how important the CARE principles are by ensuring they are specifically mentioned in the summary.
Specific to the header ""Achieving accessibility, transparency and reproducibility in research"" - I actually really like the way it is formulated now. This is probably a discussion about semantics ;) but the way I understand ""research integrity"" is that is an overarching term, and accessibility, transparency and reproducibility are elements of research integrity (among many others).

Specific to the phrase ""sustainability, ecology, biodiversity, water and food security, and climate."" - I fully understand what you meant here, but for the sake of being pedantic (since that's the headline to the statement - useful to make it perfect), it might be helpful to phrase a few of these examples as actual challenges. What I mean is that ""water"" or ""climate"" are not challenges on their own. But for example, water shortages, climate change, or ensuring food security are.

Specific to the paragraph ""This statement takes the broadest interpretation of the term “data” according to the Beijing Declaration on Research Data: data can be collected, generated or compiled by humans or machines and include (but not limited to) metadata, samples, methods, software and algorithms. Infrastructure includes hardware, intangible assets such as software, and human capital."" - I really appreciate the fact that you have defined what is meant by data :)

Specific to the phrase ""a change in the research culture and the scientific rewarding system,"" - [Here Marta is responding to Nick Wigginton's comment on the same phrase] - I actually really like the framing and I am using the reference to Brian Nosek's pyramid for culture change very often when talking about data and open science: https://www.cos.io/blog/strategy-for-culture-change :)

And I also like the explicit addition ""and the scientific rewarding system"" :) I see this not as blaming, but more thinking about what's needed (the essential components) to get there.

Specific to the phrase ""Persistently funded open research infrastructure"" - Perhaps consider rephrasing to ""sustainably funded open research infrastructure ecosystem"" - to highlight that such infrastructure is likely to consist of various components and that what is perceived by essential components by the community might evolve over time (hence sustainably might be a better word than persistently). [Note that Marta's comment was supported by another board member.]

Specific to the phrase ""scientific facility teams, "" - Could this be replaced with ""professional support staff"" to make the statement more inclusive? Especially when thinking about complex research projects, once can easily imagine myriad of other essential contributors, eg. data stewards, research software engineers, data engineers, but also colleagues such as legal experts, privacy experts, ethics advisors etc., which altogether are playing a role in data stewardship.

Recommend changing "" In addition, researcher training should incorporate a curriculum on best practices for data stewardship and disciplinary communities."" to "" In addition, researcher training should incorporate a curriculum on best practices for data stewardship, which should take into account disciplinary norms."" - I was a little bit unsure what was meant by disciplinary communities, so I suggested a rephrase. See what you think and if this is what you meant.

Specific to the phrase ""Domain repositories"" - I am a bit confused by the explicit mention of domain repositories here. Surely, the institutions, publishers and funders also play a key role here. I would the reference to domain repositories more appropriate in the following section (about metadata standards implementation).

Specific to the phrase ""research infrastructure stewards"" - Perhaps consider referring simply to ""research infrastructure"" to be more inclusive of various roles within research infrastructures.

Specific to the summary bullet ""Researchers should engage with research infrastructure and repositories to ensure their data are FAIR and data are developed/used responsibly"" - Here on the other hand I would add a preferential mention of domain repositories :) (to nudge in the right direction :))

Specific to the summary bullet text ""ensure metadata capture is robust, standardized, and comprehensive"" - I wonder if here might be again useful to refer to disciplinary standards as well? The basic standards are of course essential (e.g. Dublin Core, DataCite), but applying disciplinary standards makes datasets truly FAIR.

Specific to the summary bullets - ""Publishers and Journals"" - Consider also an explicit requirement for implementing the recommendations for data and software citations from this article: https://www.nature.com/articles/s41597-023-02491-7

This is essential for the completeness of scholarly record and to recognise and reward the value of data and people's contributions. Specific to the summary bullet phrase for Institutions - ""data sharing and stewardship"" - You might consider dropping the word ""sharing"" (part of data stewardship). Alternatively, name more elements of responsible data stewardship.

Specific to the summary bullet for Funders - I wonder if there would be also a role funders could play in supporting the sustainable ecosystem of research infrastructures?

In addition, I believe that funders can play a crucial role in advancing research assessment.
Specific to the header"" Achieving accessibility, transparency and reproducibility in research"" - I might consider ""integrity"" here (instead of perhaps transparency, which I think goes hand-in-hand with reproducibility?). That also supports the notion of trustworthiness that is discussion further in the statement.

Specific to the phrase ""and rewarding of data across the Earth and space sciences: ""rewarding of data"" is awkward. I think this is intended to reward people for sharing of data, but I think that's a different issue (discussed below). I think this line is stronger with just ""collection, use, and stewardship"". [Note that three other board members ""+1"" Nick's comment.]

Specific to the section ""Empowering scholarship"", the phrase ""a change in the research culture"" - for here and below, ""culture change"" is perhaps not the right framing. the desired incentives and behaviors exist, just not at scale. I wonder if here and below the framing can shift to more amplifying positive behaviors/practices... or demonstrating the value of data stewardship/etc. If scientists know/see that the benefits of these practices outweigh the costs/time/etc, they will do it. Sometimes blaming ""culture"" casts too wide a net and the parties described below don't see themselves in it to take responsibility (or see it as accusatory). Needs to be framed as a positive for everyone to hop on board!

Specific to the section header ""Changing research culture to prize data contributions"". - see comment above... I am not sure ""changing culture"" is an achievable goal or an apt description of the problem.

Recommend changing ""Thus, it is critical that these entities recognize and reward contributions by people serving different roles across the life cycle of data."" to ""Thus, it is critical that these entities continue to recognize and reward contributions by people serving different roles across the life cycle of data. ""

Recommend changing ""Limiting infrastructure within national boundaries restricts the development of science addressing global challenges."" to ""Expanding infrastructure across national boundaries can better aid in the development of science addressing global challenges."" - reframing in a more constructive/aspirational voice allows readers to see the future more positively.
Specific to the title ""Safeguarding data to address global challenges for our future"" - The title limits the importance of the statement to just future use of data. Please consider a title that helps a researcher value the data they create and use as well as the long term value. Thank you.

Specific to the sentence ""It is urgent for humanity to address serious and complex global challenges including sustainability, ecology, biodiversity, water and food security, and climate. "" - This is a powerful sentence. Researchers who do not work directly in the global challenges list may not see themselves as being affected by this position statement. Could you make an adjustment that is more inclusive. Thank you.
Thank you for this statement emphasizing continued improvements needed regarding data creation and use. In the summary, researchers, institutions, and funders must also all recognize the increase in time and effort needed to achieve these recommended data outcomes. Institutions and researchers will also need to increase training and skills development; an activity that funders can also support.
26 April 2024
Valuing data contributions needs to extend to the generation of data as well. Often, there is little support for or recognition of the basic research that generates the data. Likewise, the responsibility needs to extend to the quality of data. Databases need to have a way to easily submit corrections and flag anomalies, and there needs to be adequate curation to deal with those problems.
2 April 2024
Thanks for work. For nearly 30 years, AGU has framed it’s leading data position statement as “Earth science data are a world heritage.” This concept was introduced in the very first data position statement in 1997 (which stated that, “Earth and space data are a national, and in many cases, an international resource” ) https://www.codata.info/data_access/policies.html#AGU) and the next position statement included the “World Heritage” framing. This has been an impactful statement and has been used (i can personally attest) in discussing and arguing for open science across federal agencies, National Academies, internationally and with other societies and publishers—AGU was providing a leading example of the need for open data, for science and the public (both). This framing provides that ALL data should be open and available and curated using leading practices. This draft, unfortunately imho, reframes the need for open Earth science data primarily in terms of grand challenges and trust. These are important but definitely narrow the larger justifications.

The need for high quality open Earth science data extends well beyond these issues. The Earth (and solar system, etc) are noisy and complex—understanding them and the various processes and history—requires data across space and time. Collectively, individual observations, data, and models integrated in aggregate have led to our understanding of evolution (the fossil and tectonic records), Earth’s magnetic field and history, the interior structure of the Earth, it’s geochemical history and on and on. Many of these data are not directly related to “Grand challenges and sustainability” as usually understood, in that the primary need for quality open data is to advance science and build this understanding (in addition to trust in the conclusions). Collectively many of these data and integrated knowledge have provided huge (HUGE) economic and other benefits, related to heath and medicine, weather prediction, energy, mineral resources, hazard mitigation, navigation (GPS), and more (see https://eos.org/editors-vox/earth-and-space-science-for-the-benefit-of-humanity for many examples). Going forward, diverse high-quality data of all sorts will be needed for growing AI/ML and other applications—some of which are connected to grand challenges directly, but not all (e.g., see https://doi.org/10.17226/26532. DOI: 10.22541/essoar.168132856.66485758/v1 doi: 10.1038/d41586-023-03316-8 for discussion).

Would thus suggest keeping the broader framing and also (rather than instead) emphasizing the importance of diverse ESS data for grand challenges, sustainability, and many other society needs (really already resulting in $trillion benefits). Indeed, this is an opportunity to be more explicit about the diverse benefits—understanding the Earth, integration with other data, advancing science, benefiting humanity, and trust), and also calling for addressing the largest risk—lack of support for quality data curation (infrastructure and culture) more specifically.

thanks for listening.