Data & Software for Authors
What is Needed?
AGU requires that the underlying data and/or software or code needed to understand, evaluate, and build upon the reported research be available at the time of peer review and publication. Additionally, the code (e.g. Python, Jupyter Notebooks, R, MATLAB) used to perform any data analysis and to produce the manuscript’s figures should be made available in a free and open platform (e.g. Github) and preserved in a repository (e.g. Zenodo). This entails:
- Depositing the data and software in a community accepted, trusted repository, as appropriate, and preferably with a DOI
- Including an Availability Statement as a separate paragraph in the Open Research section explaining to the reader where and how to access the data and software
- And including citation(s) to the deposited data and software, in the Reference Section.
Click on the headings below for detailed information on:
- Models & Simulations
- Data and Software Sharing Guidance for Authors Submitting to AGU Journals
- International Geo Sample Numbers
Most of your questions regarding data and software should be answered by the resources below. Additionally, this example paper from Koymans et al demonstrates the use of an Open Research section with data and software citations. Just in case, if you still have questions, you can contact [email protected].
What Data Needs to be Available?
Primary and processed data used for your research should be preserved and made available. Generally, the underlying data are considered to be the types of data usually preserved in domain repositories for each discipline. These may include raw data, but are usually the processed or refined data that support and lead to the described results and allow other readers to assess your conclusions and build off your work.
In your manuscript, cite these data, as well as any data you used from other sources, and include information about access to the data in the availability statement. For model or simulation data, follow Data and Software Sharing Guidance for Authors Submitting to AGU Journals/Publications on prioritizing preserved output; in general, availability of software is most important.
Very large data (greater than 1 terabyte or TB) can be a challenge to preserve as there are often fees and additional resources required. One option to consider, institutions often offer solutions for data preservation and compliance. Again, refer to the Data and Software Sharing Guidance for Authors Submitting to AGU Journals/Publications for more information or email [email protected].
Repository Selection
The data that supports the research reported in your manuscript must be deposited in a community accepted, trusted repository. When identifying the most appropriate repositories for your data, first, refer to the Data and Software Sharing Guidance for Authors Submitting to AGU Journals/Publications below. We recommend a repository that specializes in the data for your scientific domain as this will maximize the probability that the deposited data will be findable, accessible, interoperable and reusable (FAIR). Otherwise, look to your institutional repository, your computing center, or a general repository. Note, English-language (or English translation) for any cited sources is required. For your reference:
- National Repository
- Institutional Repository - information (US-based)
- Generalist Repository comparison chart
Availability Statement
An Availability Statement, located in the Open Research section of a journal article, or at the end of a book chapter, contains information about your data, software, and other research objects (e.g. notebook) and how readers can access these (available in AGU's LaTeX / Word templates). The Statement should include:
- A brief description of the type(s) of data or software
- Repository Name(s) where they are deposited
- DOI (Persistent Identifier) [required]; or, if no DOI is available, Link to Data or Software.
- Citation in References section (Mandatory for all data and software with DOIs)
- For Software: Version and Link to publicly accessible development platform (E.g. GitHub)
- Access Conditions (e.g. if Registration is Required)
- Licensing/Permissions (e.g. Creative Commons Attribution)
When developing the Availability Statement, consider how best to direct the reader/reviewer to your data (or software). For instance, do not simply provide a web link to the homepage of the repository. Directly link to the data (or software) or provide information/guidance necessary to get to the data (or software) efficiently.
Check to see if the repository or data/software source has an “Acknowledgements” or “How to Cite” page to follow when putting together your Availability Statement and citation in the References section.
It is not sufficient to write that your data will be available upon request and to archive and make your data available in the supplementary information of your manuscript. Instead, repositories are more aligned with the FAIR principles versus supplemental information. Additionally, locations such as FTP sites and/or project web pages are also not suitable preservation choices (see Repository Selection). Embargoing your own data is generally not acceptable. See Availability Statements and Template Examples for guidance on data owned by others that is not in a preservation repository.
For data that is not initially available upon submission, authors should describe where the data will be shared in the Data Availability Statement and can share information within the supplementary information for peer review purposes only (Note: Use file upload type "Data File(s) for Peer Review (will not publish)" for this purpose).
Availability Statement Templates:
- The [type of data] data used for [brief context, description] in the study are available at [repository, source name] via [DOI, persistent identifier link, OR URL if no persistent identifier is available] with [license, access conditions]. *[Citation in References section, required for each DOI]
- [Version number] of [software name] used for [brief context, description of what the software or code was used for] is preserved at [DOI, persistent identifier link, OR URL if no persistent identifier is available], available via [license type, access conditions] and developed openly at [software development platform link]. *[Citation in References section, required for each DOI]
* For Jupyter Notebooks, R Script(s)/Markdown guidance, please see the following resources:
The Methodology section of your manuscript should also describe how your data/software pertains to your research.
Data & Software Citation
Please cite in your References/Bibliography section a formal citation to the data/software described in the Availability Statement. Doing so will provide a citation credit for the data/software. Additionally, please cite data and software created by others used in your research, also to ensure proper credit for that work. If the data or software is described in a separate data or software paper, please include both that paper and the deposited data or software as separate citations. Citations should include:
- Author(s) or project name(s)
- Date published
- Title / Software name
- Data or software release/version (optional)
- Bracketed description type (e.g., [Dataset], [Software], [Collection], [ComputationalNotebook])
- Repository name / Publication venue
- DOI
AGU now checks to see if data/software has been properly cited vs simply linking to a URL, website, platform. For an example of what AGU is expecting, see this example paper, specifically the Open Research section, and the dataset and software citations in the References.
For more information on citations, reference the Data and Software Sharing Guidance for Authors Submitting to AGU Journals/Publications.
Data Citation Examples:
- Fiechter, J., & Cheresh, J. (2020). Physical and biogeochemical drivers of alongshore pH and oxygen variability in the California Current System (Version 7) [Dataset]. Dryad. https://doi.org/10.7291/D1D96Q
- Edmunds, P. J., Didden, C., & Frank, K. (2021). Mean percentage cover of corals and Porites astreoides at each site by year at St. John, VI from 1992 to 2019 (Version 1) [Dataset]. Biological and Chemical Oceanography Data Management Office (BCO-DMO). https://doi.org/10.26008/1912/BCO-DMO.843284.1
- Alwarda, R., & Smith, I. (2021). Elevation data for Reflectors within the CO2 Deposit in Planum Australe, Mars [Dataset]. Zenodo. https://doi.org/10.5281/ZENODO.4639669
- Gries, C., Downs, R. R., O’Brien, M., Parr, C., Duerr, R., Koskela, R., et al. (2019). Return on Investment Metrics for Data Repositories in Earth and Environmental Sciences [Dataset]. Environmental Data Initiative. https://doi.org/10.6073/PASTA/D49BEC63F51603512EFA7E0FD2717203
Software Citation Examples:
- Lab for Exosphere and Near Space Environment Studies. (2019, March 20). lenses-lab/LYAO_RT-2018JA026426: Original Release (Version 1.0.0) [Software]. Zenodo. http://doi.org/10.5281/zenodo.2598836
- Bell, S. W. (2020). samwbell/saturn_counts: April 26, 2020 Release (Version 1.1.0) [Software]. Zenodo. https://doi.org/10.5281/ZENODO.3766959
- Shaoqian Hu. (2019, December 25). Direct surface wave radial anisotropy tomography package (Version 1.0) [Software]. Zenodo. http://doi.org/10.5281/zenodo.3592528
Authors are asked to preserve and cite their research software (e.g. GitHub-Zenodo), but also to include a link to where their software is being actively developed (e.g. GitHub). See instructions on how to reference and cite your software (e.g. GitHub-Zenodo) and adding a citation file (e.g. GitHub). The sharing of pseudo code may be necessary in cases where policies, the environment, or other factors prevent the full sharing of the research software. Where a limited number of lines of code are used that leverage equations documented in the manuscript, state this in the Availability Statement. For more information on citation examples, reference the Data and Software Sharing Guidance for Authors Submitting to AGU Journals/Publications.
Citation Formatter
Models, Simulations, and Code
While numerical models or theoretical work may not utilize (input) data, often “output” such as figures or tables are considered data and should be made available in electronic form. Additionally, the software code (e.g. Python, Jupyter Notebooks, R, MATLAB) used to perform any data analysis and to produce the manuscript’s figures should be made available in a free and open platform (e.g., Github) and preserved in a repository (e.g., Zenodo). In the case where a manuscript makes no use of models, data, or analysis software (e.g., a purely theoretical paper or a review paper), then make note of this point in the Data and Software Availability Statements.
When the primary data for the research comes from numerical model simulations, follow the steps outlined in the Guidelines for Research Primarily Based on Numerical Models or Theory.
Data and Software Sharing Guidance for Authors Submitting to AGU Journals/Publications
AGU editors, staff, and community members have developed Data and Software Guidance for Authors Submitting to AGU Journals/Publications. Sections in the guidance are available below for quick reference:
- Considerations for publication related to data and software
- Guidelines for Research Primarily Based on Numerical Models or Theory
- Selecting Your Repository
- During Peer Review
- Paper (Manuscript) Acceptance
- Availability Statements and Template Examples
Fox, Peter, Erdmann, Chris, Stall, Shelley, Griffies, Stephen M., Beal, Lisa M., Pinardi, Nadia, Hanson, Brooks, Friedrichs, Marjorie A. M., Feakins, Sarah, Bracco, Annalisa, Pirenne, Benoî, & Legg, Sonya. (2021). Data and Software Sharing Guidance for Authors Submitting to AGU Journals. Zenodo. https://doi.org/10.5281/zenodo.5124741
AGU editors, staff, and community members have also provided a list of Domain-Displine Repositories Useful to AGU Journals. These can also be used by contributors to AGU Books.
International Generic Sample Numbers
AGU recommends the use of IGSNs (International Generic Sample Numbers) for citing samples reported in manuscripts. The IGSN provides a unique identifier that allows samples to be linked across publications and searched through a central metadata repository. We strongly encourage authors to register samples with an IGSN Allocating Agent and obtain IGSNs and use them throughout their manuscript, tables, and archived data sets. We recognize IGSNs during our production process and will provide links in the manuscript and tables to the registered sample descriptions. IGSNs can be reserved before field seasons, or assigned afterwards.
Contact & Resources
If you have questions about how to comply with AGU data and software requirements for your manuscript, please contact us at [email protected].
For resources and further reading on the topics covered in this guidance, visit the Data Leadership page.