Published 2020-12-16
Keywords
- Datasets, Synthetic data, Modelling best practices, Model communication, ICHGS-2019
How to Cite

This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
Detailed datasets of real-world systems are becoming more and more available, accompanied by a similar increased use in research. However, datasets are often provided to researchers with restrictions regarding their publication. This poses a major limitation for the dissemination of computational tools, whose comprehension often requires the availability of the detailed dataset around which the tool was built. This paper discusses the potential of synthetic datasets for circumventing such limitations, as it is often the data content itself that is proprietary, rather than the dataset schema. Therefore, new data can be generated that conform to the schema, and may then be distributed freely alongside the relevant models, allowing other researchers to explore tools in action to their full extent. This paper presents the process of creating synthetic geospatial data within the scope of a research project which relied on real-world data, originally captured through close collaboration with industry partners.