Open access data: What do we share?

2020-10-26 / Blog 8

  1. Introduction

We have celebrated the Open Access Week (OAW) 2020 this week between 18th October and 25th October. The official theme was "Open with Purpose: Taking Action to Build Structural Equity and Inclusion". In order to celebrate the open data, open publications and open theses, Figshare organised an open access data upload competition this year. Since I started my postdoctoral researcher position at the Newcastle University, I embraced the idea of open data and I started sharing data and scripts I used in my publications publicly. The upload competition coincided with the period in which I am leaving my position at the Newcastle University. Therefore, I decided to perform handing over the data by uploading them to Figshare. You can go to Dataset(s) page on my personal website to see the published datasets.

When I looked at the homepage of figshare, the first thing I realised was a high number of figures and media uploaded. My initial assumption was people were uploading a lot of figures to increase the number of items, so that their institutions have an advantage over the others. I could have just closed the tab of figshare, and moved on with my prejudice against my fellow researchers - I know it is not nice to say. Therefore, the aim of this blog post is to test the following hypotheses:

H0: People upload lots of figures and videos to win the competition.

H1: There is still hope and I am happily wrong.

2. Materials and methods

Figshare has an API that you can access a vast amount of data very easily. So, I started with learning a bit about the structure of the API and realised that I can query the items published since a certain day. I was initially interested in the following categories of items uploaded:

Figure; Media; Dataset; Journal contribution

A short summary of method: I accessed the data using R, and tried to query items published since 01.07.2020. I, then, extracted the dates items are posted instead of published. All the figures you see on this page are based on "date posted" data. I automatised the process of requesting, accessing, stitching, merging and calculating. But running the script took a very long time...

3. Results

Four aforementioned categories form the main types of items uploaded to figshare. The total numbers of figures, media, dataset and journal contribution posted between 01.07.2020 and 26.10.2020 were 48584, 6736, 55838 and 57952, respectively. The maximum numbers posted within a day were 1632, 293, 1371 and 2152, while the minimum number of items posted were 2, 0, 26 and 33. The highest number of uploads were made on Thursdays for figures, datasets and journal contributions, whereas a busy Wednesday has seen the most media uploads. The lowest numbers happened over the weekends. Interestingly, the lowest number of uplods were on Sundays for figures, datasets and journal contributions, but least number of media uploads was on a Saturday.

Figure 1 illustrates the total number of items posted within a calendar week between 06.07.2020 and 26.10.2020. Except from the Week 35 at the end of August and Week 40 at the end of September, the variation is not significant between weeks.

Figure 1: Weekly number of figure, media, dataset and journal contribution items posted on figshare between 06.07.2020 and 26.10.2020. Click the image to see a high resolution version.

Let's have a look at the data from OAW. A total of 10707 items have a posting date during that week, whereas 11090 items were posted in the week before (12.10.2020-18.10.2020). The majority of these items are from the usual suspects categories: figure, dataset and journal contributions. These three categories form 89.5% of the total uploaded items during OAW. One week before the OAW, they summed up to 91.2% all items. The rest of items were dominated media (4.4%), preprint (1.3%) and presentation (1.2%) during OAW, while the shares were 3.4%, 1.3% and 0.9%, respectively. In order to compare the number of items posted during the OAW and the week before, I have performed Wilcoxon Rank Sum test due to the low sample number, i.e. 7 days. None of the parameters had a significant difference between the two weeks.

Figure 2: Number of items posted before (12.10.2020-19.10.2020) and during (19.10.2020-26.10.2020) open access week. Click the image to see a high resolution version.

4. Discussion

Limited variation between the weeks, except two individually high ones, as shown in Figure 1 suggests that there was no upload frenzy during OAW. Further investigation of the data by comparing the number of items posted the week before and during OAW shows a slight increase in the number of datasets, media and presentations and a reduction in the number of journal contributions and figures. However, none of these changes were found to be statistically significant. The majority of the items posted at all weeks are composed of the figures, datasets and journal contributions, although their order changes from week to week.

5. Conclusions

I am very happy to be wrong about my initial assumption that people were uploading a lot of figures and media files to win the competition. Furthermore, the number of datasets available on figshare is very motivating. Accessibility, availability and reproducibility are quite important. Let's keep on sharing our datasets.

This is a collection of information I gathered over a wide range of sources, there might be changes, mistakes, or typos in it. If you think there is anything wrong, please send me an email anil dot yildiz at newcastle dot ac dot uk.


Further reading:

Yildiz, A. and Emmerson, C. 2020. Open Access Week 2020: What do we share?. Newcastle University. doi: 10.25405/data.ncl.13160069

keywords: open access; open access week; figshare; doi