Generating a conflict of interest form automatically
While most of my grant proposals go to the NIH, which thankfully does not require an arcane conflict of interest (COI) document, I am sometimes part of a grant proposal that goes to the NSF or another agency that require a COI. Instead of just asking for any actual conflicts of interest, these documents ask one to list every co-author in the last N years, which is fairly stupid these days when most papers in the biomedical sciences have many co-authors. I hope the agencies get rid of this in my opinion pointless document soon. Until then, I have to do it.
I don’t want to retrieve all my co-authors and fill the form by hand. In a previous post, I showed how one can use the
bibliometrix R package to do an analysis of a set of publications. Among other things, this approach returns all co-authors, which I will use here to make the COI table almost completely automated.
The RMarkdown file to run this analysis is here.
library(dplyr) library(knitr) library(flextable) #remotes::install_github('massimoaria/bibliometrix') library(bibliometrix)
As explained in a previous post, the currently best way to get all my papers is to download them from NIH’s “My Bibliography” and export it in MEDLINE format. Then read in the file with the code below.
#read bib file, turn file of references into data frame pubs <- bibliometrix::convert2df("medline.txt", dbsource="pubmed",format="pubmed")
## ## Converting your pubmed collection into a bibliographic dataframe ## ## Done! ## ## ## Generating affiliation field tag AU_UN from C1: Done!
Each row of the data frame created by the
convert2df function is a publication, the columns contain information for each publication.
For a list of what each column variable codes for, see here.
Getting the right time period
This specific funding agency I’m currently writing a COI for (NIFA) requires co-authors of the last 3 years, so let’s get them. I don’t know if they mean 3 full years. I’m doing this mid-2020, so to be on safe side, I go back to 2017.
period_start = 2017 pubs_new = pubs[pubs$PY>=period_start,]
I need the full names of the authors. They are stored for each publication in the AF field. This is the only information I need for the COI form. I pull it out, then do a bit of processing to get it in the right shape, then remove duplicates and sort.
allauthors = paste0(pubs_new$AF,collapse = ";") #merge all authors into one vector allauthors2 = unlist(strsplit(allauthors, split =";")) authors = sort(unique(allauthors2)) #split vector of authors, get unique authors
Note that I originally did the above steps using
biblioAnalysis(pubs_new). However, this function/approach broke in a recent version of the package, and I realized that I can just use a few base R commands to get what I need, which is the approach shown above. If you use the
biblioAnalysis() function, the Authors are in the
Authors field of the returned object.
These kinds of COI documents that ask for all co-authors are in my opinion antiquated and should go away. In the meantime using a somewhat automated approach makes the problem not too bad. I will have to make a few manual adjustments to the table, but overall it’s not too bad. I’m still glad that NIH does not require this.