Dr. Noam Ross

Musings, Explorations, and Announcements

Git Hosting for the Distraught and the Restless

15 December 2019

It’s generally impossible to only use services, private or government, that perfectly align with one’s values, so one must opt to choose one’s battles. The controversy over GitHub’s contract with U.S. Immigration and Customs Enforcement is the latest such battle in the open-source software world. GitHub employees and users are trying to pressure GitHub to drop the contract, as a way to place greater pressure on ICE and the U.S. government to curtail crimes and human rights abuses.

A letter to GitHub signed by many open-source maintainers has raised the profile of this campaign. It stops short of calling for users to abandon GitHub, but many users concerned about this issue are searching for alternatives and considering how much they are locked in to GitHub’s ecosystem. I realize that I’m fairly locked in myself, in both personal and professional projects. While I’m not prepared to leave GitHub entirely, I wanted to see how hard it would be to set up a system where I have greater control. So here is documentation of setting up a git hosting service with using Gitea and nearlyfreespeech.net.

Job Posting: Research Software Engineer at EcoHealth Alliance

4 December 2019

I’m recruiting a Research Software Engineer to join my team at EcoHealth Alliance in New York. Details and how to apply can be found at https://www.ecohealthalliance.org/career/research-software-engineer.

Drake, Docker, and Gitlab-CI

24 September 2019

For a number of reasons I’ve been trying out GitLab as a replacement for for both GitHub and various continuous integration systems, and have been exploring configurations useful for model-fitting pipelines. I turned one of these into an example repository that shows how to use GitLab together with the Rocker Docker images and the drake build system to reproducibly run a project pipeline, using the cacheing functionality across all three tools to make things reasonably speedy and enable both local and remote builds.

A New Website

9 August 2019

For me, the task of building a personal website is fraught with so many of my technical, aesthetic, and personal hangups that I hadn’t updated mine since mid-graduate school. Thanks to consistent pestering by Maëlle, though, I finally got around to re-building this one using a modern toolkit. Hopefully I can keep it up to date.

Here are some of the pieces that I used to build it:

Questions for Jonathan Cornelissen, Dieter De Mesmaeker, Martijn Theuwissen, and Stephen LeSieur (the DataCamp Board), and Anurima Bhargava

30 April 2019

See my original post for background and updates. I was glad to see the announcement from the DataCamp Board that, after a long period of silence and inaction, the company is taking seriously an incident of sexual misconduct and re-examining its approach to the issue and its relationship with concerned instructors. The CEO’s leave of absence and the engagement of Ms. Bhargava indicate a turn in the right direction.

All musings 1

Recent Works

R, Coronavirus, and Pandemic Prevention, New York Open Statistical Programming Meetup, Apr 20, 2020
For the New York R Meetup, I spoke about the types of models and software virus hunters and epidemiologists use to understand emerging diseases, and how they relate to problems and models familiar to data scientists.
Daniel Nüst, Dirk Eddelbuettel, Dom Bennett, Robrecht Cannoodt, Dav Clark, Gergely Daroczi, Mark Edmondson, Colin Fay, Ellis Hughes, Lars Kjeldgaard, Sean Lopp, Ben Marwick, Heather Nolis, Jacqueline Nolis, Hong Ooi, Karthik Ram, Noam Ross, Lori Shepherd, Péter Sólymos, Tyson Lee Swetnam, Nitesh Turaga, Charlotte Van Petegem, Jason Williams, Craig Willis and Nan Xiao (2020, preprint) The Rockerverse: Packages and Applications for Containerization with R. doi:
Building Software and Communities With Peer Review: rOpenSci, pyOpenSci, and Beyond, PyData NYC, Nov 5, 2019
One Health Surveillance Data: Sharing, Standards, and Stories, Bat One Health Research Network Meeting, Phuket, Thailand, Aug 28, 2019
This presentation kicked off the BOHRN working sessions on data sharing for synthesis and collaboration. How can we structure disease surveillance data so we can better integrate them for macro-level insights?
Epidemiological Modeling: From Basics to Artificial Intelligence, Department of Homeland Security Biosurveillance Presentation Series, Jun 19, 2019
Simulating Epidemiological Models with R, EcoHealth Net Workshop, George Mason University, Fairfax, Virgina, Jun 4, 2019
The EcoHealth Net program brings together scientists and STEM undergraduate and graduate students for internships and a week-long intensive workshop on One Health research.
Generalized Additive Models in R: A Free Interactive Course, (online), May 30, 2019
You can take this course online! Originally created for a commerical online learning company, I’ve ported this course to a free, interactive website that allows you to exercises live.
citesdb: A high-performance database of shipment-level CITES trade data. (2019)
redoc: Reversible reproducible documents with R Markdown and Microsoft Word. (2019)
redoc is an attempt at the holy grail of mixed-workflow collaboration: it enables analysts to produce Microsoft Word douments using R Markdown that can be edited and then de-compiled into R Markdown again witout loss of information. It’s goal is for teams with different tools to be able to work together with minimal need to adopt each other’s workflows.
Reproducibility in an Office world: tools for crossing the abyss, New York R Conference, May 11, 2019