The Github threat

Many voices arise now and then against risks linked to the Github use by Free Software projects. Yet the infatuation for the collaborative forge of the Octocat Californian start-ups doesn’t seem to fade away.

These recent years, Github and its services take an important role in software engineering as they are seen as easy to use, efficient for a daily workload with interesting functions in enterprise collaborative workflow or amid a Free Software project. What are the arguments against using its services and are they valid? We will list them first, then we’ll examine their validity.

1. Critical points

1.1 Centralization

The Github application belongs to a single entity, Github Inc, a US company which manage it alone. So, a unique company under US legislation manages the access to most of Free Software application code sources, which may be a problem with groups using it when a code source is no longer available, for political or technical reason.

The most recent outcome of this centralization is Github forbidding developers from Iran, Syria or Crimea to access its services, in order to comply with newest US laws.

The Octocat, the Github mascot

This centralization leads to another trouble: as it obtained critical mass, it becomes more and more difficult not having a Github account. People who don’t use Github, by choice or not, are becoming a silent minority. It is now fashionable to use Github, and not doing so is seen as “out of date”. The same phenomenon is a classic, and even the norm, for proprietary social networks (Facebook, Twitter, Instagram).

1.2 A Proprietary Software

When you interact with Github, you are using a proprietary software, with no access to its source code and which may not work the way you think it is. It is a problem at different levels. First, ideologically, but foremost in practice. In the Github case, we send them code we can control outside of their interface. We also send them personal information (profile, Github interactions). And mostly, Github forces any project which goes through the US platform to use a crucial proprietary tools: its bug tracking system.

Windows, the epitome of proprietary software, even if others took the same path

1.3 The Uniformization

Working with Github interface seems easy and intuitive to most. Lots of companies now use it as a source repository, and many developers leaving a company find the same Github working environment in the next one. This pervasive presence of Github in free software development environment is a part of the uniformization of said developers’ working space.

Uniforms always bring Army in my mind, here the Clone army

2 – Critical points cross-examination

2.1 Regarding the centralization

2.1.1 Service availability rate

As said above, nowadays, Github is the main repository of Free Software source code. As such it is a favorite target for cyberattacks. DDOS hit it in March and August 2015. On December 15, 2015, an outage led to the inaccessibility of 5% of the repositories. The same occurred on November 15. And these are only the incident reported by Github itself. One can imagine that the mean outage rate of the platform is underestimated.

2.1.2 Chain reaction could block Free Software development

Today many dependency maintenance tools, as npm for javascript, Bundler for Ruby or even pip for Python can access an application source code directly from Github. Free Software projects getting more and more linked and codependents, if one component is down, all the developing process stop.

One of the best examples is the npmgate. Any company could legally demand that Github take down some source code from its repository, which could create a chain reaction and blocking the development of many Free Software projects, as suffered the Node.js community from the decisions of Npm, Inc, the company managing npm.

2.2 A historical precedent: SourceForge

Github didn’t appear out of the blue. In his time, its predecessor, SourceForge, was also extremely popular.

Heavily centralized, based on strong interaction with the community, SourceForge is now seen as an aging SAAS (Software As A Service) and sees most of its customers fleeing to Github. Which creates lots of hurdles for those who stayed. The Gimp project suffered from spams and terrible advertising, which led to the departure of the VLC project, then from installers corrupted with adwares instead of the official Gimp installer for Windows. And finally, the Project Gimp’s SourceForge account was hacked by… SourceForge team itself!

These are very recent examples of what can do a commercial entity when it is under its stakeholders’ pressure. It is vital to really understand what it means to trust them with data and exchange centralization, where it could have tremendous repercussion on the day-to-day life and the habits of the Free Software and open source community.

2.3. Regarding proprietary software

2.3.1 One community, several opinions on proprietary software

Mostly based on ideology, this point deals with the definition every member of the community gives to Free Software and open source. Mostly about one thing: is it viral or not? Or GPL vs MIT/BSD.

Those on the side of the viral Free Software will have trouble to use a proprietary software as this last one shouldn’t even exist. It must be assimilated, to quote Star Trek, as it is a connected black box, endangering privacy, corrupting for profit our uses and restrain our freedom to use as we’re pleased what we own, etc.

Those on the side of complete freedom have no qualms using proprietary software as their very existence is a consequence of freedom without restriction. They even agree that code they developed may be a part of proprietary software, which is quite a common occurrence. This part of the Free Software community has no qualm using Github, which is well within their ideology parameters. Just take a look at the Janson amphitheater during Fosdem and check how many Apple laptops running on macOS are around.

FreeBSD, the main BSD project under the BSD license

2.3.2 Data loss and data restrictions linked to proprietary software use

Even without ideological consideration, and just focusing on Github infrastructure, the bug tracking system is a major problem by itself.

Bug report builds the memory of Free Software projects. It is the entrance point for new contributors, the place to find bug reporting, requests for new functions, etc. The project history can’t be limited only to the code. It’s very common to find bug reports when you copy and paste an error message in a search engine. Not their historical importance is precious for the project itself, but also for its present and future users.

Github gives the ability to extract bug reports through its API. What would happen if Github is down or if the platform doesn’t support this feature anymore? In my opinion, not that many projects ever thought of this outcome. How could they move all the data generated by Github into a new bug tracking system?

One old example now is Astrid, a TODO list bought by Yahoo a few years ago. Very popular, it grew fast until it was closed overnight, with only a few weeks for its users to extract their data. It was only a to-do list. The same situation with Github would be tremendously difficult to manage for several projects if they even have the ability to deal with it. Code would still be available and could still live somewhere else, but the project memory would be lost. A project like Debian has today more than 800,000 bug reports, which are a data treasure trove about problems solved, function requests and where the development stand on each. The developers of the Cpython project have anticipated the problem and decided not to use Github bug tracking systems.

Issues, the Github proprietary bug tracking system

Another thing we could lose if Github suddenly disappear: all the work currently done regarding the pull requests (aka PRs). This Github function gives the ability to clone one project’s Github repository, to modify it to fit your needs, then to offer your own modification to the original repository. The original repository’s owner will then review said modification, and if he or she agrees with them will fuse them into the original repository. As such, it’s one of the main advantages of Github, since it can be done easily through its graphic interface.

However reviewing all the PRs may be quite long, and most of the successful projects have several ongoing PRs. And this PRs and/or the proprietary bug tracking system are commonly used as a platform for comment and discussion between developers.

Code itself is not lost if Github is down (except one specific situation as seen below), but the peer review works materialized in the PRs and the bug tracking system is lost. Let’s remember than the PR mechanism let you clone and modify projects and then generate PRs directly from its proprietary web interface without downloading a single code line on your computer. In this particular case, if Github is down, all the code and the work in progress is lost.

Some also use Github as a bookmark place. They follow their favorite projects’ activity through the Watch function. This technological watch style of data collection would also be lost if Github is down.

Debian, one of the main Free Software projects with at least a thousand official contributors

2.4 Uniformization

The Free Software community is walking a thigh rope between normalization needed for an easier interoperability between its products and an attraction for novelty led by a strong need for differentiation from what is already there.

Github popularized the use of Git, a great tool now used through various sectors far away from its original programming field. Step by step, Git is now so prominent it’s almost impossible to even think to another source control manager, even if awesome alternate solutions, unfortunately not as popular, exist as Mercurial.

A new Free Software project is now a Git repository on Github with README.md added as a quick description. All the other solutions are ostracized? How? None or very few potential contributors would notice said projects. It seems very difficult now to encourage potential contributors into learning a new source control manager AND a new forge for every project they want to contribute. Which was a basic requirement a few years ago.

It’s quite sad because Github, offering an original experience to its users, cut them out of a whole possibility realm. Maybe Github is one of the best web versioning control systems. But being the main one doesn’t let room for a new competitor to grow. And it let Github initiate development newcomers into a narrow function set, totally unrelated to the strength of the Git tool itself.

3. Centralization, uniformization, proprietary software… What’s next? Laziness?

Fight against centralization is a main part of the Free Software ideology as centralization strengthens the power of those who manage it and who through it control those who are managed by it. Uniformization allergies born against main software companies and their wishes to impose a closed commercial software world was for a long time the main fuel for innovation thirst and intelligent alternative development. As we said above, part of the Free Software community was built as a reaction to proprietary software and their threat. The other part, without hoping for their disappearance, still chose a development model opposite to proprietary software, at least in the beginning, as now there’s more and more bridges between the two.

The Github effect is a morbid one because of its consequences: at least centralization, uniformization, proprietary software usage as their bug tracking system. But some years ago the Dear Github buzz showed one more side effect, one I’ve never thought about: laziness. For those who don’t know what it is about, this letter is a complaint from several spokespersons from several Free Software projects which demand to Github team to finally implement, after years of polite asking, new functions.

Since when Free Software project facing a roadblock request for clemency and don’t build themselves the path they need? When Torvalds was involved in the Bitkeeper problem and the Linux kernel development team couldn’t use anymore their revision control software, he developed Git. The mere fact of not being able to use one tool or functions lacking is the main motivation to seek alternative solutions and, as such, of the Free Software movement. Every Free Software community member able to code should have this reflex. You don’t like what Github offers? Switch to Gitlab. You don’t like it Gitlab? Improve it or make your own solution.

The Gitlab logo

Let’s be crystal clear. I’ve never said that every Free Software developers blocked should code his or her own alternative. We all have our own priorities, and some of us even like their beauty sleep, including me. But, to see that this open letter to Github has 1340 names attached to it, among them some spokespersons for major Free Software project showed me that need, willpower and strength to code a replacement are here. Maybe said replacement will be born from this letter, it would be the best outcome of this buzz.

In the end, Github usage is just another example of Internet usage massification. As Internet users are bound to go to massively centralized social network as Facebook or Twitter, developers are following the same path with Github. Even if a large fraction of developers realize the threat linked this centralized and proprietary organization, the whole community is following this centralization and uniformization trend. Github service is useful, free or with a reasonable price (depending on the functions you need) easy to use and up most of the time. Why would we try something else? Maybe because others are using us while we are savoring the convenience? The Free Software community seems to be quite sleepy to me.

The lion enjoying the hearth warm

About Me

Carl Chenet, Free Software Indie Hacker, founder of the French-speaking Hacker News-like Journal du hacker.

Follow me on social networks

Translated from French by Stéphanie Chaptal. Original article written in 2015.

24 thoughts on “The Github threat

  1. Centralization makes for a very nice user experience and makes join a Free software project easier.

    We need to work on tools that will allow us to work in a decentralized fashion without the loss of convenience provided by centralized solutions. Sure, that may sound lazy, but there are strong economic incentives to take the quickest route to product delivery.

    Note that federated systems like email or mastadon tend to have a few major players that most of the network relies on. I think that tools like GNUnet and IPFS have a lot of potential.

    • I think it’s needed to realize that Mastodon is today an initiative to follow because it’s already the biggest success of a Free Software decentralized social network.

  2. I cannot agree more. You could add to the list what happened to a project of which I am a core developer: namely Gadgetbridge.
    Basically we received a DMCA takedown notice that shut our repo down, and in order to make it accessible again we had to find a lawyer that could help us formulating a “Counter notice”.
    The main point is that github is a US-based company and every user accepts “to submit to the exclusive jurisdiction and venue of the courts located in the City and County of San Francisco, California” when signing up: many european developers might not be not aware of the consequences.

    You can read the details over at our blog:
    https://blog.freeyourgadget.org/our-dmca-takedown-a-post-mortem.html

  3. It’s odd that the author only pointed out Gitlab as an alternative. There are more open-source solutions, namely:

    – RhodeCode AGPL3, Python
    – Gogs/Gitea, MIT, GO-LANG
    – Kallithea GPL3, Python

  4. Carl, many thanks for this article. I agree completely.

    Let me add two things:

    1. Currently, the “GitHub threat” is accompanied by the “Slack threat”. While not as many free software projects use this proprieraty walled-garden as GitHub, it is alarming to see, that many social and political groups as well as SMBs are using it for all their decision-making discussions. This is a huge privacy problem, because Slack can see, how many people are working in a company or an activism group, how different companies or groups are related, and what they are actually doing. Fortunately, there are people who are working on free and federated alternatives, e.g. Salut A Toi (https://salut-a-toi.org/).

    2. We all should probably move to distributed bug tracking. So far, there are some, e.g. fossil and at least a dozen based on git. No “winner”, so far, but the future looks bright.

    • You’re totally right. In fact “The Slack Threat” is the title of my next english blog post 😉

    • The problem with any distributed bug tracker or messaging system is spam filtering is required whenever adoption hits critical mass.

      Email already has the most spam filtering options available, so I favor sticking and building upon email and being able to choose between SpamAssassin, rspamd, or whatever else comes along. git development itself has always been heavily based around email, and anybody can follow along on https://public-inbox.org/git/ or https://marc.info/?l=git and see how everybody interacts.

      Disclaimer: I run and maintain https://public-inbox.org/git/ but have no special access or rights to the git project or kernel.org.

  5. Thanks for bringing more attention to this issue. One aspect that’s overlooked is GitHub is a centralized MESSAGING platform; like most message boards.

    All messages end up being routed through their servers and with the popularity of users using “noreply” addresses in their commits, developers lose the ability to communicate freely with each other when GitHub goes down (or they use equally centralized and proprietary platforms like Slack or Twitter)

    When a mailing list goes down, participants can still email each other directly and get work done. As a result, hardly anybody notices vger.kernel.org downtime when working on git or the Linux kernel because the culture is reply-to-all. That said, it’d be much better if more developers ran their own SMTP servers instead of letting GMail handle it.

  6. The greatest irony in the setup is that key advantages of using distributed version control are undermined by using a centralised repository for bugs and other key aspects of the development process.

    It’s most unfortunate, but indeed ubiquity comes with lots of side-effects. People join without considering, and many people joining will not have the background or information to even be able to consider.

    For an example of a distributed version control system that has its bug tracking (and other aspects) built-in, Fossil by the author of SQLite, Richard Hipp.
    https://www.fossil-scm.org/index.html/doc/trunk/www/index.wiki
    The approach has specific merits that we should consider, and they can “easily” be applied with Git also.

    Many of the GitHub alternatives are in themselves centralised – yes you can run your own instance, but they still split the code from the bugs and other info. Why.

    Finally, it should not be necessary to have a centralised userbase. It would be good to have a distributed notification system for distributed repos, using signed messages. That way even “politically endangered” projects would be able to exist effectively without an intrinsic risk of being taken out. Secondary hosts can automatically clone and broadcast availability.

  7. I have the pragmatic opinion to use the tool Github as long as it is available, but to establish a second way if Github should close at any time in the future.

    I use the Python script github-backup [1] to backup metadata like issues or pull requests in addition to a simple, native “git clone” or “git pull –all” for each repo (as fetched by a wget call to Github’s API).

    [1] https://pypi.python.org/pypi/github-backup

  8. And now github is Microsoft !?? Wonderful !!!!!!
    * “Multiple punctuation is a sure sign of a diseased mind.” The more the better

  9. Thanks for elaborating all these points 🙂

    Agreeing with all of them, I still remain a big-platform-1st-proponent, precisly because tools like the already-commented “github-backup” and many others exist. The platform-inherent risks are IMHO well-manageable, esp. in GH’s case, because of https://docs.gitlab.com/ce/user/project/import/github.html

    That’s one answer to “How [to] move all the data […] into a new bug tracking system?” although the question remains, whether this scales to large projects that need to move quickly if API-calls are limited.

    IMHO, dominance of GitHub, GitLab, etc. can most effectively be mitigated (but also exploited) by maintaining useful, flexible & compatible export/import tools.

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *