Author: Michael Carraz

Infographic: Who is behind open-source software?

Post author By Michael Carraz
Post date June 17, 2020
No Comments on Infographic: Who is behind open-source software?

In our 18th survey wave, we’ve asked developers whether they contribute to open-source software, and if so, why? In this post, we’ll explore who the contributors to open-source software are, their reasons for contributing, and finally what open-source support they expect from companies.

Open-source contributors tend to be younger than non-contributors.

More than a third (33%) of developers who contribute to open-source software are less than 24 years old as compared to 26% of non-contributors. This is not to say that they are inexperienced programmers; 41% of open-source contributors have 1 to 5 years of experience, 4 percentage points higher than non-contributors.

Contrary to what one might think, open-source contributors are not necessarily professionals. In fact, they are equally likely to be amateurs than non-contributors. You don’t have to be working professionally in the software industry to be involved and contribute to open-source software development.

Open-source contributors are more likely to be involved in multiple development areas than non-contributors. However, open-source contributors are significantly more likely to be involved in emerging sectors such as machine learning/AI and AR/VR, where innovations are mostly driven by open-source tools.

Finally, as you’d expect, developers’ likelihood of contributing to open-source software is also reflected in their activity on the most popular open-source hosting site, Github. The correlation is clear. Two-thirds of developers who don’t contribute (67%) have no personal public repositories on Github, whereas close to half of the contributors (48%) have two or more public repositories. We observe a somewhat similar relationship with Stack Overflow. Non-contributors are significantly more likely to not use the Q&A site at all or visit the site but not have an account. On the other hand, open-source contributors are twice as likely as developers who don’t contribute to have earned at least one badge (30% vs 15%). Working on open-source projects encourages developers to actively engage with their peers on Q&A sites. We’ve seen which developers contribute to open-source software projects. Let’s now dive into the reasons for contributing.

Why contribute to open-source software

Developers are most motivated to contribute to open-source projects to improve coding skills (29%) and a belief in the benefits of open-source (26%). What’s more, 22% of developers contribute to open-source software because it’s fun or to solve an issue with an existing open-source software project such as fixing a bug or creating a new feature.

By contrast, financial compensation is the least important motivation. Only 3% of developers are getting paid for their work on open-source projects. As it turns out, developers are more likely to get involved in open-source projects to build their reputation (14%) or to network (11%) rather than for direct financial gain. Furthermore, developers who get paid to contribute are almost 20 percentage points less likely to think it’s fun than those who contribute for other reasons. They are also significantly less likely to believe in open-source as a source of freedom, as an ideological imperative.

Typically developers don’t contribute to open-source for a single reason but are motivated by multiple factors. For example, half of the developers who contribute to open-source for improving their coding skills also think it’s fun. 56% of contributors who want to network also feel like it makes them belong somewhere.

What developers expect from companies

In our Q4 2019 Developer Economics survey, we also asked developers what open-source support they expect from companies. Thirty-three percent of developers not contributing to open-source don’t expect anything from companies, as compared to 15% among open-source contributors. That said, two-thirds of non-contributors still think that companies should be involved and provide support to the open-source software movement; they realise how important open-source is and believe that companies should be a part of it.

On the other hand, 44% of open-source contributors expect companies to support and contribute to open-source communities. This increases to 55% for developers who contribute to solve an issue. Many contributors (44%) expect full documentation on how to use open-source software on companies’ products or services. This is especially important to developers who get paid for their work (53%).

Interestingly, open-source developers do not necessarily expect companies to build products and services upon open-source software (39%). This is the least important vendor expectation from developers in terms of support for open-source software.

Open-source software contributors are a diverse group of people. Their motivations to contribute range from learning, having fun, solving issues to building relationships and reputations. In summary, developers have plenty of reasons to contribute to open-source, and they expect companies to support them along the way.

If you are involved in open-source and want to share your views, visit our latest survey and help shape the trends.

Tags developer community, developer survey, infographic, open source

Tips

Where do ML developers run their code?

Post author By Michael Carraz
Post date June 8, 2020
No Comments on Where do ML developers run their code?

In this blog post we’ll explore where ML developers run their app or project’s code, and how it differs based on how they are involved in machine learning/AI, what they’re using it for, as well as which algorithms and frameworks they’re using.

Machine learning (ML) powers an increasing number of applications and services which we use daily. For some organisations and data scientists, it is not just about generating business insights or training predictive models anymore. Indeed, the emphasis has shifted from pure model development to real-world production scenarios that are concerned with issues such as inference performance, scaling, load balancing, training time, reproducibility, and visibility. Those require computation power, which in the past has been a huge hindrance for machine learning developers.

A shift from running code on laptop & desktop computers to cloud computing solutions

The share of ML developers who write their app or project’s code locally on laptop or desktop computers, has dropped from 61% to 56% between the mid and end of 2019. Although the five percentage points drop is significant, the majority of developers continue to run their code locally. Unsurprisingly, amateurs are more likely to do so than professional ML developers (65% vs 51%).

By contrast, in the same period, we observe a slight increase in the share of developers who deploy their code on public clouds or mainframe computers. In this survey wave, we introduced multi cloud as a new possible answer to the question: “Where does your app/project’s code run?” in order to identify developers who are using multiple public clouds for a single project.

As it turns out, 19% of ML developers use multi cloud solutions (see this multi-cloud cheat sheet here) to deploy their code. It is likely that, by introducing this new option, we underestimate the real increase in public cloud usage for running code; some respondents may have selected multi cloud in place of public cloud. That said, it has become increasingly easy and inexpensive to spin up a number of instances and run ML models on rented cloud infrastructures. In fact, most of the leading cloud hosting solutions provide free Jupyter notebook environments that require no setup and run entirely in the cloud. Google Colab, for example, comes reinstalled with most of the machine learning libraries and acts as a perfect place where you can plug and play to build machine learning solutions where dependency and compute is not an issue.

While amateurs are less likely to leverage cloud computing infrastructures than professional developers, they are as likely as professionals to run their code on hardware other than CPU. As we’ll see in more depth later, over a third of machine learning enthusiasts who train deep learning models on large datasets use hardware architectures such as GPU and TPU to run their resource intensive code.

Developers working with big data & deep learning frameworks are more likely to deploy their code on hybrid and multi clouds

Developers who do ML/AI research are more likely to run code locally on their computers (60%) than other ML developers (54%); mostly because they tend to work with smaller datasets. On the other hand, developers in charge of deploying models built by members of their team or developers who build machine learning frameworks are more likely to run code on cloud hosting solutions.

Teachers of ML/AI or data science topics are also more likely than average to use cloud solutions, more specifically hybrid or multi clouds. It should be noted that a high share of developers teaching ML/AI are also involved in a different way in data science and ML/AI. For example, 41% consume 3rd party APIs and 37% train & deploy ML algorithms in their apps or projects. They are not necessarily using hybrid and multi cloud architectures as part of their teaching activity.

The type of ML frameworks or libraries which ML developers use is another indicator of running code on cloud computing architectures. Developers who are currently using big data frameworks such as Hadoop, and particularly Apache Spark, are more likely to use public and hybrid clouds. Spark developers also make heavier use of private clouds to deploy their code (40% vs 31% of other ML developers) and on-premise servers (36% vs 30%).

Deep learning developers are more likely to run their code on cloud instances or on-premise servers than developers using other machine learning frameworks/libraries such as the popular Scikit-learn python library.

There is, however, a clear distinction between developers using Keras and TensorFlow – the popular and most accessible deep learning libraries for python – compared to those using Torch, DeepLearning4j or Caffe. The former are less likely to run their code on anything other than their laptop or desktop computers, while the latter are significantly more likely to make use of hybrid and multi clouds, on-premise servers and mainframes. These differences stem mostly from developers’ experience in machine learning development; for example, only 19% of TensorFlow users have over 3 years of experience as compared to 25% and 35% of Torch and DeepLearning4j developers respectively. Torch is definitely best suited to ML developers who care about efficiency, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation.

Hardware architectures are used more heavily by ML developers working with speech recognition, network security, robot locomotion and bioengineering. Those developers are also more likely to use advanced algorithms such as Generative Adversarial Networks and work on large datasets, hence the need for additional computer power. Similarly, developers who are currently using C++ machine learning libraries make heavier use of hardware architectures other than CPU (38% vs 31% of other developers) and mainframes, presumably because they too care about performance.

Finally, there is a clear correlation between where ML developers’ code runs and which stage(s) of the machine learning/data science workflow they are involved in. ML developers involved in data ingestion are more likely to run their code on private clouds and on-premise servers, while those involved in model deployment make heavier use of public clouds to deploy their machine learning solutions. 31% of developers involved across all stages of the machine learning workflow – end to end – run code on self hosted solutions, as compared to 26% of developers who are not. They are also more likely to run their code on public and hybrid clouds.

By contrast, developers involved in data visualisation or data exploration tend to run their code in local environments (62% and 60% respectively), even more so than ML developers involved in other stages of the data science workflow (54%).

Developer Economics 18th edition reached 17,000+ respondents from 159 countries around the world. As such, the Developer Economics series continues to be the most global independent research on mobile, desktop, industrial IoT, consumer electronics, 3rd party ecosystems, cloud, web, game, AR/VR and machine learning developers and data scientists combined ever conducted. You can read the full free report here.

If you are a Machine Learning programmer or Data Scientist, join our community and voice your opinion in our current survey to shape the next State of the Developer nation report.

Tags ai, bigdata, cloud, data science, machine learning

Business

Ethics in AI

AI is a powerful and disruptive technology altering the landscape of application development and the wider world as we know it. The adoption of AI is increasing at a fast pace. While AI helps developers in every area of society to create solutions, implement change, and drive progress, it also forces us to think more deeply about our relationship with technology and the ethics of AI.

Indeed, adoption and availability of tools to build AI have caught up with the promises of the field and what once seemed unachievable is now within reach. As a result, many people are concerned and are actively discussing the implications of AI and to what standard we must hold ourselves in order to ensure that AI is aligned with our widely shared human values.

WHERE DO DEVELOPERS STAND ON ETHICS IN AI?

Their views are surely of the utmost importance because they are, after all, on the front line of building and implementing the algorithms that underlie AI products. In the 16th edition of our Developer Economics survey, we asked developers to what degree they agree or disagree with issues such as AI’s unintended consequences, algorithm bias, and jobs replacement, as well as their views about data collection and protection.

WE GOT THE BASICS RIGHT

It should give us peace of mind to know that the vast majority of developers take user rights very seriously. Developers agree that they should not only ask for user consent to collect data and follow security and data protection laws but that they should also go above and beyond legal requirements – 72% of developers told us so. Scandals such as the Facebook/Cambridge Analytica one have indicated that regulations are lagging behind and it is very encouraging that developers are aware of their ethical responsibility while regulators are still trying to catch up.

When it comes to AI specifically, however, developers have diverging opinions on a range of topics.

CAN AI BE TAUGHT TO BEHAVE AS THOUGH MORAL & HUMAN-FRIENDLY?

No topic divides developers more than the unintended consequences of AI. When asked whether AI can be taught to behave as though moral and human-friendly, developers’ responses split almost equally among those who agree (33%), those who neither agree nor disagree (40%), and those who disagree (27%). While such distribution of opinion could be expected from the general population, one might expect developers to have a more unified view as they possess a better technical understanding of what ML/AI can and cannot achieve.

Looking at the breakdown of developers’ opinions by age group we find that individuals who are under 25 years old have a much more positive outlook (45% agree) than those who are over 35 years old (28%). Where developers live is another differentiator: Europeans are more neutral (42% neither agree nor disagree) whereas South-Asia has the highest percentage of developers who agree that AI can be taught to behave as moral and human- friendly (49%). These differences may be the result of the type of involvement in ML/AI as developers in South-Asia are more likely to be using ML for medical diagnosis and prognosis, object recognition/image classification and NLP (Natural Language Processing), whereas Europeans are more likely to be working in more ‘traditional’ ML fields such as fraud detection.

Responses of ML/AI developers and data scientists also differ when considering their types of involvement (as professionals, hobbyists or students) and their use cases. Half of developers who teach AI, ML or data science have favourable views towards the ability of AI to behave in a moral and human- friendly way – in fact, teachers are twice as likely to strongly agree compared to all developers involved in ML/AI. On the other hand, developers who build machine learning frameworks are more likely to strongly disagree (12% vs. 8% for all developers).

Another very interesting insight is that more than half (56%) of ML developers who work in bioengineering and/or bioinformatics agree that AI can be taught to behave morally and be human-friendly. This is worth noting as these developers develop ML/AI that applies engineering principles of design and analysis to biological systems and therefore are likely to have a deeper understanding of the feasibility of such a lofty goal.

A burning question is “Will AI steal your job?”

Discover the answer and more details on the Ethics in AI, on our State of the Developer Nation 16th Edition report.

It’s free and full of insights.