Things About Open Source Projects

Last week, inspired by an email consultation from a CppJieba user (I’m also curious why so many people still prefer email consultation instead of asking through issues.), I refactored the code of CppJieba, highly integrated various APIs, making it simpler and easier for users to get started. Then I took the opportunity to release CppJieba v3.0.0 and NodeJieba 1.0.0.

These are the two projects with the most stars on my GitHub account, and I’ve been maintaining them for almost two years now. Besides making my GitHub profile look better, I’ve gained a lot. GitHub is a treasure trove of knowledge for the entire IT industry, and almost all the knowledge I’ve learned in recent years has come from open source projects on GitHub.

Students often ask me how to learn open source projects, so I just wrote this blog post to talk about some methods for learning open source projects.

Everything starts very simple

I posted a Weibo about Node.js code archaeology:

Software archaeology is really interesting. When checking out very old versions, seeing the Makefile from the early days of Node, I suddenly felt like I had opened the easy mode of a game.

I had a mischievous excitement at that time because I found that Node.js was also very simple at the beginning. But simplicity often means easiness. For example, the early Node.js used Makefile for management. All compilable files were clearly visible. Not like now with node-gyp, with piles and piles of source code, it takes a long time just to understand the relationships between these files.

This is similar to when I first started writing CppJieba. Because of the defects in the C++ standard library and the limited popularity of C++11, many common functionality libraries use libraries like boost. I don’t know if many people have the same feeling as me, but every time I use a C++ code, the most annoying thing is the need to install various dependency codes, and often installation fails due to version issues of the dependent code. This situation is particularly serious on Linux, so much so that vczh once satirized installing software on Linux: “It’s not reading instructions, but reading strategies.”

So from the beginning, I decided not to rely on any third-party libraries and wrote all the functions I needed myself. This is how my commonly used C++ library limonp was born. Although the code was very simple at the beginning, at least my program could run directly without installing any dependency libraries, super lightweight. Of course, the unit tests depend on gtest, but I included the gtest source code into my project code.

It turns out this is very important. As the saying goes: No dependencies, no harm. And later I saw that ideawu’s SSDB is also like this. Such open source projects are those that value user experience.

Good at accumulating your own code snippets

I have a project called practice for accumulating my own code snippets. When I’m exposed to new technologies, I need to write some small practice codes, and I put these practice codes into this repository. In fact, sometimes these small snippets are the seeds of big projects, just like you can see in Node.js code archaeology, the early Node.js was also the author’s experimental code, an experimental code for writing asynchronous IO services using JavaScript. (At the beginning, the code only implemented reading the js source file when the service received a request, and then returning the execution result.)

For example, a simple HTTP server I wrote was pieced together from some small experimental codes, such as socket sending and receiving, HTTP header parsing, using BlockingQueue to construct a thread pool, etc. These are all common codes, but when assembled, they become a project. And writing these code snippets is very helpful for learning new technologies. You can also look back at them later. Now with GitHub, this great community, it’s much more convenient than when I was learning programming before.

Good at standing on the shoulders of predecessors

This is easy to understand, which means being good at using others’ analysis documents. For example, if you want to deeply read Redis source code, you might as well read “Redis Design and Implementation” first before looking at the source code. It’s like when walking through a maze, someone gives you a map, it’s much easier, and you can enjoy the scenery along the way.

But I don’t understand why many people like to bite hard. I only have to bite hard when I encounter a very new open source project and can’t find relevant previous documents. But now open source project authors pay more attention to documentation, and they generally write some source code documentation consciously.

From a learning perspective, open source projects are not better with more stars

Open source projects are not better with more stars, but better if they are easier for you to understand. Generally, the closer they are to your current work, the easier they are to understand. When they are close to your work, reading the source code can bring some inspiration for development or refactoring. You can even know what technologies are used just by looking at some file names, class names, and source code comments, and then search for related technical articles and papers to study in detail. These are all very helpful.

If the source code you read has nothing to do with your work, you won’t understand it thoroughly, and you’ll forget it after a few days. Basically, it’s only useful for bragging.

Blindly pursuing high-performance code is actually not good. “Code is written for people to read, and incidentally can run on machines.” But because performance is too important for many star open source projects, sometimes they have to slightly sacrifice readability to achieve high performance. Moreover, star projects with a large number of stars consider many things, such as compatibility and ease of operation and maintenance, which are not small amounts of code. And these are not the core of the project. When reading source code, it’s better to understand the core as easily as possible.

Some thoughts

I feel that open source project authors are getting more and more popular recently, making open source projects more and more utilitarian.

I hope we don’t forget the original intention of open source projects, which is for better sharing and dissemination of knowledge.

Reprinted with permission: Developer Relations »