Bash is a command line language (you know that black screen) for Unix based operating systems such as Linux. It allows you to control your computer using programmable commands. Being a software developer or system administrator, there are several reasons why learning Bash or Shell and get the best out of the environment through the command line, increasing your productivity.

In addition to programmers and system administrators, learning Bash is a valuable ability for data working with data. Don’t worry, you don’t have to be a hacker to use the terminal and program in command line

What is Bash?

Briefly, Bash is the Command Line Interface (CLI) of Unix, Linux, Centos. It is common to be called a terminal or console, command line or shell. It is a command language that allows us to work with files on our computers in a very efficient way and often more powerfully than using a GUI (User Graphic Interface).

Make the switch from User Graphic Interfaces (GUIS) to a command line interface may seems intimidating, but I assure you that it is a simple learning process and that it soon begins to improve the productivity of professionals in everyday life.

Next, I will present some of reasons to convince you that it is very worth learning Bash:

According to developer research Stack Overflow de 2020, Bash / Shell / PowerShell (ie the family of Linux Command Language Interpreters) is the sixth most used programming language in the general, standing in front of Python and R. Was also associated with higher salaries than Python or R, according to the same research, in addition to obtaining a high classification on the list of most beloved programming language (53,7%).

And while Stackoverlow research covers programmers and software engineers of all natures, the command line is of particular relevance to data scientists because Bash / Shell / PowerShell is strongly correlated with data science technologies]( as such Python, IPython / Jupyter, TensorFlow and PyTorch. These conclusions are also pointed out by Python developer research (2019, 2020) most recent conducted by Python Software Foundation.

Command line skills help build repeatable data processes

Part of a data scientist’s work is to ensure that certain information is available regularly, often daily. Most of the time, this data is purchased, processed and displayed in the same way.

The command line is a suitable tool for this purpose, because command series are easily organized for automatic execution and repeatedly reproduced.

_Conside the following scenario: _

Your company decides to invest in data analysis. Several data professionals will join the team. You, as an administrator of systems and servers, have the task of ensuring that your machines have the work environment with everything they need to start.

If you work with a CLI (command language interpreter), you can write some scripts that will install, configure and test everything automatically.

Otherwise, you will have to resort to a Gui and make the same mouse movements and click repeatedly on several machines - damn read.

This is just an example of how terminals programming can help make data science processes more scalicable and repeatable.

Learning Bash makes you more flexible

In roles as a data scientist, programmer, system administrator; you will often find that you have more flexibility if you can use the terminal instead of having to depend on mouse movements and click on GUIs.

As the command line is a program that runs other programs (this is the origin of the name “Shell”), the interaction between the programs is often easier to adjust through the command line.

After mastering the Bash commands, even the most basic, it is relatively easy to write scripts - small programs that run at the terminal. And shell scripts make the construction of all types of data pipelines and much simpler work flows.

More broadly, knowing how to use Shell offers a second option to interact with your computer.

Even lines of commands that seem impossible to remember and difficult to keep for repeated commands of your daily life, can be easily organized using alias in Bash.

You can always use Gui when you want, but the command line can provide more direct power and control for when you need it.In addition to saving time for repetitive work, as cited in [reason 2].

Working with text files is easy

Plain text files are among the most common data storage and data processing methods. Almost any data science project will involve some work with text files. Being able to deal with text files quickly and efficiently is therefore a very useful ability for a data scientist.

Software developers also benefit from working on facilitated data files between systems or environments.

Shell has very powerful text processing tools, such as AWK and SED, which help to familiarize themselves with files and make data cleaning easier.

For example, the code below uses AWK to print the first and third columns of a file called a_csv_file, where the value of the second field is data, using a comma as a field separator.

awk 'BEGIN {FS=","} {if ($2=="Mazer") {print $1 $3} } a_csv_file'

Uses less processing resources

When you are working with limited computing features or simply want to maximize your speed, using the command line will generally be better than using a graphical interface. This is because using a GUI means dedicating many resources to rendering the graphic output.

This is valid for both working locally and remotely. When connecting remotely to use a graphical interface server, bandwidth consumption will be much larger than traveling only simple text when using the terminals.

In addition, latency, ie the “time interval between requisition and response” will be higher when using a Gui, which can be particularly frustrating if you are trying to control a mouse that is a second or two seconds behind your real movements.

If you are just by typing on the command line, the latency will probably be smaller and will also be easier to control as you know exactly where the cursor is at any time.

Cloud management are made by command line

Cloud services are usually connected and operated through a command line interface.

This is particularly important for more advanced data implementation and data science management work such as deep learning, data mining, where your local computing resources are probably insufficient for tasks are probably insufficientthat you would like to do.

The 2018 article “Tensorflow on AWS”, from Nucleus Research, states that:

According to the same article, “96 percent of Deep Learning today is executed in the cloud.”

In short, if you are going to work with advanced cloud services, command line knowledge will be needed, from moving your data and to the cloud efficiently, to the management and execution of routines in these environments.

Knowledge in Unix Shell are reusable in other shells

There are only a few popular shells (Bash, Zsh, Fish, Ksh, TCSH, CMD, Windows Powershell, etc.) and they are more similar than different, making it easier to switch between them.

For example, the Bash commands you know will work on Unix based machines such as Macs and Linux computers. But many of the same commands also work on Windows at the command prompt and / or Windows Powershell.

This cross compatibility is particularly useful when you are using online services that require some kind of command line interface. Even though their system does not use Bash, it will use a click similar enough for you to work with any or smaller adjustments.

Execution of a large number of actions will be faster in typing than in clicks

The research “Hidden Costs of Graphical User Interfaces: Failure to Make the Transition from Menus and Icon Toolbars to Keyboard Shortcuts” - “Hidden Costs of User Graphic Interfaces: Failure to transition from menus menus and bars of keyboard shortcuts, " shows that the use of the mouse stabilizes very quickly, while using the keyboard, despite its learning curveinitial, tends to be more efficient.

251 Experienced Microsoft Word users have received a questionnaire assessing their choice of methods for commands that occur most often. Unlike expectations, most experienced users rarely used efficient keyboard shortcuts, preferring the use of icon toolbars.

A second study was done to verify that keyboard shortcuts are, in fact, the most efficient method. Six participants executed common commands using menu selection, icon toolbars and keyboard shortcuts. The keyboard shortcuts were, as expected, the most efficient.

In other words: Even if you feel you are working quickly through a Gui, there is a good chance that at least for some tasks you are more efficient on the command line.

Audit and debugging is easier

As it is very easy to track all your activities performed on the command line, audit and clearance (tests) are much easier.

You can easily examine the history record (log) to track each action performed on shell, while if a wrong click leads to an error when working with a Gui, there is probably no record of it.

Error audit and debugging are essential tasks in the daily life of server administrators, programmers and data scientists.In this way, the use of graphic interfaces tends to harm both the quality and safety analysis in the environments of these professionals.

In addition, audit is closely related to security, and manage security through users and user groups in Linux and Unix using Bash is very simple.

Linux / Unix Shell is available everywhere

Although it is integrated only in MAC and Linux machines, Windows users can still use tools like SWL (Windows subsystem for Linux), Cygwin e MinGW - and, as mentioned earlier, many of the Bash commands you will learn work in the native Windows options, such as the command prompt.

This means that command line knowledge you learn can be used on virtually every computers you find (including your personal machine, regardless of the operating system you use).

The command line is simpler than you think and will give your day to day

In general there is a big mistake on the part of beginners that using the command line, or programming to terminal, requires you to know several hundred commands. In fact, although there are hundreds of commands available for use, you are likely to need only a small percentage of these commands to perform most common tasks as a day-to-day administrator or data scientist.

Here’s some parts of the excellent book The Linux Command Line:

When I am asked to explain the difference between Windows and Linux, I usually use an analogy with toys.

Windows is like a game boy.You go to the store and buy a brand new in the box.You take him home, call and play with him.Beautiful graphics, cool sounds.After a while, however, you get tired of the game that came with him, so back to the store and buy another one.This cycle repeats itself indefinitely.

Finally, you go back to the store and tell the person behind the counter: “I want a game that does it!” Just to be informed that such a game does not exist because there is no “market demand” for him. So you say, “But I just would like it to change something!” The person behind the counter says you can’t change it in the game. The games are all sealed in their cartridges. So you find that your toy is limited to the games others have decided that you need.

Linux, on the other hand, is like the largest “setting up” set in the world. You open it and are just a huge collection of pieces. There are many steel supports, screws, nuts, gears, pulleys, engines and some suggestions on what to build. So you start playing with it. You build one of the suggestions and then another.

After a while, you find that you have your own ideas about what to do. You no longer need to go back to the store, because you already have everything you need. The set of assembly assumes the form of your imagination. It does what you want. Your choice of toys is obviously a personal thing, so which toy would you consider most satisfactory?

Ready to learn the “command line”?

Now that I have introduced you to the reasons, why learning Bash / Shell, I will warn you that I am preparing an excellent course from basic to striker, which will fit in to your needs, no matter if you are an advanced user, an administrator, an administrator, an administrator, an administratorof systems or a data scientist.