08.09.2024
Home / Windows overview / The programming style is spaces or tabs. Indentation styles when programming in C. Differences in the behavior of spaces and tabs

The programming style is spaces or tabs. Indentation styles when programming in C. Differences in the behavior of spaces and tabs

Religious wars on the topic of code formatting do not subside among programmers. One of the most pressing questions on this topic is how to indent and align in the source code of programs - tabs, spaces, or one - one, and the other - the other? Each option definitely has its pros and cons.

According to my observation, among seasoned programmers, in most cases, supporters of spaces still win, and the decisive argument here is the visual similarity of aligned lines, regardless of the program settings of each individual programmer. And they prefer to agree on the number of spaces in the indentation within the team. What prevents them from also agreeing on the size of the tab is unclear :)

Probably, the relevance of using spaces arises if programmers have not agreed on the size of the “tab”, but are ready to work with each other’s code that has different indentations. In this case, the code and comments will never crawl, and to maintain such a stable appearance, you will not need to switch editor settings every time. The problem in this case will only arise when merging code blocks written by different programmers into one file.

Differences in the behavior of spaces and tabs

Consider the following illustrative piece of code:

If (mStatus != Status.PENDING) ( switch (mStatus) ( case RUNNING: throw new IllegalStateException("Cannot execute task: " + "the task is already running."); case FINISHED: throw new IllegalStateException("Cannot execute task : " + "the task has already been executed " + "(a task can be executed only once)"); ) )

We can see that text message blocks are easy to read because the beginning of each line is left aligned. When all indents (both code and alignment) are made with spaces, then blocks of text will always be aligned, on any computer and in any source code editor (after all, when writing source code, a monospaced font with spaces of the same width as the characters is traditionally used). But if the size of the indentation is unusual for another programmer, he will not be able to simply change it.

If we align not with spaces, but with tabs, then we ourselves will get the same appearance, but for another programmer whose editor is configured for a different tabulation ratio, for example, not 4, like ours, but 8, the lines will shift greatly to the right, and with a tab ratio of 2 - to the left. At the same time, the second lines of the text message will inevitably creep up. The same thing will happen with the alignment of comments at the end of lines of code. They will no longer be nicely left-aligned throughout a block that has different levels of nesting.

An intermediate option is called creating indentations - tabulation, and alignment - a combined method. The alignment begins with tabs, and when the level of the previous line is reached, with spaces. The disadvantage of this method is the relative complexity of this method and potential confusion - after all, without enabling the visualization of control characters, it is not immediately clear where the tabs are and where the spaces are. It is also unclear how they will behave when copied to other places in the code.

Moreover, the method described above will not prevent comments from being misaligned at the end of lines. And not all programmers who will subsequently edit your code will understand or accept your idea.

So what is the correct way to format your code – with spaces or tabs? Or maybe both? Let's try to figure this out.

History of tabulation - interesting facts

A short excursion into history. Tabulation (horizontal tabulation) was originally introduced in mechanical typewriters for the convenience of creating tables and was also used to create paragraph indentation. It did not have a rigid size. Before work, the user himself set the tab stops he needed using a special mechanism. The same way it can now be done in MS Word. As a result, with successive presses of the tab key (denoted as ←), the carriage, under the action of a spring, automatically moved to the left along all previously set positions, thereby moving the input area to the right.

When the time came for electronic technology, in order not to reinvent the wheel, they began to make it in the image and likeness of typewriters. Moreover, at first this technique was not much different from them. For example, the teletype was actually a symbiosis of the telegraph and the typewriter. Therefore, its developers simply converted all its keys and some other elements (carriage return, carriage return to a new line, and for some reason even the signal for reaching the end of the line) into ASCII codes. The tab key was assigned a code of 9, but since emulating the mechanism for setting custom tab stops was difficult, they decided to simply make it fixed.

Why did the choice of fixed tab size fall on 8 familiar spaces? As a rule, typewriters had a line length of 80 characters. Teletypes, used in the first computers to output information, too, since they were the heirs of typewriters. Even on punched cards, information began to be stored in lines of 80 characters. Thus, typewriters set a certain standard for line length.

Since tabulation was primarily used to create table columns, the most convenient size should be no less than the difference in the length of the most common words if written in a column (so that to move to the next column it would not be necessary to make a different number of tabulations). At the same time, this size should have allowed for the maximum possible number of columns. The third condition is that the size must fit into the length of the line 80 characters a multiple of times.

According to the last condition, the developers had a choice of the following five real options: 4, 5, 8, 10 and 16. The first two options were not very convenient, since the difference in the lengths of English words often exceeded these values. The 16-character indentation seemed excessively large, reduced the usability of the tool, and severely limited the number of columns in the tables. There was a choice between 8 and 10 characters.

The value of 8 characters was the first value that satisfied the word length difference condition. In addition, it was the only one that satisfied the second condition—building the maximum possible number of columns. The option with a 10-character tab looked unreasonably wasteful and gave only 8 columns, which could somewhat narrow the scope of the tab. In addition, the number 10 for the maximum number of columns looked more impressive psychologically. All these reasons, apparently, persuaded the developers of the teletype to use tabulation of 8 characters.

A few words about paragraph indentation, since it also goes somewhere close to our topic. According to OST 29.115-88, when printing on typewriters, the paragraph indentation should be equal to three strokes, but it is allowed to use an indentation of five strokes. In addition, it is prescribed to place 29 (+-1) lines on one page (which approximately corresponds to one and a half spacing on a typewriter). The requirement for an indentation of 3 beats at a one and a half interval is quite strange, and below I will explain why.

In typographic layout, paragraph indentation is equal to one and a half font size, that is, roughly speaking, the vertical line size (i.e., the distance from the bottom of one line to the bottom of another), multiplied by 1.5. Since in typographic layout the half-space is almost never used, and the lines follow immediately one after another, the correct paragraph in this case is approximately equal to the three middle characters of the Cyrillic alphabet.

Probably, the compilers of OST 29.115-88, without going into details of typography, took this value as a constant for monospaced fixed fonts of typewriters, and at the same time established one and a half line spacing as a standard, since with a monospaced font and low print quality, single spacing looked really very Badly.

But increased spacing, based on typographic rules, also requires increased paragraph indentation. At the time the standard was drafted, this increased 5-beat offset was already intuitively widely used. Apparently, this is why the de facto increased indentation of 5 strokes was legalized in the standard.

By the way, typewriters influenced not only the number of columns of the monitor’s text mode, but also its vertical size of 30 lines. After all, when printing with one and a half intervals, that’s exactly how many lines fit on one sheet!

Strategic view of problem solving

Of course, it is most logical to indent code nesting levels with tabs, because that is what it was originally intended for - to start printing text from different positions. In addition, in the computer era, the tab character was very well suited to the role of a logical divisor. The number of consecutive tab characters clearly indicated the level of logical nesting.

From here it becomes obvious that spaces are a necessary crutch. After all, unfortunately, the tab size for each language is not strictly fixed. Programmers move from language to language, often preserving the traditions of the previous language, as well as their preferences. In these conditions, spaces are a measure to fix the appearance of the source and protect against the spread of its formatting.

The solution could be to force the tab size to be tied to the programming language. Ergonomic experts could clearly define the most convenient tab sizes for each language's syntax, and this size could be fixed in the compiler, issuing persistent warnings when it is violated.

Since we cannot influence the strategy for solving the problem, we are forced to somehow get out tactically within the framework of the existing reality. To figure out the right option for yourself, let's look at the issues of spaces and tabs in a little more detail.

Space issues

When a programmer opens someone else's source code with indentation and alignment using spaces, but this indentation turns out to be inconvenient for him, he, in fact, cannot do anything to bring it to a convenient form without performing some complex manipulations.

Another problem with whitespace is errors where one of the indentation spaces is accidentally removed and goes undetected. When you move to a new line, the editor will automatically create the same erroneous indentation as the previous line, and this error can spread over several lines of code. Correcting this error will require some time spent on dull, purely mechanical operations.

In addition, to remove excess indentation, instead of the usual pressing on Backspace, you have to press the Shift+Tab key combination. But this, of course, is a matter of habit.

Tab problems

Tab has only one problem - when its size changes, the alignment of comments and other aligned elements is disrupted, although, as a rule, there are not so many of them that it makes reading the source difficult, and correcting them is not so difficult.

In addition, no one bothers you to simply enable in the editor the tab size that was set by the author, reducing appearance to what it would be if the source code were simply formatted with spaces. Especially for such cases, some programmers write the recommended tab size in the top comment to the code - in my opinion, a very correct decision.

Conclusions

First, I’ll say a few words about the size of indents as such. For most common programming languages, the ideal indentation size is exactly 4 characters. Why exactly? Because it has already turned out that most sources are formatted in this way, and fine adjustment to the subjective “ideal” of 3 or 5 characters loses its meaning.

The occasional 8-character indentation makes the code unnecessarily smeared horizontally and leaves no room at all for comments on the right. In addition, due to the excess of the indentation size over the length of the operators, gaps between operators and operands are exposed in the steps of the nesting ladder, which create visual gaps, which does not benefit the readability of the code.

Also, not a very good solution is to use two-character indentation, which also occurs. In this case, although width space is saved, it becomes quite difficult to navigate the nesting levels, which inevitably causes accelerated fatigue.

If programmers stick to the golden mean of 4 characters, then the debate between proponents of spaces and tabs will become irrelevant. In the meantime, we can give the following recommendations:

  1. Use standard de facto size indents. For Java, Pascal, C, C++, etc. The de facto standard is 4 characters of indentation, no matter how much we would like to use a different size.
  2. If point 1 is followed, then it absolutely does not matter what you use to indent. If you make them spaces, it will be good - your code will look readable everywhere, and nothing will ever crawl. Other programmers will become accustomed to the correct formatting when reading your code. If you indent them with tabs, then it will be even better - you will give other programmers a choice - turn on the correct tab size in the editor and read correctly formatted code, or read it with the usual indents, but creeping comments and individual lines in which alignment was applied.
  3. The choice in point 2 can be made depending on how your code will be used in the future. If its blocks will be inserted into someone else’s code, then it definitely makes sense to select tabulation to make it easier for the inserter to adjust the formatting. If the source is intended for publication on the Internet, where the tab is either eaten up or fixed at 8 characters, then you can select spaces to eliminate the additional step of preparing the source for publication. If your organization has already established certain formatting rules, then you no longer have a choice :)

If, no matter what, you are determined to use your own number of characters in indentation, then the use of spaces and tabs will necessarily depend on how your code is subsequently used by other programmers.

If for regular maintenance in terms of bug fixes, it’s probably better to use spaces. Your code will not be worked on for long, and if it is well formatted, then no one will particularly need to change the size of the indents.

If your code will be actively used by other programmers, inserting it into their programs or simply developing your project, then it is better to use tabs. In this case, it will be easier for another programmer to adapt the type of your code to the standard accepted in a particular group.

If you have one but need another

There is such a wonderful editor - Notepad++. In it, replacing tabs with spaces and vice versa is done with one click, made in the section in the menu “Edit → Operations with Spaces”. I use this editor to distill tabbed pieces of code intended for publication on a blog, since the latter automatically changes one tab character to one space, which is unacceptable.

Conclusion

Many may ask the question, what did the author of this article choose for himself? And he chose a tab size of 4 familiar places. I haven't found a good enough reason to use spaces so that my code will be visible to someone who hasn't deigned to set their code editor's indentation to the de facto standard of 4 characters.

As for Notepad from Windows, which does not have a tab size setting, and other similar editors that force tabs to be 8 characters, I can’t imagine a reason why a programmer would open my code in them, whereas for a long time already All operating systems have convenient specialized editors for source code.

Working in a notepad is not cool. It's cool to work in a hex editor :)

For inquisitive developers, the issue of using tabs and spaces to format code still remains relevant. Can they be interchanged: for example, 2 spaces per tab or 4? But there is no single standard, so sometimes misunderstandings arise between developers. Additionally, different IDEs and their compilers handle tabs differently.

The solution to the issue is usually an agreement on formatting rules within the project or programming language as a whole.

A team of developers from Google examined projects in the Github repository. They analyzed code written in 14 programming languages. The purpose of the study was to identify the ratio of tabs and spaces - that is, the most popular way of formatting text for each language.

Implementation

For the analysis, we used an existing table in which the names of Github repositories are recorded.

Let's remember that about two months ago all open source Github code became available in the form of BigQuery tables.

However, not all repositories were selected for analysis, but only the top 400 thousand repositories with the largest number of stars they received for the period from January to May 2016.

From this table, files containing code in the 14 most popular programming languages ​​were extracted. To do this, the extensions of the corresponding files were specified as parameters of the sql query - .java, .h, .js, .c, .php, .html, .cs, .json, .py, .cpp, .xml, .rb, .cc, .go.

SELECT a.id id, size, content, binary, copies, sample_repo_name , sample_path FROM (SELECT id, FIRST(path) sample_path, FIRST(repo_name) sample_repo_name FROM WHERE REGEXP_EXTRACT(path, r"\.([^\.]* )$") IN ("java","h","js","c","php","html","cs","json","py","cpp","xml", "rb","cc","go") GROUP BY id) a JOIN b ON a.id = b.id

864.6s elapsed, 1.60 TB processed

The request took quite a long time to complete. This is not surprising, since it was necessary to perform a join operation on a table of 190 million rows with a table of 70 million rows. A total of 1.6 TB of data was processed. The query results are available at this address.

The table contains files without their duplicates. Below is the total number of unique files and their total size. Duplicate files were not included in the analysis.

After that, all that remained was to generate and run the final request.

SELECT ext, tabs, spaces, countext, LOG((spaces+1)/(tabs+1)) lratio FROM (SELECT REGEXP_EXTRACT(sample_path, r"\.([^\.]*)$") ext, SUM( best="tab") tabs, SUM(best="space") spaces, COUNT(*) countext FROM (SELECT sample_path, sample_repo_name, IF(SUM(line=" ")>SUM(line="\t"), "space", "tab") WITHIN RECORD best, COUNT(line) WITHIN RECORD c FROM (SELECT LEFT(SPLIT(content, "\n"), 1) line, sample_path, sample_repo_name FROM HAVING REGEXP_MATCH(line, r"[ \t]")) HAVING c>10 # at least 10 lines that start with space or tab) GROUP BY ext) ORDER BY countext DESC LIMIT 100

16.0s elapsed, 133 GB processed

Analysis of each line of 133 GB of code took 16 seconds. The same BigQuery helped to achieve such speed.


Most often, tabs are found in C, and spaces are most often found in Java.

Although for some the ratio of certain control symbols does not matter, and debates on this topic seem far-fetched. This doesn't matter for some IDEs either, which store tabs as a number of spaces. There are also IDEs in which this number can be configured manually.

Some time ago, this problem was played out in the series “Silicon Valley”. The guy and the girl disagreed on the formatting issue. As a result, the old holivar not only led to misunderstandings professionally, but also created problems in their personal relationships.

One of the hallmarks of a good programming style is consistency—the fewer surprises, the better. Consistency makes a program easier to read, primarily by reducing distractions. It also guides the reader's eyes, for example, consistency in the location of a function or connection of files makes them easier to find in the future. It also makes it easier to solve problems of style, helping the reader get used to it more easily.

Code clarity

Good style is necessary to ensure that your program is clear, understandable, and easy to change. If in doubt, choose the most understandable technique for the problem. Remember that everything you write, you will probably have to re-read. Make your future easier and gain clarity right away.

Spaces and Formatting

Whitespace can reduce strain on the reader's eyes. Since the compiler ignores spaces, you are free to place them anywhere and format your code with spaces however you want. If you do it wisely, it will be helpful.

Spaces are used to format indentation, space around statements, function signatures, and place function arguments. Of course, these aren't all the places where whitespace is used, but it should give you an idea of ​​the places where whitespace can be used to improve readability.

Indentation

If you don't already indent your code, you will soon. This is absolutely necessary because it will help you quickly find the necessary control lines of the code or find errors in it. You should indent every block of code:

If (true) ( ​​// code block )

Bracket styles

There are many ways to indent code, and many of them depend on where you place the parentheses. Some people prefer the style used above. Some people prefer the style shown below:

If (true) ( ​​// code block )

There are other styles:

If (true) ( ​​// code block )

Which brace style you choose is up to you, although I recommend using the same brace style for everyone working on the same project. Anyway, there are arguments for each style. It's good to use a bracket style that allows you to fit as much code on the screen as possible, but consistency is just as important.

Indent Width

How much you're going to indent is a matter of personal preference—in any case, it's generally best to choose an indent size small enough that the code will fit on one screen. I consider any indentation width between 2 and 8 to be reasonable and readable, although I have found that more than four spaces for indentation can result in lines that are too long.

In general, the best solution for lines that are too long is to reduce the complexity of the code, or at least pull some of the functionality into separate functions. This will reduce the number of indentation levels and can make the code more readable (if done correctly).

Tabs and spaces

There is somewhat of a controversy over the use of tabs or spaces. Note that this is not the same as asking whether you indent with spaces or tabs. Most people let the text editor figure this out for them (or choose to convert tabs to spaces).

The real problem with tabs and spaces is what happens when someone opens your code. You can set tabs for any number of columns, and the person opening your code can have a different tab width. This can play havoc even with well-formatted code. Using only spaces fixes this problem because everything will appear the same.

Sometimes a decent code formatter can be found in text editor, can help mitigate this problem by reformatting the code. You can also change your own settings to display the code correctly (although this is not very nice).

The best solution, if you do decide to use tabs, is to be very careful about what you use them with. The real problem, in fact, arises when tabs are used not only for indentation, but also as a quick way to move four or eight characters to the right. For example, let's look at the following code:

If (long_term_one && long_term_two) ( // code )

If the second condition was formatted using a tab indented four spaces followed by a space, then when opened in another editor with a tab width of eight spaces, the code will look ugly:

If (long_term_one && long_term_two) ( // code in the body of the selection statement )

If spaces were used for formatting, the code will display correctly:

If (long_term_one && long_term_two) ( // if statement code)

Incorrect use of spaces

How many spaces you use is up to you. But, it is worth being aware of some issues. First, the more white space that helps emphasize the logic of your program, the better. But you don't want your gaps to disorient you. It may not confuse you at the moment, but may confuse you in the future, or anyone else who reads your code. What does bad formatting look like?

If(true)++i; ++j;

The indentation tells us that the two expressions will fire when the conditional statement is executed, but that's not what actually happens. When you're tracking down a syntax error across hundreds or thousands of lines of code, you end up just skimming the code instead of carefully checking each line. The easier it is for you to review your code and pick out the important details, the faster you can find bugs that creep in unnoticed. As in this example, only the first statement belongs to the body of the if statement.

There are times when you don’t want to change styles for the sake of a single element, or you need to insert several spaces in the text for reasons of aesthetics or text formatting style. And then the question arises: “How to add white space in HTML so that the text is displayed beautifully, and at the same time avoid code redundancy?” To do this, let's look at the types of spaces and examples of their use in HTML code.

HTML non-breaking space

In cases where you need not to separate parts of the text from each other, it will help non-breaking space, the code of which looks like this:

This is the so-called "non breaking space".

Examples of using non-breaking space:

Etc. because E. Veltistov 11 thousand rubles

Thin space

The HTML whitespace code we covered above is ubiquitous. But there are times when a regular space turns out to be too “big”. Then it is replaced by thin space. This is a space that is one-quarter the width of the font used. A thin space is indicated as follows:

and is used, for the most part, to separate the digits of numbers, for example, “$15,000,000” should be written like this:

$15,000,000

Note: The thin space may not display correctly in older versions of some browsers, but in all latest versions works great.

Other types of spaces in HTML

In addition to the most relevant types that we discussed above, there are others.

  •   - space the length of the letter N;
  •   - space the length of the letter M;
  • ‌ - zero-length non-connecting character;
  • ‍ is a zero-length connecting character.

Note: If you need to put multiple spaces in a row, surround the text with a tag

:

Website builder "Nubex"

Space using CSS

The option of creating tabs (indentation) using CSS can be solved using the following technique:

Website builder "Nubex"

".
I wanted to respond to the comments, but due to the volume and desire to be independent from the original topic, I decided to create a new topic.

So, under the cut - why tabs are better than spaces, the most significant misconceptions about tabs and how to use them correctly.

Let's start with the fact that most people (at least on Habré) prefer tabs.

In fact, the strange thing is that many still do not distinguish between indentation and alignment. Well, this is indentation:
for (int i = 0; i< 10; i++) { if (a[i] == 0) do_something(i); }

And this is alignment:
int some_variable = 0; int v1 = 0;

The first can be done with both tabs and spaces, but when you do it with tabs, everyone can adjust the width of the indent to their own taste and nothing goes anywhere. And the second - strictly with spaces.

The IDE has a Smart Tabs option for this:

If you use tabs correctly (namely, only for indentation), you can easily change the size of the tabs without violating your programming style.

2 spaces per tab:

5 spaces per tab:

9 spaces per tab:

So what problems are we missing out on?

1. Each programmer can adjust the tab length to suit his taste. Always works in practice. When the code is highly nested, you can set the tab width to two spaces, otherwise - to four.
2. It is easier to work with third-party libraries. Some libraries support a style with a tab width of two spaces, some with a width of four spaces. Only the use of tabs does not impose restrictions on the style.

I'll quote a couple of thoughts from the previous topic:

It is difficult to work with projects that use libraries that contain tabs in the test. Let's say that in one library the tab is 3 characters, in another it is 4 characters. And you use 2 symbols in the project. As a result, some part of your code will be displayed in the editor with incorrect formatting.

In fact, in projects that use tabulation there are no such problems - since tabulation is dimensionless, but simultaneously supporting a couple of libraries with different space-tab sizes becomes problematic, because You can no longer use tab (so that the IDE replaces tabs with spaces). Of course, there is a chance to solve this problem with different projects with different settings, but this is still a crutch, and it still blows your mind from the different nesting sizes.
It's easy to let a goat into the garden. Let's say your tab is equal to 4 spaces. Someone tweaked something a little by using a different tab size or explicitly inserting spaces. Everything looked fine for him, but your line of code will go somewhere.

Likewise, tabulation is dimensionless. This problem only occurs in projects that use spaces. Where tabs are used, they can be at least 2 or 10 characters wide.
You need to constantly adjust various editors to the tab size you need. Even if you just need to look at the code without editing. Otherwise everything will fall apart. This is especially inconvenient when you have to do something with your code on a third-party machine.

Let's say I open Kate to quickly fix the code in some file. Oops, the tab size is two spaces. You need to go into the config. And in the next file from another library there are four spaces. You'll have to use a space instead of a tab for indentation, terrible. There is no such problem with tabs.
Extra complications for those who work simultaneously with projects where coding standards require different indentations. If the standards require the use of tabulation, then this is still that ever-aching tooth. In the case of spaces, again everything is much simpler.

As discussed above, this problem exists specifically with problems, and not with tabs.

In addition, spaces have such disadvantages as the impossibility of quickly moving with the keyboard arrows (clicks each space, and not through a block), the possibility of making a mistake (putting 3 spaces in one place instead of 4, which destroys the further structure), an increase in file size and a lot of just more.

Conclusion

Spaces do not have any significant advantage over tabs, and we do not constrain the programmer into a framework and do not force him to suffer with tabs that are too small (or too large) for him.

Main

It doesn't really matter what you use. It is important that you keep track of the order of your code and always stuck to the same style. Turn on the display of tabs/spaces, sometimes change the tab size to a different one, and run your eyes through the code to make sure that you haven’t inserted spaces instead of tabs or tabs instead of spaces somewhere.

UPD: note according to comments

I have long wanted to write an article about tabs. But not about “Tabs VS Spaces”, but about how to use tabs correctly. The comments confirmed that many did not know about indentation and alignment. The point of this article is not at all that everyone who uses tabs is right. There are coding standards, there are language features, and there are personal preferences.
The most important thing is to know the rules for indentation and be able to use them. And never mix the two styles. Note - not “don’t mix tabs and spaces”, but don’t mix the two styles.
Personally, I recommend using the approach described in the topic, but only if the standards of the code you are working with do not imply something different.