14.09.2024
Home / Miscellaneous / How to open utf 8 in excel. Changing encoding in Microsoft Excel. Changing encoding in Word

How to open utf 8 in excel. Changing encoding in Microsoft Excel. Changing encoding in Word

Users working in browsers, text editors and processors often face the need to change text encoding. However, when working in the Excel spreadsheet processor, such a need may also arise, because this program processes not only numbers, but also text. Let's figure out how to change the encoding in Excel.

Lesson: Encoding in Microsoft Word


Text encoding is a set of electronic digital expressions that are converted into user-friendly characters. There are many types of encoding, each of which has its own rules and language. The ability of a program to recognize a specific language and translate it into characters understandable to the average person (letters, numbers, other symbols) determines whether the application can work with a specific text or not. Among the popular text encodings, the following should be highlighted:

  • Windows-1251;
  • KOI-8;
  • ASCII;
  • ANSI;
  • UKS-2;
  • UTF-8 (Unicode).

The last name is the most common among encodings in the world, as it is considered a kind of universal standard.

Most often, the program itself recognizes the encoding and automatically switches to it, but in some cases the user needs to indicate its type to the application. Only then will it be able to work correctly with encoded characters.

The greatest number of problems with decoding the encoding in Excel occurs when trying to open CSV files or export txt files. Often, instead of ordinary letters, when opening these files through Excel, we can observe incomprehensible characters, the so-called “krakozyabry”. In these cases, the user needs to perform certain manipulations in order for the program to begin displaying data correctly. There are several ways to solve this problem.

Method 1: Changing Encoding Using Notepad++

Unfortunately, Excel does not have a full-fledged tool that would allow you to quickly change the encoding of any type of text. Therefore, for these purposes, you have to use multi-step solutions or resort to the help of third-party applications. One of the most reliable ways is to use the Notepad++ text editor.


Even though this method is based on the use of a third party software, it is one of the simplest options for transcoding the contents of files into Excel.

Method 2: Using the Text Wizard

In addition, you can make the conversion using the program’s built-in tools, namely the Text Wizard. Oddly enough, using this tool is somewhat more complicated than using the third-party program described in the previous method.

Method 3: saving the file in a specific encoding

There is also the opposite situation, when the file does not need to be opened with the data displayed correctly, but saved in the specified encoding. You can also perform this task in Excel.


The document will be saved on your hard drive or removable storage device in the encoding that you defined yourself. But you need to take into account that now documents saved in Excel will always be saved in this encoding. In order to change this, you will have to go into the window again "Web Document Options" and change settings.

There is another way to change the encoding settings of saved text.

In fact, the question is not as trivial as it might seem at first glance. The CSV format, as its name suggests, uses the comma (,) character as a delimiter. However, many programs and services allow the use of other characters. MS Excel can also be classified as such, especially if we are talking about its Russified version. In this article I want to provide a solution to the problem of opening a CSV export file from Google Webmaster Tool in MS Excel. However, the topic is also relevant for other options.

  • Problem exporting search queries (SQ) from Google webmaster
    • Encoding problem
    • Comma delimiter in CSV
    • What helped
  • Let's sum it up

Problem exporting software from Google webmaster

Many of us use Google Webmaster Tool only for adding a site. This is wrong, there is a lot of useful information here, for example, a list of search queries that users use to find and access the site. See the report: Search traffic > Search queries- by first selecting the desired site from the list.

There is also data export in CSV format, with the ability to open it in Google Sheets (online analogue of Excel). Look for the " Download this table" By the way, the number of queries displayed on the page does not affect the completeness of the exported data.

The problem is that the CSV file uses a comma delimiter and is UTF-8 encoded. So even in Google Sheets, instead of Cyrillic, you will only get a set of questions.

In other words, we need to change the encoding to ANSI. And to open a CSV file in Excel, replace the comma delimiter (,) with the semicolon character (;).

Comma delimited CSV in UTF-8

Encoding problem

Easiest to solve problem with encoding. For this you can use any text editor with this feature, for example Notepad++. You can download it for free on the official website: unotepad-plus-plus.org. Next, launch the editor and open our CSV file in it, and then click the menu item “ Encodings" and change to the desired one, i.e. paragraph " Convert to ANSI».

My picture shows the reverse process: conversion from ANSI to UTF-8 - but I think you noticed this yourself, but the principle is the same.

Comma delimiter in CSV

Now, as for comma delimiter. Basically, you don't need to change anything for Google Sheets. Yes, this can work in the English version of MS Excel, check it out. However, if a replacement is needed, then it should be taken into account that simply replacing the comma (,) with the semicolon character (;) will not work, because there may be values ​​in the file that contain this character. They are usually placed in quotation marks. For example:

value,1,"value, with comma",

In theory, setting up the operating room itself should help here Windows systems: Start > Control Panel > Regional and Language Options. On the “Formats” tab, click the “ Change this format..."In the window that opens, on the "Numbers" tab, replace " List element separator» to the desired one, i.e. replace the semicolon symbol (;) with a comma (,).

True, it didn't work for me. And changing operating system settings, in my opinion, is not a good thing. However, I decided to indicate this solution to the problem, because... Most specialized forums refer to it.

What helped

Alas, but some additional processing I did not find any files with the .csv extension in MS Excel. However, it is possible to process it, but only if you change the file extension, for example, to .txt. So we change the file extension and open it in Excel.

The “Text Wizard (Import)” window will appear. Here you can select the source data format, the initial import line and the file encoding (we need 1251: Cyrillic). By the way, why was it necessary to change the encoding when it can be selected in the wizard? Because, at least for me, there is no UTF-8 encoding here. Click the button " Next».

In the next step, we can select a separator character, and several. Check the box " comma" and uncheck " tab character" Here you can also select the line delimiter character. This is the one that is used for values ​​that contain special characters (double quotes, comma, semicolon, newline, etc.). Option with double quote, in my case, this is what I need. Pay attention to the “Data Parsing Sample” field; the data has become a table. Click the button " Next».

At the last step, you can specify the format of the column data, although there is not much choice here: general (default), text, date, and skip column. The last option seems to me the most interesting, because... allows you to immediately exclude unnecessary columns. Simply select the appropriate column and click on the “ skip column" Click the button " Ready».

Excel will generate the table we need, where we can set the desired column width and cell format, but that's a completely different story. I will only note that there is a problem with recognizing the same percentages.

Online service for normalizing CSV files

However, all the above problems can be solved by a simple online service Normalization CSV. It allows you to change the encoding and delimiter character. True, there is a limitation in 64 000 bytes (how much is this in KB?) for a downloadable file, but CSV files usually don't weigh very much, it should be enough. The processing result is output as a regular text file; it can be saved with the .csv extension.

By the way, the script converts the percentage value, for example: 95%, into the desired value for the “percentage” column in Excel, i.e. divided by 100 and we get a floating point number, for example: 0.95. If any other changes are needed, write to us and we’ll try to improve it.

Well, if the offered online service does not suit you, you can always use the above instructions.

Let's sum it up

Some online services provide the ability to export data in a simple and convenient CSV format. As the name suggests, it uses a comma (,) as a separator. However, many applications interpret this format somewhat arbitrarily, which leads to natural problems. In addition, UTF-8 encoding is now increasingly popular, while Excel or Google Sheets use ANSI.

For example, Notepad++ will help you solve a problem with encoding, and in order to deal with the delimiter, you just need to change the file extension to .txt and use the text wizard in Excel. However, all these problems can be solved by a free online service. Normalization CSV, however, it is worth considering its limitation on the size of the downloaded file. That's all I have. Thank you for your attention. Good luck!

at 19:44 Edit message 16 comments

Files and documents created on a computer always have their own encoding. It often happens that when exchanging files or downloading them from the Internet, the encoding in which they were created is not readable by our computer. The reasons for this may be different - either the lack of the necessary encoding in the program with which we want to open the file, or simply the absence of some program components (an additional font package, for example).

Below we will look at how to change the encoding of an unreadable file or document in different programs.

Changing the encoding on a browser page

For Google Chrome

  1. Select the menu “Settings” → “Tools”.
  2. The line “Encoding” - we point the mouse, and a list of possible encodings appears in the browser.
  3. We select “Windows 1251” for Russian sites. If it doesn’t help, try “Automatic”.

For Opera

  1. Click “Opera” → “Settings”
  2. On the left menu “Websites” → field 2 “Display” → “Customize fonts”
  3. In the “Encoding” field, select “Cyrillic (Windows 1251)”.

For Firefox

  1. Firefox → Settings → Content.
  2. Opposite the “Default font” line, click the “Advanced” button.
  3. At the bottom of the window, select “Encoding” &rarr “Cyrillic (Windows 1251)”.

Changing encoding in Word

Let's look at the procedure for changing the encoding using Word 2010 as an example.

  1. Open the document.
  2. “File” tab → “Options”.
  3. Select the “Advanced” line. In the “General” section, next to the line “Confirm file format conversion when opening”, check the box. Click OK.
  4. Next, the “File Conversion” window will open. Select “Encoded Text” and click OK.
  5. Next, in the window that opens, mark “Other” and select from the list the encoding that will display the required text. In the “Sample” window you can see how the text is displayed in a particular encoding that we have chosen.

If the above procedure did not help display the document, you can try changing the font. Sometimes a document may appear as “squares” or other symbols if the program does not have the appropriate font.

Changing encoding in Excel

Let's look at the procedure for changing the encoding for Excel 93-2004 and 2007:

  1. Open an unreadable document using Notepad++.
  2. Select the menu Encoding → Convert to UTF-8.
  3. The characters will not change, only the encoding at the bottom of the screen will change. Next, select a character set. If it is Russian: Encoding → Character sets → Cyrillic → Windows-1251.
  4. Click "Save". Open the file in Excel. If the text is not readable, try repeating steps 3-4.

Changing text encoding

  1. Open the file in the standard word processor Notepad.
  2. Click “Save As”.
  3. In the saving window that opens, select the location where we want to save the file, the document type - text, and also set a different encoding type.
  4. Save.
  5. Let's try to open the document.

Read more article

VKontakte Facebook Odnoklassniki

With the transition to everything new Windows versions the severity of the problem of the existence of multiple encodings of the Russian language has almost disappeared

Often, when working with various files in Excel, the user may encounter the fact that instead of clear letters there will be an incomprehensible font that is impossible to read. This is due to incorrect encoding. In this article we will talk about how to change the encoding in Excel so that words can be read.

Method one: using Notepad++

It so happens that it is easier to change text encoding in a third-party program than in Excel itself. That is why we will now look at how to change the encoding in Excel using the Notepad++ program.

So, to perform all the steps correctly, follow the instructions:

  1. Launch Notepad++.
  2. Click on the "File" button.
  3. In the menu that appears, select “Open” (this can also be done by pressing the key combination CTRL+O).
  4. In the Explorer window that appears, navigate to the required file, the encoding of which is incorrect, and click the "Open" button.
  5. In the file that opens, click the "Encodings" button, which is located above the toolbar.
  6. In the menu, select the “Convert to UTF-8” item, since this is the encoding that the Excel program accepts with a bang.
  7. Click the "Save" button on the left side of the toolbar and close the program window.

That's all, now you know how to change the encoding in Excel using one of the following methods. Although it is the fastest, most convenient and simplest, there are others that cannot be kept silent about.

Method two: using the "Text Wizard"

Directly in the very Microsoft program Excel also has tools that allow you to change the encoding. This is exactly what we will talk about now, or rather, about the “Master of Texts”:

  1. Open Excel. Please note that it should be launched not by double-clicking on the file that is displayed incorrectly, but directly by the program with a blank sheet.
  2. Go to the "Data" tab.
  3. Click the "Get External Data" button and select "From Text" from the drop-down menu.
  4. In the Explorer window that appears, specify the path to the file that has encoding problems and click the "Import" button.
  5. Now the “Text Wizard” window itself will open. It is necessary to change the file format in it, which is why click on the file format of the same name and select “Unicode (UTF-8)” from there.
  6. Click Next.
  7. Also skip the next window by clicking the “Next” button; all settings in it should be by default.
  8. Now you need to determine the data format of the file column and, in accordance with it, select one of the items in the list of the same name. Finally, click “Done”.

After all this, you need to specify the very first cell so that the data fits at the beginning of the table.

It’s clear how to change the encoding in Excel using the second method. Let's move on to the third.

Method three: saving

How to change encoding in Excel? To implement the third method, you need to perform the following steps:

  1. Open the file in the program.
  2. Click "File".
  3. Select "Save As".
  4. In the menu that appears, select the extension and click "Tools", from the drop-down menu - "Web Document Options".
  5. In the window that appears, go to the "Encoding" tab and select it from the "Save document as" list.
  6. Click OK.

Now all that remains is to specify the folder where the file will be saved. The next time you open it, the text should be displayed correctly.

When you open a text file in Microsoft Word or another program (for example, on a computer whose operating system language is different from the one in which the text in the file is written), the encoding helps the program determine in what form the text should be displayed on the screen so that it could be read.

In this article

Understanding text encoding

Information that is displayed as text is actually stored in a text file as numeric values. The computer converts these values ​​into display characters using encoding.

An encoding is a numbering scheme in which each text character in a set is assigned a specific numeric value. The encoding may contain letters, numbers and other symbols. Different languages ​​often use different character sets, so many of the existing encodings are designed to represent the character sets of their respective languages.

Different encodings for different alphabets

The encoding information saved with the text file is used by the computer to display text on the screen. For example, in the "Cyrillic (Windows)" encoding, the character "Y" corresponds to the numeric value 201. When you open a file containing this character on a computer that uses the "Cyrillic (Windows)" encoding, the computer reads the number 201 and displays "Y" sign.

However, if the same file is opened on a computer that uses a different encoding by default, the character corresponding to the number 201 in this encoding will be displayed on the screen. For example, if the computer uses the "Western European (Windows)" encoding, the "Y" character from the source text file based on the Cyrillic alphabet will be displayed as “É”, since this is the character that corresponds to the number 201 in this encoding.

Unicode: a single encoding for different alphabets

To avoid problems with encoding and decoding text files, you can save them in Unicode. This encoding includes most characters from all languages ​​that are commonly used on modern computers.

Since Word is based on Unicode, all files in it are automatically saved in this encoding. Unicode files can be opened on any computer with an English operating system, regardless of the language of the text. In addition, on such a computer you can save files in Unicode that contain characters that are not in Western European alphabets (for example, Greek, Cyrillic, Arabic or Japanese).

Selecting encoding when opening a file

If in open file the text is distorted or appears as question marks or squares; Word may have incorrectly determined the encoding. You can specify the encoding to be used for displaying (decoding) text.

    Open the tab File.

    Select an item Options.

    Select an item Additionally.

    Go to section General and check the box Confirm file format conversion when opening.

    Note: When this check box is selected, Word displays a dialog box File Conversion Whenever you open a file in a format other than Word (that is, a file that does not have a DOC, DOT, DOCX, DOCM, DOTX, or DOTM extension). If you work with these files frequently but don't usually need to select an encoding, be sure to disable this option to prevent this dialog box from appearing.

    Close and then reopen the file.

    In the dialog box File Conversion select item Coded text.

    In the dialog box File Conversion set the switch Other and select the desired encoding from the list.

    In the area Sample

If almost all of the text looks the same (for example, squares or dots), your computer may not have the correct font installed. In this case, you can install additional fonts.

To install additional fonts, do the following:

    Click the button Start and select Control Panel.

    Do one of the following:

    On Windows 7

    1. Uninstalling a program.

      Change.

    On Windows Vista

      In the Control Panel, select a section Uninstalling a program.

      In the list of programs, click Microsoft Office or Microsoft Word if it was installed separately from Microsoft Office, and click Change.

    On Windows XP

      In Control Panel, click Installing and removing programs.

      On the list Installed programs Click Microsoft Office, or Microsoft Word if it was installed separately from Microsoft Office, and click Change.

    In a group Change Microsoft installations Office click the button Add or remove components and then click the button Continue.

    In the section Installation options expand the element Office Common Tools, and then - Multi-language support.

    Select the font you want, click the arrow next to it and select Run from my computer.

Advice: When opening a text file in one encoding or another, Word uses the fonts defined in the dialog box Web Document Options. (To bring up the dialog box Web Document Options, press Microsoft Office button, then click Word Options and select a category Additionally. In the section General click the button Web Document Options.) Using the options on the tab Fonts dialog box Web Document Options You can customize the font for each encoding.

Selecting encoding when saving a file

If you do not select an encoding when saving the file, Unicode will be used. In general, Unicode is recommended because it supports most characters in most languages.

If you plan to open the document in a program that does not support Unicode, you can select the desired encoding. For example, in operating system in English, you can create a document in Chinese (traditional script) using Unicode. However, if such a document will be opened in a program that supports Chinese but does not support Unicode, the file can be saved in the "Chinese Traditional (Big5)" encoding. As a result, the text will display correctly when you open the document in a program that supports Traditional Chinese.

Note: Because Unicode is the most comprehensive standard, some characters may not appear when saving text in other encodings. For example, suppose that a Unicode document contains text in both Hebrew and Cyrillic. If you save the file in the "Cyrillic (Windows)" encoding, the Hebrew text will not be displayed, and if you save it in the "Hebrew (Windows)" encoding, the Cyrillic text will not be displayed.

If you select an encoding standard that doesn't support some characters in the file, Word will mark them in red. You can preview the text in the selected encoding before saving the file.

When you save a file as encoded text, the text for which the Symbol font is selected, as well as the field codes, are removed from the file.

Encoding selection

    Open the tab File.

    In the field File name enter a name for the new file.

    In the field File type select Plain text.

    If a dialog box appears Microsoft Office Word - Compatibility Check, press the button Continue.

    In the dialog box File Conversion select the appropriate encoding.

    • To use standard encoding, select the option Windows (default).

      To use MS-DOS encoding, select the option MS-DOS.

      To set a different encoding, select the radio button Other and select the desired item from the list. In the area Sample you can preview the text and check whether it displays correctly in the selected encoding.

      Note: To increase the document display area, you can resize the dialog box File Conversion.

    If the message "Text highlighted in red cannot be saved correctly in the selected encoding" appears, you can select a different encoding or check the box Allow character substitution.

    If character substitution is enabled, characters that cannot be displayed will be replaced with the nearest equivalent characters in the selected encoding. For example, an ellipsis is replaced by three dots, and corner quotes are replaced by straight ones.

    If the selected encoding does not have equivalent characters for the characters highlighted in red, they will be stored as out-of-context (for example, as question marks).

    If the document will be opened in a program that does not wrap text from one line to another, you can enable hard line breaks in the document. To do this, check the box Insert line breaks and specify the break symbol you want (carriage return (CR), line feed (LF), or both) in the End lines.

Finding encodings available in Word

Word recognizes multiple encodings and supports encodings that are included with the system software.

Below is a list of scripts and their associated encodings (code pages).

Writing system

Encodings

Font used

Multilingual

Unicode (UCS-2 little endian, UTF-8, UTF-7)

Standard font for the "Normal" style of the localized version of Word

Arabic

Windows 1256, ASMO 708

Chinese (Simplified)

GB2312, GBK, EUC-CN, ISO-2022-CN, HZ

Chinese (traditional script)

BIG5, EUC-TW, ISO-2022-TW

Cyrillic

Windows 1251, KOI8-R, KOI8-RU, ISO8859-5, DOS 866

English, Western European and others based on the Latin alphabet

Windows 1250, 1252-1254, 1257, ISO8859-x

Greek

Japanese

Shift-JIS, ISO-2022-JP (JIS), EUC-JP

Korean

Wansung, Johab, ISO-2022-KR, EUC-KR

Vietnamese

Indian: Tamil

Indian: Nepali

ISCII 57002 (Devanagari)

Indian: Konkani

ISCII 57002 (Devanagari)

Indian: Hindi

ISCII 57002 (Devanagari)

Indian: Assamese

Indian: Bengali

Indian: Gujarati

Indian: Kannada

Indian: Malayalam

Indian: Oriya

Indian: Marathi

ISCII 57002 (Devanagari)

Indian: Punjabi

Indian: Sanskrit

ISCII 57002 (Devanagari)

Indian: Telugu

    To use Indian languages, you need to support them in the operating system and have the appropriate OpenType fonts.

    Only limited support is available for Nepali, Assamese, Bengali, Gujarati, Malayalam and Oriya.