31. December 2017

convert text to ubuntu download

Convert windows text to ubuntu

Get via App Store Read this post in our app!

Convert Windows text file to Ubuntu text file

I am trying to port a C++ program from Windows to Ubuntu . In Ubuntu environment , My program reads a text file which was created in Windows , however it appears the Carriage Return is affecting the reading of the file in Ubuntu environment , therefore I wish to remove all the Carriage Return .

I have tried the following command on Windows to convert the text file to Linux format

Both these methods dont work , I am getting the following error :

What am I doing wrong here ??

I have tried the following command on Windows to convert the text file to Linux format

Try to run the commands on your Ubuntu machine. tr is in coreutils and therefore always available, dos2unix needs to be installed.

As a side note: In Python, a file object has the attribute newlines, that stores all end-of-line characters in a tuple. Is something similar available for C++?

nix Craft

Linux and Unix tutorials for new and seasoned sysadmin

HowTo: Convert Between Unix and Windows text files

H ow can I convert newline [ line break or end-of-line (EOL) character ] between Unix and Windows text files?

Almost all Unix commands and text editor may display the EOL with Ctrl-m ( ^M ) characters at the end of each line for all text files created on MS-Windows operating systems.
MS-Windows may not display line feed or EOL for all text files created on UNIX operating systems.

Option #1: dos2unix and unix2dos Commands

You can use the dos2unix and unix2dos as follows. To convert newline for a UNIX file to MS-Windows, type:

$ unix2dos input.txt output.txt

$ cat -v output.txt

To convert newline for a MS-Windows file to a Unix file, type:

$ dos2unix input.txt output.txt

$ cat -v output.txt

Option #2: awk command

You can use the awk command to convert a MS-Windows file to Unix format, type:

$ cat -v output.txt

You can convert newline for a Unix file to MS-Windows format, enter:

$ awk 'sub("$", "\r")' input.txt > output.txt

$ cat -v output.txt

Please note that the cat command with the -v option is used to display all non-printing characters such as ^ and M- notation, except for LFD and TAB.

Posted by: Vivek Gite

The author is the creator of nixCraft and a seasoned sysadmin and a trainer for the Linux operating system/Unix shell scripting. He has worked with global clients and in various industries, including IT, education, defense and space research, and the nonprofit sector. Follow him on Twitter, Facebook, Google+.

Share this on (or read 4 comments/add one below):

Your support makes a big difference:

I have a small favor to ask. More people are reading the nixCraft. Many of you block advertising which is your right, and advertising revenues are not sufficient to cover my operating costs. So you can see why I need to ask for your help. The nixCraft takes a lot of my time and hard work to produce. If everyone who reads nixCraft, who likes it, helps fund it, my future would be more secure. You can donate as little as $1 to support nixCraft:

T-shirt: Sysadmin because even developers need heros

Convert windows text to ubuntu

Get via App Store Read this post in our app!

Convert image to text

I got a scanned image document from bank and I want to convert it to normal text document with images in Ubuntu .

Is there any tool for it ?

There are a number of OCR readers for linux that can convert from image to text. Look at the following options:

All the above, except ocropus, are present in the Ubuntu repository in a package of the same name.

Different readers support different image formats, so you may be limited in your options by the file format your document is in. Alternatively, you can use the convert tool from ImageMagick to change the format if you wish to use a particular OCR reader.

Adapted from my answer here.

You need to install "tesseract-ocr" on your linux machine first.

You can do it manually from CLI or i have made PHP code for the same, you can use it if you want.

Note : To running this code , exec command should be enable in php.ini

put this code in root folder and access it from browser,

Note : 1.png file should be present in your current directory.

Convert windows text to ubuntu

Get via App Store Read this post in our app!

Is there a better pdf to text converter than pdftotext?

I'm using pdftotext (part of poppler-utils) to convert PDF documents to text. It works, for the most part, but one thing I wish it did was to insert blank lines between separate paragraphs instead of mashing them together.

Is there way to get pdftotext to do this? And if not, is there another pdf to text utility that can do this?

You could try ebook-convert from Calibre.

If anything, I'd say it errs in the other direction: too many line breaks.

Another thing I'd definitely consider though is converting to HTML using pdfreflow, and then convert the HTML to TXT.

If you are using pdftotext you can use the -layout flag to preserve the layout of the text on the pages in your input pdf file:

As a fan of open source (and automation) I hate to say this, but the best results I just got (on quite a large, complex PDF) were to open it in Adobe Reader, then choose File|Save As Text.

(I am pre-processing for text analysis experiments, not as a reader, but I think my first and second choice would be the same.)

I've been comparing the output side-by-side. My second choice is ebook-convert.

Adobe: left in FF for page breaks, left in page numbers, hasn't converted headings/paragraphs to single lines, but it has fixed hyphens. Junk that was hidden in the PDF did not get output. Correctly got the big capitals at start of sections, e.g. "The", not "T he" or even "T he".

ebook-convert: Left in page numbers, and some hidden junk in header/footer (but no FFs). Converts most paragraphs to be single lines. The ones it missed are double-spaced though! Bullets don't always line up with the text. Correctly got "The" at the start of the chapter.

pdftotext (without --layout): Not bad, bullets line up, but header/footer noise. FFs are in there. Hyphens removed. Worst for start of chapter big letters: "T\n\nhe".

pdftotext (with --layout): Similar, but more indents. "T he" for start of chapter.

pdftohtml >> pdfreflow >> htmltotext: It removed page numbers, but still junk in header/footer. "T he" for start of chapter. Hyphens removed. (It uses multiple lines per paragraph, yet they are not the same line breaks as in the other versions!)

If you have a Google account, you can use Google Docs to upload the PDF and transform it into editable text.

I also tried pypdf and compared it against pdftotext on two documents. It had more linebreaks and split some section names (REFERENCES was R E F E R E N C E S).

pdf2txt did output complete garbage.

I often use pdfBox (java) if pdftotext screws up the output. You might give it a try.

Convert windows text to ubuntu

Get via App Store Read this post in our app!

Convert Text File Encoding

I frequently encounter text files (such as subtitle files in my native language, Persian) with character encoding problems. These files are created on Windows, and saved with an unsuitable encoding (seems to be ANSI), which looks gibberish and unreadable, like this:

In Windows, one can fix this easily using Notepad++ to convert the encoding to UTF-8, like below:

And the correct readable result is like this:

I've searched a lot for a similar solution on GNU/Linux, but unfortunately the suggested solutions (e.g this question) don't work. Most of all, I've seen people suggest iconv and recode but I have had no luck with these tools. I've tested many commands, including the followings, and all have failed:

None of these worked!

I'm using Ubuntu-14.04 and I'm looking for a simple solution (either GUI or CLI) that works just as Notepad++ does.

One important aspect of being "simple" is that the user is not required to determine the source encoding; rather the source encoding should be automatically detected by the tool and only the target encoding should be provided by the user. But nevertheless, I will also be glad to know about a working solution that requires the source encoding to be provided.

If someone needs a test-case to examine different solutions, the above example is accessible via this link.

These Windows files with Persian text are encoded in Windows-1256. So it can be deciphered by command similar to OP tried, but with different charsets. Namely:

~~recode Windows-1256..UTF-8 <Windows_file.txt > UTF8_file.txt~~

(denounced upon original poster’s complaints; see comments)

This one assumes that the LANG environment variable is set to a UTF-8 locale. To convert to any encoding (UTF-8 or otherwise), regardless of the current locale, one can say:

Original poster is also confused with semantic of text recoding tools (recode, iconv). For source encoding (source .. or -f) one must specify encoding with which the file is saved (by the program that created it). Not some (naïve) guesses based on mojibake characters in programs that try (but fail) to read it. Trying either ISO-8859-15 or WINDOWS-1252 for a Persian text was obviously an impasse: these encodings merely do not contain any Persian letter.

Apart from iconv , which is a very useful tool either on its own or in a script, there is a really simple solution I found trying to figure out same problem for Greek charsets (Windows-1253 + ISO-8859-7).

All you need to do is to open the text file through Gedit's "Open" dialog and not by double-clicking it. At the bottom of the dialog box there is a drop-down for Encoding, which is set to "Automatically Detected". Change it to "Windows-125x" or other suitable codeset and the text will be perfectly readable in Gedit. You can then save it using UTF-8 encoding, just to be sure you won't have the same issue again in the future.

As a complementary solution to the problem, I have prepared a useful Bash script based on the iconv command from Incnis Mrsi's answer:

Save this script as fix-encoding.sh , give it execute permission using chmod +x fix-encoding.sh and use it like this:

This script will try to fix the encoding of any number of files it is provided as input. Note that the files will be fixed in-place, so the contents will be overwritten.

I don't know if this works with Farsi: I use Gedit, it gives a fault with wrong encoding, and I can chose what I want to translate to UTF-8, it was just text not lit format, but here is a screenshot!

Sorry I finally got through my text files, so now they are all converted.

I loved notepad++ too, miss it still.

If you like working in GUI instead of CLI, like I do:

Open file with Geany (editor)
Go to File menu ->Reload as
Choose the assumed encoding to change the gibberish into identifiable characters in your language. For example, to read Greek subs I would reload as West European ->Greek (Windows-1253)
Go to Document menu >Set Encoding ->Unicode ->UTF-8
Save

The working solution I found is using the Microsoft Visual Studio Code text editor which is Freeware and available for Linux.

Open the file you want to convert its encoding in VS-Code. At the bottom of the window, there are a few buttons. One of them is related to the file encoding, as shown below:

Clicking this button pops up an overhead menu which includes two items. From this menu select the "Reopen with Encoding" option, just like below:

This will open another menu which includes a list of different encoding, as shown below. Now select "Arabic (Windows 1256)":

This will fix the gibberish text like this:

Now click the encoding button again and this time select the "Save with Encoding" option, just as below:

And in the new menu select the "UTF-8" option:

This will save the corrected file using the UTF-8 encoding:

Convert windows text to ubuntu

Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Click Here to receive this Complete Guide absolutely free.

mstep: Linux uses (LF) to mark end of lines. DOS/Windows use (CR)(LF), and Macs just to be different use (CR). The usual complaint you hear about this is that in Windows Notepad, Linux text files show all on a single line with squares showing where the lines should end.

convert text to ubuntu download

Convert windows text to ubuntu

Convert Windows text file to Ubuntu text file

nix Craft

HowTo: Convert Between Unix and Windows text files

Option #1: dos2unix and unix2dos Commands

Option #2: awk command

Posted by: Vivek Gite

Share this on (or read 4 comments/add one below):

Your support makes a big difference:

T-shirt: Sysadmin because even developers need heros

Related tutorials and howtos

Convert windows text to ubuntu

Convert image to text

Convert windows text to ubuntu

Is there a better pdf to text converter than pdftotext?

Convert windows text to ubuntu

Convert Text File Encoding

Convert windows text to ubuntu