Extract Text from PDF Document in C#.NET

A C#.NET PDF Text Extraction Control for Developers to Extract Text from PDF

Sometimes, you may find it a very frustrating task to extract text from PDF file. To solve the problem, a PDF Text Extraction Library for C#.NET is required. Here is one, designed to extract text from a PDF file. You can use it in any other applications (Web pages, word processing documents, PowerPoint presentations, desktop publishing software, search and indexing applications or content management systems). This Library runs in common .NET Frameworks and does not require the installation of any additional software for it is a standalone application.

With our C# PDF Text Extraction Library, you can extract text from a batch of PDF files. This page is going to describe how to extract text from PDF document quickly and easily with an advanced PDF Text Extractor without using other PDF processing tools. The extracted content will be saved to text files where it can be easily searched, archived, repurposed and managed.

Requirements for Extracting Text from PDF in C#.NET

To extract text from a PDF document, your PDF file should meet some basic conditions. First of all, your PDF file is formatted to contain text or images. Next, the PDF file does not contain security restrictions because the security restrictions will disable text choosing.

To extract text from a PDF document, it should meet .NET image application requirements below:

.NET Framework 2.0, 3.0, 3.5 & 4.0
Microsoft Visual Studio 2005 and above
Windows 2000, Windows XP, Windows 7, Windows Server 2003, etc

Extract Text from PDF with C# Demo Code

Extract Text from Certain PDF Page with C# Demo Code

It is able to extract text from certain PDF page in C#.NET programming by using our mature C# PDF Text Extraction SDK. This PDF Text Extraction API trial version is free of charge and it can be downloaded from our website. It not only supports text extraction from a PDF document, but also supports individual image or all images extraction from the PDF document. Here is detailed guidance for assisting text extraction from certain PDF page.

At first, go to our website and download the free evaluation version of .NET Imaging Control.
Next, open your Visual Studio 2005 or later versions and create a new project in C# class.
Then, activate the .NET Imaging Control License and copy the generated license txt to your newly created project folder.
Now, find two references ("YiiGo.Imaging.Basic.dll" and "YiiGo.Imaging.PDF.dll") under the bin folder of the unzipped evaluation package to your C#.NET project.
After that, compile the C# sample code below to extract text from certain PDF page.
Finally, run your C#.NET project.

     using YiiGo.Imaging.Basic;
     using YiiGo.Imaging.Basic.Core;
     using YiiGo.Imaging.Basic.Codec;
     using YiiGo.Imaging.PDF;

     YiiGoImaging PDF = new YiiGoImaging();

     public void PdfProcessorExtractTextPage();
     {
     PDFInputFile = (@"C:/1.pdf");
     PDFPageNumberStart = "0";
     PDFPageNumberStop = "4";
     PDFOutputFile = OutputFormat.txt;
     PDFOutputFile = (@"C:/extract.txt");
     };
     PDF. PdfProcessorExtractText (@"C:/1.pdf", "0","4",  @"C:/extract.txt");

Extract Text from Certain PDF Area with C# Demo Code

You can not only extract text from certain PDF page in C#.NET, but also from certain PDF area. The operation steps for extracting text from certain PDF area are the same with that for extracting text from certain PDF page with the exception of C# sample code. You can refer to the steps above and copy the following demo code to finish text extraction from certain PDF area.

     using YiiGo.Imaging.Basic;
     using YiiGo.Imaging.Basic.Core;
     using YiiGo.Imaging.Basic.Codec;
     using YiiGo.Imaging.PDF;

     YiiGoImaging PDF = new YiiGoImaging();

     public void PdfProcessorExtractTextArea();
     {
     PDFInputFile = (@"C:/1.pdf");
     ExtractedArea = new rectangle(0, 0, 300, 300);
     PDFOutputFile = OutputFormat.txt;
     PDFOutputFile = (@"C:/extract.txt");
     };
     PDF. PdfProcessorExtractArea(@"C:/1.pdf", new rectangle(0, 0, 300, 300),  @"C:/extract.txt");

Other Useful PDF Processing Functions

If you want extract text from PDF in VB.NET, you may use our professional VB.NET PDF Extraction SDK. Users can also find the step-by-step guide with VB sample code online.

You can also conduct other C# PDF processings. If you want to create PDF documents in C#.NET, you may need to refer to our C#.NET PDF documents creating guide. A C# PDF merging SDK and its detailed tutorial is also offered online so that users can merge PDF files in C# programming quickly. Developers who are not quite familiar with splitting PDF files can download our C# PDF Splitting Library trial version and refer to the online guide to have a try.

Products

Overview

Tech Specs

Features

Other Plugins

Image Viewer Core Barcode Plugin PDF Read & Write Tesseract OCR Plugin Form Processing Plugin JBIG2 Codec JPEG2000 Codec ISIS Scanner Twain Scanner DICOM Reader CAD Scanner