The PDBLib library can extract both text and graphics, but it will not give
you the document structure.
Again, as I said before, you need to be clear about your definition of
the problem. If "the other format" is a text file, then just extracting
text would be sufficient. You say you also want graphics, but that
does not explain to what level of fidelity your "other format" is
expected to represent the original PDF.
You almost certainly need the document structure if you are trying to
make a converter for a format in which you expect the converted document
to look exactly like the original PDF. But otherwise, there is a whole
gamut of features you might or might not need, depending on what the
to be shown in that.
There is a product from Stellent, which is now part of Oracle's
But unfortunately it does not do nearly as good a job of handling
character encoding when it comes to extracting the text.
When I was researching this stuff last year, I found that Adobe claimed
to have a library that provided more elaborate access to PDF data
structures, beyond that which is found in the library they provide with
the full retail Acrobat product. But in the Acrobat product, they did
an even worse job of dealing with character encoding (that is, they
did not bother to do anything with it???you got bytes back and if you
did not already know the encoding, too bad), and I have no reason to
believe that the other library their web site mentioned would be any better.
Note also that PDF is essentially Postscript; it is really a series of
instructions in that rendering API. One of the reasons text extraction
is difficult is that the document only expresses where on the page the
text should go, and that may or may not relate well to the logical flow
of the text (I have even seen PDF documents where each character of the
text appears in the PDF in the exact opposite order from that in which
it appears visually).
Another reason text extraction is difficult is that the character data
in the PDF may or may not be in some recognized character encoding;
often, the text is instead just an index for each character into a glyph
table stored within the PDF, which in turn may not necessarily have any
direct way to map a glyph back to the actual character it represents.
So if your "other format" is similar in nature, it might be simpler to
accomplish what you want than if you actually needed to interpret the
text. If it is sufficient simply to _draw_ the text without caring what
the actual characters are, that might be simpler.
But regardless, the fact is, for what it would cost to write a decent
PDF-to-whatever converter, you can buy a lot of licenses for the retail
Acrobat product (so your users can edit PDFs instead of whatever the
do not have to pay anything; just use Acrobat Reader.
If you still believe you do have a critical business case that requires
you to implement the converter, I am not personally aware of any good
products that can handle the full document structure that might be
required (again, depending on your _actual_ needs, which you have not
really expressed very well yet). But you surely know how to use a web
search engine. :)