Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
    Posts
  • #7159 Reply

    Rafael Steil

    Hello,

    there are some PDFs what, whenever I call pdfDoc.extractText, the text is returned without any blank space, just like as it if was a single huge word. Likewise, pdDoc.extractTextRects either returns no rect when called with puctuatedBySpace set to true, or one rect per character when called with false.

    I looked for any method that allowed me to configure the threshold of how many points or pixels should be considered as white space (as some other tools have), but couldn’t find any.

    The PDF can be downloaded at https://dl.dropboxusercontent.com/u/1492042/test.pdf

    #7178 Reply

    Dr. Plug
    Moderator

    Hello, Rafael

    Thank you for your inqury. We’ll find the reason as soon as possible,
    So could you wait for a few days?

    Thank you,
    Dr.Plug

    • This reply was modified 1 year, 7 months ago by  Dr. Plug.
    • This reply was modified 1 year, 7 months ago by  Dr. Plug.
    #7185 Reply

    Rafael Steil

    Sure, no problem.. I’ll keep an eye here. Thanks a lot.

    #7229 Reply

    Dr. Plug
    Moderator

    Hello, Rafael

    I’m so sorry for late.
    Actually PlugPDF support that function. (returned a string that space blank is included. )
    So if you test a pdf which is in sample file, it works well.
    But unfortunately your PDF file is special case. so it is hard to fix that function in your pdf.
    To fix that issue there are lots of thing to change.
    So i’m so sorry but we can’t support this issue.
    But if you have a another question, we’ll support as much as we can do.

    Thank you,
    Dr.Plug

    • This reply was modified 1 year, 7 months ago by  Dr. Plug.
    #7235 Reply

    Rafael Steil

    Thanks for the explanation. In fact, pdf from newspapers usually have a lot of garbage, it’s a pain :(. Nevertheless, the probability of other people come to face the same problem grows at the same pace of PlugPDF’s popularity- which is a great library with a great price -, so it may be interesting to keep this issue open for an eventual future release. Just to show you a perpective, this particular PDF is from the biggest brazilian newspaper.

    Thanks again for the help on this issue, I appreciate it.

    #7236 Reply

    Dr. Plug
    Moderator

    Hello, Rafael

    Thank you for your feedback!
    PlugPDF team will discuss about this issue.
    As you said, i also think we have to open this issue for future release.
    I hope it would be a good news of you.

    Thank you,
    Dr.Plug

Viewing 6 posts - 1 through 6 (of 6 total)
Reply To: Extracted text is returned without any white spaces
Your information: