Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting data from a PDF into csv or excel

    Hello,

    I am trying to get data from a few PDF documents eventually into Stata. Anyone have experience in this? I have taken a screen shot: https://imgur.com/a/LweFy. I'd like each of the boxes to become an observation.

    So far I have been using Adobe DC and Nitro Pro to convert them to Excel (both have their own set of problems).

    A few specific questions I also have:
    - Is there a way to cut the PDF documents so that each column can become a separate document? My next step is to try doing that in Microsoft Word.
    - Is there a way to select only the shared boxes (as shown in the screen shot)?

    Many thanks for any assistance.

    Khat

  • #2
    Have you taken a look at Tabula (http://tabula.technology)? It's not perfect, but it's open source and I've had good experiences with it in the past. That said you have some unusual formatting in your PDF, so your results may vary.

    Comment

    Working...
    X