Python Program to read a book (docx Word document) & store it in a DataFrame in Python.
Following code will read a book from the system in a document form and store it in a dataframe in Python.
- Step 1: Convert a pdf book into .docx format.
- Step 2: Import necessary libraries in the code.
- Step 3: Initialize address(path to the file to be read) and dataframe.
- Step 4: Create a function that will take address(path to the file to be read) as an input, and store it in a dataframe.
- Step 5: Call the function by passing the address as the parameter.
- Step 6: End.
How Does it Work ?
The “docx” package of python allows to read and access the docx documents. To install this package you need to run following on your command prompt:
pip install docx
Then we import this package in our code to access the docx document. Using this package, you can open the document and read all the paragraphs of the word document.
Also Read: Install TensorFlow in Python
Program/Code To Read Paragraphs in Word Docx:Python To Read Word Document DocxPython12345678910from docx import Documentaddress='H:/Work/Practice/OOW/1st_text/Heidi_w.docx' # path to the file in your systemtext_chunks =  # create an empty dataframedef doc_to_df(address): # define a functiondocument = Document(address) # open the documentfor paragraph in document.paragraphs: # for loop to read each paragraph and append it to the dataframetext_chunks.append(paragraph.text)doc_to_df(address) # call function
As everything in Word document is represented by paragraphs. Then using a for loop, all the paragraphs is read and appended in a dataframe.
Python To Read Word Document – DataFrame
- Games VCF Countif Record Macro Timer in Excel Task Manager print screen HTML Table Import Data Thumbnails Text to Speech TTS Candy Crush Block Apps in FaceBook C++ Programing Machine learning AI bulk email Excel VBA vba color index vba color codes Android social media Twitter Google+ Google Adsense Tips bitcoin Calendar in Excel outlook vba mass email Data Mining Excel Macro Facebook WhatsApp python Wordpress