Selenium How to get text of the entire page

If you use Selenium for automation you may need to get the content of the whole page. This can be done easily with Selenium by one line of code like:

  • python
driver.page_source

or java / groovy

driver.getPageSource();

You can get only the text of the body which should be the visible text on the page with:

  • python
element = driver.find_element_by_tag_name("body")
element.get_attribute('innerHTML')
  • java / groovy
element.getAttribute("innerHTML");

The code above is working in the most cases but may fail for some ( like HtmlUnitDriver). You can use another code which will result in similar output but it will work more widely:

WebElement element = driver.findElement(By.id("foo"));
 String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element); 

Full example for python:

from selenium import webdriver

driver = webdriver.Chrome('./chromedriver_linux64/chromedriver')
driver.maximize_window()
driver.get("https://www.google.com/ncr")
print (driver.find_element_by_tag_name("body").text)

result:

Gmail
Images
Sign in
Google offered in: french
A privacy reminder from Google
REMIND ME LATER
REVIEW NOW
France
PrivacyTermsSettings
AdvertisingBusinessAbout

Note that if you don't provide a link to to your chrome driver you may get an error like:

FileNotFoundError: [Errno 2] No such file or directory: 'chromedriver': 'chromedriver'

os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

Related Article