How to Extract Heading Content (h1, h2, etc.) from an HTML String Using Regex

Headlines and headings are usually very relevant and descriptive pieces of information for any HTML page. You might want to include them into the description <meta> tag on that page. Here is a simple regular expression to extract all those headings:

preg_match_all( '|<h[^>]+>(.*)</h[^>]+>|iU', $html, $headings );

where $html is the HTML source and $headings will be an array populated with the extracted headings.

Use Contact Form 7 to collect business leads and enquiries? I created Storage for Contact Form 7 plugin which stores them safely in WordPress database.

Get it now for only $19 →

2 Comments

  1. David says:

    Saved my day thanks!

  2. Georgios Stampolis says:

    Thank you very much!!

    I have modified it a bit to get the h-tags AND the id´s of the h-tags:

    preg_match_all('|<h.*?id=\"([^\"]*)\".*?>(.*)</h[^>]+>|iU', ...)

Leave a Reply to David Cancel reply