How to Extract Heading Content (h1, h2, etc.) from an HTML String Using Regex

February 17, 2012 Development PHP Snippet 2 Comments

Headlines and headings are usually very relevant and descriptive pieces of information for any HTML page. You might want to include them into the description <meta> tag on that page. Here is a simple regular expression to extract all those headings:

preg_match_all( '|<h[^>]+>(.*)</h[^>]+>|iU', $html, $headings );

where $html is the HTML source and $headings will be an array populated with the extracted headings.

2 Comments

David says:

June 2, 2014 at 00:48

Saved my day thanks!

Reply
Georgios Stampolis says:

October 23, 2018 at 04:52

Thank you very much!!

I have modified it a bit to get the h-tags AND the id´s of the h-tags:
```
preg_match_all('|<h.*?id=\"([^\"]*)\".*?>(.*)</h[^>]+>|iU', ...)
```
Reply

Leave a Reply Cancel reply