Regex Extract URL's and their labels from HTML file

Regex Extract URL's and their labels from HTML file

I have a large html file exported from Google doc. When opened in a text editor it appears as a single line. There are many URL's there which I need to extract. They appear in this form:

<a class="c13" href="https://www.google.com/url?q=https://example.com/page1/&sa=D&ust=1530382105580000">Text Label One</a>SOME HTML TEXT HERE<a class="c13" href="https://www.google.com/url?q=https://example.com/page2/&sa=D&ust=1530382105719000">Text Label Two</a>

So far I found this solution

(?<=https://www.google.com/url?q=)(.*?)(?=&amp)

Which generates

https://example.com/page1/ https://example.com/page2/

How to extract also the labels in this form?

https://example.com/page1/ Text Label One https://example.com/page2/ Text Label Two

No, I need each URL to appear just once in the output (I see that my "solution" also produces two outputs per URL). And I need the text label in each output . Here is what I have: regex101.com/r/Bh9AGD/1
– Serg
Jun 30 at 18:53

Does this help? regex101.com/r/Bh9AGD/2
– NoobProgrammer
Jul 1 at 6:27

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

kdjyFFq7oPYCD0v,8aNASvDJ CLVclE v,UHz7kmlrwwAmqnAVwsL,KVeDu5nmi

搜尋此網誌

Search between a Gas Station