Removing HTML Tags from HTML Source Code
How do search engines see your web pages? You can get the fields you want from the site content by using the StripHTML function, which removes all HTML tags from a certain text string.
For example,
Safir Medya
converted as follows:
Safir Medya
The StripHTML function we will use for this :
function StripHTML(S: string): string;
var
TagBegin, TagEnd, TagLength: integer;
begin
TagBegin := Pos( '<', S); // search position of first <
while (TagBegin > 0) do begin // while there is a < in S
TagEnd := Pos('>', S); // find the matching >
TagLength := TagEnd - TagBegin + 1;
Delete(S, TagBegin, TagLength); // delete the tag
TagBegin:= Pos( '<', S); // search for next <
end;
Result := S; // give the result
end;
So how do we use this function in Delphi:
procedure TForm1.Button1Click(Sender: TObject);
begin
Memo2.Text := StripHTML(Memo1.Text);
end;
Your comments are valuable to us. You can leave a comment under the subject. Thanks.
Click For More Delphi Solutions DELPHI BLOG
Click for more Delphi Source Code and Project Examples DELPHI SOURCE CODES
Guy Gordon
This code assumes correct HTML. Real-world websites often contain errors. Think of the code as a State Machine with 2 states: InsideTag = True or False. While inside a tag you might find another '<'. And while not in a tag, you might find a '>'. E.G. <br class="Apple-interchange-newline"> id="aswift_3" ...> (actual example from eBay) In both of these cases the state machine may be out-of-step with the input stream. To get back in-step, the code needs determine the correct State from the surrounding text. This is non-trivial.
2 years ago