HtmlDocument
Table of Contents
Constants
- MAX_HTML_LENGTH = 1048576
- MAX_IMAGE_URL_LENGTH = 2000
- MAX_IMAGES = 4
Methods
- __construct() : mixed
- HtmlDocument constructor.
- checkMetadata() : bool
- Returns true if metadata is complete
- detectEncoding() : string
- Auto-detect and set HTML document encoding
- extractElementAttributes() : array<string|int, mixed>
- Parses html content for attributes of the specified elements and fills $destination array with found attributes
- getDateExpire() : DateTime|null
- Returns expire date for the metadata.
- getDescription() : string
- getEmdbed() : string
- getEncoding() : string
- getExtraField() : string|null
- Returns value of the additional metadata field
- getHtml() : string
- Returns full html code of the document
- getImage() : string
- getLinkHref() : string
- Returns value of the href attribute.
- getMetaContent() : string
- Returns value of the content attribute
- getMetadata() : array<string|int, mixed>|false
- Returns metadata, extracted from the page. Should return an array with required key TITLE and optional keys DESCRIPTION and URL
- getTitle() : string
- Returns document's TITLE metadata
- getUri() : Uri
- Returns Uri of the document
- setDateExpire() : mixed
- Sets Expire date for the metadata.
- setDescription() : void
- Sets document's DESCRIPTION metadata
- setEmbed() : void
- Sets document's EMBED metadata, if site is allowed to be embedded.
- setEncoding() : void
- Set HTML document encoding
- setExtraField() : void
- Sets additional metadata field.
- setImage() : void
- Sets document's IMAGE metadata
- setTitle() : void
- Sets document's TITLE metadata
Constants
MAX_HTML_LENGTH
public
mixed
MAX_HTML_LENGTH
= 1048576
MAX_IMAGE_URL_LENGTH
public
mixed
MAX_IMAGE_URL_LENGTH
= 2000
MAX_IMAGES
public
mixed
MAX_IMAGES
= 4
Methods
__construct()
HtmlDocument constructor.
public
__construct(string $html, Uri $uri) : mixed
Parameters
- $html : string
-
Document HTML code.
- $uri : Uri
-
Document's URL.
checkMetadata()
Returns true if metadata is complete
public
checkMetadata() : bool
Return values
booldetectEncoding()
Auto-detect and set HTML document encoding
public
detectEncoding() : string
Return values
string —Detected encoding.
extractElementAttributes()
Parses html content for attributes of the specified elements and fills $destination array with found attributes
public
extractElementAttributes(string $tagName) : array<string|int, mixed>
Parameters
- $tagName : string
-
Name of the tag.
Return values
array<string|int, mixed>getDateExpire()
Returns expire date for the metadata.
public
getDateExpire() : DateTime|null
Return values
DateTime|nullgetDescription()
public
getDescription() : string
Return values
stringgetEmdbed()
public
getEmdbed() : string
Return values
string —HTML code to embed url to the page.
getEncoding()
public
getEncoding() : string
Return values
string —Document encoding.
getExtraField()
Returns value of the additional metadata field
public
getExtraField(string $fieldName) : string|null
Parameters
- $fieldName : string
-
Name of the field.
Return values
string|null —Value of the additional metadata field.
getHtml()
Returns full html code of the document
public
getHtml() : string
Return values
stringgetImage()
public
getImage() : string
Return values
string —Main image's url.
getLinkHref()
Returns value of the href attribute.
public
getLinkHref(string $rel) : string
Parameters
- $rel : string
-
Value of the rel attribute.
Return values
stringgetMetaContent()
Returns value of the content attribute
public
getMetaContent(string $name) : string
Parameters
- $name : string
-
Value of a name or property attribute.
Return values
stringgetMetadata()
Returns metadata, extracted from the page. Should return an array with required key TITLE and optional keys DESCRIPTION and URL
public
getMetadata() : array<string|int, mixed>|false
Return values
array<string|int, mixed>|falsegetTitle()
Returns document's TITLE metadata
public
getTitle() : string
Return values
stringgetUri()
Returns Uri of the document
public
getUri() : Uri
Return values
UrisetDateExpire()
Sets Expire date for the metadata.
public
setDateExpire(DateTime $dateExpire) : mixed
Parameters
- $dateExpire : DateTime
setDescription()
Sets document's DESCRIPTION metadata
public
setDescription(string $description) : void
Parameters
- $description : string
-
Description.
setEmbed()
Sets document's EMBED metadata, if site is allowed to be embedded.
public
setEmbed(string $embed) : void
Parameters
- $embed : string
-
HTML code for embedding object to the page.
setEncoding()
Set HTML document encoding
public
setEncoding(string $encoding) : void
Parameters
- $encoding : string
-
Document's encoding.
setExtraField()
Sets additional metadata field.
public
setExtraField(string $fieldName, string $fieldValue) : void
Parameters
- $fieldName : string
-
Name of the field. Expected values:
- FAVICON: $fieldValue must contain the url of document's favicon
- IMAGES: $fieldValue must be the array of urls of images, detected in the document
- In other cases, $fieldValue must contain plain text.
- $fieldValue : string
-
Field value.
setImage()
Sets document's IMAGE metadata
public
setImage(string $image) : void
Parameters
- $image : string
-
Main image's url.
setTitle()
Sets document's TITLE metadata
public
setTitle(string $title) : void
Parameters
- $title : string
-
Title.