Extract all URLs from a Webpage (xurls)
Extract all URLs from a webpage and provide counts and lists of internal and external URLs.
Extract all URLs from a Webpage
Purpose
The function extracts all URLs from a webpage provided by the user. It identifies internal and external URLs and provides a count of each.
Use Cases
- To extract all URLs from a webpage for analysis or monitoring purposes.
- To identify internal and external links on a webpage.
How to Use
- Enter the webpage URL or paste the page content.
- Click on the "Extract URLs" button to initiate the extraction process.
- The function will then analyze the webpage and provide a list of internal and external URLs.
Input Values
- Webpage URL - The URL of the webpage to extract URLs from.
- Page Content - The content of the webpage to extract URLs from.
Output Values
- Webpage Url - The URL of the webpage.
- Domain Name - The domain name of the webpage.
- Total Url Count - The total number of URLs found.
- Internal Url Count - The number of internal URLs found.
- External Url Count - The number of external URLs found.
- Internal Url List - A list of internal URLs.
- External Url List - A list of external URLs.
- Page Content - The content of the webpage.
Any other Instruction
- Ensure the webpage URL is valid and accessible.
- Review the extracted URLs for further action or analysis.
Code Analysis
The function analyzes the webpage content by identifying internal and external URLs. It stores the URLs in separate sets and provides a count of each type of URL.
Technical Parameters: page_url, page_content
Return Values: Webpage Url, Domain Name, Total Url Count, Internal Url Count, External Url Count, Internal Url List, External Url List, Page Content
You can use the following expressions to directly evaluate in a non-interactive manner using eval():
xurls("https://example.com/")
xurls(page_content="Sample page content with URLs")
Click on Help icon to open the help page on a separate window.