This is a webscraping python program. So obviously the writer won’t be able to s

This is a webscraping python program. So obviously the writer won’t be able to scrape the same target that I need to use the program to do; since they don’t have access to it inside the university servers. You will want to output the data to a PrettyTable. If the data that comes out is ugly, I’ll lose lots of points. The code must compile and execute inside a windows environment, as we us the WingIDE 7.2 ide on windows 10 to run the program. The program should also output the file data, with formatting, to a file, so that I can submit the .py file as well as the output file. The program should scape all subpages as well.
Instructions
Your Final Scripting Project Details:
Your final script will scrape the [website]
IMPORTANT you will not follow any links that veer outside of [website] (you cannot access this website, since it’s inside the university environment.
IMPORTANT you are allowed to use standard python libraries and any 3rd party library that we have used during the class.
Your script will generate a report that contains the following information.
1) Unique URLs of all the pages found on the website
2) Unique URL links to images found on the website
3) Extract any phone numbers found on the website
4) Extract all text content from each of the pages and store them in a string variable
5) Extract any Zip Codes
NOTE for Items 6-8 you will be utilizing NLTK to process all the text found on the web site, using the text content you extracted during item 4 above.
6) A list of all unique vocabulary found on the website
7) A list of all possible verbs
8) A list of all possible nouns
NOTE: REGEX PATTERN HELP
ZipCode Regex
pattern = b’d{5}(?:-d{4})?’
Phone Number Regex
b'(?d{3})?-? *d{3}-? *-?d{4}’
You will submit:
1) Your python script .py file
2) The full output of your script

Posted in Uncategorized

Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount