rajatomar788
diff --git a/‎CODE_OF_CONDUCT.md‎
Lines changed: 1 addition & 0 deletions b/‎CODE_OF_CONDUCT.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 39 additions & 0 deletions b/‎CONTRIBUTING.md‎
Lines changed: 39 additions & 0 deletions
diff --git a/‎LICENSE‎
Lines changed: 10 additions & 16 deletions b/‎LICENSE‎
Lines changed: 10 additions & 16 deletions
diff --git a/‎README.md‎
Lines changed: 77 additions & 49 deletions b/‎README.md‎
Lines changed: 77 additions & 49 deletions
@@ -0,0 +1 @@
+Be humble or Be on your way.
@@ -0,0 +1,39 @@
+#Contribution Guidelines
+To get the greatest chance of helpful responses, please also observe the following notes.
+
+## Questions
+
+The GitHub issue tracker is for bug reports and feature requests.
+Please do not use it to ask questions about how to use library.
+These questions should instead be directed to Stack Overflow.
+Make sure that your question is tagged with the python-pywebcopy tag when asking it on Stack Overflow,
+to ensure that it is answered promptly and accurately.
+
+## Good Bug Reports
+
+Please be aware of the following things when filing bug reports:
+- Avoid raising duplicate issues.
+- Please use the GitHub issue search feature to check whether your bug report or feature request has
+been mentioned in the past.
+- Duplicate bug reports and feature requests are a huge maintenance burden on the limited resources of the project.
+- If it is clear from your report that you would have struggled to find the original, that's ok, but if searching
+for a selection of words in your issue title would have found the duplicate then the issue will likely be closed
+extremely abruptly.
+- When filing bug reports about exceptions or tracebacks, please include the complete traceback.
+Partial tracebacks, or just the exception text, are not helpful.
+Issues that do not contain complete tracebacks may be closed without warning.
+
+- Make sure you provide a suitable amount of information to work with. This means you should provide:
+    - Guidance on how to reproduce the issue. Ideally, this should be a small code sample that
+        can be run immediately by the maintainers.
+        Failing that, let us know what you're doing, how often it happens,
+        what environment you're using, etc. Be thorough: it prevents us needing to ask further questions.
+    - Tell us what you expected to happen. When we run your example code, what are we expecting to happen? What does "success" look like for your code?
+    - Tell us what actually happens. It's not helpful for you to say "it doesn't work" or "it fails".
+    - Tell us how it fails: do you get an exception? A hang? How was the actual result different from your expected result?
+    - Tell us what version of the library you're using, and how you installed it.
+        Different versions of the libraries behave differently and have different bugs,
+        and some distributors of the library ship patches on top of the code we supply.
+If you do not provide all of these things,
+it will take us much longer to fix your problem.
+If we ask you to clarify these and you never respond, we will close your issue without fixing it.
@@ -1,19 +1,13 @@
-Copyright (c) 2018 The Python Packaging Authority
+Copyright 2019 Raja Tomar
 
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
 
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
+    http://www.apache.org/licenses/LICENSE-2.0
 
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
@@ -1,10 +1,11 @@
-# PyWebCopy &copy; 5
+# PyWebCopy &copy; 6
 
 `Created By : Raja Tomar`
 `License : MIT`
 `Email: rajatomar788@gmail.com`
 
-Web Scraping and Saving Complete webpages and websites with python.
+Python websites and webpages cloning at ease.
+Web Scraping or Saving Complete webpages and websites with python.
 
 Web scraping and archiving tool written in Python
 Archive any online website and its assets, css, js and
@@ -14,9 +15,13 @@ It's easy with `pywebcopy`.
 Why it's great? because it -
 
 - respects `robots.txt`
-- have a single-function basic usages
+- saves a webpage with css, js and images with one call
+- clones a complete website with assets and links remapped in one call
+- have direct apis for simplicity and ease
+- subclassing for advanced usage
+- custom html tags handler support
 - lots of configuration for many custom needs
-- provides several scraping packages in one Objects (thanks to their original owners)
+- provides several scraping packages in one objects (thanks to their original owners)
   - beautifulsoup4
   - lxml
   - requests
@@ -37,12 +42,12 @@ You are ready to go. Read the tutorials below to get started.
 
 ## First steps
 
-You should always check if the pywebcopy is installed successfully.
+You should always check if the latest pywebcopy is installed successfully.
 
 ```python
 >>> import pywebcopy
 >>> pywebcopy.__version___
-5.x
+6.x
 ```
 
 Your version may be different, now you can continue the tutorial.
@@ -54,10 +59,12 @@ To save any single page, just type in python console
 ```Python
 from pywebcopy import save_webpage
 
+kwargs = {'project_name': 'some-fancy-name'}
 
 save_webpage(
     url='http://example-site.com/index.html',
-    project_folder='path/to/downloads'
+    project_folder='path/to/downloads',
+    **kwargs
 )
 ```
 
@@ -66,15 +73,18 @@ To save full website (This could overload the target server, So, be careful)
 ```Python
 from pywebcopy import save_website
 
+kwargs = {'project_name': 'some-fancy-name'}
+
 save_website(
     url='http://example-site.com/index.html',
     project_folder='path/to/downloads',
+    **kwargs
 )
 ```
 
 ### 1.2.1 Running Tests
 Running tests is simple and doesn't require any external library. 
-Just run this command from root directory of pywebcopy package
+Just run this command from root directory of pywebcopy package.
 
 
 ```shell
@@ -89,24 +99,24 @@ from pywebcopy import WebPage
 url = 'http://example-site.com/index.html' or None
 project_loc = 'path/to/downloads/folder'
 
-wp = WebPage(url,
-project_folder
-default_encoding=None,
-HTML=None,
-**configKwargs
-)
+wp = WebPage()
 
 # You can choose to load the page explicitly using 
 # `requests` module
 wp.get(url, **requestsKwargs)
 
+# OR
+# You can choose to set the source yourself
+handle = open('file.html', 'rb')
+wp.set_source(handle)
+
 # if you want assets only
 wp.save_assets()
 
 # if you want html only
 wp.save_html()
 
-# if you want complete webpage
+# if you want complete webpage with css, js and images
 wp.save_complete()
 ```
 
@@ -171,6 +181,7 @@ then check if website allows scraping of its content.
 >>> pywebcopy.config['bypass_robots'] = True
 
 # rest of your code follows..
+
 ```
 
 ### Overwrite existing files when copying
@@ -183,6 +194,7 @@ use the over_write config key.
 >>> pywebcopy.config['over_write'] = True
 
 # rest of your code follows..
+
 ```
 
 ### Changing your project name
@@ -196,6 +208,7 @@ below
 >>> pywebcopy.config['project_name'] = 'my_project'
 
 # rest of your code follows..
+
 ```
 
 ## How to - Save Single Webpage
@@ -204,28 +217,42 @@ Particular webpage can be saved easily using the following methods.
 
 Note: if you get `pywebcopy.exceptions.AccessError` when running any of these code then use the code provided on later sections.
 
-### Method 1
+### Method 1 : via api - `save_webpage()`
 
 Webpage can easily be saved using an inbuilt funtion called `.save_webpage()` which takes several
 arguments also.
 
 ```python
->>> import pywebcopy
->>> pywebcopy.save_webpage(project_url='http://google.com', project_folder='c://Saved_Webpages/',)
+>>> from pywebcopy import save_webpage
+>>> save_webpage(project_url='http://google.com', project_folder='c://Saved_Webpages/',)
 
-# rest of your code follows..
 ```
 
 ### Method 2
 
-This use case is slightly more powerful as it can provide every functionallity of the WebPage 
-data class.
+This use case is slightly more powerful as it can provide every functionallity of the WebPage class.
 
 ```python
->>> from pywebcopy import Webpage
+>>> from pywebcopy import Webpage, config
+>>> url = 'http://some-url.com/some-page.html'
+
+# You should always start with setting up the config or use apis
+>>> config.setup_config(url, project_folder, project_name, **kwargs)
+
+# Create a instance of the webpage object
+>>> wp = Webpage()
+
+# If you want to use `requests` to fetch the page then
+>>> wp.get(url)
+
+# Else if you want to use plain html or urllib then use
+>>> wp.set_source(object_which_have_a_read_method, encoding=encoding)
+>>> wp.url = url   # you need to do this if you are using set_source()
 
->>> wp = WebPage('http://google.com', 'e://tests/', project_name='Google')
+# Then you can access several methods like
 >>> wp.save_complete()
+>>> wp.save_html()
+>>> wp.save_assets()
 
 # This Webpage object contains every methods of the Webpage() class and thus
 # can be reused for later usages.
@@ -242,44 +269,50 @@ One feature is that the raw html is now also accepted.
 
 ```python
 
->>> from pywebcopy import Webpage
+>>> from pywebcopy import Webpage, config
 
 >>> HTML = open('test.html').read()
 
 >>> base_url = 'http://example.com' # used as a base for downloading imgs, css, js files.
 >>> project_folder = '/saved_pages/'
+>>> config.setup_config(base_url, project_folder)
 
->>> wp = WebPage(base_url, project_folder, HTML=HTML)
+>>> wp = WebPage()
+>>> wp.set_source(HTML)
+>>> wp.url = base_url
 >>> wp.save_webpage()
+
 ```
 
-## How to - Whole Websites
+## How to - Clone Whole Websites
 
 Use caution when copying websites as this can overload or damage the
 servers of the site and rarely could be illegal, so check everything before
 you proceed.
 
-### Method 1 -
+### Method 1 : via api - `save_website()`
 
 Using the inbuilt api `.save_website()` which takes several arguments.
 
 ```python
->>> import pywebcopy
+>>> from pywebcopy import save_website
+
+>>> save_website(project_url='http://localhost:8000', project_folder='e://tests/')
 
->>> pywebcopy.save_website(project_url='http://localhost:8000', project_folder='e://tests/')
 ```
 
 ### Method 2 -
 
 By creating a Crawler() object which provides several other functions as well.
 
 ```python
->>> import pywebcopy
+>>> from pywebcopy import Crawler, config
 
->>> pywebcopy.config.setup_config(project_url='http://localhost:5000/', project_folder='e://tests/', project_name='LocalHost')
+>>> config.setup_config(project_url='http://localhost:5000/', project_folder='e://tests/', project_name='LocalHost')
 
->>> crawler = pywebcopy.Crawler('http://localhost:5000/')
+>>> crawler = Crawler('http://localhost:5000/')
 >>> crawler.crawl()
+
 ```
 
 ## Contribution
@@ -296,33 +329,36 @@ If you have any suggestions or fixes or reports feel free to mail me :)
 
 `pywebcopy` is highly configurable.
 
-### 1.3.1 Direct Call Method
+### 1.3.1 APIS
 
-To change any configuration, just pass it to the `init` call.
+To change any configuration, just pass it to the `api` call.
 
 Example:
 
 ```Python
-from pywebcopy.core import save_webpage
+from pywebcopy import save_webpage
+
+kwargs = {
+    'key1': 'value1',
+    ...
+}
 
 save_webpage(
 
     url='http://some-site.com/', # required
     download_loc='path/to/downloads/', # required
 
-    # config keys are case-insensitive
-    any_config_key='new_value',
-    another_config_key='another_new_value',
+    kwargs=kwargs
 
     ...
 
     # add many as you want :)
 )
+
 ```
 
 ### 1.3.2 `config.setup_config` Method
 
->**This function is changed from  `core.setup_config`**
 
 You can manually configure every configuration by using a 
 `config.setup_config` call.
@@ -378,12 +414,6 @@ below is the list of `config` keys with their `default` values :
 # delete the project folder after making zip archive of it
 'delete_project_folder': False
 
-# which parser to use when parsing pages
-# for speed choose 'html.parser' (will crack some webpages)
-# for exact webpage copy choose 'html5lib' (a little slow)
-# or you can leave it to default 'lxml' (balanced)
-'PARSER' : 'lxml'
-
 # to download css file or not
 'LOAD_CSS': True
 
@@ -398,10 +428,7 @@ below is the list of `config` keys with their `default` values :
 'OVER_WRITE': False
 
 # list of allowed file extensions
-'ALLOWED_FILE_EXT': ['.html', '.css', '.json', '.js',
-                     '.xml','.svg', '.gif', '.ico',
-                      '.jpeg', '.jpg', '.png', '.ttf',
-                      '.eot', '.otf', '.woff']
+'ALLOWED_FILE_EXT': ['.html', '.css', ...]
 
 # log file path
 'LOG_FILE': None
@@ -425,6 +452,7 @@ below is the list of `config` keys with their `default` values :
 
 # bypass the robots.txt restrictions
 'BYPASS_ROBOTS' : False
+
 ```
 
 told you there were plenty of `config` vars available!