Scraping With Scrapy
This post is having the instructions to install Scrapy and starting your first project.
What is Scrapy?
Scrapy is an application framework for crawling web sites and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival. It’s all in Python. Read more here.
Installing Scrapy
1.Install gcc and lxml.
sudo apt-get install python-dev
sudo apt-get install libevent-dev
sudo apt-get install libxml2 libxml2-dev
sudo apt-get install libxml2-dev libxslt-dev
sudo apt-get install python-lxml
2.Install twisted
sudo apt-get install python-twisted python-libxml2 python-simplejson
3.Install pyOpenSSL
wget http://pypi.python.org/packages/source/p/pyOpenSSL/pyOpenSSL-0.13.tar.gz
tar -zxvf pyOpenSSL-0.13.tar.gz
cd pyOpenSSL-0.13
sudo python setup.py install
#If any error like gcc exit status 1 pops then :
sudo apt-get update
sudo apt-get install yum rpm
#then
sudo yum install python-devel libxml2-devel libxslt-devel
sudo yum install pyOpenSSL
#or
sudo apt-get install libssl-dev
4.Install pycrypto
wget http://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.5.tar.gz
tar -zxvf pycrypto-2.5.tar.gz
cd pycrypto-2.5
sudo python setup.py install
5.Install easy_install:(if you don’t have easy_install)
wget http://peak.telecommunity.com/dist/ez_setup.py
python ez_setup.py
6.Install w3lib
sudo easy_install -U w3lib
7.Install scrapy
sudo easy_install Scrapy
Creating a project in Scrapy
scrapy startproject my_first_project
The directory structure will look like :
my_first_project/
|___scrapy.cfg
|___my_first_project/
|______ __init__.py
|______ items.py
|______ pipelines.py
|______ settings.py
|______ spiders/
|__________ __init__.py
|__________ ...
Some useful links : Scrapper? , Web Crawler?.
© Copyright Pazhani Ragunathan All Rights Reserved
Designed in Space, Crafted with ❤