初學者必看!網路爬蟲使用Python並輸出CSV檔

上一次教大家怎樣使用Python做網路爬蟲，在把這些資訊爬下來以後有可能有需要輸出成CSV檔，今天就繼續教大家怎麼把網路爬蟲下來的資料輸出成CSV檔。還沒看過上一篇的人請先看完上一篇再閱讀這一篇，請點下面連結:

初學者必看!網路爬蟲的簡單範例使用Python

跟上次一樣，網路爬蟲的網站是暢銷書排行榜，網址是:https://www.books.com.tw/web/sys_saletopb/books

輸出成CSV檔，首先最前面要引用csv套件，輸入下面程式:

import csv

接下來是命名輸出檔案的名稱，輸入下面程式:

with open('books.csv', 'w', encoding='utf-8', newline='') as csv_file:

上面的這行程式是把輸出檔案命名為book.csv。再來就要輸出文字到檔案裏面，先輸出標題如下:

csv_writer = csv.writer(csv_file)
csv_writer.writerow(['排名', '書名', '作者', '價錢', '連結'])

這時候執行程式，就會把文字寫入到book.csv檔案裏面，打開檔案就是這樣的結果:

排名,書名,作者,價錢,連結

然後就要把爬到文字寫入到檔案中，要寫在For迴圈裏面，連同原本For迴圈的程式如下:

for item in info_items:
number=item.find('div','stitle').text.strip()
bookname=item.find('div','type02_bd-a').a.text.strip()
price=item.find('li','price_a').text.strip()
writter=item.find(["ul","li"]).a.text.strip()
link=item.find('div','type02_bd-a').a.get('href').strip()

csv_writer.writerow([number, bookname, writter, price, link]) 多增加這行
print(' 排名：{}, 書名:{}, {}, {}, {}'.format(number, bookname, writter, price, link))

這樣子就會把爬到的文字輸出到csv檔裡面了。完整的網路爬蟲程式如下:

import csv
import requests
from bs4 import BeautifulSoup

url ="https://www.books.com.tw/web/sys_saletopb/books/"
response = requests.get(url)

soup = BeautifulSoup(response.text, 'lxml')

info_items = soup.find_all('li', 'item',limit=10)

with open('books.csv', 'w', encoding='utf-8', newline='') as csv_file:

csv_writer = csv.writer(csv_file)
csv_writer.writerow(['排名', '書名', '作者', '價錢', '連結'])

for item in info_items:
number=item.find('div','stitle').text.strip()
bookname=item.find('div','type02_bd-a').a.text.strip()
price=item.find('li','price_a').text.strip()
writter=item.find(["ul","li"]).a.text.strip()
link=item.find('div','type02_bd-a').a.get('href').strip()

csv_writer.writerow([number, bookname, writter, price, link])
print(' 排名：{}, 書名:{}, {}, {}, {}'.format(number, bookname, writter, price, link))

info_items = soup.find_all('li', 'item',limit=10)這裡是因為有一些暢銷書籍沒有作者，會導致作者這一格的程式writter=item.find(["ul","li"]).a.text.strip()找不到東西程式錯誤，這是這個程式的一個問題，我目前還沒修正，找時間會在解決這個問題。特別注意空格不要空錯，空格在Python語言中是有意義的，我上次就空格的地方錯了導致程式跑不起來。以上就是把網路爬蟲輸出成CSV檔案的教學。