Powered by GitBook

주의할 점?

아래는 관련 글에서 스크래핑을 할 때 몇 가지 주의할 점에 대해서 기술하여 가져왔습니다.

처음이신 분들은 꼭 읽어보시기 바랍니다.

A few scraping rules(크롤링 규칙?)

You should check a site's terms and conditions before you scrape them. It's their data and they likely have some rules to govern it.

크롤링을 하기 전에 해당 페이지의 조건등을 살펴볼 것. 데이터는 그들의 것이며, 그 데이터들을 사용하기 위한 룰이 있을 수도 있다.

Be nice - A computer will send web requests much quicker than a user can. Make sure you space out your requests a bit so that you don't hammer the site's server.

예의를 지켜라. 컴퓨터는 사용자가 하는 것보다 빠른 요청 신호를 웹 페이지로 보냅니다. 그러므로, 요청을 보낼 때에는 어느 정도 간격을 두고 할 것. 아니면, 웹 사이트에 부담을 줄 수 있습니다.

Scrapers break - Sites change their layout all the time. If that happens, be prepared to rewrite your code.

페이지의 구조가 바뀌면, 해당 페이지를 긁는 코드의 구조 또한 바뀌어야 합니다. 한 번 분석해서 구현했다고 해도 주기적으로 확인을 해야합니다.

Web pages are inconsistent - There's sometimes some manual clean up that has to happen even after you've gotten your data.

웹 페이지는 고정되어 있지 않습니다. 당신이 데이터를 한 번 가져왔다고 해도, 데어터를 긁은 후에도 구조나 데이터가 변할 수 있습니다.

results matching ""

No results matching ""