<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.2">Jekyll</generator><link href="https://jjwsgit.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://jjwsgit.github.io/" rel="alternate" type="text/html" /><updated>2022-10-26T14:56:52+00:00</updated><id>https://jjwsgit.github.io/feed.xml</id><title type="html">DS_Circle</title><subtitle>동글뱅의 블로그 입니다</subtitle><author><name>JJWS</name></author><entry><title type="html">[멋사 AI 7기] 랜덤포레스트</title><link href="https://jjwsgit.github.io/likelion/randomforest/" rel="alternate" type="text/html" title="[멋사 AI 7기] 랜덤포레스트" /><published>2022-10-26T00:00:00+00:00</published><updated>2022-10-26T00:00:00+00:00</updated><id>https://jjwsgit.github.io/likelion/randomforest</id><content type="html" xml:base="https://jjwsgit.github.io/likelion/randomforest/">&lt;p&gt;:octocat: modeling&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;decisiontree&quot;&gt;DecisionTree&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;장점&lt;/strong&gt;&lt;br /&gt;
baseline model로 좋다&lt;br /&gt;
부스팅 모델의 기반이고 빠르며, 피처 중요도 보기 좋음&lt;br /&gt;
&lt;strong&gt;단점&lt;/strong&gt;&lt;br /&gt;
랜덤성에 따라 결과 또는 성능의 변동 폭이 크다&lt;br /&gt;
계층적 접근 방식이기 때문에 중간에 에러 발생하면 다음 단계로 에러가 계속 전파&lt;br /&gt;
노이즈에 민감하다 ~ 과적합 가능성, 일반화하여 사용하기 어렵다&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;랜덤포레스트&quot;&gt;랜덤포레스트&lt;/h1&gt;

&lt;p&gt;bagging : bootstrap + aggregating&lt;br /&gt;
조금씩 다른 훈련 데이터에 대해 훈련된 기초 분류기(base learner)들을 결합(aggregating)시키는 방법이다.&lt;br /&gt;
비상관화된 여러 트리들의 집계는 노이즈에 대해 강인&lt;br /&gt;
랜덤화를 통해 좋은 일반화 성능&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;주요-파라미터&quot;&gt;주요 파라미터&lt;/h2&gt;

&lt;p&gt;n_estimators : 트리수&lt;br /&gt;
n_jobs=k : Parallelization, -1 일때는 사용가능한 모든 코어 사용)&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;트리가 많아서 시각화 하기가 힘들다&lt;br /&gt;
&lt;a href=&quot;https://github.com/andosa/treeinterpreter&quot;&gt;시각화 방법&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;검증&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Hold-out Validation&lt;br /&gt;
빠르게 평가가 가능&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;Cross Validation&lt;br /&gt;
&lt;strong&gt;장점&lt;/strong&gt;&lt;br /&gt;
모든 데이터 셋을 평가에 활용할 수 있고, 평가에 사용되는 데이터셋의 편향을 막을 수 있다&lt;br /&gt;
=&amp;gt; 데이터 부족으로 인한 underfitting 방지&lt;br /&gt;
=&amp;gt; 평가 결과에 따라 좀 더 일반화된 모델을 만들 수 있다&lt;br /&gt;
~ 정확도를 향상시킬 수 있다.&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;단점&lt;/strong&gt;
모델 훈련 및 평가 시간이 오래 걸린다&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;평가&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;MAE(Mean Absolute Error)&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;R^2 결정계수&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;오늘의-이모저모&quot;&gt;오늘의 이모저모&lt;/h1&gt;
&lt;p&gt;&lt;br /&gt;
기타사항&lt;br /&gt;
wandb : 튜닝을 하고 기록을 해주는 라이브러리&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;:bulb:&lt;a href=&quot;https://speakerdeck.com/weirdx/99con-junieo-gaebaljayi-iryeogseo-sseugi-idongug&quot;&gt;올바른 이력서 쓰기&lt;/a&gt;&lt;br /&gt;
날짜순 내림차순&lt;br /&gt;
zip x&lt;br /&gt;
본인 소개엔 기여점 쓰기. 기술 / 협업 / 소통&lt;br /&gt;
기술스택 - 할 수 있는 것(주력기술)들 모호하지않게 한줄정리&lt;br /&gt;
무관한 경력이라도 쓰는 것 추천. 조직생활 경험이나 성실성 등을 보여줄 수 있다. 어떻게 풀어쓰느냐!&lt;br /&gt;
&lt;a href=&quot;https://jojoldu.github.io/&quot;&gt;이력서&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;details&gt;
&lt;summary&gt;:bookmark:출처&lt;/summary&gt;

- DecisionTree&lt;br /&gt;
https://scikit-learn.org/stable/index.html&lt;br /&gt;
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html&lt;br /&gt;
- RandomForest&lt;br /&gt;
asdasdsadsad&lt;br /&gt;
https://github.com/andosa/treeinterpreter&lt;br /&gt;
- 99CON : 주니어 개발자의 이력서 쓰기 - 이동욱&lt;br /&gt;
https://speakerdeck.com/weirdx/99con-junieo-gaebaljayi-iryeogseo-sseugi-idongug&lt;br /&gt;
https://jojoldu.github.io/&lt;br /&gt;
&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p class=&quot;notice--success&quot;&gt;:mortar_board:&lt;strong&gt;포스팅 공지&lt;/strong&gt; &lt;br /&gt;&lt;br /&gt;
작성한 포스팅은 &lt;strong&gt;멋쟁이 사자처럼 AI SCHOOl&lt;/strong&gt;의 수업 내용입니다.&lt;br /&gt;&lt;/p&gt;</content><author><name>JJWS</name></author><category term="likelion" /><category term="TIL" /><summary type="html">:octocat: modeling</summary></entry><entry><title type="html">[멋사 AI 7기] 머신러닝 기본</title><link href="https://jjwsgit.github.io/likelion/ML_basic/" rel="alternate" type="text/html" title="[멋사 AI 7기] 머신러닝 기본" /><published>2022-10-25T00:00:00+00:00</published><updated>2022-10-25T00:00:00+00:00</updated><id>https://jjwsgit.github.io/likelion/ML_basic</id><content type="html" xml:base="https://jjwsgit.github.io/likelion/ML_basic/">&lt;p&gt;:octocat: scikit-learn&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;ml-tools&quot;&gt;ML Tools&lt;/h1&gt;

&lt;p&gt;&lt;a href=&quot;https://scikit-learn.org/stable/index.html&quot;&gt;scikit-learn&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;the most popular ML framework&lt;br /&gt;
사이킷런으로 할 수 있는 6가지 일&lt;br /&gt;&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Classification - Supervised learning&lt;/li&gt;
  &lt;li&gt;Regression - Supervised learning&lt;/li&gt;
  &lt;li&gt;Clustering - Unsupervised learning&lt;/li&gt;
  &lt;li&gt;Dimensionality reduction&lt;/li&gt;
  &lt;li&gt;Model selection&lt;/li&gt;
  &lt;li&gt;Preprocessing - 수치형 변수로 바꿔주는 등&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;장점&lt;/strong&gt;&lt;br /&gt;
Numpy, Scipy, Matplotlib 기반 제작 =&amp;gt; 높은 접근 가능성과 재사용성&lt;br /&gt;
간단하고 효율적인 도구&lt;br /&gt;
오픈 소스, 상업적 사용가능&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;단점&lt;/strong&gt;&lt;br /&gt;
딥러닝 및 강화학습은 지원하지 않음&lt;br /&gt;
아키텍처를 정의 하기위해 많은 vocabulary를 필요로 하고, 효율적인 컴퓨팅을 위해 GPU가 추가로 필요하기 때문에
사이킷런에서는 지원하지 않음&lt;br /&gt;
&lt;br /&gt;
:pushpin:sklearn.neural_network에서 간단한 다층 퍼셉트론을 구현&lt;br /&gt;
이 모듈에 대한 버그 수정만 허락 중&lt;br /&gt;
더 복잡한 딥 러닝 모델을 구현하려면 tensorflow, keras 및 pytorch와 같은 인기 있는 딥 러닝 프레임워크로 전환바람&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;XGBoost / LightGBM / CatBoost&lt;br /&gt;
PyCaret&lt;br /&gt;
Prophet&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;딥러닝&lt;br /&gt;
TensorFlow / Keras / PyTorch&lt;br /&gt;
Fast.ai - 교육용 목적&lt;br /&gt;
Caffe&lt;br /&gt;
MXNet&lt;br /&gt;
&lt;br /&gt;
XAI : 설명가능한 인공지능&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;scikit-learn-과-ml&quot;&gt;scikit-learn 과 ML&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;기계학습&lt;/strong&gt;&lt;br /&gt;
지도학습(분류, 회귀) + 비지도학습(군집화, 변환, 연관) + 강화학습 + etc&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html&quot;&gt;choosing the right estimator&lt;/a&gt;&lt;br /&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt;범주형&lt;/th&gt;
      &lt;th&gt;수치형&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;지도학습&lt;/td&gt;
      &lt;td&gt;분류&lt;/td&gt;
      &lt;td&gt;회귀&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;비지도학습&lt;/td&gt;
      &lt;td&gt;군집화&lt;/td&gt;
      &lt;td&gt;차원축소&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;비지도 학습 - feature engineering&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;머신러닝-과정&quot;&gt;머신러닝 과정&lt;/h2&gt;

&lt;p&gt;fit : 학습 =&amp;gt; predict : 예측 =&amp;gt; evaluate : 평가&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;예시) 의사결정나무&lt;br /&gt;&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;모델설정 : model = DecisionTreeClassifier(randoms_state,,,)&lt;/li&gt;
  &lt;li&gt;학습 : model.fit(X_train, y_train)&lt;/li&gt;
  &lt;li&gt;예측 : y_predict = model.predict(X_test)&lt;/li&gt;
  &lt;li&gt;의사결정나무 시각화 &lt;br /&gt;
 plot_tree(model, max_depth, feature_names, filled,,,)&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;피처의 중요도&lt;br /&gt;
 model.feature_importances_&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;정확도 측정&lt;br /&gt;
 1) model.score(X_test, y_test)&lt;br /&gt;
 2) accuracy_score(y_test, y_predict)&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;데이터-전처리-feature&quot;&gt;데이터 전처리, Feature&lt;/h3&gt;

&lt;p&gt;데이터전처리방법&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Normalization : 0에서 1사이의 값으로 만들어줌&lt;br /&gt;
수치변수의 범위가 다양할때 normalization을 통해 머신러닝 모델이 데이터를 더 자세히 관찰할 수 있게 함&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;Outlier 다른 데이터 예측할 때 영향이 있는 경우가 있다. 사용 여부 고려&lt;/li&gt;
  &lt;li&gt;Imputation : 결측치&lt;br /&gt;
데이터에 결측치가 있으면 모델링을 할 수 없음&lt;br /&gt;
결측치를 채우거나, 제거하거나, 대체하거나 해야 함&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;Encoding&lt;br /&gt;
문자형태는 머신러닝 알고리즘이 학습 할 수 없음&lt;br /&gt;
종류 원핫인코딩 등&lt;br /&gt;
&lt;br /&gt;
수치형 변수를 범주화하는 이유 몇 가지&lt;br /&gt;
머신러닝 알고리즘에 힌트를 줄 수도 있고 과적합을 방지할 수도 있다.&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;modeling&quot;&gt;Modeling&lt;/h3&gt;

&lt;h4 id=&quot;알고리즘&quot;&gt;알고리즘&lt;/h4&gt;

&lt;dl&gt;
  &lt;dt&gt;&lt;strong&gt;의사결정나무&lt;/strong&gt;&lt;br /&gt;&lt;/dt&gt;
  &lt;dt&gt;나무를 뒤집어 놓은 모양&lt;br /&gt;&lt;/dt&gt;
  &lt;dt&gt;스무고개와 유사하게 예/아니오 질문을 이어가며 학습&lt;br /&gt;&lt;/dt&gt;
  &lt;dt&gt;질문이나 정답을 담을 네모 상자를 노드라 하며 분기가 거듭될 수록 그에 해당되는 데이터가 줄어듬.&lt;br /&gt;&lt;/dt&gt;
  &lt;dt&gt;&lt;br /&gt;&lt;/dt&gt;
  &lt;dt&gt;CART(Classification And Regression Tree)&lt;br /&gt;&lt;/dt&gt;
  &lt;dd&gt;분류와 회귀에 모두 사용할 수 있는 Tree&lt;br /&gt;
&lt;br /&gt;&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;주요 파라미터&lt;br /&gt;
criterion: 가지 분할의 품질을 측정(gini, entropy)&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;지니불순도&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;엔트로피&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;max_depth: 트리의 최대 깊이&lt;br /&gt;
min_samples_split:내부 노드를 분할하는 데 필요한 최소 샘플 수&lt;br /&gt;
min_samples_leaf: 리프 노드에 있어야 하는 최소 샘플 수&lt;br /&gt;
max_leaf_nodes: 리프 노드 숫자의 제한치&lt;br /&gt;
random_state: 추정기의 무작위성을 제어 =&amp;gt; 실행했을 때 같은 결과가 나오도록 한다&lt;br /&gt;&lt;/p&gt;

&lt;h4 id=&quot;훈련&quot;&gt;훈련&lt;/h4&gt;

&lt;p&gt;overfitting / optimum / underfitting&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;오늘의-이모저모&quot;&gt;오늘의 이모저모&lt;/h1&gt;

&lt;p&gt;선형 대수학 관점에서 행렬(예: 디자인 행렬 X)은 대문자 라틴 문자를 사용하고,&lt;br /&gt;
벡터(응답 벡터 y)에는 소문자 라틴 문자를 사용하는 것이 일반적 - &lt;a href=&quot;https://stats.stackexchange.com/questions/389395/why-uppercase-for-x-and-lowercase-for-y&quot;&gt;출처&lt;/a&gt;&lt;br /&gt;
no free lunch&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;details&gt;
&lt;summary&gt;:bookmark:출처&lt;/summary&gt;

- scikit-learn&lt;br /&gt;
https://scikit-learn.org/stable/index.html&lt;br /&gt;
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html&lt;br /&gt;
- stackexchange&lt;br /&gt;
https://stats.stackexchange.com/questions/389395/why-uppercase-for-x-and-lowercase-for-y&lt;br /&gt;
&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p class=&quot;notice--success&quot;&gt;:mortar_board:&lt;strong&gt;포스팅 공지&lt;/strong&gt; &lt;br /&gt;&lt;br /&gt;
작성한 포스팅은 &lt;strong&gt;멋쟁이 사자처럼 AI SCHOOl&lt;/strong&gt;의 수업 내용입니다.&lt;br /&gt;&lt;/p&gt;</content><author><name>JJWS</name></author><category term="likelion" /><category term="TIL" /><summary type="html">:octocat: scikit-learn</summary></entry><entry><title type="html">[멋사 AI 7기] Git과 Streamlit</title><link href="https://jjwsgit.github.io/likelion/streamlit/" rel="alternate" type="text/html" title="[멋사 AI 7기] Git과 Streamlit" /><published>2022-10-18T00:00:00+00:00</published><updated>2022-10-18T00:00:00+00:00</updated><id>https://jjwsgit.github.io/likelion/streamlit</id><content type="html" xml:base="https://jjwsgit.github.io/likelion/streamlit/">&lt;p&gt;:octocat: git/github &amp;amp; streamlit&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;git-과-github&quot;&gt;Git 과 GitHub&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;git : 분산(여러 사용자들의 작업/수정) 버전 관리 시스템&lt;br /&gt;
컴퓨터 파일의 변경사항을 추적하는 시스템&lt;br /&gt;
여러 명의 사용자들 간에 해당 파일들의 작업을 조율하기 위한 스냅샷 스트림 기반의 분산 버전 관리 시스템&lt;br /&gt;
git은 빠른 수행 속도에 중점을 두고 있는 것이 특징이며 데이터 무결성, 분산, 비선형 워크플로를 지원&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;github : 소스코드 관리 + 저장 + 소셜코딩&lt;br /&gt;
분산 버전 관리 툴인 깃 저장소 호스팅을 지원하는 웹 서비스이다.&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;commit--git-commit&quot;&gt;&lt;a href=&quot;https://backlog.com/git-tutorial/kr/intro/intro1_3.html&quot;&gt;commit&lt;/a&gt; / &lt;a href=&quot;https://steady-coding.tistory.com/m/277&quot;&gt;git commit&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;버전관리의 일환&lt;br /&gt;
버전은 의미 있는 변화를 뜻하며, 작업이 완결된 상태&lt;br /&gt;
이렇게 의미 있는 변화에 대해 기록하는 것이 바로 commit&lt;br /&gt;
변경을 기록하는 커밋, 파일 및 폴더의 추가/변경 사항을 저장소에 기록&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;repository-만들기&quot;&gt;repository 만들기&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;gitignore : 파일을 git의 추적에서 제외 - 올릴 필요없는 캐시/로그 등의 파일 저장하지마라&lt;br /&gt;
Project에 원하지 않는 Backup File이나 Log File , 혹은 컴파일 된 파일들을 Git에서 제외시킬수 있는 설정 File이다.&lt;br /&gt;
항상 최상위 Directory에 존재해야함&lt;br /&gt;
규칙을 작성하여 특정 확장자를 제외할 수 있다.&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;LICENSE &lt;a href=&quot;https://wooono.tistory.com/m/379&quot;&gt;참고&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;README.md&lt;br /&gt;
&lt;a href=&quot;https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax&quot;&gt;깃헙 마크다운 문서&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://imgur.com/&quot;&gt;imgur&lt;/a&gt; : 이미지 업로드 후 링크 사용할 때 괜찮은 사이트&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;pip freeze &amp;gt; requirements.txt : 설치된 패키지를 목록으로 만들기&lt;br /&gt;
requirements.txt&lt;br /&gt;
필요한 모듈/라이브러리 알려주는 약속&lt;br /&gt;
배포 할 때 사용하면 서버에 필요한 패키지를 설치해 줌&lt;br /&gt;
pandas=1.5 처럼 버전을 지정해서 목록화 할 수 있다.&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;streamlit-시작하기&quot;&gt;streamlit &lt;a href=&quot;https://docs.streamlit.io/library/get-started&quot;&gt;시작하기&lt;/a&gt;&lt;/h1&gt;

&lt;p&gt;st.markdown&lt;br /&gt;
st.sidebar&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;markdown&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;header&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;selectbox&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;multiselect&lt;br /&gt;
&lt;br /&gt;
st.dataframe&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Chart element&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;line_chart&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;bar_chart&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;pyplot&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;@st.cache : 캐시 적용하기&lt;br /&gt;
첫 로드 후 이후 기존 로드한 데이터를 사용하여 부담을 줄여준다.&lt;br /&gt;
캐시를 적용하지 않으면 데이터를 매번 가져와야 해서 속도가 떨어짐&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;:pushpin: plotly express 그래프 호환 잘되는편&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;오늘의-이모저모&quot;&gt;오늘의 이모저모&lt;/h1&gt;

&lt;p&gt;데이터센터나 클라우드 서버를 사용하는 이유?&lt;br /&gt;
안전성, 확장성, 이원화 &lt;br /&gt;
전력사용 및 대용량, 발열 때문&lt;br /&gt;
&lt;br /&gt;
:bulb: 서버 관리에 대한 불편함&lt;br /&gt;
24시간 계속 가동되어야하는데 서버가 버티지 못해서 주기적으로 교체 해야함&lt;br /&gt;
&lt;br /&gt;
이모지 단축키: window + .&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;details&gt;
&lt;summary&gt;:bookmark:출처&lt;/summary&gt;

- Commit&lt;br /&gt;
https://backlog.com/git-tutorial/kr/intro/intro1_3.html&lt;br /&gt;
https://steady-coding.tistory.com/m/277&lt;br /&gt;
- LICENSE&lt;br /&gt;
https://pandas.pydata.org/docs/reference/api/pandas.to_numeric.html&lt;br /&gt;
- GitHub basic writing and formatting syntax&lt;br /&gt;
https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax&lt;br /&gt;
- imgur&lt;br /&gt;
https://imgur.com/&lt;br /&gt;
- Streamlit&lt;br /&gt;
https://docs.streamlit.io/library/get-started&lt;br /&gt;
&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p class=&quot;notice--success&quot;&gt;:mortar_board:&lt;strong&gt;포스팅 공지&lt;/strong&gt; &lt;br /&gt;&lt;br /&gt;
작성한 포스팅은 &lt;strong&gt;멋쟁이 사자처럼 AI SCHOOl&lt;/strong&gt;의 수업 내용입니다.&lt;br /&gt;&lt;/p&gt;</content><author><name>JJWS</name></author><category term="likelion" /><category term="TIL" /><summary type="html">:octocat: git/github &amp;amp; streamlit</summary></entry><entry><title type="html">[멋사 AI 7기] 절약</title><link href="https://jjwsgit.github.io/likelion/saving/" rel="alternate" type="text/html" title="[멋사 AI 7기] 절약" /><published>2022-10-17T00:00:00+00:00</published><updated>2022-10-17T00:00:00+00:00</updated><id>https://jjwsgit.github.io/likelion/saving</id><content type="html" xml:base="https://jjwsgit.github.io/likelion/saving/">&lt;p&gt;:octocat:Saving&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;오늘의-키워드--절약&quot;&gt;오늘의 키워드 : 절약&lt;/h1&gt;

&lt;p&gt;현업에서 다루는 데이터는 실습에서 다루는 데이터보다 대부분 큰 용량&lt;br /&gt;
이 때 컴퓨터 RAM 용량만큼 불러올 수 있음&lt;br /&gt;
더 많은 데이터를 불러와서 분석하거나 모델을 만들기 위해서는 메모리를 효율적으로 사용할 수 있어야 함&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;:bulb: 절약하는 방법&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;메모리 절약 =&amp;gt; downcast&lt;br /&gt;
절약을 통해 더 많은 데이터를 불러와서 더 많이 분석할 수 있을지?&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;스토리지(디스크공간) 절약 =&amp;gt; parquet&lt;br /&gt;
파일 크기를 줄여서 더 많은 파일을 저장할 수 있을까?&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;메모리-절약-by-downcast&quot;&gt;메모리 절약 by downcast&lt;/h2&gt;

&lt;p&gt;데이터의 범위(int64, uint32 등)에 따라 메모리에서 차지하는 용량이 다르다.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;../../images/2022-10-17-saving/datatype.png&quot; alt=&quot;datatype&quot; /&gt;&lt;/p&gt;

&lt;p&gt;downcast 는 &lt;a href=&quot;https://pandas.pydata.org/docs/reference/api/pandas.to_numeric.html&quot;&gt;pd.to_numeric&lt;/a&gt; 을 사용한다.&lt;br /&gt;
downcast : {‘integer’, ‘signed’, ‘unsigned’, ‘float’}, default None&lt;br /&gt;
    If not None, and if the data has been successfully cast to a&lt;br /&gt;
    numerical dtype (or if the data was numeric to begin with),&lt;br /&gt;
    downcast that resulting data to the smallest numerical dtype&lt;br /&gt;
    possible according to the following rules:&lt;br /&gt;
    &lt;br /&gt;
    - ‘integer’ or ‘signed’: smallest signed int dtype (min.: np.int8)&lt;br /&gt;
    - ‘unsigned’: smallest unsigned int dtype (min.: np.uint8)&lt;br /&gt;
    - ‘float’: smallest float dtype (min.: np.float32)&lt;br /&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;int&lt;br /&gt;
음수가 없을 때 unsigned&lt;br /&gt;
음수가 있을 때 signed = integer&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;float =&amp;gt; float - 소수점이나, 결측치 고려&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;bool =&amp;gt; int8 - 거의 차이 없음
&lt;br /&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;object &lt;br /&gt;
범주형 형태일 때 =&amp;gt; df.astype(“category”)&lt;br /&gt;
게시글 내용처럼 범주의 수가 너무 많다면 적합하지 않다.&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;스토리지-절약---파일-크기-줄이기-by-parquet&quot;&gt;스토리지 절약 - 파일 크기 줄이기 by parquet&lt;/h2&gt;

&lt;p&gt;csv : 행단위 vs parquet : 열단위&lt;br /&gt;
열 단위 구분이 압축에 유리하다.&lt;br /&gt;
&lt;br /&gt;
:pushpin: pyarrow 와 fastparquet 도 설치한다.&lt;br /&gt;
engine 에서 쓰이기 때문&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://pandas.pydata.org/pandas-docs/version/1.1/reference/api/pandas.DataFrame.to_parquet.html&quot;&gt;pandas.DataFrame.to_parquet&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
engine : {‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’ Parquet library to use.&lt;br /&gt;
    If ‘auto’, then the option io.parquet.engine is used.&lt;br /&gt;
    The default io.parquet.engine behavior is to try ‘pyarrow’,&lt;br /&gt;
    falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.&lt;br /&gt;
compression : {‘snappy’, ‘gzip’, ‘brotli’, None}, default ‘snappy’&lt;br /&gt;
    Name of the compression to use. Use None for no compression.&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;../../images/2022-10-17-saving/parquet_performance.png&quot; alt=&quot;performance&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;
parquet 파일은 메타 정보를 포함하고 있다.&lt;br /&gt;
=&amp;gt; 데이터 수가 적을 때에는 오히려 csv 보다 파일 크기가 클 수 있다.&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h3 id=&quot;파일-크기-비교&quot;&gt;파일 크기 비교&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://docs.python.org/ko/3/library/os.html#module-os&quot;&gt;os : 기타 운영 체제 인터페이스&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;이 모듈은 운영 체제 종속 기능을 사용하는 이식성 있는 방법을 제공합니다.&lt;br /&gt;
&lt;br /&gt;
os.stat(path, *, dir_fd=None, follow_symlinks=True)&lt;br /&gt;
파일 또는 파일 기술자의 상태를 가져옵니다. &lt;br /&gt;
주어진 경로에 대해 stat() 시스템 호출과 같은 작업을 수행합니다.&lt;br /&gt;
stat_result 객체를 반환합니다.&lt;br /&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;st_size&lt;br /&gt;
일반 파일 또는 심볼릭 링크면, 바이트 단위의 파일의 크기. &lt;br /&gt;
심볼릭 링크의 크기는 포함하고 있는 경로명의 길이이며, 끝나는 널 바이트는 포함하지 않습니다.&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;os.path : 일반적인 경로명 조작&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;os.path.isfile(path)&lt;br /&gt;
path가 존재하는 일반 파일이면 True를 반환합니다.&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;대시보드&quot;&gt;대시보드&lt;/h1&gt;

&lt;h2 id=&quot;streamlit&quot;&gt;streamlit&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://streamlit.io/&quot;&gt;A faster way to build and share data apps&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;오늘의-이모저모&quot;&gt;오늘의 이모저모&lt;/h1&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PySpark Koalas&lt;/strong&gt;&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://spark.apache.org/docs/latest/api/python/migration_guide/koalas_to_pyspark.html#&quot;&gt;Migrating from Koalas to pandas API on Spark — PySpark documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://koalas.readthedocs.io/en/latest/getting_started/10min.html&quot;&gt;10 minutes to Koalas — Koalas 1.8.2 documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://spark.apache.org/docs/latest/api/python/migration_guide/koalas_to_pyspark.html#&quot;&gt;Migrating from Koalas to pandas API on Spark — PySpark 3.2.1 documentation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=Y9kdUq_qIa8&quot;&gt;pandas 코드로 대규모 클러스터에서 더 빠르게 빅데이터를 분석 해보자 - Koalas - 박현우 - PyCon Korea 2020 - YouTube&lt;/a&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;리눅스 계열 명령어&lt;/strong&gt;&lt;br /&gt;
ls : 현재 디렉토리 폴더 내부 확인 (윈도우는 dir)&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;cd : Change 디렉토리, 현재 디렉토리 위치에 존재하거나 상위 다른 디렉토리로 이동&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;~ : root 디렉토리 이동&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;.. : 상위 디렉토리 이동&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;mkdir dirName : 현재 디렉토리 위치에서 새로운 디렉토리를 생성&lt;br /&gt;
&lt;br /&gt;
mv 디렉토리 혹은 파일 이동경로 : 현재 디렉토리 내 디렉토리 또는 파일 이동&lt;br /&gt;
mv 디렉토리 혹은 파일 newName : 현재 디렉토리 내 디렉토리 또는 파일 이름 변경&lt;br /&gt;
&lt;br /&gt;
cp 현재파일명 복사할파일명 : 현재 디렉토리 내 파일 복사&lt;br /&gt;
cp -R dir/ : 현재 디렉토리 내 디렉토리 복사&lt;br /&gt;
&lt;br /&gt;
rm fileName : 현재 디렉토리 내 파일 삭제&lt;br /&gt;
rm -rf : 현재 디렉토리 내 디렉토리 삭제&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;details&gt;
&lt;summary&gt;:bookmark:출처&lt;/summary&gt;

- Data type&lt;br /&gt;
https://github.com/rougier/numpy-tutorial#quick-references&lt;br /&gt;
- pd.to_numeric&lt;br /&gt;
https://pandas.pydata.org/docs/reference/api/pandas.to_numeric.html&lt;br /&gt;
- pandas.DataFrame.to_parquet&lt;br /&gt;
https://builtin.com/data-science/numpy-random-seed&lt;br /&gt;
- Parquet read performance&lt;br /&gt;
https://wesmckinney.com/blog/python-parquet-update/&lt;br /&gt;
- os&lt;br /&gt;
https://docs.python.org/ko/3/library/os.html#module-os&lt;br /&gt;
- streamlit&lt;br /&gt;
https://streamlit.io/&lt;br /&gt;
- PySpark Koalas&lt;br /&gt;
https://spark.apache.org/docs/latest/api/python/migration_guide/koalas_to_pyspark.html#&lt;br /&gt;
https://koalas.readthedocs.io/en/latest/getting_started/10min.html&lt;br /&gt;
https://spark.apache.org/docs/latest/api/python/migration_guide/koalas_to_pyspark.html#&lt;br /&gt;
https://www.youtube.com/watch?v=Y9kdUq_qIa8&lt;br /&gt;
&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p class=&quot;notice--success&quot;&gt;:mortar_board:&lt;strong&gt;포스팅 공지&lt;/strong&gt; &lt;br /&gt;&lt;br /&gt;
작성한 포스팅은 &lt;strong&gt;멋쟁이 사자처럼 AI SCHOOl&lt;/strong&gt;의 수업 내용입니다.&lt;br /&gt;&lt;/p&gt;</content><author><name>JJWS</name></author><category term="likelion" /><category term="TIL" /><summary type="html">:octocat:Saving</summary></entry><entry><title type="html">[멋사 AI 7기] EDA 이모저모</title><link href="https://jjwsgit.github.io/likelion/todayEDA/" rel="alternate" type="text/html" title="[멋사 AI 7기] EDA 이모저모" /><published>2022-10-13T00:00:00+00:00</published><updated>2022-10-13T00:00:00+00:00</updated><id>https://jjwsgit.github.io/likelion/todayEDA</id><content type="html" xml:base="https://jjwsgit.github.io/likelion/todayEDA/">&lt;p&gt;:octocat:EDA&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;파일시스템의-대소문자-구분&quot;&gt;파일시스템의 대소문자 구분&lt;/h1&gt;

&lt;p&gt;&lt;a href=&quot;https://learn.microsoft.com/ko-kr/windows/wsl/case-sensitivity&quot;&gt;대/소문자 구분 조정&lt;/a&gt;&lt;br /&gt;
“Windows 및 Linux 대/소문자 구분 간의 차이점”&lt;br /&gt;
Linux 및 Windows 파일 및 디렉터리 둘 다로 작업할 때 대/소문자 구분 처리 방법을 조정해야 할 수 있다.&lt;br /&gt;
표준 동작:&lt;br /&gt;
Windows 파일 시스템은 파일 및 디렉터리 이름을 대/소문자를 구분하지 않는 것으로 처리 ex) FOO.txt 및 foo.txt 동일한 파일로 처리&lt;br /&gt;
Linux 파일 시스템은 파일 및 디렉터리 이름을 대/소문자를 구분하는 것으로 처리 ex) FOO.txt 및 foo.txt 고유 파일로 처리&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;random-generator&quot;&gt;&lt;a href=&quot;https://numpy.org/doc/stable/reference/random/generator.html&quot;&gt;Random Generator&lt;/a&gt;&lt;/h1&gt;

&lt;p&gt;numpy.random.seed() vs numpy.random.default_rng() - 사용 권장&lt;br /&gt;
&lt;a href=&quot;https://builtin.com/data-science/numpy-random-seed&quot;&gt;Stop Using NumPy’s Global Random Seed&lt;/a&gt;&lt;br /&gt;
global random state vs own random state&lt;br /&gt;
default_rng의 이점&lt;br /&gt;
함수와 클래스 사이에 난수 생성기를 전달할 수 있다. &lt;br /&gt;
즉, 각 개인 또는 함수는 전역 시드를 재설정하지 않고도 고유한 random state 를 가질 수 있다.&lt;br /&gt;
또한 각 스크립트는 재현 가능 함수에 난수 생성기를 전달할 수 있다.&lt;br /&gt;
=&amp;gt; 프로젝트의 각 부분에서 어떤 난수 생성기가 사용되는지 정확히 알 수 있다.&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;pandas&quot;&gt;Pandas&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html&quot;&gt;to_datetime(arg, format)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://pandas.pydata.org/docs/reference/series.html#api-series-dt&quot;&gt;dt accessor&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html&quot;&gt;options and setting&lt;/a&gt;&lt;br /&gt;
pd.options.display.max_columns = None&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;merge&quot;&gt;&lt;a href=&quot;https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html&quot;&gt;merge&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;df1.merge(df2, on=”key”)&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;how : {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘inner’&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;on : label or list&lt;br /&gt;
Column or index level names to join on.&lt;br /&gt;
These must be found in both DataFrames.&lt;br /&gt; 
If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;groupby-메서드-체이닝&quot;&gt;groupby 메서드 체이닝&lt;/h2&gt;

&lt;p&gt;groupby 메서드로 생성된 결과물은 DataFrameGroupBy object로 DataFrame과 다르다.&lt;br /&gt;
DataFrameGroupBy object에는 여러 메서드를 적용할 수 있다.&lt;br /&gt;
:bulb: 메서드 체이닝: 여러 메서드를 붙여 사용하는 것&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;map-function&quot;&gt;map function&lt;/h2&gt;

&lt;p&gt;map lambda 안 써줘도 되는 경우 - dict&lt;br /&gt;
gender_dict = {1 : “남자”, 2 : “여자”}&lt;br /&gt;
df[“성별”] = df[“성별코드”].map(gender_dict) / map(lambda x : gender_dict(x))&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;:pushpin: map 대신 replace 사용가능&lt;br /&gt;
&lt;a href=&quot;https://abluesnake.tistory.com/142&quot;&gt;map과 replace의 차이&lt;/a&gt;&lt;br /&gt;
map은 기존의 값들을 맵핑하여 변환하기 때문에, 딕셔너리에 값이 존재하지 않으면 맵핑할 수 없어 NaN을 반환&lt;br /&gt;
replace는 값을 바꿔주는 용도이기 때문에, 값이 존재하지 않아도 기존 값을 그대로 유지&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;시각화&quot;&gt;시각화&lt;/h1&gt;

&lt;h2 id=&quot;heatmap-상관계수&quot;&gt;heatmap 상관계수&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;numpy 이용 mask 만들기&lt;br /&gt;
np.ones(shape) == np.ones_like(array_like)&lt;br /&gt;
np.triu[tril]&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;seaborn heatmap&lt;br /&gt;
sns.heatmap(corr, annot=True, fmt=”.2f”, cmap=”coolwarm”, vmin=-1, vmax=1, mask=mask)&lt;br /&gt;
vmin/vmax 지정해주지 않으면 데이터 및 기타 키워드 인수에서 추론 - 데이터의 최소/최대&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;details&gt;
&lt;summary&gt;:bookmark:출처&lt;/summary&gt;

- 대/소문자 구분 조정&lt;br /&gt;
https://learn.microsoft.com/ko-kr/windows/wsl/case-sensitivity&lt;br /&gt;
- Random Generator&lt;br /&gt;
https://numpy.org/doc/stable/reference/random/generator.html&lt;br /&gt;
- Stop Using NumPy’s Global Random Seed&lt;br /&gt;
https://builtin.com/data-science/numpy-random-seed&lt;br /&gt;
- pandas.to_datetime&lt;br /&gt;
https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html&lt;br /&gt;
- Datetime properties&lt;br /&gt;
https://pandas.pydata.org/docs/reference/series.html#api-series-dt&lt;br /&gt;
- Options and settings&lt;br /&gt;
https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html&lt;br /&gt;
- pandas.DataFrame.merge&lt;br /&gt;
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html&lt;br /&gt;
- map과 replace의 차이
https://abluesnake.tistory.com/142&lt;br /&gt;
&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p class=&quot;notice--success&quot;&gt;:mortar_board:&lt;strong&gt;포스팅 공지&lt;/strong&gt; &lt;br /&gt;&lt;br /&gt;
작성한 포스팅은 &lt;strong&gt;멋쟁이 사자처럼 AI SCHOOl&lt;/strong&gt;의 수업 내용입니다.&lt;br /&gt;&lt;/p&gt;</content><author><name>JJWS</name></author><category term="likelion" /><category term="TIL" /><summary type="html">:octocat:EDA</summary></entry><entry><title type="html">[멋사 AI 7기] EDA2</title><link href="https://jjwsgit.github.io/likelion/eda2/" rel="alternate" type="text/html" title="[멋사 AI 7기] EDA2" /><published>2022-10-11T00:00:00+00:00</published><updated>2022-10-11T00:00:00+00:00</updated><id>https://jjwsgit.github.io/likelion/eda2</id><content type="html" xml:base="https://jjwsgit.github.io/likelion/eda2/">&lt;p&gt;:octocat:EDA&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;pandas&quot;&gt;Pandas&lt;/h1&gt;

&lt;p&gt;:bulb: &lt;a href=&quot;https://vita.had.co.nz/papers/tidy-data.pdf&quot;&gt;Tidy Data&lt;/a&gt;&lt;br /&gt;
깔끔한 데이터 / 분석하기 좋은 데이터&lt;br /&gt;
각 변수가 열이고 각 관측치가 행이 되도록 배열된 데이터 - Hadley Wickham&lt;br /&gt;
&lt;br /&gt;
melt로 Tidy data 만들기&lt;br /&gt;
pd.melt(df, id_vars, value_vars, var_name, value_name)&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;subset-observations---rows&quot;&gt;Subset Observations - rows&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;df.nlargest(n, “value”)&lt;/li&gt;
  &lt;li&gt;df.nsmallest(n, “value”)&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;subset-variables---columns&quot;&gt;Subset Variables - columns&lt;/h2&gt;
&lt;p&gt;select single column with specific name&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;df[“colname”]&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;df.colname    * 특수문자, 띄어쓰기 등 주의&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;reshaping-data&quot;&gt;Reshaping Data&lt;/h2&gt;

&lt;h3 id=&quot;pandas-crosstab&quot;&gt;&lt;a href=&quot;https://pandas.pydata.org/docs/reference/api/pandas.crosstab.html&quot;&gt;Pandas Crosstab&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;normalize : bool, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False&lt;br /&gt;
Normalize by dividing all values by the sum of values - 해당 항목의 수 / 전체 빈도수&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;If passed ‘all’ or True, will normalize over all values&lt;/li&gt;
  &lt;li&gt;If passed ‘index’ will normalize over each row&lt;/li&gt;
  &lt;li&gt;If passed ‘columns’ will normalize over each column&lt;/li&gt;
  &lt;li&gt;If margins is True, will also normalize margin values&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;컬럼-제거하기&quot;&gt;컬럼 제거하기&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;df.drop(labels=[“col1”, “col2”], axis=1)&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;df.drop(columns=[“col1”, “col2”])&lt;br /&gt;
labels는 명시적으로 axis 설정 필요&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;컬럼명-변경하기&quot;&gt;컬럼명 변경하기&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;df.columns = [colname_list]&lt;/li&gt;
  &lt;li&gt;df = df.rename(columns={“변경전” : “변경후”})&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;데이터-타입-변경&quot;&gt;데이터 타입 변경&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;pd.to_numeric
    &lt;ul&gt;
      &lt;li&gt;errors : {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’&lt;br /&gt;
  If ‘raise’, then invalid parsing will raise an exception.&lt;br /&gt;
  If ‘coerce’, then invalid parsing will be set as NaN.&lt;br /&gt;
  If ‘ignore’, then invalid parsing will return the input.&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;series&quot;&gt;Series&lt;/h2&gt;

&lt;h3 id=&quot;handling&quot;&gt;Handling&lt;/h3&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt;Series&lt;/th&gt;
      &lt;th&gt;DataFrame&lt;/th&gt;
      &lt;th&gt;대체값 일치여부&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;replace&lt;/td&gt;
      &lt;td&gt;O&lt;/td&gt;
      &lt;td&gt;O&lt;/td&gt;
      &lt;td&gt;완전히 일치 시 대체 &lt;br /&gt; (정규 표현식은 일부여도 OK)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;str.replace&lt;/td&gt;
      &lt;td&gt;O&lt;/td&gt;
      &lt;td&gt;X&lt;/td&gt;
      &lt;td&gt;일부만 일치해도 대체&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;:pushpin: 참고&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://ko.wikipedia.org/wiki/%EC%A0%95%EA%B7%9C_%ED%91%9C%ED%98%84%EC%8B%9D&quot;&gt;정규 표현식&lt;/a&gt;&lt;br /&gt;
메타문자는 기억해둘 만하다&lt;br /&gt;
regex=True&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;Series Accessor&lt;br /&gt;
.str 접근자는 시리즈 문자열 형식에만 사용할 수 있습니다.&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;str.split(pat, expand=True)&lt;br /&gt;
expand : bool, default False&lt;br /&gt;
Expand the split strings into separate columns.&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;If True, return DataFrame/MultiIndex expanding dimensionality.&lt;/li&gt;
  &lt;li&gt;If False, return Series/Index, containing lists of strings.&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;:pushpin: 함수 적용되는 형태 주의&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h3 id=&quot;pandas-style&quot;&gt;&lt;a href=&quot;https://pandas.pydata.org/docs/reference/style.html&quot;&gt;Pandas Style&lt;/a&gt;&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;background_gradient&lt;/li&gt;
  &lt;li&gt;bar 등&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;값-찾기filtering&quot;&gt;값 찾기(Filtering)&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;str.isin    Dataframe.isin&lt;/li&gt;
  &lt;li&gt;str.contains&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;오늘의-이모저모&quot;&gt;오늘의 이모저모&lt;/h1&gt;

&lt;h2 id=&quot;시각화&quot;&gt;시각화&lt;/h2&gt;

&lt;h3 id=&quot;seaborn&quot;&gt;&lt;a href=&quot;https://seaborn.pydata.org/tutorial/function_overview.html#&quot;&gt;Seaborn&lt;/a&gt;&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;heatmap&lt;/li&gt;
  &lt;li&gt;pairplot
각 열의 조합에 대해서 산점도를 그리고, 같은 데이터가 만나는 대각선 영역에는 해당 데이터의 히스토그램을 그린다.&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://dailyheumsi.tistory.com/97&quot;&gt;범주 위치 조정&lt;/a&gt;&lt;br /&gt;
plt.legend(loc, bbox_to_anchor)&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;loc : 바운딩 박스 안에서 위치 조정&lt;/li&gt;
  &lt;li&gt;bbox_to_anchor : 바운딩 박스 밖에서 위치 조정&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;annot : 과학적 표기법 출력 여부&lt;br /&gt;
fmt : 표기 형식&lt;br /&gt;
cmap : 색상 - print(plt.colormaps())&lt;br /&gt;
palette&lt;br /&gt;
&lt;br /&gt;
point plot : 점과 ci -&amp;gt;errorbar&lt;br /&gt;
&lt;br /&gt;
hist =&amp;gt; kde(density) =&amp;gt; violin&lt;br /&gt;
scatter =&amp;gt; strip =&amp;gt; swarm&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h3 id=&quot;plotly&quot;&gt;Plotly&lt;/h3&gt;
&lt;p&gt;plotly.express&lt;br /&gt;
px.histogram : seaborn 의 barplot 과 유사한 기능&lt;br /&gt;
hisfunc : seaborn 의 estimate 기능과 유사함&lt;br /&gt;
histfunc: str (default ‘count’ if no arguments are provided, else ‘sum’)&lt;br /&gt;
One of ‘count’, ‘sum’, ‘avg’, ‘min’, or ‘max’.Function used
to aggregate values for summarization (note: can be normalized with histnorm).&lt;br /&gt; 
The arguments to this function are the values of y(x) if orientation is ‘v’(‘h’).&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;color&lt;br /&gt;
barmode&lt;br /&gt;
facet_row[col]&lt;br /&gt;
marginal&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;option&quot;&gt;option&lt;/h2&gt;
&lt;p&gt;모든 컬럼 출력 설정&lt;br /&gt;
pd.options.display.max_columns = None&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;details&gt;
&lt;summary&gt;:bookmark:출처&lt;/summary&gt;

- Tidy Data&lt;br /&gt;
https://vita.had.co.nz/papers/tidy-data.pdf&lt;br /&gt;
- Pandas Cheat Sheet&lt;br /&gt;
https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf&lt;br /&gt;
- Pandas Crosstab&lt;br /&gt;
https://pandas.pydata.org/docs/reference/api/pandas.crosstab.html&lt;br /&gt;
- 정규 표현식&lt;br /&gt;
https://ko.wikipedia.org/wiki/%EC%A0%95%EA%B7%9C_%ED%91%9C%ED%98%84%EC%8B%9D&lt;br /&gt;
- Pandas Style&lt;br /&gt;
https://pandas.pydata.org/docs/reference/style.html&lt;br /&gt;
- pairplot&lt;br /&gt;
https://velog.io/@addison/%EB%8D%B0%EC%9D%B4%ED%84%B0-%EB%B6%84%EC%84%9D-3-7-%ED%83%90%EC%83%89%EC%A0%81-%EB%8D%B0%EC%9D%B4%ED%84%B0-%EB%B6%84%EC%84%9D-%EC%83%81%EA%B4%80%EA%B4%80%EA%B3%84-%EB%B6%84%EC%84%9D&lt;br /&gt;
- Seaborn&lt;br /&gt;
https://seaborn.pydata.org/tutorial/function_overview.html#&lt;br /&gt;
- 범주 위치 조정
https://dailyheumsi.tistory.com/97&lt;br /&gt;
&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p class=&quot;notice--success&quot;&gt;:mortar_board:&lt;strong&gt;포스팅 공지&lt;/strong&gt; &lt;br /&gt;&lt;br /&gt;
작성한 포스팅은 &lt;strong&gt;멋쟁이 사자처럼 AI SCHOOl&lt;/strong&gt;의 수업 내용입니다.&lt;br /&gt;&lt;/p&gt;</content><author><name>JJWS</name></author><category term="likelion" /><category term="TIL" /><summary type="html">:octocat:EDA</summary></entry><entry><title type="html">[멋사 AI 7기] EDA1</title><link href="https://jjwsgit.github.io/likelion/eda1/" rel="alternate" type="text/html" title="[멋사 AI 7기] EDA1" /><published>2022-10-06T00:00:00+00:00</published><updated>2022-10-06T00:00:00+00:00</updated><id>https://jjwsgit.github.io/likelion/eda1</id><content type="html" xml:base="https://jjwsgit.github.io/likelion/eda1/">&lt;p&gt;:octocat:EDA&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;:pushpin:태도&lt;br /&gt;
꾸준히 연습을 통해 안목을 기르는 게 중요하다. 한 가지 도구에 익숙해지면 새로운 도구에도 적응할 수 있음&lt;br /&gt;
수영, 자전거 타기와 같다&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;시각화-라이브러리pandasseabornplotly&quot;&gt;시각화 라이브러리(Pandas/Seaborn/Plotly)&lt;/h1&gt;

&lt;h2 id=&quot;pandas&quot;&gt;Pandas&lt;br /&gt;&lt;/h2&gt;

&lt;p&gt;pandas는 필요한 정보를 미리 계산할 수 있는 장점을 활용할 수 있다.&lt;br /&gt;
이를 통해 시각화 속도 이슈를 줄일 수 있다.&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;plotly&quot;&gt;Plotly&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://plotly.com/python-api-reference/plotly.express.html&quot;&gt;Plotly Express&lt;/a&gt; 사용권장&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;high-level interface for data visualization&lt;/li&gt;
  &lt;li&gt;seaborn 과 비슷한 사용법&lt;/li&gt;
  &lt;li&gt;사용법이 plotly.graph_objects 에 비해 비교적 간단한 편
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;points 옵션을 통해 strip plot 대체가능  * 다양한 옵션을 사용해 보자&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Cufflinks&lt;br /&gt;
다양한 라이브러리를 참조해서 버전 호환성 문제가 있을 수 있다.&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;eda&quot;&gt;EDA&lt;/h1&gt;

&lt;h2 id=&quot;라이브러리-로드&quot;&gt;라이브러리 로드&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;pandas&lt;/li&gt;
  &lt;li&gt;numpy&lt;/li&gt;
  &lt;li&gt;matplotlib.pyplot
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;시각화-폰트-설정&quot;&gt;시각화 폰트 설정&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;koreanize_matplotlib&lt;/li&gt;
  &lt;li&gt;%config InlineBackend.figure_format = ‘retina’
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;일부-데이터-보기&quot;&gt;일부 데이터 보기&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;head()&lt;/li&gt;
  &lt;li&gt;tail()&lt;/li&gt;
  &lt;li&gt;sample()
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;데이터-합치기&quot;&gt;데이터 합치기&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;pd.concat()   * axis
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;중복-제거&quot;&gt;중복 제거&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;duplicated() : 중복 확인&lt;/li&gt;
  &lt;li&gt;drop_duplicates() : 중복 제거	* df[df.duplicated()] 확인
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;인덱스-값-설정&quot;&gt;인덱스 값 설정&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;set_index()
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;정렬&quot;&gt;정렬&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;sort_index()	* ascending
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;판다스-attributes&quot;&gt;판다스 attributes&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;shape&lt;/li&gt;
  &lt;li&gt;dtypes&lt;/li&gt;
  &lt;li&gt;columns&lt;/li&gt;
  &lt;li&gt;index
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;데이터-요약&quot;&gt;데이터 요약&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;info()
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;결측치-보기&quot;&gt;결측치 보기&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;isnull()		*sum(), mean()
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;기술통계&quot;&gt;기술통계&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;describe()	* include&lt;/li&gt;
  &lt;li&gt;unique() : column에서 고유한 값을 가진 리스트, series에서만 사용 가능&lt;/li&gt;
  &lt;li&gt;nunique() : column에서 고유한 값의 개수
&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;오늘의-이모저모&quot;&gt;오늘의 이모저모&lt;/h1&gt;

&lt;h2 id=&quot;pandas-accessors&quot;&gt;&lt;a href=&quot;https://pandas.pydata.org/docs/reference/series.html#accessors&quot;&gt;Pandas Accessors&lt;/a&gt;&lt;/h2&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Data Type&lt;/th&gt;
      &lt;th&gt;Accessor&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Datetime, Timedelta, Period&lt;/td&gt;
      &lt;td&gt;dt&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;String&lt;/td&gt;
      &lt;td&gt;str&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Categorical&lt;/td&gt;
      &lt;td&gt;cat&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Sparse&lt;/td&gt;
      &lt;td&gt;sparse&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;데이터-타입-변경&quot;&gt;데이터 타입 변경&lt;/h2&gt;

&lt;p&gt;astype(“type”)&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;문자&lt;br /&gt;
astype(str) &amp;lt;= pandas series&lt;br /&gt;
str() &amp;lt;= python 문자열&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;날짜&lt;br /&gt;
pd.to_datetime()&lt;br /&gt;
&lt;a href=&quot;https://pandas.pydata.org/docs/reference/series.html#datetimelike-properties&quot;&gt;Datetime properties&lt;/a&gt;&lt;br /&gt;
pd.date_range(start, end)&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;map--apply-applymap&quot;&gt;map / apply/ applymap&lt;/h2&gt;
&lt;p&gt;차이 복습&lt;br /&gt;
&lt;a href=&quot;https://stackoverflow.com/questions/19798153/difference-between-map-applymap-and-apply-methods-in-pandas&quot;&gt;참고 사이트&lt;/a&gt;&lt;br /&gt;
USE CASE&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;map&lt;br /&gt;
map is meant for mapping values from one domain to another, so is optimised for performance (e.g., df[‘A’].map({1:’a’, 2:’b’, 3:’c’}))&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;apply&lt;br /&gt;
apply is for applying any function that cannot be vectorised (e.g., df[‘sentences’].apply(nltk.sent_tokenize))&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;applymap&lt;br /&gt;
applymap is good for elementwise transformations across multiple rows/columns (e.g., df[[‘A’, ‘B’, ‘C’]].applymap(str.strip))&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;https://i.stack.imgur.com/IZys3.png&quot; alt=&quot;image&quot; /&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;빈도수-구하기&quot;&gt;빈도수 구하기&lt;/h2&gt;
&lt;p&gt;한 개의 변수&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;value_counts()&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;두 개의 변수&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;crosstab&lt;/li&gt;
  &lt;li&gt;pivot_table&lt;/li&gt;
  &lt;li&gt;group by
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;기타&quot;&gt;기타&lt;/h2&gt;
&lt;p&gt;결측치 채우기&lt;br /&gt;
fillna(value)&lt;br /&gt;
&lt;br /&gt;
to_frame()&lt;br /&gt;
&lt;br /&gt;
컬럼 삭제&lt;br /&gt;
del df[“col”]&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;==판다스내부 벡터끼리의 연산, 비트와이즈 연산==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;details&gt;
&lt;summary&gt;:bookmark:출처&lt;/summary&gt;

- Plotly Express&lt;br /&gt;
https://plotly.com/python-api-reference/plotly.express.html&lt;br /&gt;
- Pandas Accessors&lt;br /&gt;
https://pandas.pydata.org/docs/reference/series.html#accessors&lt;br /&gt;
- Datetime properties&lt;br /&gt;
https://pandas.pydata.org/docs/reference/series.html#datetimelike-properties&lt;br /&gt;
- map / apply / applymap 차이 stackoverflow&lt;br /&gt;
https://stackoverflow.com/questions/19798153/difference-between-map-applymap-and-apply-methods-in-pandas&lt;br /&gt;
&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p class=&quot;notice--success&quot;&gt;:mortar_board:&lt;strong&gt;포스팅 공지&lt;/strong&gt; &lt;br /&gt;&lt;br /&gt;
작성한 포스팅은 &lt;strong&gt;멋쟁이 사자처럼 AI SCHOOl&lt;/strong&gt;의 수업 내용입니다.&lt;br /&gt;&lt;/p&gt;</content><author><name>JJWS</name></author><category term="likelion" /><category term="TIL" /><summary type="html">:octocat:EDA</summary></entry><entry><title type="html">[멋사 AI 7기] 데이터 시각화</title><link href="https://jjwsgit.github.io/likelion/plot/" rel="alternate" type="text/html" title="[멋사 AI 7기] 데이터 시각화" /><published>2022-10-05T00:00:00+00:00</published><updated>2022-10-05T00:00:00+00:00</updated><id>https://jjwsgit.github.io/likelion/plot</id><content type="html" xml:base="https://jjwsgit.github.io/likelion/plot/">&lt;p&gt;:octocat:Plot&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;jupyter-notebook&quot;&gt;Jupyter Notebook&lt;/h1&gt;
&lt;ul&gt;
  &lt;li&gt;Jupyter Extension&lt;/li&gt;
  &lt;li&gt;결과창 줄이기&lt;br /&gt;
더블클릭 / ESC + o&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;도움말&lt;bt&gt;
shift + tab + tab&lt;br /&gt;
&lt;br /&gt;&lt;/bt&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;plot&quot;&gt;Plot&lt;/h1&gt;

&lt;p&gt;:pushpin: 시각화 할 때 데이터가 많으면 오래 걸린다. 속도를 개선 해보는 방법은?&lt;br /&gt;
대표값을 표시해야한다면 그래프에서 계산하지 않고 미리 계산해서 시각화 하기&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;matplotlib--koreanize-matplotlib&quot;&gt;matplotlib / &lt;a href=&quot;https://github.com/ychoi-kr/koreanize-matplotlib&quot;&gt;koreanize-matplotlib&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;그래프에 retina display 적용&lt;br /&gt;
%config InlineBackend.figure_format = ‘retina’&lt;br /&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://matplotlib.org/3.3.3/tutorials/introductory/customizing.html&quot;&gt;Customizing&lt;/a&gt; Matplotlib with style sheets&lt;br /&gt;
plt.style.use(“style”) # fivethirtyeight, ggplot 추천&lt;br /&gt;
style 지정 - 기존 값들을 초기화 하지 않는다.&lt;br /&gt;
=&amp;gt; 다시 지정할 때 restart 해주는 게 편하다.&lt;br /&gt;
&lt;br /&gt;
plt.legend(bbox_to_anchor)&lt;br /&gt;
secondary_y&lt;br /&gt;
plt.axhline(val, color)&lt;br /&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;로그 안뜨게 하는 방법&lt;br /&gt;
plt.show(), 변수할당, ;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;plotly&quot;&gt;plotly&lt;/h2&gt;
&lt;p&gt;JavaScript 기반 동적&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# plotly offline mode
&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;plotly.offline&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;iplot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;init_notebook_mode&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;plotly.subplots&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;make_subplots&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;init_notebook_mode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;color : seaborn의 hue처럼 색상에 따라 다른 데이터들을 구분해주는 역할을 하는 속성&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;오늘의-이모저모&quot;&gt;오늘의 이모저모&lt;/h2&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt;Series&lt;/th&gt;
      &lt;th&gt;DataFrame&lt;/th&gt;
      &lt;th&gt;사용예&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;map&lt;/td&gt;
      &lt;td&gt;O&lt;/td&gt;
      &lt;td&gt;X&lt;/td&gt;
      &lt;td&gt;df[“컬럼명”].map(함수 or dict)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;apply&lt;/td&gt;
      &lt;td&gt;O&lt;/td&gt;
      &lt;td&gt;O&lt;/td&gt;
      &lt;td&gt;df.apply(함수)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;applymap&lt;/td&gt;
      &lt;td&gt;X&lt;/td&gt;
      &lt;td&gt;O&lt;/td&gt;
      &lt;td&gt;df.applymap(함수)&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;벡터연산으로 반복문 보다 빠르다.&lt;br /&gt;
lambda를 사용해도 되지만, 가독성 떨어질 경우 함수를 사용하는 편이다.&lt;br /&gt;
tqdm의 progress_map&lt;br /&gt;
&lt;br /&gt;
Series.to_list()&lt;br /&gt;
return a list of the values&lt;br /&gt;
&lt;br /&gt;
merge/join&lt;br /&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;on : key&lt;/li&gt;
  &lt;li&gt;how
inner / left / right / outer&lt;br /&gt;
보통 right로 merge를 할 일이 있어도 left가 더 직관적이기 때문에&lt;br /&gt;
두 데이터프레임의 순서를 바꿔주고 how=’left’를 많이 사용해주는 편&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;resample / 분할&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html&quot;&gt;pandas.DataFrame.resample&lt;/a&gt; / &lt;a href=&quot;https://pandas.pydata.org/docs/getting_started/intro_tutorials/09_timeseries.html#min-tut-09-timeseries&quot;&gt;time series&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;pd.cut() : 같은 길이로 나누기
예) 시험의 절대 평가&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;pd.qcut() : 같은 갯수로 나누기
예) 시험의 상대평가&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;cp949의 인코딩 범위에 아스키코드는 포함되지 않는다.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;details&gt;
&lt;summary&gt;:bookmark:출처&lt;/summary&gt;

- koreanize-matplotlib&lt;br /&gt;
https://github.com/ychoi-kr/koreanize-matplotlib&lt;br /&gt;
- matplotlib customizing&lt;br /&gt;
  https://matplotlib.org/3.3.3/tutorials/introductory/customizing.html&lt;br /&gt;
- resampling&lt;br /&gt;
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html&lt;br /&gt;
https://pandas.pydata.org/docs/getting_started/intro_tutorials/09_timeseries.html#min-tut-09-timeseries&lt;br /&gt;
&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p class=&quot;notice--success&quot;&gt;:mortar_board:&lt;strong&gt;포스팅 공지&lt;/strong&gt; &lt;br /&gt;&lt;br /&gt;
작성한 포스팅은 &lt;strong&gt;멋쟁이 사자처럼 AI SCHOOl&lt;/strong&gt;의 수업 내용입니다.&lt;br /&gt;&lt;/p&gt;</content><author><name>JJWS</name></author><category term="likelion" /><category term="TIL" /><summary type="html">:octocat:Plot</summary></entry><entry><title type="html">[멋사 AI 7기] 웹 데이터 ETC</title><link href="https://jjwsgit.github.io/likelion/webdata_etc/" rel="alternate" type="text/html" title="[멋사 AI 7기] 웹 데이터 ETC" /><published>2022-10-04T00:00:00+00:00</published><updated>2022-10-04T00:00:00+00:00</updated><id>https://jjwsgit.github.io/likelion/webdata_etc</id><content type="html" xml:base="https://jjwsgit.github.io/likelion/webdata_etc/">&lt;p&gt;:octocat:Web | DataFrame&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;web&quot;&gt;WEB&lt;/h1&gt;

&lt;p&gt;:pushpin:참고사항&lt;br /&gt;
네이버 증권 게시판에 글을 쓰면&lt;br /&gt;
데이터베이스권은 네이버에 있으며 저작권은 글쓴이에게 있음&lt;br /&gt;
&lt;br /&gt;
get : 필요한 데이터를 Query String 에 담아 전송&lt;br /&gt;
post : 전송할 데이터를 HTTP 메시지의 Body의 Form Data에 담아 전송&lt;br /&gt;
get 과 post 여부는 브라우저의 네트워크 탭의 Headers &amp;gt; Request Method 를 통해 확인&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;referer&lt;br /&gt;
헤더에 담겨 있는 현재 페이지에 요청한 이전 페이지의 url 정보&lt;br /&gt;
서버는 referer 참조함으로써 현재 표시하는 웹페이지가 어떤 웹페이지에서 요청되었는지 알 수 있음&lt;br /&gt;
&lt;a href=&quot;https://inpa.tistory.com/entry/WEB-%F0%9F%93%9A-HTTP-referer-%EB%9E%80&quot;&gt;참고 사이트&lt;/a&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;비동기-통신&quot;&gt;비동기 통신&lt;/h2&gt;

&lt;p&gt;XHR(XMLHttpRequest)&lt;br /&gt;
필요한 부분만 서버에 요청하고 해당하는 내용만 받음&lt;br /&gt;
=&amp;gt; 대역폭의 낭비를 줄이고 필요한 부분만 요청하는 상호작용&lt;br /&gt;
&lt;a href=&quot;https://velog.io/@ldaehi0205/ajax-fetch-xhr-%EB%B9%84%EB%8F%99%EA%B8%B0%ED%86%B5%EC%8B%A0-%EC%9D%B4%ED%95%B4%ED%95%98%EA%B8%B0&quot;&gt;참고 사이트&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;웹-데이터-수집&quot;&gt;웹 데이터 수집&lt;/h1&gt;

&lt;h2 id=&quot;pandas-read_html&quot;&gt;Pandas read_html&lt;/h2&gt;

&lt;p&gt;데이터를 수집할 URL을 찾고, pd.read_html(url)을 통해 table 태그의 데이터를 가져온다.&lt;br /&gt;
한글 깨짐 방지!&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;beautifulsoup&quot;&gt;BeautifulSoup&lt;/h2&gt;

&lt;p&gt;상세정보를 위한 링크정보 수집&lt;br /&gt;
=&amp;gt; bs를 사용하여 수집한 html 문서에서 링크 정보를 찾는다&lt;br /&gt;
&lt;br /&gt;
html tag에서 사용하는 css class 지정방식과 bs에서 사용하는 방법의 차이가 있다. 확인 필요&lt;br /&gt;
nth-child를 지원하지 않아 nth-of-type을 바꿔줘야함&lt;br /&gt;
&lt;br /&gt;
get_text() 는 메소드, text 는 attr 그 외에 차이가 안보임&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;예외-처리&quot;&gt;예외 처리&lt;/h2&gt;

&lt;p&gt;try except 구문&lt;br /&gt;
try 구문에서 오류가 나면 except 구문으로 빠짐&lt;br /&gt;
함수를 반복적으로 돌릴 때 중간에 오류가 나서 멈추는 경우를 방지&lt;br /&gt;
&lt;br /&gt;
예외 사항에서 꼭 고쳐야 하는 심각한 것에는 무엇이 있을까?&lt;br /&gt;
보안 이슈&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;크로스 사이트 스크립팅(Cross Site Scripting, XSS)&lt;br /&gt;
‘공격자’의 웹사이트에서 피해자가 친숙하다고 느끼는 웹사이트에 악성 스크립트를 주입하는 행위&lt;br /&gt;
웹사이트 사이를 넘어서 공격한다는 의미에서 크로스 사이트 스크립팅이라는 용어가 생겨남.&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://nordvpn.com/ko/blog/xss-attack/&quot;&gt;참고 사이트&lt;/a&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;dataframe-화&quot;&gt;DataFrame 화&lt;/h1&gt;

&lt;dl&gt;
  &lt;dt&gt;T = 전치행렬 transpose()&lt;br /&gt;&lt;/dt&gt;
  &lt;dt&gt;set_index(index로 만들 column)&lt;br /&gt;&lt;/dt&gt;
  &lt;dt&gt;pd.concat([df1, df2,..], axis) 1 : 옆으로&lt;br /&gt;&lt;/dt&gt;
  &lt;dt&gt;map, apply&lt;br /&gt;&lt;/dt&gt;
  &lt;dd&gt;데이터프레임(특정 컬럼)에 반복문을 사용하지 않고 함수 일괄 적용&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/dd&gt;
&lt;/dl&gt;

&lt;details&gt;
&lt;summary&gt;:bookmark:출처&lt;/summary&gt;

- referer&lt;br /&gt;
https://inpa.tistory.com/entry/WEB-%F0%9F%93%9A-HTTP-referer-%EB%9E%80&lt;br /&gt;
- XHR&lt;br /&gt;
https://velog.io/@ldaehi0205/ajax-fetch-xhr-%EB%B9%84%EB%8F%99%EA%B8%B0%ED%86%B5%EC%8B%A0-%EC%9D%B4%ED%95%B4%ED%95%98%EA%B8%B0&lt;br /&gt;
- XSS&lt;br /&gt;
https://nordvpn.com/ko/blog/xss-attack/&lt;br /&gt;
&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p class=&quot;notice--success&quot;&gt;:mortar_board:&lt;strong&gt;포스팅 공지&lt;/strong&gt; &lt;br /&gt;&lt;br /&gt;
작성한 포스팅은 &lt;strong&gt;멋쟁이 사자처럼 AI SCHOOl&lt;/strong&gt;의 수업 내용입니다.&lt;br /&gt;&lt;/p&gt;</content><author><name>JJWS</name></author><category term="likelion" /><category term="TIL" /><summary type="html">:octocat:Web | DataFrame</summary></entry><entry><title type="html">[멋사 AI 7기] 멜론 TOP100</title><link href="https://jjwsgit.github.io/likelion/melon100/" rel="alternate" type="text/html" title="[멋사 AI 7기] 멜론 TOP100" /><published>2022-09-29T00:00:00+00:00</published><updated>2022-09-29T00:00:00+00:00</updated><id>https://jjwsgit.github.io/likelion/melon100</id><content type="html" xml:base="https://jjwsgit.github.io/likelion/melon100/">&lt;p&gt;:octocat:Web Scraping&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pandas&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;requests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;https://www.melon.com/chart/index.htm&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# 로봇으로 인식하는 것 같음
# pd.read_html(url)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;HTTPError: HTTP Error 406: Not Acceptable&lt;br /&gt;
사용자 에이전트에서 정해준 규격에 따른 어떠한 콘텐츠도 찾지 못했을 때&lt;br /&gt;
로봇으로 인식해 정보를 받지 못하므로 user-agent를 설정 해줘야 함&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;requests&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;headers&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;user-agent&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Mozilla/5.0&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;read_html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;temp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;temp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;temp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tail&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src=&quot;../../images/2022-09-29-melon100/head.png&quot; alt=&quot;head&quot; /&gt;&lt;br /&gt;
&lt;img src=&quot;../../images/2022-09-29-melon100/tail.png&quot; alt=&quot;tail&quot; /&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;temp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;columns&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;[‘Unnamed: 0’, ‘순위’, ‘순위등락’, ‘앨범이미지’, ‘곡 상세가기’, ‘곡정보’, ‘앨범’, ‘좋아요’, ‘듣기’,
       ‘담기’, ‘다운’, ‘뮤비’]&lt;br /&gt;
&lt;br /&gt;
‘Unnamed: 0’ : 선택버튼 무의미한 컬럼 &lt;br /&gt;
[‘곡 상세가기’, ‘듣기’, ‘담기’, ‘다운’, ‘뮤비’] : 추가 기능&lt;br /&gt;
&lt;br /&gt;
‘순위등락’ : 전처리 해서 사용가능&lt;br /&gt;
‘곡정보’ : 제목이랑 가수 구분 필요&lt;br /&gt;
‘좋아요’ : 모두 0으로 나오는 데 홈페이지상으론 수치가 나옴, 이유가 뭘까?&lt;br /&gt;
‘앨범이미지’ : NaN으로 받아와짐, 추가 처리 필요&lt;br /&gt;
[“순위”, “순위등락”, “곡정보”, “앨범”] : 사용&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;rank_transform&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;str_rank&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;str_num&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;str_rank&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;str_num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;순위 컬럼 수치형 데이터로 변환 &lt;br /&gt;
ex) 1위 -&amp;gt; 1&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;순위_변환&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;순위&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;apply&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rank_transform&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;순위_변환&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;../../images/2022-09-29-melon100/rank.png&quot; alt=&quot;rank&quot; /&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;unique_list&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;순위등락&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unique&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;unique_list&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;[‘순위 동일  0’, ‘단계 상승  1’, ‘단계 하락  2’, ‘단계 하락  1’, ‘단계 상승  2’,
       ‘단계 하락  3’, ‘단계 상승  4’, ‘단계 상승  3’, ‘단계 하락  4’, ‘단계 하락  9’,
       ‘순위 진입’]&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# https://codingspooning.tistory.com/138
# https://kynk94.github.io/devlog/post/re-match-hangul
&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;re&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;regex_hangul&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;hangul&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;re&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;compile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'[^ㄱ-ㅣ가-힣+]'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# 한글과 띄어쓰기를 제외한 모든 글자
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hangul&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sub&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;''&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# 한글과 띄어쓰기를 제외한 모든 부분을 제거
&lt;/span&gt;    
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;슬라이싱을 통해 5글자를 추출해도 되겠지만, 정규표현식을 사용해서 추출해봤음&lt;br /&gt;
유니코드 ㄱ : 12593, ㅣ : 12643, 가 : 44032, 힣: 55203&lt;br /&gt;
ㄱ-힣으로 지정한다면 12644 ~ 44031의 문자도 포함되므로 ㄱ-ㅣ가-힣으로 지정한다.&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;idx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;unique&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;enumerate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unique_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;unique_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;idx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;regex_hangul&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unique&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unique_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;{‘단계상승’, ‘단계하락’, ‘순위동일’, ‘순위진입’}&lt;br /&gt;
set로 중복제거&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;regex_num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;num&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;re&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;compile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;r&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'\d+'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;findall&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# 리스트 형식으로 나와서 0번째 인덱스 값을 가져옴
&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;숫자를 찾아오는 정규표현식&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;rank_updown&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rank_change&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;regex_hangul&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rank_change&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;단계상승&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;regex_num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rank_change&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;elif&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;regex_hangul&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rank_change&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;단계하락&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regex_num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rank_change&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;elif&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;regex_hangul&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rank_change&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;순위동일&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;new&quot;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;단계하락 =&amp;gt; 음수, 순위동일 =&amp;gt; 0 &lt;br /&gt;
새롭게 순위에 진입한 경우는 “new”로 변환. 추후 분석 시 수치형 데이터로 변환할 필요가 있어 보임&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;순위등락_변환&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;순위등락&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;apply&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rank_updown&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;순위&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;순위_변환&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;순위등락_변환&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src=&quot;../../images/2022-09-29-melon100/rank_change.png&quot; alt=&quot;rank_change&quot; /&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;bs4&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BeautifulSoup&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bs&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;html&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Pandas로 추출하지 못한 내용들 BeautifulSoup으로 시작&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# 제목
# #lst50 &amp;gt; td:nth-child(6) &amp;gt; div &amp;gt; div &amp;gt; div.ellipsis.rank01 &amp;gt; span &amp;gt; a
&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;title&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;div.ellipsis.rank01 span a&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;title_list&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;title&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;title_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;title_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;제목&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;title_list&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;제목을 추출해와서 list로 모은 후 DataFrame 컬럼으로 추가&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# 가수
#lst50 &amp;gt; td:nth-child(6) &amp;gt; div &amp;gt; div &amp;gt; div.ellipsis.rank02 &amp;gt; a
#lst50 &amp;gt; td:nth-child(6) &amp;gt; div &amp;gt; div &amp;gt; div.ellipsis.rank02 &amp;gt; span &amp;gt; a
#lst50 &amp;gt; td:nth-child(6) &amp;gt; div &amp;gt; div &amp;gt; div.ellipsis.rank02 &amp;gt; span 단 class=&quot;checkEllipsis&quot;
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# musician = html.select(&quot;div.ellipsis.rank02 a&quot;) # 여기로 접근하면 span에 있는 a 도 불러와져서 가수가 2번씩 뽑힘
# musician = html.select(&quot;div.ellipsis.rank02 span a&quot;) # 가수가 여러명이면 여러개 뽑힌다
&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;musician&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;div.ellipsis.rank02 span.checkEllipsis&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;musician_list&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;musician&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;musician_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;    
&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;musician_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;가수&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;musician_list&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;가수를 추출해와서 list로 모은 후 DataFrame 컬럼으로 추가&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;제목&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;가수&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;../../images/2022-09-29-melon100/title_musician.png&quot; alt=&quot;title_musician&quot; /&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# 앨범 이미지
#lst50 &amp;gt; td:nth-child(4) &amp;gt; div &amp;gt; a &amp;gt; img
# https://cdnimg.melon.co.kr/cm2/album/images/110/45/985/11045985_20220905151107_500.jpg/melon/resize/120/quality/80/optimize
# optimize를 없애야 큰 이미지를 얻을 수 있겠다
&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;image&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;div a img&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;img_list&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;img&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:]:&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# 첫 줄 melon img, 둘째줄 불필요 
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;img_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;img&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;src&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/melon&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;img_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;/melon 뒤로 resize optimize를 없애줘야 본 크기의 이미지를 얻어올 수 있다.&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;앨범이미지_변환&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;img_list&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;앨범이미지_변환&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src=&quot;../../images/2022-09-29-melon100/image.png&quot; alt=&quot;image&quot; /&gt;&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;#lst50 &amp;gt; td:nth-child(8) &amp;gt; div &amp;gt; button &amp;gt; span.cnt
# 이것도 0으로 뜸, 문제가 뭘까?
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;span.cnt&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;requests get을 사용해 불러오면 수치가 0으로 나온다.&lt;br /&gt;
BeautifulSoup으로 변환해도 마찬가지&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# 멜론에서 Javascript를 사용해서 좋아요 수를 실시간으로 불러옴
# BeautifulSoup은 정적 페이지 정보만 가져옴 Selenium 같은 동적 정보 가져오는 툴이 필요
# BeautifulSoup만 사용해서 가져오는 방법이 없을까
# https://sdallman.medium.com/web-scraping-dynamic-content-only-using-beautiful-soup-631496473c0e
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# 좋아요 javascript code
# likeAttr = data-song-no
# SUMMCNT가 좋아요 수 
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;script&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p class=&quot;text-center&quot;&gt;&lt;img src=&quot;../../images/2022-09-29-melon100/like_set.png&quot; alt=&quot;like&quot; /&gt;&lt;br /&gt;
(JavaScript Like Func)&lt;/p&gt;

&lt;p&gt;위의 JavaScript Like Func를 보면 좋아요 데이터는 Javascript를 통해 동적으로 불러온다.&lt;br /&gt;
Selenium을 사용해 불러와도 되겠지만, 팀장님이 알려주신 위 &lt;a href=&quot;https://sdallman.medium.com/web-scraping-dynamic-content-only-using-beautiful-soup-631496473c0e&quot;&gt;출처 사이트&lt;/a&gt;를 통해 BeautifulSoup으로도 가능하다는 걸 알게 됐다.&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;#lst50 &amp;gt; td:nth-child(8) &amp;gt; div &amp;gt; button
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;button&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;button.button_etc.like&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;data_song_no&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;button&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;data_song_no&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;data-song-no&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data_song_no&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Like Func의 attr인 data-song-no를 추출해와서 리스트에 넣어준다.&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# 여러개를 한번에 뽑기 위해서는 &quot;song_no1, song_no2,, song_noN&quot;의 형태로 넣어줘야함  
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;song_no&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;, &quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data_song_no&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cashflow&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;https://www.melon.com/commonlike/getSongLike.json?contsIds=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;song_no&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cf&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;requests&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cashflow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;headers&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;user-agent&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Mozilla/5.0&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cfdata&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;리스트에 있는 데이터를 콤마 구분자 형식의 문자열로 만들기 위해 join을 사용&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;df_like&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cfdata&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;contsLike&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df_like&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;../../images/2022-09-29-melon100/like_table.png&quot; alt=&quot;like_table&quot; /&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;좋아요_변환&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df_like&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;SUMMCNT&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;필요한 SUMMCNT : 좋아요 수 만 뽑아서 DataFrame 컬럼에 추가한다.&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;df_last&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;순위_변환&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;순위등락_변환&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;앨범이미지_변환&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;제목&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;가수&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;앨범&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;좋아요_변환&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df_last&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;../../images/2022-09-29-melon100/trans_table.png&quot; alt=&quot;trans_table&quot; /&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;df_last&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;columns&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;rank&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;rank_change&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;img_url&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;title&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;musician&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;album&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;like&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;알아보기 쉽게 컬럼명 바꿔준다.&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# 이미지 저장
# https://www.delftstack.com/ko/howto/python/download-image-in-python/
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# def jpg_load(name, url):
#     f = open(name + '.jpg','wb')
#     response = requests.get(url)
#     f.write(response.content)
#     f.close()
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas
# for idx, row in df_last.head().iterrows():
#     jpg_load(row[&quot;title&quot;], row[&quot;img_url&quot;])
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;저장할 이미지의 이름과 URL을 Parameter로 가지는 이미지 저장 함수&lt;br /&gt;
지금까지 만든 DataFrame으로 이미지를 저장하고 싶다면 위 코드를 실행하면 된다.&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;#conts &amp;gt; div.multi_row &amp;gt; div.calendar_prid.mt12 &amp;gt; span.yyyymmdd &amp;gt; span class=&quot;year&quot;
#conts &amp;gt; div.multi_row &amp;gt; div.calendar_prid.mt12 &amp;gt; span.hhmm &amp;gt; span class=&quot;hour&quot;
&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;today&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;span.year&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;_&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;html&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;span.hour&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;file_name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;top100-&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;today&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;.csv&quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df_last&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;to_csv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;file_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;index&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;스크래핑한 URL의 날짜와 시간을 받아와서 만들 csv의 이름 설정&lt;br /&gt;
top100-yyyy.mm.dd_hh:mm.csv로 저장&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;read_csv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;file_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;../../images/2022-09-29-melon100/csv.png&quot; alt=&quot;csv&quot; /&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;details&gt;
&lt;summary&gt;:bookmark:출처&lt;/summary&gt;

- 멜론차트 TOP100&lt;br /&gt;
https://www.melon.com/chart/index.htm&lt;br /&gt;
- 정규표현식 텍스트 전처리&lt;br /&gt;
https://codingspooning.tistory.com/138&lt;br /&gt;
https://kynk94.github.io/devlog/post/re-match-hangul&lt;br /&gt;
- Web scraping dynamic content only using beautiful soup&lt;br /&gt;
https://sdallman.medium.com/web-scraping-dynamic-content-only-using-beautiful-soup-631496473c0e&lt;br /&gt;
- 이미지 저장&lt;br /&gt;
https://www.delftstack.com/ko/howto/python/download-image-in-python/&lt;br /&gt;
- DataFrame iterrows&lt;br /&gt;
https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas&lt;br /&gt;
&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p class=&quot;notice--success&quot;&gt;:mortar_board:&lt;strong&gt;포스팅 공지&lt;/strong&gt; &lt;br /&gt;&lt;br /&gt;
작성한 포스팅은 &lt;strong&gt;멋쟁이 사자처럼 AI SCHOOl&lt;/strong&gt;의 수업 내용입니다.&lt;br /&gt;&lt;/p&gt;</content><author><name>JJWS</name></author><category term="likelion" /><category term="TIL" /><category term="miniproject" /><summary type="html">:octocat:Web Scraping</summary></entry></feed>