Exploratory Data Analysis - Seattle, WA

Exploratory Data Analysis - Seattle, WA

Exploratory Data Analysis (EDA) is a crucial step in the data science process that allows us to understand the structure and relationships within a dataset. It involves visualizing, summarizing, and transforming the data to uncover patterns and insights that inform the development of predictive models. Here are a few reasons why EDA is so important in data science:

  1. Identifying patterns and relationships: EDA helps us understand how different variables within a dataset relate to each other. This is crucial for identifying potential predictor variables and building accurate predictive models.

  2. Detecting outliers and anomalies: EDA can help us identify any unusual or unexpected data points, which can be critical for avoiding misleading results and improving the accuracy of our models.

  3. Understanding the distribution of the data: By visualizing the distribution of the data, we can gain a better understanding of the central tendency and spread of the data, which can inform the choice of statistical methods and models.

  4. Improving the quality of the data: EDA can help us identify issues with the data such as missing values, duplicates, or inconsistent formats. By fixing these issues, we can improve the quality of the data and increase the accuracy of our results.

  5. Providing insights for data cleaning and pre-processing: EDA helps us identify issues with the data that require cleaning and pre-processing, such as transforming variables to a more suitable form, or removing outlier or irrelevant data points.

In conclusion, EDA is an essential step in the data science process that provides valuable insights and improves the quality of the data. It helps us understand the structure of the data, identify potential predictor variables, and avoid misleading results. By taking the time to perform EDA, we can ensure that our data is well understood, and that our models are based on accurate and relevant data.

“The information in the relevance of EDA was generated on 20230208 using OpenAI’s ChatGPT (GPT-3.5) language model. While every effort has been made to ensure its accuracy, the information should not be considered a primary source.”

Utilizing EDA to De-tangle Seattle, US Real State Market

At the Neuefische bootcamp, the EDA work was performed to the house sales in the King County dataset. The dataset is publicly available at Kaggle. To simulate a world case scenario. Different stakeholders with individual requirements were previous established by the Neuefische team (Asigment). In this analysis I simulated a realtor company who conducted the EDA to fulfill the needs of a fictional investor named Timothy Stevens. Mr. Stevens sells houses and is interested in putting his investments in the market. He owns expensive houses in the county center, which are the analysis target. The best timing for him is within a year. He is open for renovation if profits rise with it.

The analysis

image

At Red Cedar Property Advisors (RCPA) we specialize in providing the best advice to buy or sell your real state properties. We cover the King County in Washington State.

Our customer Timothy Stevens asked for our advice. He is interested in selling some of his properties in the center of the County within the next year. A key question that he brought to us was: Is it worth it to renovate the properties? Although he is opened in doing it, it will only be done if there is a profit benefit.


Main results:

The mean prices per zip code in the King County are quite disperse. The price range goes between 2.4 and 11 million. Because of this disperse, through the analysis I identified that anything bigger than 4 million is assumed its an outlier.

Unfortunately, our customer the properties in the county’s center are not reflected in these prices. Note to the reader: In the assessment description, I did not had information on the exact location of the stakeholder properties. Because of that, I use the city of Seattle as the base for the analysis

Is it worth to remodel?

Quite understandable the property and construction size correlate with the price. The bigger the property and its construction, the higher the market price. In our analysis we discover other factors that also have a slight correlation with the price in the center of the county . Current house condition and house grade (~0.63).
image

The house condition relates to the maintenance stage of the house, e.g. if it requires repair or how much work on it is required. While house grade relates on how good its construction meets the building code. From not meeting it, to a custom design.

In our analysis we found that improving the house condition or grade do not had an impact in the price. This result is only valid for the center of the city of Seattle and do not represent the whole County. Therefore RCPA gave the recommendation: do not remodel. Sell as is.

image

Takeaways

  • The minimum price to sell is $ 675,000.

  • House Condition and Grade drive the price.

  • Do not remodel, it is not profitable in the city of Seattle.


The analysis is available at the EDA_Project_KC_Housesales GitHub repository. At the end of the analysis a presentation was done to the Neuefische team and other trainees. This presentation is available at: EDA_Project_KC_Housesales Presentation.instuctions