Data Analytics Project.
[Audio] Our data - information about a sample of 1,000 Australian companies and are required to analyse these companies Here we have in this dataset 9 columns that are explained in the next slide through the data dictionary..
[Audio] Data dictionary that describe each of the columns, and their value types and ranges, The 9 columns are: Revenue , Industry , Exp_other , Cust_Satisf , Foreign_Op , Chair_fem , Region , R&D , exp , Marketing exp.
[Audio] We have to clean the data before starting the analysis (using power query) We start by removing blank cells from the columns, We had 5 columns who have blanks, Remove blank cells in Industry column as it is main key identifier in the record In the 3 columns Foreign_Op & R&D exp & Marketing exp , we used power query we replaced null values (blanks) with zeroes As for the cust_satisf column, we replace the blanks with mean value which was calculated for this column to be 7 according to the shown table..
[Audio] Then we did further data cleaning to other columns We normalized Cust_Satisf column where Normalization of data It is the process of scaling data in such a way that all data points lie in a range of 0 to 1. Thus, this technique, makes it possible to bring all data points to a common scale. The mathematical formula for normalization is above. So instead of having range 1- 40 we have the range to be 0 -1 We also added the profit column Revenue - Exp - R&D - Marketing = profit Chair_fem to make the column more easy to interpret we replaced the dummy values with male and female instead of 0 & 1.
[Audio] This the cleaned data After loading it from power query.
[Audio] Now we start analyzing the data using pivot tables and charts: We plot Industry vs its count industry 4 is the top count industry 9 is the least count the variation in counts of industries isn't large largest no. = 121 , smallest no.= 102.
[Audio] We plot Industry vs its Sum of revenue industry 4 is the top in revenue industry 6 is the least.
[Audio] We plot industry vs sum Marketing expenses & R&D expenses & other expenses industry 4 has highest other expenses, while industry 9 is the smallest industry 5 has the highest R&D, while industry 9 is the smallest industry 9 has the highest Marketing, while industry 2 is the smallest other expenses has a very large weight and should be minimized for larger profits.
[Audio] We plot Industry vs its Sum of Profit industry 7 is the top in profit industry 6 is the least industry 9 is very promising, although they had the second smallest revenue, but managed to have the second largest profit this was due the low other expenses and R&D costs, but spend most in Marketing so other industries should spend more on their Marketing.
[Audio] We plot Industry vs its average customer satisf industry 8 & 7 are the top in cust satisf industry 6 is the least we can see that industry 6 comes last in rating, revenue and profit, we shouldn't invest more in this industry.
[Audio] We plot Count of Male / Female chairman per industry male chairmans are dominant in all industries.
[Audio] We plot regions vs count of industry & averge normalized cust satisfy east & north have largest no. of industries northeast & offshore have the least cust satisf is nearly the same in regions with a lot of industries.
[Audio] We plot Region vs Average of Profit east & north have largest sum profit as no of industries in region increase, their sum of profit increase southwest, northeast, north have highest average profit.
[Audio] We plot Region vs Sum of revenue as no of industries in region increase, their revenue increase east & north have largest Revenue.
[Audio] We plot Sum of profit for each industry for each region east, north & capital have the highest profits Offshore is the least industry 9 (promising) should be invested more in regions like the capital, which has very low profit from industry 9.
[Audio] We plot region vs Average Marketing expenses & R&D expenses & other expenses average of other expenses in west & north west regions are the highest, while offshore & Northeast are the lowest average of R&D in North east & north west regions are the highest, while offshore & south are the lowest average of Marketing in offshore & northeast regions are the highest, while capital & east are the lowest southwest, northeast, north have highest average profit you should invest more in southwest, northeast with more industries as they have limited industries in this region.